Skip to main content
POST
/
v1
/
messages
curl --request POST \
  --url https://direct.evolink.ai/v1/messages \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "glm-5.2",
  "max_tokens": 1024,
  "messages": [
    {
      "role": "user",
      "content": "Hello, world"
    }
  ]
}
'
{
  "id": "msg_0842a705-9d0b-4eaa-b12d-09a4106326c5",
  "type": "message",
  "role": "assistant",
  "model": "glm-5.2",
  "content": [
    {
      "type": "thinking",
      "thinking": "The user asked to greet them with one word, so answering \"Hi\" will do.",
      "signature": ""
    },
    {
      "type": "text",
      "text": "Hi."
    }
  ],
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 18,
    "output_tokens": 101,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0,
    "prompt_tokens_details": {
      "cached_tokens": 0
    }
  }
}
BaseURL: The default BaseURL is https://direct.evolink.ai, which has better support for text models and long-lived connections. https://api.evolink.ai is the primary endpoint for multimodal services and serves as a fallback address for text models.

Authorizations

Authorization
string
header
required

##All interfaces require authentication using a Bearer Token##

Get an API Key:

Visit the API Key management page to obtain your API Key

Add it to the request header when using:

Authorization: Bearer YOUR_API_KEY

Note: EvoLink uses Bearer Token authentication uniformly for /v1/messages.

Body

application/json
model
enum<string>
required

The model to call

Available options:
glm-5.2
Example:

"glm-5.2"

messages
object[]
required

The list of conversation messages, alternating between user / assistant by turn

Notes:

  • Must contain at least 1 message
  • The last message is usually role=user
  • Multi-turn context is supported, and the model references prior messages
Minimum array length: 1
max_tokens
integer

The upper limit on the length of generated content (number of tokens)

Notes:

  • Tokens produced by thinking also count toward this limit
  • When the limit is reached, content is truncated and the response returns stop_reason=max_tokens
Required range: x >= 1
Example:

1024

system

System prompt, used to set the AI's role and behavior

Notes:

  • Supports a string or an array of content blocks
  • Passed via the top-level system field (do not place it inside messages)
  • The model follows the system constraints
  • ⚠️ An overly long system may be truncated: For long context, place it in messages rather than piling everything into system
Example:

"You are a helpful assistant."

temperature
number

Sampling temperature

Notes:

  • Higher values make output more varied, lower values more deterministic
  • Recommended range [0, 1]
Required range: 0 <= x <= 1
Example:

1

top_p
number

Nucleus sampling threshold

Notes:

  • Range [0, 1]
  • It is recommended not to adjust temperature and top_p at the same time
Required range: 0 <= x <= 1
Example:

0.9

top_k
integer

Sample only from the K highest-probability tokens (an Anthropic-specific parameter)

Notes:

  • Smaller values make output more deterministic, larger values make candidates more diverse
Required range: x >= 0
Example:

10

stop_sequences
string[]

Custom stop sequences: generation stops when it hits any of these strings

Notes:

  • Hitting one truncates output, and content before the hit is returned normally
  • ⚠️ Note: When a stop sequence is hit, GLM-5.2 returns stop_reason as end_turn (rather than the Anthropic-standard stop_sequence), and the response does not include a stop_sequence field. If your client relies on stop_reason=="stop_sequence" to detect a hit, special handling is required
Example:
["\n\n"]
stream
boolean
default:false

Whether to return via SSE streaming

  • true: Server-Sent Events streaming (standard Anthropic event sequence: message_start / content_block_start / content_block_delta / message_delta / message_stop)
  • false: Returns the complete response all at once (default)
Example:

false

thinking
object

Controls deep thinking

Notes:

  • GLM-5.2 is a reasoning model, and thinking is enabled by default when this field is not passed
  • When enabled, the response content array includes a type="thinking" reasoning-process block (billed as output tokens, and signature may be an empty string)
  • Pass {"type":"disabled"} to turn off thinking, significantly reducing output tokens
  • ⚠️ Only the binary type toggle takes effect: thinking budget / level parameters such as budget_tokens and effort have no effect (they are ignored), so the amount of thinking cannot be finely controlled
tools
object[]

The list of tool definitions

Notes:

  • Follows the Anthropic tool definition spec
  • input_schema uses a JSON Schema object
  • The model returns standard tool_use blocks with stop_reason=tool_use
tool_choice
object

Tool selection strategy

metadata
object

Request metadata

Response

Message object

Anthropic-style message response

id
string

The message's unique ID (format: msg_<uuid>)

type
enum<string>

Response object type

Available options:
message
role
enum<string>
Available options:
assistant
model
string

The model actually used

Example:

"glm-5.2"

content
object[]

The list of response content blocks

Possible block types:

  • thinking: the reasoning process (when thinking is enabled, which is the default)
  • text: the final answer text
  • tool_use: a tool call initiated by the model
stop_reason
enum<string>

Stop reason

  • end_turn: natural completion (⚠️ also returned when stop_sequences is hit)
  • max_tokens: reached the max_tokens limit
  • tool_use: the model triggered a tool call
Available options:
end_turn,
max_tokens,
tool_use
usage
object

Token usage statistics (Anthropic spec)