GLM-5.2 - Anthropic Compatible API
- Call the GLM-5.2 model via the Anthropic Messages protocol
- Request / response structures aligned with the Anthropic API
- System prompt: Passed via the top-level
systemfield - Thinking mode: GLM-5.2 has thinking enabled by default; thinking content is returned via
content[type=thinking]blocks; passthinking.type=disabledto turn it off - Streaming output: SSE event stream
- Tool calling: Compatible with the Anthropic
tool_use/tool_resultflow - ⚠️ No multimodal support: GLM-5.2 is a text-only model, and image / video content blocks are ignored
https://direct.evolink.ai, which has better support for text models and long-lived connections. https://api.evolink.ai is the primary endpoint for multimodal services and serves as a fallback address for text models.Authorizations
##All interfaces require authentication using a Bearer Token##
Get an API Key:
Visit the API Key management page to obtain your API Key
Add it to the request header when using:
Authorization: Bearer YOUR_API_KEYNote: EvoLink uses Bearer Token authentication uniformly for /v1/messages.
Body
The model to call
glm-5.2 "glm-5.2"
The list of conversation messages, alternating between user / assistant by turn
Notes:
- Must contain at least 1 message
- The last message is usually
role=user - Multi-turn context is supported, and the model references prior messages
1The upper limit on the length of generated content (number of tokens)
Notes:
- Tokens produced by thinking also count toward this limit
- When the limit is reached, content is truncated and the response returns
stop_reason=max_tokens
x >= 11024
System prompt, used to set the AI's role and behavior
Notes:
- Supports a string or an array of content blocks
- Passed via the top-level
systemfield (do not place it inside messages) - The model follows the system constraints
- ⚠️ An overly long system may be truncated: For long context, place it in
messagesrather than piling everything intosystem
"You are a helpful assistant."
Sampling temperature
Notes:
- Higher values make output more varied, lower values more deterministic
- Recommended range
[0, 1]
0 <= x <= 11
Nucleus sampling threshold
Notes:
- Range
[0, 1] - It is recommended not to adjust temperature and top_p at the same time
0 <= x <= 10.9
Sample only from the K highest-probability tokens (an Anthropic-specific parameter)
Notes:
- Smaller values make output more deterministic, larger values make candidates more diverse
x >= 010
Custom stop sequences: generation stops when it hits any of these strings
Notes:
- Hitting one truncates output, and content before the hit is returned normally
- ⚠️ Note: When a stop sequence is hit, GLM-5.2 returns
stop_reasonasend_turn(rather than the Anthropic-standardstop_sequence), and the response does not include astop_sequencefield. If your client relies onstop_reason=="stop_sequence"to detect a hit, special handling is required
["\n\n"]Whether to return via SSE streaming
true: Server-Sent Events streaming (standard Anthropic event sequence: message_start / content_block_start / content_block_delta / message_delta / message_stop)false: Returns the complete response all at once (default)
false
Controls deep thinking
Notes:
- GLM-5.2 is a reasoning model, and thinking is enabled by default when this field is not passed
- When enabled, the response
contentarray includes atype="thinking"reasoning-process block (billed as output tokens, andsignaturemay be an empty string) - Pass
{"type":"disabled"}to turn off thinking, significantly reducing output tokens - ⚠️ Only the binary
typetoggle takes effect: thinking budget / level parameters such asbudget_tokensandefforthave no effect (they are ignored), so the amount of thinking cannot be finely controlled
The list of tool definitions
Notes:
- Follows the Anthropic tool definition spec
input_schemauses a JSON Schema object- The model returns standard
tool_useblocks withstop_reason=tool_use
Tool selection strategy
Request metadata
Response
Message object
Anthropic-style message response
The message's unique ID (format: msg_<uuid>)
Response object type
message assistant The model actually used
"glm-5.2"
The list of response content blocks
Possible block types:
thinking: the reasoning process (when thinking is enabled, which is the default)text: the final answer texttool_use: a tool call initiated by the model
Stop reason
end_turn: natural completion (⚠️ also returned when stop_sequences is hit)max_tokens: reached the max_tokens limittool_use: the model triggered a tool call
end_turn, max_tokens, tool_use Token usage statistics (Anthropic spec)