GLM-5.2 - OpenAI Compatible API
- Use the OpenAI Chat Completions protocol to call the GLM-5.2 model
- Synchronous processing mode, returning conversation content in real time
- Plain-text conversation: Single-turn or multi-turn contextual dialogue
- System prompts: Customize the AI’s role and behavior via
role=systemmessages - Deep thinking: Toggle the chain of thought via
thinking.type, and adjust reasoning intensity withreasoning_effort; the reasoning process is returned viareasoning_content - Streaming output: Supports SSE streaming responses (
stream=true) - Tool calling: Supports Function Calling, knowledge base retrieval (retrieval), web search (web_search), and MCP (up to 128 tools)
- Structured output: Enable JSON mode via
response_format
Streaming response notes: When stream=true, responses are returned via Server-Sent Events, with each message formatted as data: {JSON}, and data: [DONE] returned at the end. Each data chunk (ChatCompletionChunk) contains id, created, model, choices, and optional usage and content_filter; within it, choices[].delta incrementally returns role / content / reasoning_content / tool_calls, and choices[].finish_reason provides the termination reason in the final chunk.
https://direct.evolink.ai, which has better support for text models and long-lived connections. https://api.evolink.ai is the primary endpoint for multimodal services and serves as a fallback address for text models.Authorizations
##All interfaces require authentication using a Bearer Token##
Get an API Key:
Visit the API Key management page to obtain your API Key
Add it to the request header when using:
Authorization: Bearer YOUR_API_KEYBody
The model code to call
glm-5.2: Latest flagship model, offering complex reasoning, ultra-long context, and exceptional inference speed
glm-5.2 "glm-5.2"
The list of conversation messages, containing the full context of the current conversation
Supports four roles: system, user, assistant, tool. Messages of different roles have different field structures; please select the corresponding role to view details. Must contain at least 1 message, and cannot consist solely of system or assistant messages.
1- System Message
- User Message
- Assistant Message
- Tool Message
Whether to enable streaming output mode
false: The model generates the complete response and returns it all at once (default), suitable for short text and batch processingtrue: Returns chunks in real time via Server-Sent Events (SSE), suitable for chat and long text; returnsdata: [DONE]when the stream ends
false
Controls whether to enable the chain of thought (Chain of Thought)
Controls the model's reasoning intensity (exclusive to GLM-5.2)
Notes:
- Only takes effect when
thinkingis enabled; defaults tomax - Values from strongest to weakest:
max>xhigh>high>medium>low>minimal>none
GLM-5.2 mapping rules (for compatibility with other protocols):
xhigh→ equivalent tomaxlow/medium→ equivalent tohighnone/minimal→ skip thinking (no deep reasoning)
max, xhigh, high, medium, low, minimal, none "max"
Whether to enable the sampling strategy
true(default): Usestemperature/top_pfor random sampling, producing more varied outputfalse: Always selects the highest-probability token (greedy decoding), producing more deterministic output; in this casetemperatureandtop_pare ignored
For tasks requiring consistency and reproducibility (such as code generation and translation), setting this to false is recommended
true
Sampling temperature, controlling the randomness and creativity of the output
Notes:
- Value range:
[0.0, 1.0], limited to two decimal places - Higher values (e.g. 0.8): more random and creative, suitable for creative writing
- Lower values (e.g. 0.2): more stable and deterministic, suitable for factual Q&A and code generation
- GLM-5.2 default value:
1.0
Recommendation: Do not adjust both temperature and top_p at the same time
0 <= x <= 11
Nucleus Sampling parameter, an alternative to temperature sampling
Notes:
- Value range:
[0.01, 1.0], limited to two decimal places - The model only considers candidate tokens whose cumulative probability reaches
top_p; for example, 0.1 means only the top 10% probability tokens are considered - Smaller values produce more focused and consistent output; larger values increase diversity
- GLM-5.2 default value:
0.95
Recommendation: Do not adjust both temperature and top_p at the same time
0.01 <= x <= 10.95
The maximum number of tokens limit for the model's output
Notes:
- GLM-5.2 supports up to 131,072 tokens (128K) of output length; setting no less than
1024is recommended - When
thinkingis enabled, chain-of-thought tokens are also counted toward this limit - If generation is truncated due to
length, try increasing this value
1 <= x <= 1310721024
The list of tools the model can call
Notes:
- Supports function calling (
function), knowledge base retrieval (retrieval), web search (web_search), and MCP (mcp) - Supports up to 128 functions
128- Function Tool
- Retrieval Tool (Knowledge Base Retrieval)
- Web Search Tool (Web Search)
- MCP Tool
Controls how the model selects which function to call
Notes: Only takes effect when the tool type is function; defaults to and only supports auto (the model automatically decides whether to call a tool)
auto "auto"
The list of stop words
Notes:
- When the generated text encounters a specified string, generation stops immediately (the stop word itself is not included in the returned text)
- Currently only a single stop word is supported, in the format
["stop_word1"], for example["Human:"]
4["Human:"]Specifies the model's response output format; defaults to text
Notes:
{ "type": "json_object" }enables JSON mode, and the model returns valid JSON-formatted data, suitable for scenarios such as structured data extraction- When using JSON mode, it is recommended to explicitly request JSON output in the
systemorusermessage
Unique request identifier
Notes:
- Passed by the client, 6-64 characters long; using UUID format is recommended to ensure uniqueness
- If not provided, the platform will generate one automatically
6 - 64"req-7f3a2c1e8b9d4f0a"
Unique identifier of the end user
Notes: 6-128 characters long; using a unique identifier that does not contain sensitive information is recommended, which can help the platform monitor and detect abusive behavior
6 - 128"user-abc123456"
Response
Conversation generated successfully
Task ID
"chatcmpl-a6613b56-c61c-94ba-9a9f-43d4cdc7d77a"
Response type
chat.completion "chat.completion"
Request ID (returned when request_id is provided in the request)
"req-7f3a2c1e8b9d4f0a"
Request creation time, Unix timestamp (seconds)
1777021417
Model name
"glm-5.2"
The list of model responses
Token usage statistics returned when the call ends
Web search-related information, returned when the web_search tool is used and a search is triggered
Content safety-related information