curl --request POST \
  --url https://direct.evolink.ai/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "glm-5.2",
  "messages": [
    {
      "role": "user",
      "content": "Please introduce yourself"
    }
  ]
}
'

{
  "id": "chatcmpl-a6613b56-c61c-94ba-9a9f-43d4cdc7d77a",
  "object": "chat.completion",
  "request_id": "req-7f3a2c1e8b9d4f0a",
  "created": 1777021417,
  "model": "glm-5.2",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm GLM-5.2, and I can help you with a variety of tasks such as conversation, reasoning, writing, and code.",
        "reasoning_content": "Let me first analyze this problem...",
        "tool_calls": [
          {
            "id": "<string>",
            "function": {
              "name": "<string>",
              "arguments": "<string>"
            },
            "mcp": {
              "id": "<string>",
              "server_label": "<string>",
              "error": "<string>"
            }
          }
        ]
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 346,
    "total_tokens": 370,
    "prompt_tokens_details": {
      "cached_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 321
    }
  },
  "web_search": [
    {
      "icon": "<string>",
      "title": "<string>",
      "link": "<string>",
      "media": "<string>",
      "publish_date": "<string>",
      "content": "<string>",
      "refer": "<string>"
    }
  ],
  "content_filter": [
    {
      "level": 1
    }
  ]
}

GLM-5.2

GLM-5.2 - OpenAI Compatible API

Use the OpenAI Chat Completions protocol to call the GLM-5.2 model
Synchronous processing mode, returning conversation content in real time
Plain-text conversation: Single-turn or multi-turn contextual dialogue
System prompts: Customize the AI’s role and behavior via role=system messages
Deep thinking: Toggle the chain of thought via thinking.type, and adjust reasoning intensity with reasoning_effort; the reasoning process is returned via reasoning_content
Streaming output: Supports SSE streaming responses (stream=true)
Tool calling: Supports Function Calling, knowledge base retrieval (retrieval), web search (web_search), and MCP (up to 128 tools)
Structured output: Enable JSON mode via response_format

Streaming response notes: When stream=true, responses are returned via Server-Sent Events, with each message formatted as data: {JSON}, and data: [DONE] returned at the end. Each data chunk (ChatCompletionChunk) contains id, created, model, choices, and optional usage and content_filter; within it, choices[].delta incrementally returns role / content / reasoning_content / tool_calls, and choices[].finish_reason provides the termination reason in the final chunk.

POST

chat

completions

curl --request POST \
  --url https://direct.evolink.ai/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "glm-5.2",
  "messages": [
    {
      "role": "user",
      "content": "Please introduce yourself"
    }
  ]
}
'

{
  "id": "chatcmpl-a6613b56-c61c-94ba-9a9f-43d4cdc7d77a",
  "object": "chat.completion",
  "request_id": "req-7f3a2c1e8b9d4f0a",
  "created": 1777021417,
  "model": "glm-5.2",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm GLM-5.2, and I can help you with a variety of tasks such as conversation, reasoning, writing, and code.",
        "reasoning_content": "Let me first analyze this problem...",
        "tool_calls": [
          {
            "id": "<string>",
            "function": {
              "name": "<string>",
              "arguments": "<string>"
            },
            "mcp": {
              "id": "<string>",
              "server_label": "<string>",
              "error": "<string>"
            }
          }
        ]
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 346,
    "total_tokens": 370,
    "prompt_tokens_details": {
      "cached_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 321
    }
  },
  "web_search": [
    {
      "icon": "<string>",
      "title": "<string>",
      "link": "<string>",
      "media": "<string>",
      "publish_date": "<string>",
      "content": "<string>",
      "refer": "<string>"
    }
  ],
  "content_filter": [
    {
      "level": 1
    }
  ]
}

BaseURL: The default BaseURL is https://direct.evolink.ai, which has better support for text models and long-lived connections. https://api.evolink.ai is the primary endpoint for multimodal services and serves as a fallback address for text models.

Authorizations

Authorization

string

header

required

##All interfaces require authentication using a Bearer Token##

Get an API Key:

Visit the API Key management page to obtain your API Key

Add it to the request header when using:

Authorization: Bearer YOUR_API_KEY

Body

application/json

model

enum<string>

default:glm-5.2

required

The model code to call

glm-5.2: Latest flagship model, offering complex reasoning, ultra-long context, and exceptional inference speed

Available options:

glm-5.2

Example:

"glm-5.2"

messages

(System Message · object | User Message · object | Assistant Message · object | Tool Message · object)[]

required

The list of conversation messages, containing the full context of the current conversation

Supports four roles: system, user, assistant, tool. Messages of different roles have different field structures; please select the corresponding role to view details. Must contain at least 1 message, and cannot consist solely of system or assistant messages.

Minimum array length: 1

System Message
User Message
Assistant Message
Tool Message

Show child attributes

stream

boolean

default:false

Whether to enable streaming output mode

false: The model generates the complete response and returns it all at once (default), suitable for short text and batch processing
true: Returns chunks in real time via Server-Sent Events (SSE), suitable for chat and long text; returns data: [DONE] when the stream ends

Example:

false

thinking

object

Controls whether to enable the chain of thought (Chain of Thought)

Show child attributes

reasoning_effort

enum<string>

default:max

Controls the model's reasoning intensity (exclusive to GLM-5.2)

Notes:

Only takes effect when thinking is enabled; defaults to max
Values from strongest to weakest: max > xhigh > high > medium > low > minimal > none

GLM-5.2 mapping rules (for compatibility with other protocols):

xhigh → equivalent to max
low / medium → equivalent to high
none / minimal → skip thinking (no deep reasoning)

Available options:

max,

xhigh,

high,

medium,

low,

minimal,

none

Example:

"max"

do_sample

boolean

default:true

Whether to enable the sampling strategy

true (default): Uses temperature / top_p for random sampling, producing more varied output
false: Always selects the highest-probability token (greedy decoding), producing more deterministic output; in this case temperature and top_p are ignored

For tasks requiring consistency and reproducibility (such as code generation and translation), setting this to false is recommended

Example:

true

temperature

number<float>

default:1

Sampling temperature, controlling the randomness and creativity of the output

Notes:

Value range: [0.0, 1.0], limited to two decimal places
Higher values (e.g. 0.8): more random and creative, suitable for creative writing
Lower values (e.g. 0.2): more stable and deterministic, suitable for factual Q&A and code generation
GLM-5.2 default value: 1.0

Recommendation: Do not adjust both temperature and top_p at the same time

Required range: 0 <= x <= 1

Example:

1

top_p

number<float>

default:0.95

Nucleus Sampling parameter, an alternative to temperature sampling

Notes:

Value range: [0.01, 1.0], limited to two decimal places
The model only considers candidate tokens whose cumulative probability reaches top_p; for example, 0.1 means only the top 10% probability tokens are considered
Smaller values produce more focused and consistent output; larger values increase diversity
GLM-5.2 default value: 0.95

Recommendation: Do not adjust both temperature and top_p at the same time

Required range: 0.01 <= x <= 1

Example:

0.95

max_tokens

integer

The maximum number of tokens limit for the model's output

Notes:

GLM-5.2 supports up to 131,072 tokens (128K) of output length; setting no less than 1024 is recommended
When thinking is enabled, chain-of-thought tokens are also counted toward this limit
If generation is truncated due to length, try increasing this value

Required range: 1 <= x <= 131072

Example:

1024

tools

(Function Tool · object | Retrieval Tool (Knowledge Base Retrieval) · object | Web Search Tool (Web Search) · object | MCP Tool · object)[]

The list of tools the model can call

Notes:

Supports function calling (function), knowledge base retrieval (retrieval), web search (web_search), and MCP (mcp)
Supports up to 128 functions

Maximum array length: 128

Function Tool
Retrieval Tool (Knowledge Base Retrieval)
Web Search Tool (Web Search)
MCP Tool

Show child attributes

tool_choice

enum<string>

default:auto

Controls how the model selects which function to call

Notes: Only takes effect when the tool type is function; defaults to and only supports auto (the model automatically decides whether to call a tool)

Available options:

auto

Example:

"auto"

stop

string[]

The list of stop words

Notes:

When the generated text encounters a specified string, generation stops immediately (the stop word itself is not included in the returned text)
Currently only a single stop word is supported, in the format ["stop_word1"], for example ["Human:"]

Maximum array length: 4

Example:

["Human:"]

response_format

object

Specifies the model's response output format; defaults to text

Notes:

{ "type": "json_object" } enables JSON mode, and the model returns valid JSON-formatted data, suitable for scenarios such as structured data extraction
When using JSON mode, it is recommended to explicitly request JSON output in the system or user message

Show child attributes

request_id

string

Unique request identifier

Notes:

Passed by the client, 6-64 characters long; using UUID format is recommended to ensure uniqueness
If not provided, the platform will generate one automatically

Required string length: 6 - 64

Example:

"req-7f3a2c1e8b9d4f0a"

user_id

string

Unique identifier of the end user

Notes: 6-128 characters long; using a unique identifier that does not contain sensitive information is recommended, which can help the platform monitor and detect abusive behavior

Required string length: 6 - 128

Example:

"user-abc123456"

Response

Conversation generated successfully

string

Task ID

Example:

"chatcmpl-a6613b56-c61c-94ba-9a9f-43d4cdc7d77a"

object

enum<string>

Response type

Available options:

chat.completion

Example:

"chat.completion"

request_id

string

Request ID (returned when request_id is provided in the request)

Example:

"req-7f3a2c1e8b9d4f0a"

created

integer

Request creation time, Unix timestamp (seconds)

Example:

1777021417

model

string

Model name

Example:

"glm-5.2"

choices

object[]

The list of model responses

Show child attributes

usage

object

Token usage statistics returned when the call ends

Show child attributes

web_search

object[]

Web search-related information, returned when the web_search tool is used and a search is triggered

Show child attributes

content_filter

object[]

Content safety-related information

Show child attributes

MiniMax-M2.5 - Complete API Reference Get Credits Usage