Skip to main content
POST
/
v1
/
chat
/
completions
curl --request POST \
  --url https://direct.evolink.ai/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "glm-5.2",
  "messages": [
    {
      "role": "user",
      "content": "Please introduce yourself"
    }
  ]
}
'
{
  "id": "chatcmpl-a6613b56-c61c-94ba-9a9f-43d4cdc7d77a",
  "object": "chat.completion",
  "request_id": "req-7f3a2c1e8b9d4f0a",
  "created": 1777021417,
  "model": "glm-5.2",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm GLM-5.2, and I can help you with a variety of tasks such as conversation, reasoning, writing, and code.",
        "reasoning_content": "Let me first analyze this problem...",
        "tool_calls": [
          {
            "id": "<string>",
            "function": {
              "name": "<string>",
              "arguments": "<string>"
            },
            "mcp": {
              "id": "<string>",
              "server_label": "<string>",
              "error": "<string>"
            }
          }
        ]
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 346,
    "total_tokens": 370,
    "prompt_tokens_details": {
      "cached_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 321
    }
  },
  "web_search": [
    {
      "icon": "<string>",
      "title": "<string>",
      "link": "<string>",
      "media": "<string>",
      "publish_date": "<string>",
      "content": "<string>",
      "refer": "<string>"
    }
  ],
  "content_filter": [
    {
      "level": 1
    }
  ]
}
BaseURL: The default BaseURL is https://direct.evolink.ai, which has better support for text models and long-lived connections. https://api.evolink.ai is the primary endpoint for multimodal services and serves as a fallback address for text models.

Authorizations

Authorization
string
header
required

##All interfaces require authentication using a Bearer Token##

Get an API Key:

Visit the API Key management page to obtain your API Key

Add it to the request header when using:

Authorization: Bearer YOUR_API_KEY

Body

application/json
model
enum<string>
default:glm-5.2
required

The model code to call

  • glm-5.2: Latest flagship model, offering complex reasoning, ultra-long context, and exceptional inference speed
Available options:
glm-5.2
Example:

"glm-5.2"

messages
(System Message · object | User Message · object | Assistant Message · object | Tool Message · object)[]
required

The list of conversation messages, containing the full context of the current conversation

Supports four roles: system, user, assistant, tool. Messages of different roles have different field structures; please select the corresponding role to view details. Must contain at least 1 message, and cannot consist solely of system or assistant messages.

Minimum array length: 1
stream
boolean
default:false

Whether to enable streaming output mode

  • false: The model generates the complete response and returns it all at once (default), suitable for short text and batch processing
  • true: Returns chunks in real time via Server-Sent Events (SSE), suitable for chat and long text; returns data: [DONE] when the stream ends
Example:

false

thinking
object

Controls whether to enable the chain of thought (Chain of Thought)

reasoning_effort
enum<string>
default:max

Controls the model's reasoning intensity (exclusive to GLM-5.2)

Notes:

  • Only takes effect when thinking is enabled; defaults to max
  • Values from strongest to weakest: max > xhigh > high > medium > low > minimal > none

GLM-5.2 mapping rules (for compatibility with other protocols):

  • xhigh → equivalent to max
  • low / medium → equivalent to high
  • none / minimal → skip thinking (no deep reasoning)
Available options:
max,
xhigh,
high,
medium,
low,
minimal,
none
Example:

"max"

do_sample
boolean
default:true

Whether to enable the sampling strategy

  • true (default): Uses temperature / top_p for random sampling, producing more varied output
  • false: Always selects the highest-probability token (greedy decoding), producing more deterministic output; in this case temperature and top_p are ignored

For tasks requiring consistency and reproducibility (such as code generation and translation), setting this to false is recommended

Example:

true

temperature
number<float>
default:1

Sampling temperature, controlling the randomness and creativity of the output

Notes:

  • Value range: [0.0, 1.0], limited to two decimal places
  • Higher values (e.g. 0.8): more random and creative, suitable for creative writing
  • Lower values (e.g. 0.2): more stable and deterministic, suitable for factual Q&A and code generation
  • GLM-5.2 default value: 1.0

Recommendation: Do not adjust both temperature and top_p at the same time

Required range: 0 <= x <= 1
Example:

1

top_p
number<float>
default:0.95

Nucleus Sampling parameter, an alternative to temperature sampling

Notes:

  • Value range: [0.01, 1.0], limited to two decimal places
  • The model only considers candidate tokens whose cumulative probability reaches top_p; for example, 0.1 means only the top 10% probability tokens are considered
  • Smaller values produce more focused and consistent output; larger values increase diversity
  • GLM-5.2 default value: 0.95

Recommendation: Do not adjust both temperature and top_p at the same time

Required range: 0.01 <= x <= 1
Example:

0.95

max_tokens
integer

The maximum number of tokens limit for the model's output

Notes:

  • GLM-5.2 supports up to 131,072 tokens (128K) of output length; setting no less than 1024 is recommended
  • When thinking is enabled, chain-of-thought tokens are also counted toward this limit
  • If generation is truncated due to length, try increasing this value
Required range: 1 <= x <= 131072
Example:

1024

tools
(Function Tool · object | Retrieval Tool (Knowledge Base Retrieval) · object | Web Search Tool (Web Search) · object | MCP Tool · object)[]

The list of tools the model can call

Notes:

  • Supports function calling (function), knowledge base retrieval (retrieval), web search (web_search), and MCP (mcp)
  • Supports up to 128 functions
Maximum array length: 128
tool_choice
enum<string>
default:auto

Controls how the model selects which function to call

Notes: Only takes effect when the tool type is function; defaults to and only supports auto (the model automatically decides whether to call a tool)

Available options:
auto
Example:

"auto"

stop
string[]

The list of stop words

Notes:

  • When the generated text encounters a specified string, generation stops immediately (the stop word itself is not included in the returned text)
  • Currently only a single stop word is supported, in the format ["stop_word1"], for example ["Human:"]
Maximum array length: 4
Example:
["Human:"]
response_format
object

Specifies the model's response output format; defaults to text

Notes:

  • { "type": "json_object" } enables JSON mode, and the model returns valid JSON-formatted data, suitable for scenarios such as structured data extraction
  • When using JSON mode, it is recommended to explicitly request JSON output in the system or user message
request_id
string

Unique request identifier

Notes:

  • Passed by the client, 6-64 characters long; using UUID format is recommended to ensure uniqueness
  • If not provided, the platform will generate one automatically
Required string length: 6 - 64
Example:

"req-7f3a2c1e8b9d4f0a"

user_id
string

Unique identifier of the end user

Notes: 6-128 characters long; using a unique identifier that does not contain sensitive information is recommended, which can help the platform monitor and detect abusive behavior

Required string length: 6 - 128
Example:

"user-abc123456"

Response

Conversation generated successfully

id
string

Task ID

Example:

"chatcmpl-a6613b56-c61c-94ba-9a9f-43d4cdc7d77a"

object
enum<string>

Response type

Available options:
chat.completion
Example:

"chat.completion"

request_id
string

Request ID (returned when request_id is provided in the request)

Example:

"req-7f3a2c1e8b9d4f0a"

created
integer

Request creation time, Unix timestamp (seconds)

Example:

1777021417

model
string

Model name

Example:

"glm-5.2"

choices
object[]

The list of model responses

usage
object

Token usage statistics returned when the call ends

Web search-related information, returned when the web_search tool is used and a search is triggered

content_filter
object[]

Content safety-related information