Skip to main content
POST
/
v1
/
chat
/
completions
curl --request POST \
--url https://api.evolink.ai/v1/chat/completions \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '{
"model": "kimi-k2-thinking",
"messages": [
{
"role": "user",
"content": "Please introduce yourself"
}
],
"temperature": 1
}'
{
  "id": "cmpl-04ea926191a14749b7f2c7a48a68abc6",
  "model": "kimi-k2-thinking",
  "object": "chat.completion",
  "created": 1698999496,
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hi there! How can I help you?",
        "reasoning_content": "The user just said \"hi\". This is a very simple greeting. I should be friendly, helpful, and professional in my response..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 8,
    "completion_tokens": 292,
    "total_tokens": 300,
    "prompt_tokens_details": {
      "cached_tokens": 8
    }
  }
}

Authorizations

Authorization
string
header
required

##All endpoints require Bearer Token authentication##

Get API Key:

Visit API Key Management Page to get your API Key

Add to request headers:

Authorization: Bearer YOUR_API_KEY

Body

application/json
model
enum<string>
required

Model name for chat completion

Available options:
kimi-k2-thinking,
kimi-k2-thinking-turbo
Example:

"kimi-k2-thinking"

messages
object[]
required

List of messages for the conversation, supports multi-turn dialogue and multimodal input

Minimum length: 1
stream
boolean
default:false

Whether to stream the response

  • true: Stream response, returns content chunk by chunk in real-time
  • false: Wait for complete response and return all at once
Example:

false

max_tokens
integer

Maximum number of tokens to generate in the response

Note:

  • Too small value may cause truncated response
  • If max tokens is reached, finish_reason will be "length", otherwise "stop"
Required range: x >= 1
Example:

2000

temperature
number
default:1

Sampling temperature, controls randomness of output

Note:

  • Lower values (e.g., 0.2): More deterministic and focused output
  • Higher values (e.g., 1.5): More random and creative output
  • Recommended value for kimi-k2-thinking series: 1.0
Required range: 0 <= x <= 2
Example:

1

top_p
number
default:1

Nucleus sampling parameter

Note:

  • Controls sampling from tokens with cumulative probability
  • For example, 0.9 means sampling from tokens with top 90% cumulative probability
  • Default: 1.0 (considers all tokens)

Suggestion: Do not adjust both temperature and top_p simultaneously

Required range: 0 <= x <= 1
Example:

0.9

top_k
integer

Top-K sampling parameter

Note:

  • For example, 10 limits sampling to the top 10 highest probability tokens
  • Smaller values make output more focused
  • Default: no limit
Required range: x >= 1
Example:

40

n
integer
default:1

Number of completions to generate for each input message

Note:

  • Default: 1, maximum: 5
  • When temperature is very close to 0, only 1 result can be returned
Required range: 1 <= x <= 5
Example:

1

presence_penalty
number
default:0

Presence penalty, number between -2.0 and 2.0

Note:

  • Positive values penalize new tokens based on whether they appear in the text, increasing likelihood of discussing new topics
Required range: -2 <= x <= 2
Example:

0

frequency_penalty
number
default:0

Frequency penalty, number between -2.0 and 2.0

Note:

  • Positive values penalize new tokens based on their frequency in the text, decreasing likelihood of repeating same phrases verbatim
Required range: -2 <= x <= 2
Example:

0

response_format
object

Response format settings

Note:

  • Set to {"type": "json_object"} to enable JSON mode, ensuring model generates valid JSON
  • When using response_format with {"type": "json_object"}, explicitly guide the model to output JSON format in your prompt
  • Default: {"type": "text"}
  • Warning: Do not mix partial mode with response_format=json_object
stop

Stop sequences, generation stops when these sequences are matched

Note:

  • The stop sequences themselves will not be included in the output
  • Maximum 5 strings, each no longer than 32 bytes Single stop word
tools
object[]

List of tools for Tool Use or Function Calling

Note:

  • Each tool must include a type
  • The function structure must include name, description, and parameters
  • Maximum 128 functions in tools array
Maximum length: 128

Response

Chat completion successful

id
string

Unique identifier for the chat completion

Example:

"cmpl-04ea926191a14749b7f2c7a48a68abc6"

model
string

The model used for completion

Example:

"kimi-k2-thinking"

object
enum<string>

Response type

Available options:
chat.completion
Example:

"chat.completion"

created
integer

Unix timestamp when the completion was created

Example:

1698999496

choices
object[]

List of completion choices

usage
object

Token usage statistics