Authorizations
##All endpoints require Bearer Token authentication##
Get API Key:
Visit API Key Management Page to get your API Key
Add to request headers:
Authorization: Bearer YOUR_API_KEYBody
Model name for chat completion
kimi-k2-thinking, kimi-k2-thinking-turbo "kimi-k2-thinking"
List of messages for the conversation, supports multi-turn dialogue and multimodal input
1Whether to stream the response
true: Stream response, returns content chunk by chunk in real-timefalse: Wait for complete response and return all at once
false
Maximum number of tokens to generate in the response
Note:
- Too small value may cause truncated response
- If max tokens is reached, finish_reason will be "length", otherwise "stop"
x >= 12000
Sampling temperature, controls randomness of output
Note:
- Lower values (e.g., 0.2): More deterministic and focused output
- Higher values (e.g., 1.5): More random and creative output
- Recommended value for kimi-k2-thinking series: 1.0
0 <= x <= 21
Nucleus sampling parameter
Note:
- Controls sampling from tokens with cumulative probability
- For example, 0.9 means sampling from tokens with top 90% cumulative probability
- Default: 1.0 (considers all tokens)
Suggestion: Do not adjust both temperature and top_p simultaneously
0 <= x <= 10.9
Top-K sampling parameter
Note:
- For example, 10 limits sampling to the top 10 highest probability tokens
- Smaller values make output more focused
- Default: no limit
x >= 140
Number of completions to generate for each input message
Note:
- Default: 1, maximum: 5
- When temperature is very close to 0, only 1 result can be returned
1 <= x <= 51
Presence penalty, number between -2.0 and 2.0
Note:
- Positive values penalize new tokens based on whether they appear in the text, increasing likelihood of discussing new topics
-2 <= x <= 20
Frequency penalty, number between -2.0 and 2.0
Note:
- Positive values penalize new tokens based on their frequency in the text, decreasing likelihood of repeating same phrases verbatim
-2 <= x <= 20
Response format settings
Note:
- Set to {"type": "json_object"} to enable JSON mode, ensuring model generates valid JSON
- When using response_format with {"type": "json_object"}, explicitly guide the model to output JSON format in your prompt
- Default: {"type": "text"}
- Warning: Do not mix partial mode with response_format=json_object
Stop sequences, generation stops when these sequences are matched
Note:
- The stop sequences themselves will not be included in the output
- Maximum 5 strings, each no longer than 32 bytes Single stop word
List of tools for Tool Use or Function Calling
Note:
- Each tool must include a type
- The function structure must include name, description, and parameters
- Maximum 128 functions in tools array
128Response
Chat completion successful
Unique identifier for the chat completion
"cmpl-04ea926191a14749b7f2c7a48a68abc6"
The model used for completion
"kimi-k2-thinking"
Response type
chat.completion "chat.completion"
Unix timestamp when the completion was created
1698999496
List of completion choices
Token usage statistics