deepseek-v4-flash (fast general-purpose) and deepseek-v4-pro (deep reasoning)thinking.type; deepseek-v4-pro returns thinking content through reasoning_contentresponse_formathttps://direct.evolink.ai, which has better support for text models and long-lived connections. https://api.evolink.ai is the primary endpoint for multimodal services and serves as a fallback address for text models.##All APIs require Bearer Token authentication##
Get API Key:
Visit the API Key Management Page to obtain your API Key
Add to request header:
Authorization: Bearer YOUR_API_KEYChat model name
deepseek-v4-flash: Fast general-purpose model, 1M contextdeepseek-v4-pro: Deep reasoning model, excels at math, programming, and complex logicTip: Both models have thinking enabled by default, and responses include reasoning_content. Set thinking.type="disabled" to turn it off and reduce output token cost. Both models share identical parameters.
deepseek-v4-flash, deepseek-v4-pro "deepseek-v4-flash"
List of conversation messages, supports multi-turn dialogue
Messages with different roles have different field structures; select the corresponding role to view details
1Thinking mode control (new in V4)
Notes:
type=enabled)choices[].message.reasoning_content and billed as output tokens⚠️ Multi-turn / tool-calling caveat: If the current response includes reasoning_content, the corresponding assistant message in the messages history of the next request must echo that field verbatim, otherwise the API returns 400 The reasoning_content in the thinking mode must be passed back to the API. If you would rather not handle it, set thinking.type="disabled" explicitly for the whole session.
Sampling temperature, controls randomness of output
Notes:
0 <= x <= 21
Nucleus sampling parameter
Notes:
Suggestion: Do not adjust temperature and top_p simultaneously
0 <= x <= 11
Limits the maximum number of tokens generated
Notes:
1 <= x <= 3840004096
Frequency penalty, used to reduce repetitive content
Notes:
-2 <= x <= 20
Presence penalty, used to encourage new topics
Notes:
-2 <= x <= 20
Specifies the response format
Notes:
{"type": "json_object"} to enable JSON modeStop sequences; generation stops when the model encounters any of these strings
Notes:
Whether to stream the response
true: Stream response; returns content chunk by chunk in real time via SSE (Server-Sent Events)false: Wait for the full response and return it at once (default)false
Streaming response options
Only effective when stream=true
List of tool definitions for Function Calling
Notes:
128Controls tool-calling behavior
Options:
none: Do not call any toolauto: Let the model decide whether to call a tool (default when tools are provided)required: Force the model to call one or more tools{"type":"function","function":{"name":"xxx"}}: Call the specified toolDefault: none when no tools are provided, auto when tools are provided
none, auto, required Whether to return token log probabilities
Notes:
true, the response includes log probability information for each tokenReturn log probabilities of the top N tokens
Notes:
logprobs to be true[0, 20]0 <= x <= 20Token bias map
Notes:
Number of chat completion choices to generate for each input message
Notes:
1 <= x <= 81
Random seed (Beta)
Notes:
Unique identifier representing the end user
Notes:
Chat completion successful
Unique identifier for the chat completion
"53c548dc-ec02-4a2f-bbb6-eca4184630b8"
Model name actually used
"deepseek-v4-flash"
Response type
chat.completion "chat.completion"
Creation timestamp (Unix seconds)
1777021417
List of completion choices
Token usage statistics (including cache and reasoning breakdowns)
System fingerprint identifier
"fp_evolink_v4_20260402"