DeepSeek V4 - OpenAI-Compatible API
- Call the DeepSeek V4 model using the OpenAI Chat Completions protocol
- Supports two models:
deepseek-v4-flash(fast general-purpose) anddeepseek-v4-pro(deep reasoning) - Plain text conversation: Single- or multi-turn contextual dialogue with 1M ultra-long context
- System prompts: Customize the AI’s role and behavior
- Thinking mode: Control deep reasoning via
thinking.type;deepseek-v4-proreturns thinking content throughreasoning_content - Streaming output: SSE streaming returns are supported
- Tool calling: Supports Function Calling (up to 128 tools)
- JSON mode: Enabled via
response_format - Context caching: Requests with identical prefixes automatically hit the cache, substantially lowering input cost
https://direct.evolink.ai, which has better support for text models and long-lived connections. https://api.evolink.ai is the primary endpoint for multimodal services and serves as a fallback address for text models.Authorizations
##All APIs require Bearer Token authentication##
Get API Key:
Visit the API Key Management Page to obtain your API Key
Add to request header:
Authorization: Bearer YOUR_API_KEYBody
Chat model name
deepseek-v4-flash: Fast general-purpose model, 1M contextdeepseek-v4-pro: Deep reasoning model, excels at math, programming, and complex logic
Tip: Both models have thinking enabled by default, and responses include reasoning_content. Set thinking.type="disabled" to turn it off and reduce output token cost. Both models share identical parameters.
deepseek-v4-flash, deepseek-v4-pro "deepseek-v4-flash"
List of conversation messages, supports multi-turn dialogue
Messages with different roles have different field structures; select the corresponding role to view details
1- System Message
- User Message
- Assistant Message
- Tool Message
Thinking mode control (new in V4)
Notes:
- Controls the deep thinking (Chain of Thought) feature
- Enabled by default on both models (
type=enabled) - When enabled, the reasoning process is returned through
choices[].message.reasoning_contentand billed as output tokens
⚠️ Multi-turn / tool-calling caveat: If the current response includes reasoning_content, the corresponding assistant message in the messages history of the next request must echo that field verbatim, otherwise the API returns 400 The reasoning_content in the thinking mode must be passed back to the API. If you would rather not handle it, set thinking.type="disabled" explicitly for the whole session.
Sampling temperature, controls randomness of output
Notes:
- Lower values (e.g., 0.2): More deterministic, more focused output
- Higher values (e.g., 1.5): More random, more creative output
- Default: 1
0 <= x <= 21
Nucleus sampling parameter
Notes:
- Controls sampling from tokens with cumulative probability
- For example, 0.9 means sampling from tokens whose cumulative probability reaches 90%
- Default: 1.0 (considers all tokens)
Suggestion: Do not adjust temperature and top_p simultaneously
0 <= x <= 11
Limits the maximum number of tokens generated
Notes:
- The V4 series can reach up to 384,000 tokens
- When thinking is enabled, reasoning_tokens also count toward the max_tokens limit
- If not set, the model decides the generation length on its own
1 <= x <= 3840004096
Frequency penalty, used to reduce repetitive content
Notes:
- Positive values penalize tokens based on their frequency in the already-generated text
- The higher the value, the less likely repetition becomes
- Default: 0 (no penalty)
-2 <= x <= 20
Presence penalty, used to encourage new topics
Notes:
- Positive values penalize tokens based on whether they have already appeared in the text
- The higher the value, the more the model tends to discuss new topics
- Default: 0 (no penalty)
-2 <= x <= 20
Specifies the response format
Notes:
- Set to
{"type": "json_object"}to enable JSON mode - In JSON mode the model outputs valid JSON content
- For best results, explicitly ask for JSON output in your system or user message
Stop sequences; generation stops when the model encounters any of these strings
Notes:
- Can be a single string or an array of strings
- Up to 16 stop sequences are supported
Whether to stream the response
true: Stream response; returns content chunk by chunk in real time via SSE (Server-Sent Events)false: Wait for the full response and return it at once (default)
false
Streaming response options
Only effective when stream=true
List of tool definitions for Function Calling
Notes:
- Up to 128 tool definitions are supported
- Each tool must define a name, description, and parameter schema
128Controls tool-calling behavior
Options:
none: Do not call any toolauto: Let the model decide whether to call a tool (default when tools are provided)required: Force the model to call one or more tools- Object form
{"type":"function","function":{"name":"xxx"}}: Call the specified tool
Default: none when no tools are provided, auto when tools are provided
none, auto, required Whether to return token log probabilities
Notes:
- When set to
true, the response includes log probability information for each token
Return log probabilities of the top N tokens
Notes:
- Requires
logprobsto betrue - Range:
[0, 20]
0 <= x <= 20Token bias map
Notes:
- Keys are token IDs in the tokenizer; values are bias values between -100 and 100
- -100 completely bans the token, 100 forces it to be generated
- Typical values in the range -1 to 1 already produce observable effects
Number of chat completion choices to generate for each input message
Notes:
- Default 1; if set to N, N candidates are returned (billed as N × output_tokens)
1 <= x <= 81
Random seed (Beta)
Notes:
- When specified, the model attempts deterministic sampling
- Same seed + same other parameters → same output (not guaranteed 100%)
Unique identifier representing the end user
Notes:
- Helps the platform monitor and detect abuse
- A hashed user ID is recommended
Response
Chat completion successful
Unique identifier for the chat completion
"53c548dc-ec02-4a2f-bbb6-eca4184630b8"
Model name actually used
"deepseek-v4-flash"
Response type
chat.completion "chat.completion"
Creation timestamp (Unix seconds)
1777021417
List of completion choices
Token usage statistics (including cache and reasoning breakdowns)
System fingerprint identifier
"fp_evolink_v4_20260402"