curl --request POST \
  --url https://direct.evolink.ai/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "deepseek-v4-flash",
  "messages": [
    {
      "role": "user",
      "content": "Please introduce yourself"
    }
  ]
}
'

{
  "id": "837f529d-00f9-4731-b2e1-4a54fc31790a",
  "object": "chat.completion",
  "created": 1777026806,
  "model": "deepseek-v4-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I am the DeepSeek assistant, always ready to answer your questions and help you out."
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 7,
    "completion_tokens": 31,
    "total_tokens": 38,
    "prompt_tokens_details": {
      "cached_tokens": 0
    },
    "prompt_cache_hit_tokens": 0,
    "prompt_cache_miss_tokens": 7
  },
  "system_fingerprint": "fp_evolink_v4_20260402"
}

Chat Completions API

DeepSeek V4 - OpenAI-Compatible API

Call the DeepSeek V4 model using the OpenAI Chat Completions protocol
Supports two models: deepseek-v4-flash (fast general-purpose) and deepseek-v4-pro (deep reasoning)
Plain text conversation: Single- or multi-turn contextual dialogue with 1M ultra-long context
System prompts: Customize the AI’s role and behavior
Thinking mode: Control deep reasoning via thinking.type; deepseek-v4-pro returns thinking content through reasoning_content
Streaming output: SSE streaming returns are supported
Tool calling: Supports Function Calling (up to 128 tools)
JSON mode: Enabled via response_format
Context caching: Requests with identical prefixes automatically hit the cache, substantially lowering input cost

POST

chat

completions

curl --request POST \
  --url https://direct.evolink.ai/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "deepseek-v4-flash",
  "messages": [
    {
      "role": "user",
      "content": "Please introduce yourself"
    }
  ]
}
'

{
  "id": "837f529d-00f9-4731-b2e1-4a54fc31790a",
  "object": "chat.completion",
  "created": 1777026806,
  "model": "deepseek-v4-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I am the DeepSeek assistant, always ready to answer your questions and help you out."
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 7,
    "completion_tokens": 31,
    "total_tokens": 38,
    "prompt_tokens_details": {
      "cached_tokens": 0
    },
    "prompt_cache_hit_tokens": 0,
    "prompt_cache_miss_tokens": 7
  },
  "system_fingerprint": "fp_evolink_v4_20260402"
}

BaseURL: The default BaseURL is https://direct.evolink.ai, which has better support for text models and long-lived connections. https://api.evolink.ai is the primary endpoint for multimodal services and serves as a fallback address for text models.

Authorizations

Authorization

string

header

required

##All APIs require Bearer Token authentication##

Get API Key:

Visit the API Key Management Page to obtain your API Key

Add to request header:

Authorization: Bearer YOUR_API_KEY

Body

application/json

model

enum<string>

default:deepseek-v4-flash

required

Chat model name

deepseek-v4-flash: Fast general-purpose model, 1M context
deepseek-v4-pro: Deep reasoning model, excels at math, programming, and complex logic

Tip: Both models have thinking enabled by default, and responses include reasoning_content. Set thinking.type="disabled" to turn it off and reduce output token cost. Both models share identical parameters.

Available options:

deepseek-v4-flash,

deepseek-v4-pro

Example:

"deepseek-v4-flash"

messages

(System Message · object | User Message · object | Assistant Message · object | Tool Message · object)[]

required

List of conversation messages, supports multi-turn dialogue

Messages with different roles have different field structures; select the corresponding role to view details

Minimum array length: 1

System Message
User Message
Assistant Message
Tool Message

Show child attributes

thinking

object

Thinking mode control (new in V4)

Notes:

Controls the deep thinking (Chain of Thought) feature
Enabled by default on both models (type=enabled)
When enabled, the reasoning process is returned through choices[].message.reasoning_content and billed as output tokens

⚠️ Multi-turn / tool-calling caveat: If the current response includes reasoning_content, the corresponding assistant message in the messages history of the next request must echo that field verbatim, otherwise the API returns 400 The reasoning_content in the thinking mode must be passed back to the API. If you would rather not handle it, set thinking.type="disabled" explicitly for the whole session.

Show child attributes

temperature

number

default:1

Sampling temperature, controls randomness of output

Notes:

Lower values (e.g., 0.2): More deterministic, more focused output
Higher values (e.g., 1.5): More random, more creative output
Default: 1

Required range: 0 <= x <= 2

Example:

1

top_p

number

default:1

Nucleus sampling parameter

Notes:

Controls sampling from tokens with cumulative probability
For example, 0.9 means sampling from tokens whose cumulative probability reaches 90%
Default: 1.0 (considers all tokens)

Suggestion: Do not adjust temperature and top_p simultaneously

Required range: 0 <= x <= 1

Example:

1

max_tokens

integer

Limits the maximum number of tokens generated

Notes:

The V4 series can reach up to 384,000 tokens
When thinking is enabled, reasoning_tokens also count toward the max_tokens limit
If not set, the model decides the generation length on its own

Required range: 1 <= x <= 384000

Example:

4096

frequency_penalty

number

default:0

Frequency penalty, used to reduce repetitive content

Notes:

Positive values penalize tokens based on their frequency in the already-generated text
The higher the value, the less likely repetition becomes
Default: 0 (no penalty)

Required range: -2 <= x <= 2

Example:

0

presence_penalty

number

default:0

Presence penalty, used to encourage new topics

Notes:

Positive values penalize tokens based on whether they have already appeared in the text
The higher the value, the more the model tends to discuss new topics
Default: 0 (no penalty)

Required range: -2 <= x <= 2

Example:

0

response_format

object

Specifies the response format

Notes:

Set to {"type": "json_object"} to enable JSON mode
In JSON mode the model outputs valid JSON content
For best results, explicitly ask for JSON output in your system or user message

Show child attributes

stop

Stop sequences; generation stops when the model encounters any of these strings

Notes:

Can be a single string or an array of strings
Up to 16 stop sequences are supported

stream

boolean

default:false

Whether to stream the response

true: Stream response; returns content chunk by chunk in real time via SSE (Server-Sent Events)
false: Wait for the full response and return it at once (default)

Example:

false

stream_options

object

Streaming response options

Only effective when stream=true

Show child attributes

tools

object[]

List of tool definitions for Function Calling

Notes:

Up to 128 tool definitions are supported
Each tool must define a name, description, and parameter schema

Maximum array length: 128

Show child attributes

tool_choice

Controls tool-calling behavior

Options:

none: Do not call any tool
auto: Let the model decide whether to call a tool (default when tools are provided)
required: Force the model to call one or more tools
Object form {"type":"function","function":{"name":"xxx"}}: Call the specified tool

Default: none when no tools are provided, auto when tools are provided

Available options:

none,

auto,

required

logprobs

boolean

default:false

Whether to return token log probabilities

Notes:

When set to true, the response includes log probability information for each token

top_logprobs

integer

Return log probabilities of the top N tokens

Notes:

Requires logprobs to be true
Range: [0, 20]

Required range: 0 <= x <= 20

logit_bias

object

Token bias map

Notes:

Keys are token IDs in the tokenizer; values are bias values between -100 and 100
-100 completely bans the token, 100 forces it to be generated
Typical values in the range -1 to 1 already produce observable effects

Show child attributes

integer

default:1

Number of chat completion choices to generate for each input message

Notes:

Default 1; if set to N, N candidates are returned (billed as N × output_tokens)

Required range: 1 <= x <= 8

Example:

1

seed

integer

Random seed (Beta)

Notes:

When specified, the model attempts deterministic sampling
Same seed + same other parameters → same output (not guaranteed 100%)

user

string

Unique identifier representing the end user

Notes:

Helps the platform monitor and detect abuse
A hashed user ID is recommended

Response

Chat completion successful

string

Unique identifier for the chat completion

Example:

"53c548dc-ec02-4a2f-bbb6-eca4184630b8"

model

string

Model name actually used

Example:

"deepseek-v4-flash"

object

enum<string>

Response type

Available options:

chat.completion

Example:

"chat.completion"

created

integer

Creation timestamp (Unix seconds)

Example:

1777021417

choices

object[]

List of completion choices

Show child attributes

usage

object

Token usage statistics (including cache and reasoning breakdowns)

Show child attributes

system_fingerprint

string

System fingerprint identifier

Example:

"fp_evolink_v4_20260402"

Doubao Seed 2.0 Responses API - Complete API Reference DeepSeek V4 - Anthropic-Compatible API