Skip to main content
POST
/
v1
/
chat
/
completions
curl --request POST \
  --url https://direct.evolink.ai/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "deepseek-v4-flash",
  "messages": [
    {
      "role": "user",
      "content": "Please introduce yourself"
    }
  ]
}
'
{
  "id": "837f529d-00f9-4731-b2e1-4a54fc31790a",
  "object": "chat.completion",
  "created": 1777026806,
  "model": "deepseek-v4-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I am the DeepSeek assistant, always ready to answer your questions and help you out."
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 7,
    "completion_tokens": 31,
    "total_tokens": 38,
    "prompt_tokens_details": {
      "cached_tokens": 0
    },
    "prompt_cache_hit_tokens": 0,
    "prompt_cache_miss_tokens": 7
  },
  "system_fingerprint": "fp_evolink_v4_20260402"
}
BaseURL: The default BaseURL is https://direct.evolink.ai, which has better support for text models and long-lived connections. https://api.evolink.ai is the primary endpoint for multimodal services and serves as a fallback address for text models.

Authorizations

Authorization
string
header
required

##All APIs require Bearer Token authentication##

Get API Key:

Visit the API Key Management Page to obtain your API Key

Add to request header:

Authorization: Bearer YOUR_API_KEY

Body

application/json
model
enum<string>
default:deepseek-v4-flash
required

Chat model name

  • deepseek-v4-flash: Fast general-purpose model, 1M context
  • deepseek-v4-pro: Deep reasoning model, excels at math, programming, and complex logic

Tip: Both models have thinking enabled by default, and responses include reasoning_content. Set thinking.type="disabled" to turn it off and reduce output token cost. Both models share identical parameters.

Available options:
deepseek-v4-flash,
deepseek-v4-pro
Example:

"deepseek-v4-flash"

messages
(System Message · object | User Message · object | Assistant Message · object | Tool Message · object)[]
required

List of conversation messages, supports multi-turn dialogue

Messages with different roles have different field structures; select the corresponding role to view details

Minimum array length: 1
thinking
object

Thinking mode control (new in V4)

Notes:

  • Controls the deep thinking (Chain of Thought) feature
  • Enabled by default on both models (type=enabled)
  • When enabled, the reasoning process is returned through choices[].message.reasoning_content and billed as output tokens

⚠️ Multi-turn / tool-calling caveat: If the current response includes reasoning_content, the corresponding assistant message in the messages history of the next request must echo that field verbatim, otherwise the API returns 400 The reasoning_content in the thinking mode must be passed back to the API. If you would rather not handle it, set thinking.type="disabled" explicitly for the whole session.

temperature
number
default:1

Sampling temperature, controls randomness of output

Notes:

  • Lower values (e.g., 0.2): More deterministic, more focused output
  • Higher values (e.g., 1.5): More random, more creative output
  • Default: 1
Required range: 0 <= x <= 2
Example:

1

top_p
number
default:1

Nucleus sampling parameter

Notes:

  • Controls sampling from tokens with cumulative probability
  • For example, 0.9 means sampling from tokens whose cumulative probability reaches 90%
  • Default: 1.0 (considers all tokens)

Suggestion: Do not adjust temperature and top_p simultaneously

Required range: 0 <= x <= 1
Example:

1

max_tokens
integer

Limits the maximum number of tokens generated

Notes:

  • The V4 series can reach up to 384,000 tokens
  • When thinking is enabled, reasoning_tokens also count toward the max_tokens limit
  • If not set, the model decides the generation length on its own
Required range: 1 <= x <= 384000
Example:

4096

frequency_penalty
number
default:0

Frequency penalty, used to reduce repetitive content

Notes:

  • Positive values penalize tokens based on their frequency in the already-generated text
  • The higher the value, the less likely repetition becomes
  • Default: 0 (no penalty)
Required range: -2 <= x <= 2
Example:

0

presence_penalty
number
default:0

Presence penalty, used to encourage new topics

Notes:

  • Positive values penalize tokens based on whether they have already appeared in the text
  • The higher the value, the more the model tends to discuss new topics
  • Default: 0 (no penalty)
Required range: -2 <= x <= 2
Example:

0

response_format
object

Specifies the response format

Notes:

  • Set to {"type": "json_object"} to enable JSON mode
  • In JSON mode the model outputs valid JSON content
  • For best results, explicitly ask for JSON output in your system or user message
stop

Stop sequences; generation stops when the model encounters any of these strings

Notes:

  • Can be a single string or an array of strings
  • Up to 16 stop sequences are supported
stream
boolean
default:false

Whether to stream the response

  • true: Stream response; returns content chunk by chunk in real time via SSE (Server-Sent Events)
  • false: Wait for the full response and return it at once (default)
Example:

false

stream_options
object

Streaming response options

Only effective when stream=true

tools
object[]

List of tool definitions for Function Calling

Notes:

  • Up to 128 tool definitions are supported
  • Each tool must define a name, description, and parameter schema
Maximum array length: 128
tool_choice

Controls tool-calling behavior

Options:

  • none: Do not call any tool
  • auto: Let the model decide whether to call a tool (default when tools are provided)
  • required: Force the model to call one or more tools
  • Object form {"type":"function","function":{"name":"xxx"}}: Call the specified tool

Default: none when no tools are provided, auto when tools are provided

Available options:
none,
auto,
required
logprobs
boolean
default:false

Whether to return token log probabilities

Notes:

  • When set to true, the response includes log probability information for each token
top_logprobs
integer

Return log probabilities of the top N tokens

Notes:

  • Requires logprobs to be true
  • Range: [0, 20]
Required range: 0 <= x <= 20
logit_bias
object

Token bias map

Notes:

  • Keys are token IDs in the tokenizer; values are bias values between -100 and 100
  • -100 completely bans the token, 100 forces it to be generated
  • Typical values in the range -1 to 1 already produce observable effects
n
integer
default:1

Number of chat completion choices to generate for each input message

Notes:

  • Default 1; if set to N, N candidates are returned (billed as N × output_tokens)
Required range: 1 <= x <= 8
Example:

1

seed
integer

Random seed (Beta)

Notes:

  • When specified, the model attempts deterministic sampling
  • Same seed + same other parameters → same output (not guaranteed 100%)
user
string

Unique identifier representing the end user

Notes:

  • Helps the platform monitor and detect abuse
  • A hashed user ID is recommended

Response

Chat completion successful

id
string

Unique identifier for the chat completion

Example:

"53c548dc-ec02-4a2f-bbb6-eca4184630b8"

model
string

Model name actually used

Example:

"deepseek-v4-flash"

object
enum<string>

Response type

Available options:
chat.completion
Example:

"chat.completion"

created
integer

Creation timestamp (Unix seconds)

Example:

1777021417

choices
object[]

List of completion choices

usage
object

Token usage statistics (including cache and reasoning breakdowns)

system_fingerprint
string

System fingerprint identifier

Example:

"fp_evolink_v4_20260402"