> ## Documentation Index
> Fetch the complete documentation index at: https://docs.evolink.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# DeepSeek V4 - OpenAI-Compatible API

> - Call the DeepSeek V4 model using the OpenAI Chat Completions protocol
- Supports two models: `deepseek-v4-flash` (fast general-purpose) and `deepseek-v4-pro` (deep reasoning)
- **Plain text conversation**: Single- or multi-turn contextual dialogue with 1M ultra-long context
- **System prompts**: Customize the AI's role and behavior
- **Thinking mode**: Control deep reasoning via `thinking.type`; `deepseek-v4-pro` returns thinking content through `reasoning_content`
- **Streaming output**: SSE streaming returns are supported
- **Tool calling**: Supports Function Calling (up to 128 tools)
- **JSON mode**: Enabled via `response_format`
- **Context caching**: Requests with identical prefixes automatically hit the cache, substantially lowering input cost

<Note>
  **BaseURL**: The default BaseURL is `https://direct.evolink.ai`, which has better support for text models and long-lived connections. `https://api.evolink.ai` is the primary endpoint for multimodal services and serves as a fallback address for text models.
</Note>


## OpenAPI

````yaml /en/api-manual/language-series/deepseek-v4/deepseek-v4-chat.json POST /v1/chat/completions
openapi: 3.1.0
info:
  title: DeepSeek V4 Complete Parameter Reference (OpenAI-Compatible)
  description: >-
    Complete API reference for the DeepSeek V4 series chat interface
    (`deepseek-v4-flash` / `deepseek-v4-pro`).


    **Model capabilities**:

    - Context length: **1,000,000 tokens** (1M)

    - Maximum output: **384,000 tokens** (384K)

    - Thinking mode: toggled via the `thinking` field; `deepseek-v4-pro` excels
    at complex reasoning

    - Context disk cache: automatic hits, with hits and misses billed separately


    **Billing tiers (UC/1K tokens, EvoLink internal unit)**:

    | Model | Input cache hit | Input cache miss | Output |

    | --- | --- | --- | --- |

    | deepseek-v4-flash | 20 | 100 | 200 |

    | deepseek-v4-pro | 100 | 1200 | 2400 |
  license:
    name: MIT
  version: 1.0.0
servers:
  - url: https://direct.evolink.ai
    description: Production (recommended)
  - url: https://api.evolink.ai
    description: Alternative URL
security:
  - bearerAuth: []
tags:
  - name: Chat Completion
    description: AI chat completion related endpoints
paths:
  /v1/chat/completions:
    post:
      tags:
        - Chat Completion
      summary: DeepSeek V4 Chat Interface (OpenAI-Compatible)
      description: >-
        - Call the DeepSeek V4 model using the OpenAI Chat Completions protocol

        - Supports two models: `deepseek-v4-flash` (fast general-purpose) and
        `deepseek-v4-pro` (deep reasoning)

        - **Plain text conversation**: Single- or multi-turn contextual dialogue
        with 1M ultra-long context

        - **System prompts**: Customize the AI's role and behavior

        - **Thinking mode**: Control deep reasoning via `thinking.type`;
        `deepseek-v4-pro` returns thinking content through `reasoning_content`

        - **Streaming output**: SSE streaming returns are supported

        - **Tool calling**: Supports Function Calling (up to 128 tools)

        - **JSON mode**: Enabled via `response_format`

        - **Context caching**: Requests with identical prefixes automatically
        hit the cache, substantially lowering input cost
      operationId: createChatCompletionDeepSeekV4
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/ChatCompletionRequest'
            examples:
              simple_text:
                summary: Single-turn text conversation (Flash)
                value:
                  model: deepseek-v4-flash
                  messages:
                    - role: user
                      content: Please introduce yourself
              multi_turn:
                summary: Multi-turn conversation (context understanding)
                value:
                  model: deepseek-v4-flash
                  messages:
                    - role: user
                      content: What is Python?
                    - role: assistant
                      content: Python is a high-level programming language...
                    - role: user
                      content: What are its advantages?
              system_prompt:
                summary: Using system prompts
                value:
                  model: deepseek-v4-flash
                  messages:
                    - role: system
                      content: >-
                        You are a professional Python programming assistant.
                        Answer questions concisely.
                    - role: user
                      content: How do I read a file?
              thinking_mode:
                summary: Pro model + explicitly enabled thinking mode
                value:
                  model: deepseek-v4-pro
                  thinking:
                    type: enabled
                    reasoning_effort: high
                  messages:
                    - role: user
                      content: Prove that √2 is irrational
              disable_thinking:
                summary: Disable thinking mode (direct answer only)
                value:
                  model: deepseek-v4-pro
                  thinking:
                    type: disabled
                  messages:
                    - role: user
                      content: What is the capital of France?
              json_mode:
                summary: JSON mode structured output
                value:
                  model: deepseek-v4-flash
                  response_format:
                    type: json_object
                  messages:
                    - role: system
                      content: You must output strict JSON.
                    - role: user
                      content: Give me an example JSON with name and age fields
              tool_calling:
                summary: Function Calling tool use
                value:
                  model: deepseek-v4-flash
                  messages:
                    - role: user
                      content: Query today's weather in Beijing
                  tools:
                    - type: function
                      function:
                        name: get_weather
                        description: Query the weather for a specified city
                        parameters:
                          type: object
                          properties:
                            city:
                              type: string
                              description: City name
                          required:
                            - city
                  tool_choice: auto
              streaming:
                summary: Streaming output
                value:
                  model: deepseek-v4-flash
                  stream: true
                  stream_options:
                    include_usage: true
                  messages:
                    - role: user
                      content: Write a short poem about spring
      responses:
        '200':
          description: Chat completion successful
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ChatCompletionResponse'
              examples:
                thinking_disabled:
                  summary: thinking disabled (plain text reply)
                  value:
                    id: 837f529d-00f9-4731-b2e1-4a54fc31790a
                    object: chat.completion
                    created: 1777026806
                    model: deepseek-v4-flash
                    choices:
                      - index: 0
                        message:
                          role: assistant
                          content: >-
                            Hello! I am the DeepSeek assistant, always ready to
                            answer your questions and help you out.
                        logprobs: null
                        finish_reason: stop
                    usage:
                      prompt_tokens: 7
                      completion_tokens: 31
                      total_tokens: 38
                      prompt_tokens_details:
                        cached_tokens: 0
                      prompt_cache_hit_tokens: 0
                      prompt_cache_miss_tokens: 7
                    system_fingerprint: fp_evolink_v4_20260402
                thinking_enabled:
                  summary: thinking enabled (with reasoning_content)
                  value:
                    id: 658083bb-1137-49d2-8c4d-900e508cbd53
                    object: chat.completion
                    created: 1777026807
                    model: deepseek-v4-flash
                    choices:
                      - index: 0
                        message:
                          role: assistant
                          content: The capital of France is **Paris**.
                          reasoning_content: >-
                            The user is asking "What is the capital of France?"
                            — a basic general-knowledge question. Just give the
                            answer "Paris".
                        logprobs: null
                        finish_reason: stop
                    usage:
                      prompt_tokens: 7
                      completion_tokens: 53
                      total_tokens: 60
                      prompt_tokens_details:
                        cached_tokens: 0
                      completion_tokens_details:
                        reasoning_tokens: 45
                      prompt_cache_hit_tokens: 0
                      prompt_cache_miss_tokens: 7
                    system_fingerprint: fp_evolink_v4_20260402
                cache_hit:
                  summary: Context cache hit (large cache_hit_tokens)
                  value:
                    id: 3e4a1b70-8c59-4b22-a011-9f2c7d5a3e88
                    object: chat.completion
                    created: 1777026900
                    model: deepseek-v4-flash
                    choices:
                      - index: 0
                        message:
                          role: assistant
                          content: Hello!
                        logprobs: null
                        finish_reason: stop
                    usage:
                      prompt_tokens: 694
                      completion_tokens: 10
                      total_tokens: 704
                      prompt_tokens_details:
                        cached_tokens: 640
                      prompt_cache_hit_tokens: 640
                      prompt_cache_miss_tokens: 54
                    system_fingerprint: fp_evolink_v4_20260402
        '400':
          description: Invalid request parameters
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                error:
                  code: 400
                  message: Invalid request parameters
                  type: invalid_request_error
        '401':
          description: Unauthenticated, invalid or expired token
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                error:
                  code: 401
                  message: Invalid or expired token
                  type: authentication_error
        '402':
          description: Insufficient quota, recharge required
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                error:
                  code: 402
                  message: Insufficient quota
                  type: insufficient_quota_error
        '403':
          description: Access denied for this model
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                error:
                  code: 403
                  message: Access denied for this model
                  type: permission_error
                  param: model
        '404':
          description: Resource not found
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                error:
                  code: 404
                  message: Specified model not found
                  type: not_found_error
                  param: model
        '413':
          description: Request body too large
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                error:
                  code: 413
                  message: Request body too large
                  type: request_too_large_error
                  param: messages
        '429':
          description: Rate limit exceeded
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                error:
                  code: 429
                  message: Rate limit exceeded
                  type: rate_limit_error
        '500':
          description: Internal server error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                error:
                  code: 500
                  message: Internal server error
                  type: internal_server_error
        '502':
          description: Gateway error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                error:
                  code: 502
                  message: Bad gateway
                  type: bad_gateway_error
        '503':
          description: Service temporarily unavailable
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                error:
                  code: 503
                  message: Service temporarily unavailable
                  type: service_unavailable_error
components:
  schemas:
    ChatCompletionRequest:
      type: object
      required:
        - model
        - messages
      properties:
        model:
          type: string
          description: >-
            Chat model name


            - `deepseek-v4-flash`: Fast general-purpose model, 1M context

            - `deepseek-v4-pro`: Deep reasoning model, excels at math,
            programming, and complex logic


            **Tip**: Both models **have `thinking` enabled by default**, and
            responses include `reasoning_content`. Set
            `thinking.type="disabled"` to turn it off and reduce output token
            cost. Both models share identical parameters.
          enum:
            - deepseek-v4-flash
            - deepseek-v4-pro
          default: deepseek-v4-flash
          example: deepseek-v4-flash
        messages:
          type: array
          description: >-
            List of conversation messages, supports multi-turn dialogue


            Messages with different roles have different field structures;
            select the corresponding role to view details
          items:
            oneOf:
              - $ref: '#/components/schemas/SystemMessage'
              - $ref: '#/components/schemas/UserMessage'
              - $ref: '#/components/schemas/AssistantRequestMessage'
              - $ref: '#/components/schemas/ToolMessage'
            discriminator:
              propertyName: role
              mapping:
                system:
                  $ref: '#/components/schemas/SystemMessage'
                user:
                  $ref: '#/components/schemas/UserMessage'
                assistant:
                  $ref: '#/components/schemas/AssistantRequestMessage'
                tool:
                  $ref: '#/components/schemas/ToolMessage'
          minItems: 1
        thinking:
          type: object
          description: >-
            Thinking mode control (new in V4)


            **Notes**:

            - Controls the deep thinking (Chain of Thought) feature

            - **Enabled by default on both models** (`type=enabled`)

            - When enabled, the reasoning process is returned through
            `choices[].message.reasoning_content` and billed as output tokens


            ⚠️ **Multi-turn / tool-calling caveat**: If the current response
            includes `reasoning_content`, **the corresponding assistant message
            in the `messages` history of the next request must echo that field
            verbatim**, otherwise the API returns 400 `The reasoning_content in
            the thinking mode must be passed back to the API`. If you would
            rather not handle it, set `thinking.type="disabled"` explicitly for
            the whole session.
          properties:
            type:
              type: string
              description: |-
                Thinking mode switch

                - `enabled`: Enable deep thinking (default)
                - `disabled`: Disable deep thinking; the model answers directly
              enum:
                - enabled
                - disabled
              default: enabled
            reasoning_effort:
              type: string
              description: >-
                Reasoning effort level


                - `low`: Low effort, faster response and fewer reasoning_tokens

                - `medium`: Medium effort (default)

                - `high`: High effort, more thorough thinking process, consumes
                more reasoning_tokens
              enum:
                - low
                - medium
                - high
              default: medium
        temperature:
          type: number
          description: |-
            Sampling temperature, controls randomness of output

            **Notes**:
            - Lower values (e.g., 0.2): More deterministic, more focused output
            - Higher values (e.g., 1.5): More random, more creative output
            - Default: 1
          minimum: 0
          maximum: 2
          default: 1
          example: 1
        top_p:
          type: number
          description: >-
            Nucleus sampling parameter


            **Notes**:

            - Controls sampling from tokens with cumulative probability

            - For example, 0.9 means sampling from tokens whose cumulative
            probability reaches 90%

            - Default: 1.0 (considers all tokens)


            **Suggestion**: Do not adjust temperature and top_p simultaneously
          minimum: 0
          maximum: 1
          default: 1
          example: 1
        max_tokens:
          type: integer
          description: >-
            Limits the maximum number of tokens generated


            **Notes**:

            - The V4 series can reach up to **384,000 tokens**

            - When thinking is enabled, reasoning_tokens also count toward the
            max_tokens limit

            - If not set, the model decides the generation length on its own
          minimum: 1
          maximum: 384000
          example: 4096
        frequency_penalty:
          type: number
          description: >-
            Frequency penalty, used to reduce repetitive content


            **Notes**:

            - Positive values penalize tokens based on their frequency in the
            already-generated text

            - The higher the value, the less likely repetition becomes

            - Default: 0 (no penalty)
          minimum: -2
          maximum: 2
          default: 0
          example: 0
        presence_penalty:
          type: number
          description: >-
            Presence penalty, used to encourage new topics


            **Notes**:

            - Positive values penalize tokens based on whether they have already
            appeared in the text

            - The higher the value, the more the model tends to discuss new
            topics

            - Default: 0 (no penalty)
          minimum: -2
          maximum: 2
          default: 0
          example: 0
        response_format:
          type: object
          description: >-
            Specifies the response format


            **Notes**:

            - Set to `{"type": "json_object"}` to enable JSON mode

            - In JSON mode the model outputs valid JSON content

            - For best results, explicitly ask for JSON output in your system or
            user message
          properties:
            type:
              type: string
              enum:
                - text
                - json_object
              description: Response format type
              default: text
        stop:
          description: >-
            Stop sequences; generation stops when the model encounters any of
            these strings


            **Notes**:

            - Can be a single string or an array of strings

            - Up to 16 stop sequences are supported
          oneOf:
            - type: string
            - type: array
              items:
                type: string
              maxItems: 16
        stream:
          type: boolean
          description: >-
            Whether to stream the response


            - `true`: Stream response; returns content chunk by chunk in real
            time via SSE (Server-Sent Events)

            - `false`: Wait for the full response and return it at once
            (default)
          default: false
          example: false
        stream_options:
          type: object
          description: |-
            Streaming response options

            Only effective when `stream=true`
          properties:
            include_usage:
              type: boolean
              description: >-
                Return usage statistics (including cache breakdown) at the end
                of the stream
        tools:
          type: array
          description: |-
            List of tool definitions for Function Calling

            **Notes**:
            - Up to 128 tool definitions are supported
            - Each tool must define a name, description, and parameter schema
          items:
            $ref: '#/components/schemas/Tool'
          maxItems: 128
        tool_choice:
          description: >-
            Controls tool-calling behavior


            **Options**:

            - `none`: Do not call any tool

            - `auto`: Let the model decide whether to call a tool (default when
            tools are provided)

            - `required`: Force the model to call one or more tools

            - Object form `{"type":"function","function":{"name":"xxx"}}`: Call
            the specified tool


            **Default**: `none` when no tools are provided, `auto` when tools
            are provided
          oneOf:
            - type: string
              enum:
                - none
                - auto
                - required
            - type: object
              description: Specify a particular tool to call
              properties:
                type:
                  type: string
                  enum:
                    - function
                function:
                  type: object
                  properties:
                    name:
                      type: string
                      description: Name of the function to call
                  required:
                    - name
        logprobs:
          type: boolean
          description: >-
            Whether to return token log probabilities


            **Notes**:

            - When set to `true`, the response includes log probability
            information for each token
          default: false
        top_logprobs:
          type: integer
          description: |-
            Return log probabilities of the top N tokens

            **Notes**:
            - Requires `logprobs` to be `true`
            - Range: `[0, 20]`
          minimum: 0
          maximum: 20
        logit_bias:
          type: object
          description: >-
            Token bias map


            **Notes**:

            - Keys are token IDs in the tokenizer; values are bias values
            between -100 and 100

            - -100 completely bans the token, 100 forces it to be generated

            - Typical values in the range -1 to 1 already produce observable
            effects
          additionalProperties:
            type: number
            minimum: -100
            maximum: 100
        'n':
          type: integer
          description: >-
            Number of chat completion choices to generate for each input message


            **Notes**:

            - Default 1; if set to N, N candidates are returned (billed as N ×
            output_tokens)
          minimum: 1
          maximum: 8
          default: 1
          example: 1
        seed:
          type: integer
          description: >-
            Random seed (Beta)


            **Notes**:

            - When specified, the model attempts deterministic sampling

            - Same seed + same other parameters → same output (not guaranteed
            100%)
        user:
          type: string
          description: |-
            Unique identifier representing the end user

            **Notes**:
            - Helps the platform monitor and detect abuse
            - A hashed user ID is recommended
    ChatCompletionResponse:
      type: object
      properties:
        id:
          type: string
          description: Unique identifier for the chat completion
          example: 53c548dc-ec02-4a2f-bbb6-eca4184630b8
        model:
          type: string
          description: Model name actually used
          example: deepseek-v4-flash
        object:
          type: string
          enum:
            - chat.completion
          description: Response type
          example: chat.completion
        created:
          type: integer
          description: Creation timestamp (Unix seconds)
          example: 1777021417
        choices:
          type: array
          description: List of completion choices
          items:
            $ref: '#/components/schemas/Choice'
        usage:
          $ref: '#/components/schemas/Usage'
        system_fingerprint:
          type: string
          description: System fingerprint identifier
          example: fp_evolink_v4_20260402
    ErrorResponse:
      type: object
      properties:
        error:
          type: object
          properties:
            code:
              type: integer
              description: HTTP status error code
            message:
              type: string
              description: Error description
            type:
              type: string
              description: Error type
            param:
              type: string
              description: Related parameter name
    SystemMessage:
      title: System Message
      type: object
      required:
        - role
        - content
      properties:
        role:
          type: string
          enum:
            - system
          description: Role identifier, fixed to `system`
        content:
          type: string
          description: System prompt content, used to define the AI's role and behavior
        name:
          type: string
          description: >-
            Participant name, used to distinguish different sources of system
            prompts
    UserMessage:
      title: User Message
      type: object
      required:
        - role
        - content
      properties:
        role:
          type: string
          enum:
            - user
          description: Role identifier, fixed to `user`
        content:
          type: string
          description: User message content (plain text string)
        name:
          type: string
          description: Participant name, used to distinguish different users
    AssistantRequestMessage:
      title: Assistant Message
      type: object
      required:
        - role
        - content
      properties:
        role:
          type: string
          enum:
            - assistant
          description: Role identifier, fixed to `assistant`
        content:
          type:
            - string
            - 'null'
          description: |-
            Assistant message content

            **Notes**:
            - Used to pass historical assistant replies in multi-turn dialogues
            - Can be `null` when `tool_calls` is present
        name:
          type: string
          description: Participant name
        prefix:
          type: boolean
          description: >-
            Whether to enable prefix continuation mode (Beta)


            **Notes**:

            - Only set in the last message

            - When set to `true`, the model continues generation using this
            message's `content` as the prefix
          default: false
        reasoning_content:
          type:
            - string
            - 'null'
          description: >-
            Chain-of-thought content (Beta)


            **Notes**:

            - Produced by both `deepseek-v4-flash` and `deepseek-v4-pro` when
            thinking is enabled (default)

            - **Must be echoed verbatim in multi-turn scenarios**: copy
            `choices[0].message.reasoning_content` from the previous response
            directly into the `reasoning_content` field of the historical
            assistant message; omission will be rejected by the API (400)

            - No need to pair with `prefix` when passed as historical context;
            set `prefix=true` only when explicitly enabling prefix continuation
        tool_calls:
          type: array
          description: >-
            List of tool calls


            Used to carry historical tool-call information in multi-turn
            dialogues
          items:
            type: object
            properties:
              id:
                type: string
                description: Unique identifier of the tool call
              type:
                type: string
                enum:
                  - function
              function:
                type: object
                properties:
                  name:
                    type: string
                    description: Name of the called function
                  arguments:
                    type: string
                    description: Function arguments (JSON string)
    ToolMessage:
      title: Tool Message
      type: object
      required:
        - role
        - content
        - tool_call_id
      properties:
        role:
          type: string
          enum:
            - tool
          description: Role identifier, fixed to `tool`
        content:
          type: string
          description: Return value from the tool call
        tool_call_id:
          type: string
          description: >-
            Tool call ID


            Corresponds to the `id` field returned in the assistant message's
            `tool_calls`
    Tool:
      type: object
      required:
        - type
        - function
      properties:
        type:
          type: string
          enum:
            - function
          description: Tool type; currently only `function` is supported
        function:
          type: object
          required:
            - name
          properties:
            name:
              type: string
              description: >-
                Name of the function to call


                **Notes**:

                - Must consist of a-z, A-Z, 0-9 characters, optionally with
                underscores and hyphens

                - Maximum length is 64 characters
            description:
              type: string
              description: >-
                Description of the function, helping the model understand when
                and how to call it
            parameters:
              type: object
              description: >-
                Function input parameters, described as a JSON Schema object


                **Notes**:

                - Omitting `parameters` defines a function with an empty
                parameter list
            strict:
              type: boolean
              description: >-
                Whether to enable strict mode (Beta)


                **Notes**:

                - When set to `true`, the API uses strict mode for function
                calls

                - Ensures the output always conforms to the function's JSON
                Schema definition
              default: false
    Choice:
      type: object
      properties:
        index:
          type: integer
          description: Index of this choice
          example: 0
        message:
          $ref: '#/components/schemas/AssistantMessage'
        logprobs:
          type:
            - object
            - 'null'
          description: >-
            Log probability information (returned only when `logprobs=true` is
            requested)
        finish_reason:
          type: string
          description: |-
            Finish reason

            - `stop`: Natural completion or a stop sequence was hit
            - `length`: Reached the maximum token limit
            - `content_filter`: Content was filtered by safety policy
            - `tool_calls`: The model called a tool
            - `insufficient_system_resource`: Insufficient backend resources
          enum:
            - stop
            - length
            - content_filter
            - tool_calls
            - insufficient_system_resource
          example: stop
    Usage:
      type: object
      description: Token usage statistics (including cache and reasoning breakdowns)
      properties:
        prompt_tokens:
          type: integer
          description: Total tokens in the input (cache hit + miss)
          example: 694
        completion_tokens:
          type: integer
          description: Tokens in the output (including the reasoning portion)
          example: 20
        total_tokens:
          type: integer
          description: Total tokens = prompt_tokens + completion_tokens
          example: 714
        prompt_cache_hit_tokens:
          type: integer
          description: >-
            Number of input tokens that hit the context cache


            **Notes**: Cache-hit tokens are billed at the **cache hit rate**
            (Flash 20 UC/1K, Pro 100 UC/1K)
          example: 640
        prompt_cache_miss_tokens:
          type: integer
          description: >-
            Number of input tokens that missed the cache


            **Notes**: Billed at the **standard input rate** (Flash 100 UC/1K,
            Pro 1200 UC/1K)
          example: 54
        prompt_tokens_details:
          type: object
          description: Detailed breakdown of input tokens (OpenAI style)
          properties:
            cached_tokens:
              type: integer
              description: >-
                Number of cache-hit tokens (equivalent to
                `prompt_cache_hit_tokens`; auto-mapped by the framework)
              example: 640
        completion_tokens_details:
          type: object
          description: Detailed breakdown of output tokens
          properties:
            reasoning_tokens:
              type: integer
              description: >-
                Number of reasoning tokens produced by thinking mode (counted as
                output, billed at output rate)
              example: 10
    AssistantMessage:
      type: object
      properties:
        role:
          type: string
          description: Role of the message sender
          enum:
            - assistant
          example: assistant
        content:
          type: string
          description: AI's response message content
          example: >-
            Hello! I am DeepSeek V4. I excel at general conversation, code
            generation, mathematical reasoning, and many other tasks.
        reasoning_content:
          type: string
          description: >-
            Chain-of-thought content (returned only when thinking is enabled)


            **Notes**:

            - Enabled by default on `deepseek-v4-pro`; returns the full
            reasoning process

            - Returned on `deepseek-v4-flash` only when
            `thinking.type="enabled"` is set explicitly

            - Billed as output tokens and counted in
            `completion_tokens_details.reasoning_tokens`
          example: Let me analyze this question...
        tool_calls:
          type: array
          description: List of tool calls (returned when the model decides to call a tool)
          items:
            type: object
            properties:
              id:
                type: string
                description: Unique identifier of the tool call
              type:
                type: string
                enum:
                  - function
              function:
                type: object
                properties:
                  name:
                    type: string
                    description: Name of the called function
                  arguments:
                    type: string
                    description: Function arguments (JSON string)
  securitySchemes:
    bearerAuth:
      type: http
      scheme: bearer
      description: >-
        ##All APIs require Bearer Token authentication##


        **Get API Key**:


        Visit the [API Key Management Page](https://evolink.ai/dashboard/keys)
        to obtain your API Key


        **Add to request header**:

        ```

        Authorization: Bearer YOUR_API_KEY

        ```

````