Skip to main content
POST
/
v1
/
audios
/
generations
curl --request POST \
  --url https://api.evolink.ai/v1/audios/generations \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "qwen-voice-design",
  "voice_prompt": "A calm middle-aged male news anchor with a deep, resonant voice, rich in magnetism, steady pace, and clear articulation",
  "preview_text": "Good evening, listeners. Welcome to the evening news broadcast.",
  "preferred_name": "announcer"
}
'
{
  "created": 1775123456,
  "id": "task-unified-1775123456-abcd1234",
  "model": "qwen-voice-design",
  "object": "audio.generation.task",
  "progress": 0,
  "status": "pending",
  "task_info": {
    "can_cancel": true,
    "estimated_time": 15,
    "audio_type": "voice_design"
  },
  "type": "audio",
  "usage": {
    "credits_reserved": 2
  }
}

Authorizations

Authorization
string
header
required

##All endpoints require Bearer Token authentication##

Get your API Key:

Visit the API Key management page to obtain your API Key

Add the following header to every request:

Authorization: Bearer YOUR_API_KEY

Body

application/json
model
enum<string>
default:qwen-voice-design
required

Model name

Available options:
qwen-voice-design
Example:

"qwen-voice-design"

voice_prompt
string
required

A text description of the voice characteristics used to define the voice profile

Constraints:

  • Maximum 2048 characters
  • Supports Chinese and English only

Suggested dimensions:

  • Gender: male, female, neutral
  • Age: child (5-12), teen (13-18), young adult (19-35), middle-aged (36-55), senior (55+)
  • Pitch: high, medium, low
  • Pace: fast, moderate, slow
  • Emotion: cheerful, calm, gentle, serious, lively, composed
  • Character: magnetic, crisp, husky, mellow, sweet, deep
  • Use case: news broadcasting, commercial narration, audiobook, animation character, voice assistant

Example descriptions:

  • A calm middle-aged male with a slow pace, deep magnetic voice, suitable for news reading or documentary narration
  • A cute child voice, approximately an 8-year-old girl, slightly childlike speech, suitable for animated character dubbing
  • A gentle and intellectual female, around 30 years old, calm tone, suitable for audiobook narration
Maximum string length: 2048
Example:

"A calm middle-aged male news anchor with a deep, resonant voice, rich in magnetism, steady pace, and clear articulation"

preview_text
string
required

Preview text used to generate a sample audio clip

Constraints:

  • Maximum 1024 characters
  • Supports 10 languages: Chinese, English, Japanese, Korean, German, French, Italian, Russian, Portuguese, Spanish
  • Recommended to match the language field
Maximum string length: 1024
Example:

"Good evening, listeners. Welcome to the evening news broadcast."

preferred_name
string
required

Voice name prefix

Constraints:

  • Only digits, English letters, and underscores
  • No more than 16 characters

The generated full voice name format: qwen-tts-vd-{preferred_name}-voice-{timestamp}

For example, passing announcer results in a voice name like: qwen-tts-vd-announcer-voice-20260402-a1b2

Maximum string length: 16
Pattern: ^[a-zA-Z0-9_]+$
Example:

"announcer"

language
enum<string>

Language preference for the voice profile; recommended to match preview_text

Defaults to zh if not provided

Available options:
zh,
en,
ja,
ko,
de,
fr,
it,
ru,
pt,
es
Example:

"zh"

sample_rate
enum<integer>

Preview audio sample rate (Hz)

Defaults to 24000 if not provided

Available options:
8000,
16000,
24000,
48000
Example:

24000

response_format
enum<string>

Preview audio format

Defaults to wav if not provided

Available options:
pcm,
wav,
mp3,
opus
Example:

"wav"

target_model
enum<string>
default:qwen3-tts-vd-2026-01-26

The TTS model that will drive the created voice

Important: The target_model specified when creating the voice must match the model used in subsequent speech synthesis; otherwise synthesis will fail

ValueDescription
qwen3-tts-vd-2026-01-26Qwen3-TTS-VD non-streaming (default)
qwen3-tts-vd-realtime-2026-01-15Qwen3-TTS-VD-Realtime bidirectional streaming (new)
qwen3-tts-vd-realtime-2025-12-16Qwen3-TTS-VD-Realtime bidirectional streaming (legacy)

Currently this platform supports qwen3-tts-vd-2026-01-26 (non-streaming); realtime models are not yet integrated but voices can be pre-created

Available options:
qwen3-tts-vd-2026-01-26,
qwen3-tts-vd-realtime-2026-01-15,
qwen3-tts-vd-realtime-2025-12-16
Example:

"qwen3-tts-vd-2026-01-26"

callback_url
string<uri>

HTTPS callback URL invoked when the task completes

Trigger conditions:

  • Triggered when the task is completed, failed, or cancelled
  • Sent after billing confirmation

Security restrictions:

  • HTTPS only
  • Internal IP addresses are blocked (127.0.0.1, 10.x.x.x, 172.16-31.x.x, 192.168.x.x, etc.)
  • URL length must not exceed 2048 characters

Callback behavior:

  • Timeout: 10 seconds
  • Up to 3 retries after failure (at 1s / 2s / 4s intervals)
  • Response body format matches the task query API response
  • A 2xx status code is considered success; other codes trigger a retry
Example:

"https://your-domain.com/webhooks/voice-design-completed"

Response

Voice design task created successfully

created
integer

Task creation timestamp

Example:

1775123456

id
string

Task ID

Example:

"task-unified-1775123456-abcd1234"

model
string

Actual model name used

Example:

"qwen-voice-design"

object
enum<string>

Specific task type

Available options:
audio.generation.task
progress
integer

Task progress percentage (0-100)

Required range: 0 <= x <= 100
Example:

0

status
enum<string>

Task status

Available options:
pending,
processing,
completed,
failed
Example:

"pending"

task_info
object

Audio task details

type
enum<string>

Task output type

Available options:
audio
Example:

"audio"

usage
object

Usage and billing information