Workflow:
result_data.voice (the voice name)voice parameter for speech synthesis##All endpoints require Bearer Token authentication##
Get your API Key:
Visit the API Key management page to obtain your API Key
Add the following header to every request:
Authorization: Bearer YOUR_API_KEYModel name
qwen-voice-design "qwen-voice-design"
A text description of the voice characteristics used to define the voice profile
Constraints:
2048 charactersSuggested dimensions:
Example descriptions:
A calm middle-aged male with a slow pace, deep magnetic voice, suitable for news reading or documentary narrationA cute child voice, approximately an 8-year-old girl, slightly childlike speech, suitable for animated character dubbingA gentle and intellectual female, around 30 years old, calm tone, suitable for audiobook narration2048"A calm middle-aged male news anchor with a deep, resonant voice, rich in magnetism, steady pace, and clear articulation"
Preview text used to generate a sample audio clip
Constraints:
1024 characterslanguage field1024"Good evening, listeners. Welcome to the evening news broadcast."
Voice name prefix
Constraints:
16 charactersThe generated full voice name format: qwen-tts-vd-{preferred_name}-voice-{timestamp}
For example, passing announcer results in a voice name like: qwen-tts-vd-announcer-voice-20260402-a1b2
16^[a-zA-Z0-9_]+$"announcer"
Language preference for the voice profile; recommended to match preview_text
Defaults to zh if not provided
zh, en, ja, ko, de, fr, it, ru, pt, es "zh"
Preview audio sample rate (Hz)
Defaults to 24000 if not provided
8000, 16000, 24000, 48000 24000
Preview audio format
Defaults to wav if not provided
pcm, wav, mp3, opus "wav"
The TTS model that will drive the created voice
Important: The target_model specified when creating the voice must match the model used in subsequent speech synthesis; otherwise synthesis will fail
| Value | Description |
|---|---|
qwen3-tts-vd-2026-01-26 | Qwen3-TTS-VD non-streaming (default) |
qwen3-tts-vd-realtime-2026-01-15 | Qwen3-TTS-VD-Realtime bidirectional streaming (new) |
qwen3-tts-vd-realtime-2025-12-16 | Qwen3-TTS-VD-Realtime bidirectional streaming (legacy) |
Currently this platform supports
qwen3-tts-vd-2026-01-26(non-streaming); realtime models are not yet integrated but voices can be pre-created
qwen3-tts-vd-2026-01-26, qwen3-tts-vd-realtime-2026-01-15, qwen3-tts-vd-realtime-2025-12-16 "qwen3-tts-vd-2026-01-26"
HTTPS callback URL invoked when the task completes
Trigger conditions:
Security restrictions:
2048 charactersCallback behavior:
10 seconds3 retries after failure (at 1s / 2s / 4s intervals)"https://your-domain.com/webhooks/voice-design-completed"
Voice design task created successfully
Task creation timestamp
1775123456
Task ID
"task-unified-1775123456-abcd1234"
Actual model name used
"qwen-voice-design"
Specific task type
audio.generation.task Task progress percentage (0-100)
0 <= x <= 1000
Task status
pending, processing, completed, failed "pending"
Audio task details
Task output type
audio "audio"
Usage and billing information