Qwen Voice Design
- Create a custom voice profile from a text description and receive the voice name and a preview audio clip
- Qwen3 TTS VD speech synthesis must use a voice created by this API — system built-in voices are not supported
- Asynchronous processing mode; use the returned task ID to query the result
- Generated audio links are valid for 24 hours — save them promptly
Workflow:
- Call this API to create a voice
- Poll the task result to obtain
result_data.voice(the voice name) - Call Qwen3 TTS VD with the
voiceparameter for speech synthesis
Authorizations
##All endpoints require Bearer Token authentication##
Get your API Key:
Visit the API Key management page to obtain your API Key
Add the following header to every request:
Authorization: Bearer YOUR_API_KEYBody
Model name
qwen-voice-design "qwen-voice-design"
A text description of the voice characteristics used to define the voice profile
Constraints:
- Maximum
2048characters - Supports Chinese and English only
Suggested dimensions:
- Gender: male, female, neutral
- Age: child (5-12), teen (13-18), young adult (19-35), middle-aged (36-55), senior (55+)
- Pitch: high, medium, low
- Pace: fast, moderate, slow
- Emotion: cheerful, calm, gentle, serious, lively, composed
- Character: magnetic, crisp, husky, mellow, sweet, deep
- Use case: news broadcasting, commercial narration, audiobook, animation character, voice assistant
Example descriptions:
A calm middle-aged male with a slow pace, deep magnetic voice, suitable for news reading or documentary narrationA cute child voice, approximately an 8-year-old girl, slightly childlike speech, suitable for animated character dubbingA gentle and intellectual female, around 30 years old, calm tone, suitable for audiobook narration
2048"A calm middle-aged male news anchor with a deep, resonant voice, rich in magnetism, steady pace, and clear articulation"
Preview text used to generate a sample audio clip
Constraints:
- Maximum
1024characters - Supports 10 languages: Chinese, English, Japanese, Korean, German, French, Italian, Russian, Portuguese, Spanish
- Recommended to match the
languagefield
1024"Good evening, listeners. Welcome to the evening news broadcast."
Voice name prefix
Constraints:
- Only digits, English letters, and underscores
- No more than
16characters
The generated full voice name format: qwen-tts-vd-{preferred_name}-voice-{timestamp}
For example, passing announcer results in a voice name like: qwen-tts-vd-announcer-voice-20260402-a1b2
16^[a-zA-Z0-9_]+$"announcer"
Language preference for the voice profile; recommended to match preview_text
Defaults to zh if not provided
zh, en, ja, ko, de, fr, it, ru, pt, es "zh"
Preview audio sample rate (Hz)
Defaults to 24000 if not provided
8000, 16000, 24000, 48000 24000
Preview audio format
Defaults to wav if not provided
pcm, wav, mp3, opus "wav"
The TTS model that will drive the created voice
Important: The target_model specified when creating the voice must match the model used in subsequent speech synthesis; otherwise synthesis will fail
| Value | Description |
|---|---|
qwen3-tts-vd-2026-01-26 | Qwen3-TTS-VD non-streaming (default) |
qwen3-tts-vd-realtime-2026-01-15 | Qwen3-TTS-VD-Realtime bidirectional streaming (new) |
qwen3-tts-vd-realtime-2025-12-16 | Qwen3-TTS-VD-Realtime bidirectional streaming (legacy) |
Currently this platform supports
qwen3-tts-vd-2026-01-26(non-streaming); realtime models are not yet integrated but voices can be pre-created
qwen3-tts-vd-2026-01-26, qwen3-tts-vd-realtime-2026-01-15, qwen3-tts-vd-realtime-2025-12-16 "qwen3-tts-vd-2026-01-26"
HTTPS callback URL invoked when the task completes
Trigger conditions:
- Triggered when the task is completed, failed, or cancelled
- Sent after billing confirmation
Security restrictions:
- HTTPS only
- Internal IP addresses are blocked (127.0.0.1, 10.x.x.x, 172.16-31.x.x, 192.168.x.x, etc.)
- URL length must not exceed
2048characters
Callback behavior:
- Timeout:
10seconds - Up to
3retries after failure (at 1s / 2s / 4s intervals) - Response body format matches the task query API response
- A 2xx status code is considered success; other codes trigger a retry
"https://your-domain.com/webhooks/voice-design-completed"
Response
Voice design task created successfully
Task creation timestamp
1775123456
Task ID
"task-unified-1775123456-abcd1234"
Actual model name used
"qwen-voice-design"
Specific task type
audio.generation.task Task progress percentage (0-100)
0 <= x <= 1000
Task status
pending, processing, completed, failed "pending"
Audio task details
Task output type
audio "audio"
Usage and billing information