Qwen Voice Design

Authorizations

Authorization

string

header

required

##All endpoints require Bearer Token authentication##

Get your API Key:

Visit the API Key management page to obtain your API Key

Add the following header to every request:

Authorization: Bearer YOUR_API_KEY

Body

application/json

model

enum<string>

default:qwen-voice-design

required

Model name

Available options:

qwen-voice-design

Example:

"qwen-voice-design"

voice_prompt

string

required

A text description of the voice characteristics used to define the voice profile

Constraints:

Maximum 2048 characters
Supports Chinese and English only

Suggested dimensions:

Gender: male, female, neutral
Age: child (5-12), teen (13-18), young adult (19-35), middle-aged (36-55), senior (55+)
Pitch: high, medium, low
Pace: fast, moderate, slow
Emotion: cheerful, calm, gentle, serious, lively, composed
Character: magnetic, crisp, husky, mellow, sweet, deep
Use case: news broadcasting, commercial narration, audiobook, animation character, voice assistant

Example descriptions:

A calm middle-aged male with a slow pace, deep magnetic voice, suitable for news reading or documentary narration
A cute child voice, approximately an 8-year-old girl, slightly childlike speech, suitable for animated character dubbing
A gentle and intellectual female, around 30 years old, calm tone, suitable for audiobook narration

Maximum string length: 2048

Example:

"A calm middle-aged male news anchor with a deep, resonant voice, rich in magnetism, steady pace, and clear articulation"

preview_text

string

required

Preview text used to generate a sample audio clip

Constraints:

Maximum 1024 characters
Supports 10 languages: Chinese, English, Japanese, Korean, German, French, Italian, Russian, Portuguese, Spanish
Recommended to match the language field

Maximum string length: 1024

Example:

"Good evening, listeners. Welcome to the evening news broadcast."

preferred_name

string

required

Voice name prefix

Constraints:

Only digits, English letters, and underscores
No more than 16 characters

The generated full voice name format: qwen-tts-vd-{preferred_name}-voice-{timestamp}

For example, passing announcer results in a voice name like: qwen-tts-vd-announcer-voice-20260402-a1b2

Maximum string length: 16

Pattern: ^[a-zA-Z0-9_]+$

Example:

"announcer"

language

enum<string>

Language preference for the voice profile; recommended to match preview_text

Defaults to zh if not provided

Available options:

zh,

en,

ja,

ko,

de,

fr,

it,

ru,

pt,

es

Example:

"zh"

sample_rate

enum<integer>

Preview audio sample rate (Hz)

Defaults to 24000 if not provided

Available options:

8000,

16000,

24000,

48000

Example:

24000

response_format

enum<string>

Preview audio format

Defaults to wav if not provided

Available options:

pcm,

wav,

mp3,

opus

Example:

"wav"

target_model

enum<string>

default:qwen3-tts-vd-2026-01-26

The TTS model that will drive the created voice

Important: The target_model specified when creating the voice must match the model used in subsequent speech synthesis; otherwise synthesis will fail

Value	Description
`qwen3-tts-vd-2026-01-26`	Qwen3-TTS-VD non-streaming (default)
`qwen3-tts-vd-realtime-2026-01-15`	Qwen3-TTS-VD-Realtime bidirectional streaming (new)
`qwen3-tts-vd-realtime-2025-12-16`	Qwen3-TTS-VD-Realtime bidirectional streaming (legacy)

Currently this platform supports qwen3-tts-vd-2026-01-26 (non-streaming); realtime models are not yet integrated but voices can be pre-created

Available options:

qwen3-tts-vd-2026-01-26,

qwen3-tts-vd-realtime-2026-01-15,

qwen3-tts-vd-realtime-2025-12-16

Example:

"qwen3-tts-vd-2026-01-26"

callback_url

string<uri>

HTTPS callback URL invoked when the task completes

Trigger conditions:

Triggered when the task is completed, failed, or cancelled
Sent after billing confirmation

Security restrictions:

HTTPS only
Internal IP addresses are blocked (127.0.0.1, 10.x.x.x, 172.16-31.x.x, 192.168.x.x, etc.)
URL length must not exceed 2048 characters

Callback behavior:

Timeout: 10 seconds
Up to 3 retries after failure (at 1s / 2s / 4s intervals)
Response body format matches the task query API response
A 2xx status code is considered success; other codes trigger a retry

Example:

"https://your-domain.com/webhooks/voice-design-completed"

Response

Voice design task created successfully

created

integer

Task creation timestamp

Example:

1775123456

string

Task ID

Example:

"task-unified-1775123456-abcd1234"

model

string

Actual model name used

Example:

"qwen-voice-design"

object

enum<string>

Specific task type

Available options:

audio.generation.task

progress

integer

Task progress percentage (0-100)

Required range: 0 <= x <= 100

Example:

0

status

enum<string>

Task status

Available options:

pending,

processing,

completed,

failed

Example:

"pending"

task_info

object

Audio task details

Show child attributes

type

enum<string>

Task output type

Available options:

audio

Example:

"audio"

usage

object

Usage and billing information

Show child attributes

Image Series

Video Series

Audio Series

Text Series

Account Management

Task Management

File Management

Authorizations

Body

Response

Image Series

Video Series

Audio Series

Text Series

Account Management

Task Management

File Management

Documentation Index

Authorizations

Body

Response