Seed-Audio 1.0 Audio Generation
- Multimodal audio generation with three modes: text-to-audio, reference-audio (voice cloning), and reference-image
- Up to
120seconds of audio per request - Asynchronous mode — use the returned task ID to query the result
- Generated audio links are valid for 24 hours, please save them promptly
Authorizations
##All endpoints require Bearer Token authentication##
Get your API Key:
Visit the API Key management page to obtain your API Key
Add it to the request header:
Authorization: Bearer YOUR_API_KEYBody
Model name
doubao-seed-audio-1-0 "doubao-seed-audio-1-0"
The prompt or text to synthesize into audio
Three generation modes (auto-detected by which reference resources you pass):
- Text-to-audio: pass only
promptto generate audio directly from the prompt - Reference-audio (voice cloning): pair with
audio_references; use the literal marker@audioNto reference the Nth item (numbered from1, in array order) - Reference-image: pair with
image_urls;promptonly needs the text to synthesize
Audio references (
audio_references) and image references (image_urls) are mutually exclusive — only one may be used per request.
Constraints:
- Up to
1500characters
1500"Welcome to the audio generation service. The weather is lovely today."
List of reference resources. Each item can be a voice ID or a reference-audio URL, and the two may be mixed within the same array
- Voice ID: the
voice_typeof a preset voice — see the full list in Seed-Audio 1.0 Voice List - Audio URL: upload a reference audio clip for voice cloning
- Mutually exclusive with
image_urls: reference audio and reference image are either-or; they cannot be sent together in one request - Use the literal marker
@audioNinpromptto reference the Nth item (numbered from1, in array order) - If omitted, the model generates a voice freely based on
prompt
Quantity limit:
- Up to
3items total in the array (voice IDs and audio URLs combined)
Audio URL constraints:
- Each reference clip ≤
30seconds and ≤10 MB - Formats:
wav/mp3/pcm/ogg_opus
3["zh_female_vv_uranus_bigtts"]List of reference-image URLs; generates audio matching the mood of the image
- When using an image reference,
promptonly needs the text to synthesize - Mutually exclusive with
audio_references: reference image and reference audio are either-or; they cannot be sent together in one request
Constraints:
- Currently only
1image, ≤10 MB - Formats:
jpeg/png/webp
1["https://example.com/scene.jpg"]Output audio format
wav, mp3, pcm, ogg_opus "mp3"
Output sample rate (Hz)
8000, 16000, 24000, 32000, 44100, 48000 24000
Speech-rate multiplier (supports two decimal places)
1.0: normal speed (default)2.0: 2x speed;0.5: half speed
Range 0.5 to 2.0
0.5 <= x <= 2Must be a multiple of 0.011.25
Loudness multiplier (supports two decimal places)
1.0: normal loudness (default)2.0: 2x loudness;0.5: half loudness
Range 0.5 to 2.0
0.5 <= x <= 2Must be a multiple of 0.010.85
Pitch adjustment, in semitones
0: default pitch (no change)- Positive values raise the pitch: the larger the value, the higher and sharper the voice;
12raises it by one octave - Negative values lower the pitch: the smaller the value, the lower and deeper the voice;
-12lowers it by one octave
Range -12 to 12
-12 <= x <= 120
HTTPS callback URL invoked when the task finishes
When it fires:
- Triggered when the task is completed, failed, or cancelled
- Sent after billing is finalized
Security restrictions:
- HTTPS only
- Callbacks to internal IP addresses are forbidden (127.0.0.1, 10.x.x.x, 172.16-31.x.x, 192.168.x.x, etc.)
- URL length must not exceed
2048characters
Callback mechanism:
- Timeout:
10seconds - Up to
3retries on failure (at1/2/4seconds after each failure) - The callback body has the same format as the task-query response
- A 2xx response is treated as success; other status codes trigger a retry
"https://your-domain.com/webhooks/audio-completed"
Response
Audio generation task created successfully
Task creation timestamp
1775200000
Task ID
"task-unified-1775200000-abcd1234"
The model actually used
"doubao-seed-audio-1-0"
Specific task type
audio.generation.task Task progress percentage (0-100)
0 <= x <= 1000
Task status
pending, processing, completed, failed "pending"
Detailed audio task information
Task output type
audio "audio"
Usage and billing information