Seed-Audio 1.0 Audio Generation

Authorizations

Authorization

string

header

required

##All endpoints require Bearer Token authentication##

Get your API Key:

Visit the API Key management page to obtain your API Key

Add it to the request header:

Authorization: Bearer YOUR_API_KEY

Body

application/json

model

enum<string>

default:doubao-seed-audio-1-0

required

Model name

Available options:

doubao-seed-audio-1-0

Example:

"doubao-seed-audio-1-0"

prompt

string

required

The prompt or text to synthesize into audio

Three generation modes (auto-detected by which reference resources you pass):

Text-to-audio: pass only prompt to generate audio directly from the prompt
Reference-audio (voice cloning): pair with audio_references; use the literal marker @audioN to reference the Nth item (numbered from 1, in array order)
Reference-image: pair with image_urls; prompt only needs the text to synthesize

Audio references (audio_references) and image references (image_urls) are mutually exclusive — only one may be used per request.

Constraints:

Up to 1500 characters

Maximum string length: 1500

Example:

"Welcome to the audio generation service. The weather is lovely today."

audio_references

string[]

List of reference resources. Each item can be a voice ID or a reference-audio URL, and the two may be mixed within the same array

Voice ID: the voice_type of a preset voice — see the full list in Seed-Audio 1.0 Voice List
Audio URL: upload a reference audio clip for voice cloning
Mutually exclusive with image_urls: reference audio and reference image are either-or; they cannot be sent together in one request
Use the literal marker @audioN in prompt to reference the Nth item (numbered from 1, in array order)
If omitted, the model generates a voice freely based on prompt

Quantity limit:

Up to 3 items total in the array (voice IDs and audio URLs combined)

Audio URL constraints:

Each reference clip ≤ 30 seconds and ≤ 10 MB
Formats: wav / mp3 / pcm / ogg_opus

Maximum array length: 3

Example:

["zh_female_vv_uranus_bigtts"]

image_urls

string<uri>[]

List of reference-image URLs; generates audio matching the mood of the image

When using an image reference, prompt only needs the text to synthesize
Mutually exclusive with audio_references: reference image and reference audio are either-or; they cannot be sent together in one request

Constraints:

Currently only 1 image, ≤ 10 MB
Formats: jpeg / png / webp

Maximum array length: 1

Example:

["https://example.com/scene.jpg"]

format

enum<string>

default:wav

Output audio format

Available options:

wav,

mp3,

pcm,

ogg_opus

Example:

"mp3"

sample_rate

enum<integer>

default:24000

Output sample rate (Hz)

Available options:

8000,

16000,

24000,

32000,

44100,

48000

Example:

24000

speech_rate

number

default:1

Speech-rate multiplier (supports two decimal places)

1.0: normal speed (default)
2.0: 2x speed; 0.5: half speed

Range 0.5 to 2.0

Required range: 0.5 <= x <= 2Must be a multiple of 0.01

Example:

1.25

loudness_rate

number

default:1

Loudness multiplier (supports two decimal places)

1.0: normal loudness (default)
2.0: 2x loudness; 0.5: half loudness

Range 0.5 to 2.0

Required range: 0.5 <= x <= 2Must be a multiple of 0.01

Example:

0.85

pitch_rate

integer

default:0

Pitch adjustment, in semitones

0: default pitch (no change)
Positive values raise the pitch: the larger the value, the higher and sharper the voice; 12 raises it by one octave
Negative values lower the pitch: the smaller the value, the lower and deeper the voice; -12 lowers it by one octave

Range -12 to 12

Required range: -12 <= x <= 12

Example:

0

callback_url

string<uri>

HTTPS callback URL invoked when the task finishes

When it fires:

Triggered when the task is completed, failed, or cancelled
Sent after billing is finalized

Security restrictions:

HTTPS only
Callbacks to internal IP addresses are forbidden (127.0.0.1, 10.x.x.x, 172.16-31.x.x, 192.168.x.x, etc.)
URL length must not exceed 2048 characters

Callback mechanism:

Timeout: 10 seconds
Up to 3 retries on failure (at 1 / 2 / 4 seconds after each failure)
The callback body has the same format as the task-query response
A 2xx response is treated as success; other status codes trigger a retry

Example:

"https://your-domain.com/webhooks/audio-completed"

Response

Audio generation task created successfully

created

integer

Task creation timestamp

Example:

1775200000

string

Task ID

Example:

"task-unified-1775200000-abcd1234"

model

string

The model actually used

Example:

"doubao-seed-audio-1-0"

object

enum<string>

Specific task type

Available options:

audio.generation.task

progress

integer

Task progress percentage (0-100)

Required range: 0 <= x <= 100

Example:

0

status

enum<string>

Task status

Available options:

pending,

processing,

completed,

failed

Example:

"pending"

task_info

object

Detailed audio task information

Show child attributes

type

enum<string>

Task output type

Available options:

audio

Example:

"audio"

usage

object

Usage and billing information

Show child attributes