Skip to main content
POST
/
v1
/
audios
/
generations
curl --request POST \
  --url https://api.evolink.ai/v1/audios/generations \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "doubao-seed-audio-1-0",
  "prompt": "Welcome to the audio generation service. The weather is lovely today.",
  "format": "mp3"
}
'
{
  "created": 1775200000,
  "id": "task-unified-1775200000-abcd1234",
  "model": "doubao-seed-audio-1-0",
  "object": "audio.generation.task",
  "progress": 0,
  "status": "pending",
  "task_info": {
    "can_cancel": true,
    "estimated_time": 15,
    "audio_type": "audio_generation"
  },
  "type": "audio",
  "usage": {
    "credits_reserved": 9.6
  }
}

Authorizations

Authorization
string
header
required

##All endpoints require Bearer Token authentication##

Get your API Key:

Visit the API Key management page to obtain your API Key

Add it to the request header:

Authorization: Bearer YOUR_API_KEY

Body

application/json
model
enum<string>
default:doubao-seed-audio-1-0
required

Model name

Available options:
doubao-seed-audio-1-0
Example:

"doubao-seed-audio-1-0"

prompt
string
required

The prompt or text to synthesize into audio

Three generation modes (auto-detected by which reference resources you pass):

  • Text-to-audio: pass only prompt to generate audio directly from the prompt
  • Reference-audio (voice cloning): pair with audio_references; use the literal marker @audioN to reference the Nth item (numbered from 1, in array order)
  • Reference-image: pair with image_urls; prompt only needs the text to synthesize

Audio references (audio_references) and image references (image_urls) are mutually exclusive — only one may be used per request.

Constraints:

  • Up to 1500 characters
Maximum string length: 1500
Example:

"Welcome to the audio generation service. The weather is lovely today."

audio_references
string[]

List of reference resources. Each item can be a voice ID or a reference-audio URL, and the two may be mixed within the same array

  • Voice ID: the voice_type of a preset voice — see the full list in Seed-Audio 1.0 Voice List
  • Audio URL: upload a reference audio clip for voice cloning
  • Mutually exclusive with image_urls: reference audio and reference image are either-or; they cannot be sent together in one request
  • Use the literal marker @audioN in prompt to reference the Nth item (numbered from 1, in array order)
  • If omitted, the model generates a voice freely based on prompt

Quantity limit:

  • Up to 3 items total in the array (voice IDs and audio URLs combined)

Audio URL constraints:

  • Each reference clip ≤ 30 seconds and ≤ 10 MB
  • Formats: wav / mp3 / pcm / ogg_opus
Maximum array length: 3
Example:
["zh_female_vv_uranus_bigtts"]
image_urls
string<uri>[]

List of reference-image URLs; generates audio matching the mood of the image

  • When using an image reference, prompt only needs the text to synthesize
  • Mutually exclusive with audio_references: reference image and reference audio are either-or; they cannot be sent together in one request

Constraints:

  • Currently only 1 image, ≤ 10 MB
  • Formats: jpeg / png / webp
Maximum array length: 1
Example:
["https://example.com/scene.jpg"]
format
enum<string>
default:wav

Output audio format

Available options:
wav,
mp3,
pcm,
ogg_opus
Example:

"mp3"

sample_rate
enum<integer>
default:24000

Output sample rate (Hz)

Available options:
8000,
16000,
24000,
32000,
44100,
48000
Example:

24000

speech_rate
number
default:1

Speech-rate multiplier (supports two decimal places)

  • 1.0: normal speed (default)
  • 2.0: 2x speed; 0.5: half speed

Range 0.5 to 2.0

Required range: 0.5 <= x <= 2Must be a multiple of 0.01
Example:

1.25

loudness_rate
number
default:1

Loudness multiplier (supports two decimal places)

  • 1.0: normal loudness (default)
  • 2.0: 2x loudness; 0.5: half loudness

Range 0.5 to 2.0

Required range: 0.5 <= x <= 2Must be a multiple of 0.01
Example:

0.85

pitch_rate
integer
default:0

Pitch adjustment, in semitones

  • 0: default pitch (no change)
  • Positive values raise the pitch: the larger the value, the higher and sharper the voice; 12 raises it by one octave
  • Negative values lower the pitch: the smaller the value, the lower and deeper the voice; -12 lowers it by one octave

Range -12 to 12

Required range: -12 <= x <= 12
Example:

0

callback_url
string<uri>

HTTPS callback URL invoked when the task finishes

When it fires:

  • Triggered when the task is completed, failed, or cancelled
  • Sent after billing is finalized

Security restrictions:

  • HTTPS only
  • Callbacks to internal IP addresses are forbidden (127.0.0.1, 10.x.x.x, 172.16-31.x.x, 192.168.x.x, etc.)
  • URL length must not exceed 2048 characters

Callback mechanism:

  • Timeout: 10 seconds
  • Up to 3 retries on failure (at 1 / 2 / 4 seconds after each failure)
  • The callback body has the same format as the task-query response
  • A 2xx response is treated as success; other status codes trigger a retry
Example:

"https://your-domain.com/webhooks/audio-completed"

Response

Audio generation task created successfully

created
integer

Task creation timestamp

Example:

1775200000

id
string

Task ID

Example:

"task-unified-1775200000-abcd1234"

model
string

The model actually used

Example:

"doubao-seed-audio-1-0"

object
enum<string>

Specific task type

Available options:
audio.generation.task
progress
integer

Task progress percentage (0-100)

Required range: 0 <= x <= 100
Example:

0

status
enum<string>

Task status

Available options:
pending,
processing,
completed,
failed
Example:

"pending"

task_info
object

Detailed audio task information

type
enum<string>

Task output type

Available options:
audio
Example:

"audio"

usage
object

Usage and billing information