POST
/
v1
/
audio
/
speech

Body

application/json
input
string
required

The text to convert to speech.

voice
enum<string>
required

The voice to use for the speech, input should be 'nova', 'shimmer', 'echo', 'onyx', 'fable' or 'alloy'

Available options:
nova,
shimmer,
echo,
onyx,
fable,
alloy
model
enum<string>
required

The model to use for generating the speech. 'tts-1' is a high-quality model that is slower and more expensive, 'tts-1-hd' is a higher-quality model that is even slower and more expensive.

Read more about the models here.

Available options:
tts-1,
tts-1-hd
response_format
enum<string>
default: mp3

The format of the audio response, defaults to mp3.

Available options:
mp3,
opus,
aac,
flac,
wav,
pcm
speed
number
default: 1

The speed of the generated audio. Select a value from 0.25 to 4.0. 1.0 is the default.

Response

200 - audio/mp3

The response is of type file.