Audio - TTS & STT
Text-to-speech with 7-day cache. Speech-to-text up to 250 MB.
TTS - Text-to-speech
POST /v2/audio/tts returns a audio_file_id. Stream the audio with GET /v2/audio/file/{file_id}. Responses are cached in Redis for 7 days based on {text, provider, model, voice, speed}.
- OpenAI -
tts-1T1 ($15/1M chars) ·tts-1-hdT2 ($30/1M). Voices: alloy, echo, fable, onyx, nova, shimmer. - ElevenLabs -
eleven_turbo_v2_5T1 ($22/1M) ·eleven_multilingual_v2T3 ($99/1M).
STT - Speech-to-text
POST /v2/audio/stt - multipart form upload. Accepts MP3, WAV, FLAC, M4A, OGG, WEBM. Max 25 MB for Whisper, 250 MB for Deepgram.
- Deepgram -
nova-2T1 ($0.0043/min) - OpenAI -
whisper-1T2 ($0.006/min) - 25 MB limit