Multimodal quickstart

Generate your first image, audio, or embedding in under 5 minutes.

Prerequisite

Your workspace must have multimodal_enabled = TRUE (ask an admin). Add your provider API keys under Dashboard → API Keys before calling any /v2/* endpoint.

The multimodal module runs as a separate service on port 4010 (https://app.hiway2llm.com routes /v2/* to it transparently). It shares your JWT authentication and BYOK vault with the core LLM router.

1. Add a multimodal provider key

Go to Dashboard → API Keys → Add BYOK key and select a multimodal provider: fal (images + video), openai (images + audio + embeddings), elevenlabs (TTS), deepgram (STT), or stability (images).

Provider setup guides

Need help getting an API key? Each provider has a blog article with step-by-step instructions: [OpenAI](/blog/get-openai-api-key) · [fal.ai](/blog/get-fal-ai-api-key) · [ElevenLabs](/blog/get-elevenlabs-api-key) · [Stability AI](/blog/get-stability-ai-api-key) · [Together AI](/blog/get-together-ai-api-key) · [Replicate](/blog/get-replicate-api-key) · [Cohere](/blog/get-cohere-api-key) · [BFL / Flux](/blog/get-bfl-flux-api-key)

Two requirements for multimodal

1. Add a BYOK key for a supported multimodal provider in Dashboard → API Keys (fal.ai, Stability AI, ElevenLabs, OpenAI, Cohere, etc.). 2. Make sure multimodal is enabled on your workspace: Settings → Multimodal → Enable. Without both, /v2/* endpoints return 403.

2. Generate an image

curl https://app.hiway2llm.com/v2/image/generate \
  -H "Authorization: Bearer hw_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A cinematic shot of a futuristic city at night",
    "provider": "fal",
    "model": "fal-ai/flux/schnell",
    "aspect": "landscape",
    "seed": 42
  }'

3. Text-to-speech

curl https://app.hiway2llm.com/v2/audio/tts \
  -H "Authorization: Bearer hw_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello from HiWay2LLM!", "provider": "openai", "voice": "nova"}' | jq .audio_file_id

# Then stream the audio:
curl https://app.hiway2llm.com/v2/audio/file/FILE_ID \
  -H "Authorization: Bearer hw_live_YOUR_KEY" --output speech.mp3

4. Embeddings

resp = httpx.post(
    "https://app.hiway2llm.com/v2/embed",
    headers={"Authorization": "Bearer hw_live_YOUR_KEY"},
    json={
        "input": ["Hello world", "Second sentence"],
        "provider": "openai",
        "model": "text-embedding-3-small",
    },
)
data = resp.json()
print(data["data"][0]["embedding"][:5])  # first 5 dims
print("cached:", data["cached"])