Running an A/B experiment

Benchmark models on real traffic without writing glue code.

Open Dashboard → Experiments → New. Pick 2-5 candidate models, a sample rate, a match filter (so only matching requests participate), and a stop condition.

bash

# Start an experiment via API
curl https://app.hiway2llm.com/v1/experiments \
  -H "Authorization: Bearer hw_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name":          "haiku-vs-gpt4o-mini",
    "candidates":    ["anthropic/claude-haiku-4-5", "openai/gpt-4o-mini"],
    "sample_rate":   0.05,
    "match_filter":  { "tier": "light" },
    "stop_after":    { "requests": 1000 }
  }'

Reading results

Open Dashboard → Experiments → [your experiment] for the live dashboard: per-candidate cost, p50/p95 latency, error rate, and the pairwise winner with confidence interval. Export raw data to CSV for your own analysis.