Running an A/B experiment
Scale+ — benchmark models on real traffic without writing glue code.
Open Dashboard → Experiments → New. Pick 2-5 candidate models, a sample rate, a match filter (so only matching requests participate), and a stop condition.
bash
# Start an experiment via API
curl https://app.hiway2llm.com/v1/experiments \
-H "Authorization: Bearer hw_live_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "haiku-vs-gpt4o-mini",
"candidates": ["anthropic/claude-haiku-4-5", "openai/gpt-4o-mini"],
"sample_rate": 0.05,
"match_filter": { "tier": "light" },
"stop_after": { "requests": 1000 }
}'Reading results
Open Dashboard → Experiments → [your experiment] for the live dashboard: per-candidate cost, p50/p95 latency, error rate, and the pairwise winner with confidence interval. Export raw data to CSV for your own analysis.