
How to Get Your Groq API Key (500+ Tokens per Second)
LPU chips, real-time inference - here's how to get started
Step-by-step guide to get your Groq API key, access Llama 4, Mixtral, and Whisper models, and connect it to HiWay2LLM for 500+ tok/s inference.
Groq is unlike any other LLM provider. Where OpenAI, Anthropic, and Mistral run on standard GPUs, Groq uses its own LPU (Language Processing Unit) chips - designed from the ground up for sequence inference. The practical result: 500 to 800 tokens per second on models like Llama 3.3 70B. That's 10 to 20 times faster than most alternatives.
For real-time applications, voice pipelines, streaming, or anything latency-sensitive, Groq is hard to beat.
Prerequisites
- An email address, or a GitHub/Google account for SSO
- No credit card required to get started (generous free tier)
Get your key in 3 steps
1. Create your account at console.groq.com
Go to console.groq.com. Sign up in seconds using GitHub or Google SSO - no lengthy forms.
2. Open "API Keys" in the left menu
Once logged in, click API Keys in the left navigation menu.
3. Create your key
Click Create API Key, give it a name (e.g. hiway2llm), then copy the key immediately. Like all providers, Groq only shows it once.
Models unlocked with your key
Groq gives access to a curated set of high-performance open-source models:
- Meta Llama 4 Scout / Maverick - latest Meta generation, multimodal
- Llama 3.3 70B - excellent performance-to-cost ratio, widely used in production
- Gemma 2 9B - compact and efficient Google model
- Mixtral 8x7B - Mixture of Experts architecture, versatile
- Whisper Large v3 - reference-grade multilingual audio transcription
Note: Groq does not natively offer embedding models. For embeddings, pair Groq with a provider like Mistral or OpenAI.
Free tier and pricing
Groq has one of the most generous free tiers available:
- Up to 6,000 requests/day on Llama 3.3 70B for free
- Per-model limits (RPD and RPM) visible in the console
- Then: pay-as-you-go at very competitive rates
For high-volume workloads, Groq offers dedicated plans with higher throughput limits.
Security tips
- Never commit your key to a Git repository
- Always use environment variables (
.envfiles) - Create one key per environment (dev, prod) to isolate usage
- If a key leaks, revoke it immediately from the Groq console
Why speed changes everything
Most LLM APIs run at 40-80 tokens per second. Groq delivers 500-800 tokens per second. In practice:
- Complete responses arrive before the user finishes reading the first sentence
- Streaming becomes genuinely real-time
- Voice pipelines (TTS) achieve sub-300ms end-to-end latency
- Agents chaining multiple LLM calls finish in seconds instead of tens of seconds
If your use case isn't latency-sensitive, other providers may be a better fit on price. But when speed matters, Groq is the obvious choice.
Connect your key to HiWay2LLM
Bring your Groq key to HiWay2LLM to centralize your calls, monitor usage, and intelligently route between providers based on latency or cost requirements.
Connect in 30 seconds
Was this useful?
Comments
Be the first to comment.