December 20252 min readJohan Bretonneau

How to Get Your Groq API Key (500+ Tokens per Second)
LPU chips, real-time inference - here's how to get started

Step-by-step guide to get your Groq API key, access Llama 4, Mixtral, and Whisper models, and connect it to HiWay2LLM for 500+ tok/s inference.

Groq is unlike any other LLM provider. Where OpenAI, Anthropic, and Mistral run on standard GPUs, Groq uses its own LPU (Language Processing Unit) chips - designed from the ground up for sequence inference. The practical result: 500 to 800 tokens per second on models like Llama 3.3 70B. That's 10 to 20 times faster than most alternatives.

For real-time applications, voice pipelines, streaming, or anything latency-sensitive, Groq is hard to beat.

Prerequisites

An email address, or a GitHub/Google account for SSO
No credit card required to get started (generous free tier)

Get your key in 3 steps

1. Create your account at console.groq.com

Go to console.groq.com. Sign up in seconds using GitHub or Google SSO - no lengthy forms.

2. Open "API Keys" in the left menu

Once logged in, click API Keys in the left navigation menu.

3. Create your key

Click Create API Key, give it a name (e.g. hiway2llm), then copy the key immediately. Like all providers, Groq only shows it once.

Models unlocked with your key

Groq gives access to a curated set of high-performance open-source models:

Meta Llama 4 Scout / Maverick - latest Meta generation, multimodal
Llama 3.3 70B - excellent performance-to-cost ratio, widely used in production
Gemma 2 9B - compact and efficient Google model
Mixtral 8x7B - Mixture of Experts architecture, versatile
Whisper Large v3 - reference-grade multilingual audio transcription

Note: Groq does not natively offer embedding models. For embeddings, pair Groq with a provider like Mistral or OpenAI.

Free tier and pricing

Groq has one of the most generous free tiers available:

Up to 6,000 requests/day on Llama 3.3 70B for free
Per-model limits (RPD and RPM) visible in the console
Then: pay-as-you-go at very competitive rates

For high-volume workloads, Groq offers dedicated plans with higher throughput limits.

Security tips

Never commit your key to a Git repository
Always use environment variables (.env files)
Create one key per environment (dev, prod) to isolate usage
If a key leaks, revoke it immediately from the Groq console

Why speed changes everything

Most LLM APIs run at 40-80 tokens per second. Groq delivers 500-800 tokens per second. In practice:

Complete responses arrive before the user finishes reading the first sentence
Streaming becomes genuinely real-time
Voice pipelines (TTS) achieve sub-300ms end-to-end latency
Agents chaining multiple LLM calls finish in seconds instead of tens of seconds

If your use case isn't latency-sensitive, other providers may be a better fit on price. But when speed matters, Groq is the obvious choice.

Connect your key to HiWay2LLM

Bring your Groq key to HiWay2LLM to centralize your calls, monitor usage, and intelligently route between providers based on latency or cost requirements.

Bring my key to HiWay2LLM →

Connect in 30 seconds

LinkedIn X Email

Was this useful?

Comments

…

Be the first to comment.