December 20252 min readJohan Bretonneau

How to Get Your Groq API Key (500+ Tokens per Second)
LPU chips, real-time inference - here's how to get started

Step-by-step guide to get your Groq API key, access Llama 4, Mixtral, and Whisper models, and connect it to HiWay2LLM for 500+ tok/s inference.

Groq is unlike any other LLM provider. Where OpenAI, Anthropic, and Mistral run on standard GPUs, Groq uses its own LPU (Language Processing Unit) chips - designed from the ground up for sequence inference. The practical result: 500 to 800 tokens per second on models like Llama 3.3 70B. That's 10 to 20 times faster than most alternatives.

For real-time applications, voice pipelines, streaming, or anything latency-sensitive, Groq is hard to beat.

Prerequisites

  • An email address, or a GitHub/Google account for SSO
  • No credit card required to get started (generous free tier)

Get your key in 3 steps

1. Create your account at console.groq.com

Go to console.groq.com. Sign up in seconds using GitHub or Google SSO - no lengthy forms.

2. Open "API Keys" in the left menu

Once logged in, click API Keys in the left navigation menu.

3. Create your key

Click Create API Key, give it a name (e.g. hiway2llm), then copy the key immediately. Like all providers, Groq only shows it once.

Models unlocked with your key

Groq gives access to a curated set of high-performance open-source models:

  • Meta Llama 4 Scout / Maverick - latest Meta generation, multimodal
  • Llama 3.3 70B - excellent performance-to-cost ratio, widely used in production
  • Gemma 2 9B - compact and efficient Google model
  • Mixtral 8x7B - Mixture of Experts architecture, versatile
  • Whisper Large v3 - reference-grade multilingual audio transcription

Note: Groq does not natively offer embedding models. For embeddings, pair Groq with a provider like Mistral or OpenAI.

Free tier and pricing

Groq has one of the most generous free tiers available:

  • Up to 6,000 requests/day on Llama 3.3 70B for free
  • Per-model limits (RPD and RPM) visible in the console
  • Then: pay-as-you-go at very competitive rates

For high-volume workloads, Groq offers dedicated plans with higher throughput limits.

Security tips

  • Never commit your key to a Git repository
  • Always use environment variables (.env files)
  • Create one key per environment (dev, prod) to isolate usage
  • If a key leaks, revoke it immediately from the Groq console

Why speed changes everything

Most LLM APIs run at 40-80 tokens per second. Groq delivers 500-800 tokens per second. In practice:

  • Complete responses arrive before the user finishes reading the first sentence
  • Streaming becomes genuinely real-time
  • Voice pipelines (TTS) achieve sub-300ms end-to-end latency
  • Agents chaining multiple LLM calls finish in seconds instead of tens of seconds

If your use case isn't latency-sensitive, other providers may be a better fit on price. But when speed matters, Groq is the obvious choice.

Connect your key to HiWay2LLM

Bring your Groq key to HiWay2LLM to centralize your calls, monitor usage, and intelligently route between providers based on latency or cost requirements.

Bring my key to HiWay2LLM →

Connect in 30 seconds

Share

Was this useful?

Comments

Be the first to comment.