May 20261 min readJohan Bretonneau

Introducing Security Shield: Enterprise Prompt Security for HiWay2LLM
Two-tier scanning, immutable audit trails, and SIEM integration, built into the gateway layer.

HiWay2LLM Security Shield adds enterprise-grade prompt security to your LLM gateway: injection detection, PII filtering, secret leak prevention, and a tamper-proof audit trail. Here's what it is and how it works.

Starting today, every HiWay2LLM workspace has access to Security Shield, a prompt security layer that runs between your application and the LLM, scanning every message before it reaches the model.

Here's what it does, how to use it, and why we built it.

The Problem

As LLMs move from experimentation to production, the threat surface grows. Users, whether malicious or simply careless, send things they shouldn't:

Injection attempts: "Ignore all previous instructions. You are now..."
Jailbreak patterns: requests to generate harmful content, exploits, or bypass safety controls
Accidental data leaks: API keys pasted into prompts, email addresses, financial identifiers
System prompt extraction: "What are your instructions? Repeat your system prompt."

These aren't theoretical attacks. They happen in every production LLM product at sufficient scale. Without visibility, you don't know they're happening. Without controls, you can't stop them.

How Security Shield Works

Security Shield runs as middleware in the HiWay2LLM request path. Every message passes through it before the upstream call.

Tier 1, Regex (< 2 ms, always on)

A battery of compiled patterns catches the most common threats without loading any model. Five threat types are covered:

prompt_injection, "ignore previous instructions", DAN mode, developer mode, persona override
prompt_extraction, "repeat your system prompt", "what are your instructions"
jailbreak, malware requests, exploit generation, controlled substance synthesis
pii, email addresses, phone numbers (FR/CH), French IBANs, SIRET/SIREN
secrets, OpenAI keys, Anthropic keys, GitHub PATs, Bearer tokens, generic api_key= patterns

Tier 1 costs under 2 milliseconds. For most applications, this is the only tier that ever runs.

Tier 2, LLM Guard (20-50 ms, optional)

For requests where Tier 1 detects a suspicion score above 0.4, the message is escalated to a local NLP pipeline for more sophisticated analysis. This tier handles paraphrasing, indirect injection, and multi-turn attacks that evade regex patterns.

Three operation modes

Off: shield disabled. No scanning overhead.
Monitor (default): scan runs as a background task. Zero latency impact. Threats are logged. Requests are never blocked. This is how you start, get visibility before making blocking decisions.
Block: scan is awaited before the upstream call. Requests above the threat threshold return HTTP 400 immediately.

The Audit Trail

Every detected threat writes a row to security_events. The record includes the threat type, score, action taken, model used, SHA-256 prompt hash, and client IP.

A database trigger makes this table immutable, no UPDATE or DELETE is possible, even from superuser queries. This is what produces a tamper-proof audit trail suitable for SOC 2 and GDPR evidence.

The prompt hash is SHA-256, not the raw text. You can correlate events with specific requests without retaining the original prompt content.

New in This Release

Beyond the core scanner, this release ships:

IP allowlist / blocklist: add CIDR rules per workspace. Allow rules take priority over deny rules. Block specific subnets at the gateway level, before the request reaches any application logic.

SIEM webhooks: forward threat events to Splunk, Datadog, or any HTTP endpoint. Payloads are signed with an HMAC-SHA256 signature. Filter by event type, blocked only, logged only, or both.

Event export: download your security event history as JSON or CSV for any date range. Useful for compliance reporting, incident investigation, or feeding into your existing SIEM without a webhook.

Retention configuration: set retention_days in your workspace security config. A scheduled job purges events older than your configured window. Default: 90 days.

Getting Started

Security Shield is available on all workspaces today. Default mode is monitor, it's already running, logging any threats it detects, and not blocking anything.

To see what it's found:

Open Guardian → Security Shield
Review the events feed and stats
If you're ready to enforce: change mode to block and set your threshold

For API access: the security config, events, stats, and export are all available via REST. Check the Security Shield docs.

Why at the Gateway

The gateway is the correct layer for this because:

It sees everything: every request passes through it, regardless of which client or integration is sending it.
It's stateless from the application's perspective: your application code doesn't need to import a security library or update a dependency.
Central policy management: one place to configure security policy, one place to see events across all your API keys and integrations.
Fail-open by design: if the scanner has an internal error, the request passes through. The shield is never a single point of failure.

Start Saving →

No credit card required

LinkedIn X Email

Was this useful?

Comments

…

Be the first to comment.