Enabling semantic cache

Qdrant-backed cache — Scale plan and above.

Open Dashboard → Cache. Toggle *Semantic cache on* and set the cosine similarity threshold (default 0.92 — lower = more hits but looser match) and the TTL (default 24h).

What to watch

  • Hit rate in Dashboard → Usage — target 15-40% for typical chat apps.
  • _hiway.cache_hit and _hiway.cache_similarity in your response logs — sanity-check that similar but-different prompts don't collide.
  • Cache eviction log — when entries expire or are manually purged.

Don't cache personalized responses

Responses that embed user-specific data (account name, balance, personalized recommendation) should not hit cache. Either mark them with cache: false in the request body, or enable PII masking so that the user-specific part doesn't affect similarity.