Enabling semantic cache

Qdrant-backed cache - available on all packs.

Open Dashboard → Cache. Toggle *Semantic cache on* and set the cosine similarity threshold (default 0.92 - lower = more hits but looser match) and the TTL (default 24h).

What to watch

  • Hit rate in Dashboard → Usage - target 15-40% for typical chat apps.
  • _hiway.cache_hit and _hiway.cache_similarity in your response logs - sanity-check that similar but-different prompts don't collide.
  • Cache eviction log - when entries expire or are manually purged.

Don't cache personalized responses

Responses that embed user-specific data (account name, balance, personalized recommendation) should not hit cache. Either mark them with cache: false in the request body, or enable PII masking so that the user-specific part doesn't affect similarity.