Enabling semantic cache
Qdrant-backed cache - available on all packs.
Open Dashboard → Cache. Toggle *Semantic cache on* and set the cosine similarity threshold (default 0.92 - lower = more hits but looser match) and the TTL (default 24h).
What to watch
- Hit rate in Dashboard → Usage - target 15-40% for typical chat apps.
_hiway.cache_hitand_hiway.cache_similarityin your response logs - sanity-check that similar but-different prompts don't collide.- Cache eviction log - when entries expire or are manually purged.
Don't cache personalized responses
Responses that embed user-specific data (account name, balance, personalized recommendation) should not hit cache. Either mark them with cache: false in the request body, or enable PII masking so that the user-specific part doesn't affect similarity.