Enabling semantic cache
Qdrant-backed cache — Scale plan and above.
Open Dashboard → Cache. Toggle *Semantic cache on* and set the cosine similarity threshold (default 0.92 — lower = more hits but looser match) and the TTL (default 24h).
What to watch
- Hit rate in Dashboard → Usage — target 15-40% for typical chat apps.
_hiway.cache_hitand_hiway.cache_similarityin your response logs — sanity-check that similar but-different prompts don't collide.- Cache eviction log — when entries expire or are manually purged.
Don't cache personalized responses
Responses that embed user-specific data (account name, balance, personalized recommendation) should not hit cache. Either mark them with cache: false in the request body, or enable PII masking so that the user-specific part doesn't affect similarity.