Caching
Squeezr uses an in-process cache to avoid re-compressing content it has already seen, plus a session-level expand store that holds original content for lossless retrieval.
Compression cache
Compressed results are stored in an LRU cache keyed by a deterministic MD5 hash of the original content. When the same tool result appears again (within the same session or across sessions), the cached compressed version is returned instantly without re-running the pipeline.
[cache]
enabled = true
max_entries = 1000The cache survives as long as the proxy process is running. It resets on squeezr stop / squeezr start.
KV cache warming
Squeezr uses deterministic MD5-based content IDs to keep compressed content prefix-stable across requests. This means the LLM's KV (key-value) cache — which stores attention computation results — can be reused across turns when the compressed prefix is identical, reducing compute at the API level.
Expand store
When content is compressed, the full original is stored in an in-memory expand store keyed by a short ID. When the model calls squeezr_expand(id), the original is retrieved in under 5ms without any API call.
- Scoped to the current proxy session.
- Cleared on proxy restart.
- No explicit size limit — scales with number of compressed blocks in session.
Session cache summarization
After approximately 50 tool results in a session, older results are batch-summarized into a single compact block. This prevents the expand store from growing unbounded in very long sessions while keeping the most recent results fully accessible.
Cache accuracy
The cache is keyed on content hashes, so it will never serve stale compressed content. If a file changes between reads, the hash changes and the pipeline runs fresh. This makes the cache safe for use in active development where files change frequently.
Checking cache stats
curl http://localhost:8080/squeezr/stats