Squeezr — AI Context Compression

Squeezr uses an in-process cache to avoid re-compressing content it has already seen, plus a session-level expand store that holds original content for lossless retrieval.

Compression cache

Compressed results are stored in an LRU cache keyed by a deterministic MD5 hash of the original content. When the same tool result appears again (within the same session or across sessions), the cached compressed version is returned instantly without re-running the pipeline.

[cache]
enabled = true
max_entries = 1000

The cache survives as long as the proxy process is running. It resets on squeezr stop / squeezr start.

KV cache warming

Squeezr uses deterministic MD5-based content IDs to keep compressed content prefix-stable across requests. This means the LLM's KV (key-value) cache — which stores attention computation results — can be reused across turns when the compressed prefix is identical, reducing compute at the API level.

Expand store

When content is compressed, the full original is stored in an in-memory expand store keyed by a short ID. When the model calls squeezr_expand(id), the original is retrieved in under 5ms without any API call.

Scoped to the current proxy session.
Cleared on proxy restart.
No explicit size limit — scales with number of compressed blocks in session.

Stale turn summarization

In long sessions, old assistant/user turns are replaced with compact placeholders to prevent the context window from filling with irrelevant history. The last N turns are always kept at full fidelity. Configurable via squeezr.toml:

[compression]
stale_turns = true            # default: true
stale_turn_threshold = 50     # activate after N user turns (default: 50)
stale_turn_keep_recent = 20   # always keep last N turns intact (default: 20)

The summarization only fires on text blocks — tool_use and tool_result blocks are never collapsed (they have linked IDs that must remain paired). Only runs when the session has no Anthropic cache markers (cache barrier protected).

Cross-session glossary store

Squeezr passively scans messages for long repeated paths and identifiers (≥30 chars, ≥20 occurrences per request) and maps them to short refs ($P1, $P2, …). The mapping is persisted to ~/.squeezr/glossary.json and reloaded automatically on proxy restart.

Passive mode — message bodies are never mutated; this is prep work for a future active-substitution release.
The glossary grows automatically across sessions and proxy restarts.

Cache accuracy

The cache is keyed on content hashes, so it will never serve stale compressed content. If a file changes between reads, the hash changes and the pipeline runs fresh. This makes the cache safe for use in active development where files change frequently.

Checking cache stats

curl http://localhost:8080/squeezr/stats