Squeezr — AI Context Compression

Claude Code is the primary target for Squeezr. Because it sends large tool results (file reads, shell output, search results) in every turn, it benefits the most from compression — typically saving 50–70% of input tokens per session.

Setup

Run squeezr setup once. It automatically sets ANTHROPIC_BASE_URL=http://localhost:8080 in your environment. No manual configuration needed.

Then start the proxy:

squeezr setup   # one-time
squeezr start

Claude Code reads ANTHROPIC_BASE_URL automatically and routes all API calls through the proxy. Your existing ANTHROPIC_API_KEY or OAuth token continues to work — Squeezr forwards it to the Anthropic API transparently.

OAuth support

Claude Max, Team, and Enterprise plans authenticate via OAuth tokens instead of API keys. Squeezr detects and forwards OAuth tokens transparently. No additional configuration is needed.

How it works with Claude Code

Layer 1: System prompt compression

Claude Code's system prompt is ~13–20KB and is sent with every request. Squeezr compresses it once using a cheap AI model (Haiku) and caches the result. Every subsequent request reuses the cached version, saving ~3,000 tokens per request. Before Haiku compression, Squeezr deduplicates identical skill/plugin blocks byte-by-byte (a known Claude Code plugin bug that can waste 10–30K tokens per request).

Layer 2: Deterministic preprocessing

Zero-latency rule-based transforms applied to every tool result before anything else:

ANSI escape codes and progress bars stripped
Duplicate stack frames and repeated lines deduplicated
JSON whitespace collapsed
Blank lines consolidated

Layer 3: Tool-specific patterns

Each tool result is matched against 30+ specialized patterns. Errors and actionable information are always preserved:

Tool	Compression strategy	Typical savings
`Read`	Cross-turn dedup, large file → signatures only	40–90%
`Bash`	Pattern matching (git, test, build)	50–80%
`Grep`	Grouped by file, matches capped	30–60%
`Glob`	Directory tree summary (>30 files)	30–50%
`WebFetch`	AI summarization	60–80%

Cross-turn file read deduplication

When Claude Code reads the same file multiple times in a conversation, Squeezr detects the duplication and replaces repeated reads with a compact reference pointer. This alone can save thousands of tokens in long sessions. When the same file is read multiple times, earlier reads are expressed as a Myers unified diff vs the latest version (60–85% savings).

Adaptive pressure

Compression aggressiveness scales automatically with context window usage:

Context usage	Threshold	Behavior
< 50%	1,500 chars	Light — only compress large results
50–75%	800 chars	Normal — standard compression
75–90%	400 chars	Aggressive — compress most results
> 90%	150 chars	Critical — compress everything, 0 git diff context

Per-command skip

Add # squeezr:skip anywhere in a Bash command to bypass compression for that specific result:

cat important-file.txt  # squeezr:skip

Configuration tips

Skip specific tools:
```
[compression]
skip_tools = ["Read"]
```
Only compress specific tools:
```
[compression]
only_tools = ["Bash"]
```
Raise the threshold to compress less (fewer false positives):
```
[compression]
threshold = 1500
```

Verifying it works

squeezr status

After a few interactions, you should see a positive savings percentage. Check the troubleshooting guide if you see 502 errors or the proxy is not intercepting requests.