Works with your tools
Auto-detects API format from request headers. Zero per-tool config.
See the difference
Real compression results from actual coding sessions. Every byte counts.
7-Layer Pipeline
Each request passes through seven independent stages. Each layer catches what the previous one missed.
01System Prompt
~13KB → 600 tokens
02Read Dedup
Collapse duplicate reads
03Noise Strip
ANSI, progress bars, spinners
04Tool Patterns
30+ specific compressors
05Line Dedup
Repeated lines & stacks
06AI Compress
Haiku / GPT-mini / Flash
07Session Cache
KV cache warming
Everything you need
30+ Patterns
Git diffs, test runners, build tools, Docker, Terraform, package managers — each has a dedicated compressor that knows exactly what to keep.
AI Fallback
When no pattern matches, Haiku, GPT-4o-mini, or Gemini Flash compress to under 150 tokens. The best model wins.
File Dedup
Read the same file 5 times? Only the latest stays full. Earlier reads become lightweight references.
Session Cache
Identical compressed strings reuse API provider KV cache — up to 90% cost reduction on cache hits.
Expand Tool
The AI can call squeezr_expand() to retrieve any original content. Nothing is permanently lost.
Zero Config
One install, one command, works immediately. Optional TOML config for fine-grained control.
See the compression
Before and after from real coding sessions. Click to toggle.
Three steps. Thirty seconds.
From install to savings in under a minute. No configuration required.
Install & Setup
One npm install, one setup command. Auto-detects your OS, configures env vars, and starts the daemon.
Proxy Intercepts
Your AI tool sends requests through localhost. Squeezr intercepts transparently — no code changes needed.
Savings Begin
Compressed requests go to the API. Your AI gets all essential info with a fraction of the tokens.
Estimate your savings
See how much you could save based on your usage.
Ready to compress?
Three commands. Thirty seconds. That's it.