Meet Zest
An 800M parameter model trained to compress coding tool outputs — bash, git, tests, file reads. Runs on your machine via Ollama. No APIs. No cost. No added latency.
Deterministic compression (always on, free) saves 15–20% of your full context. Zest adds 52–72% savings on large tool outputs on top of that.
Works with your tools
Auto-detects API format from request headers. Zero per-tool config.
See the difference
Real compression results from actual coding sessions. Every byte counts.
7-Layer Pipeline
Each request passes through seven independent stages. Each layer catches what the previous one missed.
01System Prompt
~13KB → 600 tokens
02Read Dedup
Collapse duplicate reads
03Noise Strip
ANSI, progress bars, spinners
04Tool Patterns
30+ specific compressors
05Line Dedup
Repeated lines & stacks
06AI Compress
Haiku / GPT-mini / Flash
07Session Cache
KV cache warming
Everything you need
30+ Patterns
Git diffs, test runners, build tools, Docker, Terraform, package managers — each has a dedicated compressor that knows exactly what to keep.
AI Compression (opt-in)
Blocks over 1500 chars summarized by Zest (Squeezr’s own local model), Haiku, GPT-4o-mini or Gemini Flash. Off by default, rate-limited, cache-safe.
File Dedup
Read the same file 5 times? Only the latest stays full. Earlier reads become lightweight references.
Prompt-Cache Safe
Compression is byte-stable: the cached prefix never changes, so Anthropic’s 10x-cheaper cache reads keep hitting. Live Hit Health metric included.
Expand Tool
The AI can call squeezr_expand() to retrieve any original content. Nothing is permanently lost.
Zero Config
One install, one command, works immediately. Optional TOML config for fine-grained control.
See the compression
Before and after from real coding sessions. Click to toggle.
Three steps. Thirty seconds.
From install to savings in under a minute. No configuration required.
Install & Setup
One npm install, one setup command. Auto-detects your OS, configures env vars, and starts the daemon.
Proxy Intercepts
Your AI tool sends requests through localhost. Squeezr intercepts transparently — no code changes needed.
Savings Begin
Compressed requests go to the API. Your AI gets all essential info with a fraction of the tokens.
Estimate your savings
See how much you could save based on your usage.
Ready to compress?
Three commands. Thirty seconds. That's it.