New — Local AI compression

Meet Zest

An 800M parameter model trained to compress coding tool outputs — bash, git, tests, file reads. Runs on your machine via Ollama. No APIs. No cost. No added latency.

505 MB
Model size
0€
Cost per compression
~1s
Response time
52–72%
Savings on large inputs

Deterministic compression (always on, free) saves 15–20% of your full context. Zest adds 52–72% savings on large tool outputs on top of that.

One command
squeezr zestand done
1
Install Ollama
Squeezr detects if Ollama is already installed. If not, it installs it automatically.
2
Download Zest
The 505 MB Q4_K_M model is downloaded from HuggingFace and registered in Ollama.
3
Smoke test
Squeezr compresses a real sample and shows you the result before committing to anything.
4
Done — zero config
Squeezr rewires itself to use Zest. From now on every block is compressed locally, for free.
NuevoSqueezr ya disponible para Claude Desktop
Ver documentación
Compatibility

Works with your tools

Auto-detects API format from request headers. Zero per-tool config.

Claude Code
Anthropic Messages API
Claude Desktop
MCP Server
OpenAI Codex
Chat Completions API
Aider
OpenAI-compatible
Gemini CLI
Google AI API
Compression Gains

See the difference

Real compression results from actual coding sessions. Every byte counts.

0%avg. compressed
7
Layers
30+
Patterns
$$$
Saved
Test Outputvitest · 188 tests
2,340 chars198 chars
-92%
File Readserver.ts · 3200 lines
3,200 chars84 chars
-97%
Git Difffeature branch · 47 files
1,800 chars320 chars
-82%
System PromptClaude Code · 13KB
13,000 chars600 chars
-95%
Architecture

7-Layer Pipeline

Each request passes through seven independent stages. Each layer catches what the previous one missed.

01System Prompt

~13KB → 600 tokens

95%

02Read Dedup

Collapse duplicate reads

80%

03Noise Strip

ANSI, progress bars, spinners

30%

04Tool Patterns

30+ specific compressors

60%

05Line Dedup

Repeated lines & stacks

25%

06AI Compress

Haiku / GPT-mini / Flash

85%

07Session Cache

KV cache warming

90%
Features

Everything you need

Deterministic

30+ Patterns

Git diffs, test runners, build tools, Docker, Terraform, package managers — each has a dedicated compressor that knows exactly what to keep.

PASS src/config.test.ts (12 tests)
PASS src/cache.test.ts (8 tests)
FAIL src/server.test.ts (2 failed)
Zest

AI Compression (opt-in)

Blocks over 1500 chars summarized by Zest (Squeezr’s own local model), Haiku, GPT-4o-mini or Gemini Flash. Off by default, rate-limited, cache-safe.

Haiku
120ms
GPT-mini
95ms
Flash
80ms
Dedup

File Dedup

Read the same file 5 times? Only the latest stays full. Earlier reads become lightweight references.

Cache-safe

Prompt-Cache Safe

Compression is byte-stable: the cached prefix never changes, so Anthropic’s 10x-cheaper cache reads keep hitting. Live Hit Health metric included.

Lossless

Expand Tool

The AI can call squeezr_expand() to retrieve any original content. Nothing is permanently lost.

Simple

Zero Config

One install, one command, works immediately. Optional TOML config for fine-grained control.

Real Examples

See the compression

Before and after from real coding sessions. Click to toggle.

Beforevitest · 188 tests
vitest · 188 tests
✓ config (12) cache (8) expand (15)
✓ compressor (24) deterministic (89)
✗ server.test.ts (40 | 2 failed)
FAIL streaming — expected 500 to be 200
FAIL health — Cannot read undefined
1 failed | 5 passed · 2 failed | 186 passed
2,340 chars198 chars
How it works

Three steps. Thirty seconds.

From install to savings in under a minute. No configuration required.

01

Install & Setup

One npm install, one setup command. Auto-detects your OS, configures env vars, and starts the daemon.

terminal
$ npm i -g squeezr-ai
$ squeezr setup
✓ Done
02

Proxy Intercepts

Your AI tool sends requests through localhost. Squeezr intercepts transparently — no code changes needed.

proxy
→ POST /v1/messages
12,847 tokens input
Compressing...
03

Savings Begin

Compressed requests go to the API. Your AI gets all essential info with a fraction of the tokens.

stats
✓ 42 requests processed
✓ 34,291 tokens saved
✓ 78% average compression
Calculator

Estimate your savings

See how much you could save based on your usage.

60
8K
Tokens saved / session
192,000
Tokens saved / month
12.7M
~3 sessions/day × 22 days
Cost saved / month
$38.02
Based on Claude (Sonnet) input pricing

Ready to compress?

Three commands. Thirty seconds. That's it.

terminal
$
MIT LicensedZero Config< 30s Setup