Squeezr — AI Context Compression

New — Local AI compression

Meet Zest

An 800M parameter model trained to compress coding tool outputs — bash, git, tests, file reads. Runs on your machine via Ollama. No APIs. No cost. No added latency.

505 MB

Model size

0€

Cost per compression

~1s

Response time

52–72%

Savings on large inputs

Deterministic compression (always on, free) saves 15–20% of your full context. Zest adds 52–72% savings on large tool outputs on top of that.

One command

squeezr zestand done

Install Ollama

Squeezr detects if Ollama is already installed. If not, it installs it automatically.

Download Zest

The 505 MB Q4_K_M model is downloaded from HuggingFace and registered in Ollama.

Smoke test

Squeezr compresses a real sample and shows you the result before committing to anything.

Done — zero config

Squeezr rewires itself to use Zest. From now on every block is compressed locally, for free.

Read Zest docs HuggingFace

NuevoSqueezr ya disponible para Claude Desktop

Ver documentación

Compatibility

Works with your tools

Auto-detects API format from request headers. Zero per-tool config.

Claude Code

Anthropic Messages API

Claude Desktop

MCP Server

OpenAI Codex

Chat Completions API

Aider

OpenAI-compatible

Gemini CLI

Google AI API

Compression Gains

See the difference

Real compression results from actual coding sessions. Every byte counts.

0%avg. compressed

Layers

30+

Patterns

$$$

Saved

Test Outputvitest · 188 tests

2,340 chars198 chars

-92%

File Readserver.ts · 3200 lines

3,200 chars84 chars

-97%

Git Difffeature branch · 47 files

1,800 chars320 chars

-82%

System PromptClaude Code · 13KB

13,000 chars600 chars

-95%

Architecture

7-Layer Pipeline

Each request passes through seven independent stages. Each layer catches what the previous one missed.

01System Prompt

~13KB → 600 tokens

95%

02Read Dedup

Collapse duplicate reads

80%

03Noise Strip

ANSI, progress bars, spinners

30%

04Tool Patterns

30+ specific compressors

60%

05Line Dedup

Repeated lines & stacks

25%

06AI Compress

Haiku / GPT-mini / Flash

85%

07Session Cache

KV cache warming

90%

Features

Everything you need

Deterministic

30+ Patterns

Git diffs, test runners, build tools, Docker, Terraform, package managers — each has a dedicated compressor that knows exactly what to keep.

PASS src/config.test.ts (12 tests)

PASS src/cache.test.ts (8 tests)

FAIL src/server.test.ts (2 failed)

Zest

AI Compression (opt-in)

Blocks over 1500 chars summarized by Zest (Squeezr’s own local model), Haiku, GPT-4o-mini or Gemini Flash. Off by default, rate-limited, cache-safe.

Haiku

120ms

GPT-mini

95ms

Flash

80ms

Dedup

File Dedup

Read the same file 5 times? Only the latest stays full. Earlier reads become lightweight references.

Cache-safe

Prompt-Cache Safe

Compression is byte-stable: the cached prefix never changes, so Anthropic’s 10x-cheaper cache reads keep hitting. Live Hit Health metric included.

Lossless

Expand Tool

The AI can call squeezr_expand() to retrieve any original content. Nothing is permanently lost.

Simple

Zero Config

One install, one command, works immediately. Optional TOML config for fine-grained control.

Real Examples

See the compression

Before and after from real coding sessions. Click to toggle.

Beforevitest · 188 tests

vitest · 188 tests

✓ config (12) cache (8) expand (15)

✓ compressor (24) deterministic (89)

✗ server.test.ts (40 | 2 failed)

FAIL streaming — expected 500 to be 200

FAIL health — Cannot read undefined

1 failed | 5 passed · 2 failed | 186 passed

2,340 chars198 chars

How it works

Three steps. Thirty seconds.

From install to savings in under a minute. No configuration required.

Install & Setup

One npm install, one setup command. Auto-detects your OS, configures env vars, and starts the daemon.

terminal

$ npm i -g squeezr-ai

$ squeezr setup

✓ Done

Proxy Intercepts

Your AI tool sends requests through localhost. Squeezr intercepts transparently — no code changes needed.

proxy

→ POST /v1/messages

12,847 tokens input

Compressing...

Savings Begin

Compressed requests go to the API. Your AI gets all essential info with a fraction of the tokens.

stats

✓ 42 requests processed

✓ 34,291 tokens saved

✓ 78% average compression

Calculator

Estimate your savings

See how much you could save based on your usage.

Requests per session60

Avg tokens per request8K

AI Provider

Tokens saved / session

192,000

Tokens saved / month

12.7M

~3 sessions/day × 22 days

Cost saved / month

$38.02

Based on Claude (Sonnet) input pricing

Ready to compress?

Three commands. Thirty seconds. That's it.

terminal

$ ▌

Read the docs View on GitHub

MIT LicensedZero Config< 30s Setup