Squeezr — AI Context Compression

Zest is an 800M-parameter language model fine-tuned from Qwen3.5 to compress coding tool outputs — bash commands, git diffs, test failures, file reads, npm installs, kubectl, docker, and 40+ more categories. It runs entirely on your machine via Ollama, which means zero API cost, zero data sent to the cloud, and ~1s response time.

Install in one command

squeezr zest

The wizard handles everything automatically:

Checks if Ollama is installed — installs it automatically if not.
Downloads the zest-Q4_K_M.gguf model (505 MB) from HuggingFace.
Creates the zest model in Ollama with the correct compression system prompt.
Runs a smoke test and shows you the compression result.
Updates ~/.squeezr/squeezr.toml to use Zest as the AI backend and restarts Squeezr.

What it compresses

Zest was distilled from Claude Opus 4.7 on a dataset of 1,100+ real tool outputs covering:

Category	Examples	Typical savings
Package managers	npm install, pip, cargo, pnpm	60–80%
Test runners	vitest, jest, pytest, cargo test, playwright	50–85%
Build tools	tsc, eslint, next build, cargo build	40–70%
Git	diff, log, status, branch	40–70%
Infrastructure	docker, kubectl, terraform	50–75%
HTTP / CLI	curl, wget, gh cli	40–70%
Stack traces	JS, Python, Rust, Go, Java	50–70%
File reads	Any code file, logs, configs	30–90%

Performance

Metric	Value
Model architecture	Qwen3.5-0.8B (800M params)
Quantization	Q4_K_M (4 bits/param)
File size	505 MB
Training accuracy	89.12%
Compression on inputs >5K chars	52–72%
Minimum input size	1,500 chars (smaller inputs may expand)
Response time	~1s (CPU), <500ms (GPU)
Cost	Free — runs locally
Deterministic compression (no Zest)	15–20% of full context
With Zest active	35–45% of full context on tool-heavy sessions

How it works with Squeezr

Once installed, Zest replaces Haiku/GPT-mini as the AI compression backend. Squeezr still runs its deterministic pre-pass (regex, dedup, diff), then sends blocks larger than 1,500 chars to Zest for further compression. The thinking mode of Qwen3.5 is disabled via think: false so only the compressed output is returned.

# ~/.squeezr/squeezr.toml after squeezr zest:
[compression]
ai_compression = true
ai_min_chars = 1500

[local]
enabled = true
upstream_url = "http://localhost:11434"
compression_model = "zest"

Verify it is running

Open the dashboard and check the AI Compression card. You should see calls attributed to local:zest once tool outputs large enough to compress come through.

# PowerShell — watch for Zest compressions in real time
Get-Content ~/.squeezr/squeezr.log -Tail 0 -Wait | Select-String "local:zest"

Disable Zest

[compression]
ai_compression = false

Or toggle the AI Compression switch in the dashboard — state persists across restarts.

Model on HuggingFace

The GGUF is hosted at huggingface.co/ramosvs/zest. The training code and dataset pipeline are in the zest/ recipe directory of the Squeezr repository.

Roadmap

Zest v2 — retrained on 3,000+ examples with stronger variant signal.
GPU quantization — Q5_K_M and Q8_0 variants for users with VRAM.
Auto-updates — squeezr zest --update to pull the latest model.

Zest — Local AI compression model