Zest — Local AI compression model
Zest is an 800M-parameter language model fine-tuned from Qwen3.5 to compress coding tool outputs — bash commands, git diffs, test failures, file reads, npm installs, kubectl, docker, and 40+ more categories. It runs entirely on your machine via Ollama, which means zero API cost, zero data sent to the cloud, and ~1s response time.
Install in one command
squeezr zestThe wizard handles everything automatically:
- Checks if Ollama is installed — installs it automatically if not.
- Downloads the
zest-Q4_K_M.ggufmodel (505 MB) from HuggingFace. - Creates the
zestmodel in Ollama with the correct compression system prompt. - Runs a smoke test and shows you the compression result.
- Updates
~/.squeezr/squeezr.tomlto use Zest as the AI backend and restarts Squeezr.
What it compresses
Zest was distilled from Claude Opus 4.7 on a dataset of 1,100+ real tool outputs covering:
| Category | Examples | Typical savings |
|---|---|---|
| Package managers | npm install, pip, cargo, pnpm | 60–80% |
| Test runners | vitest, jest, pytest, cargo test, playwright | 50–85% |
| Build tools | tsc, eslint, next build, cargo build | 40–70% |
| Git | diff, log, status, branch | 40–70% |
| Infrastructure | docker, kubectl, terraform | 50–75% |
| HTTP / CLI | curl, wget, gh cli | 40–70% |
| Stack traces | JS, Python, Rust, Go, Java | 50–70% |
| File reads | Any code file, logs, configs | 30–90% |
Performance
| Metric | Value |
|---|---|
| Model architecture | Qwen3.5-0.8B (800M params) |
| Quantization | Q4_K_M (4 bits/param) |
| File size | 505 MB |
| Training accuracy | 89.12% |
| Compression on inputs >5K chars | 52–72% |
| Minimum input size | 1,500 chars (smaller inputs may expand) |
| Response time | ~1s (CPU), <500ms (GPU) |
| Cost | Free — runs locally |
| Deterministic compression (no Zest) | 15–20% of full context |
| With Zest active | 35–45% of full context on tool-heavy sessions |
How it works with Squeezr
Once installed, Zest replaces Haiku/GPT-mini as the AI compression backend. Squeezr still runs its deterministic pre-pass (regex, dedup, diff), then sends blocks larger than 1,500 chars to Zest for further compression. The thinking mode of Qwen3.5 is disabled via think: false so only the compressed output is returned.
# ~/.squeezr/squeezr.toml after squeezr zest:
[compression]
ai_compression = true
ai_min_chars = 1500
[local]
enabled = true
upstream_url = "http://localhost:11434"
compression_model = "zest"Verify it is running
Open the dashboard and check the AI Compression card. You should see calls attributed to local:zest once tool outputs large enough to compress come through.
# PowerShell — watch for Zest compressions in real time
Get-Content ~/.squeezr/squeezr.log -Tail 0 -Wait | Select-String "local:zest"Disable Zest
[compression]
ai_compression = falseOr toggle the AI Compression switch in the dashboard — state persists across restarts.
Model on HuggingFace
The GGUF is hosted at huggingface.co/ramosvs/zest. The training code and dataset pipeline are in the zest/ recipe directory of the Squeezr repository.
Roadmap
- Zest v2 — retrained on 3,000+ examples with stronger variant signal.
- GPU quantization — Q5_K_M and Q8_0 variants for users with VRAM.
- Auto-updates —
squeezr zest --updateto pull the latest model.