All docs

Zest — Local AI compression model

Zest is an 800M-parameter language model fine-tuned from Qwen3.5 to compress coding tool outputs — bash commands, git diffs, test failures, file reads, npm installs, kubectl, docker, and 40+ more categories. It runs entirely on your machine via Ollama, which means zero API cost, zero data sent to the cloud, and ~1s response time.

Install in one command

squeezr zest

The wizard handles everything automatically:

  1. Checks if Ollama is installed — installs it automatically if not.
  2. Downloads the zest-Q4_K_M.gguf model (505 MB) from HuggingFace.
  3. Creates the zest model in Ollama with the correct compression system prompt.
  4. Runs a smoke test and shows you the compression result.
  5. Updates ~/.squeezr/squeezr.toml to use Zest as the AI backend and restarts Squeezr.

What it compresses

Zest was distilled from Claude Opus 4.7 on a dataset of 1,100+ real tool outputs covering:

CategoryExamplesTypical savings
Package managersnpm install, pip, cargo, pnpm60–80%
Test runnersvitest, jest, pytest, cargo test, playwright50–85%
Build toolstsc, eslint, next build, cargo build40–70%
Gitdiff, log, status, branch40–70%
Infrastructuredocker, kubectl, terraform50–75%
HTTP / CLIcurl, wget, gh cli40–70%
Stack tracesJS, Python, Rust, Go, Java50–70%
File readsAny code file, logs, configs30–90%

Performance

MetricValue
Model architectureQwen3.5-0.8B (800M params)
QuantizationQ4_K_M (4 bits/param)
File size505 MB
Training accuracy89.12%
Compression on inputs >5K chars52–72%
Minimum input size1,500 chars (smaller inputs may expand)
Response time~1s (CPU), <500ms (GPU)
CostFree — runs locally
Deterministic compression (no Zest)15–20% of full context
With Zest active35–45% of full context on tool-heavy sessions

How it works with Squeezr

Once installed, Zest replaces Haiku/GPT-mini as the AI compression backend. Squeezr still runs its deterministic pre-pass (regex, dedup, diff), then sends blocks larger than 1,500 chars to Zest for further compression. The thinking mode of Qwen3.5 is disabled via think: false so only the compressed output is returned.

# ~/.squeezr/squeezr.toml after squeezr zest:
[compression]
ai_compression = true
ai_min_chars = 1500

[local]
enabled = true
upstream_url = "http://localhost:11434"
compression_model = "zest"

Verify it is running

Open the dashboard and check the AI Compression card. You should see calls attributed to local:zest once tool outputs large enough to compress come through.

# PowerShell — watch for Zest compressions in real time
Get-Content ~/.squeezr/squeezr.log -Tail 0 -Wait | Select-String "local:zest"

Disable Zest

[compression]
ai_compression = false

Or toggle the AI Compression switch in the dashboard — state persists across restarts.

Model on HuggingFace

The GGUF is hosted at huggingface.co/ramosvs/zest. The training code and dataset pipeline are in the zest/ recipe directory of the Squeezr repository.

Roadmap

  • Zest v2 — retrained on 3,000+ examples with stronger variant signal.
  • GPU quantization — Q5_K_M and Q8_0 variants for users with VRAM.
  • Auto-updatessqueezr zest --update to pull the latest model.