Squeezr — AI Context Compression

Squeezr is an HTTP proxy running on Node.js. It runs on any machine from the last 10 years. No GPU required. AI compression is delegated to cloud APIs (Haiku / GPT-mini / Gemini Flash) or an optional local Ollama server.

Minimum requirements

Resource	Minimum	Recommended	Notes
CPU	1 core x64/arm64	2+ cores	Regex + MD5 + Myers diff; brief spikes when processing 1–2 MB requests.
RAM	256 MB free	512 MB free	Measured: ~140 MB working set with all 3 processes (proxy + MITM + desktop proxy).
Disk	200 MB	300 MB	128 MB node_modules + ~10 MB code + bounded caches in `~/.squeezr/`.
Network	Outbound HTTPS (443)	—	`api.anthropic.com`, `api.openai.com`, `generativelanguage.googleapis.com`
GPU	Not required		All AI compression is cloud-based or via optional external Ollama.

Software

Requirement	Version
Node.js	≥ 18 (tested on 24.x)
OS	Windows 10/11, macOS, Linux, WSL2
npm	Whichever ships with your Node version

Measured resource usage

Main proxy RAM: ~140 MB working set (proxy + dashboard + MITM 8081).
Desktop proxy (optional, Claude/Codex Desktop only): separate process, ~80 MB.
CPU idle: ~0% — only active while a request is in flight.
CPU peak: one core for 10–50 ms per request (deterministic pipeline is synchronous).
Disk in ~/.squeezr/: bounded — cache.json (LRU 1000 entries), history.json (500 sessions max), captures/ (20 files max, opt-in).

Added latency per request

Pipeline stage	Latency
Deterministic (regex, dedup, Myers diff)	5–50 ms
AI compression (Haiku / GPT-mini / Gemini Flash)	300–1500 ms — only on old blocks above threshold, runs in parallel
Forward to upstream	Passthrough — streaming preserved

Ports used (configurable)

Port	Purpose
`8080`	Main HTTP proxy (Claude Code, Codex CLI, Aider, Gemini CLI) + dashboard
`8081`	MITM proxy (Codex OAuth)
`8443` / `8088`	Desktop proxy — only needed for Claude Desktop / Codex Desktop

What you do NOT need

❌ GPU / CUDA
❌ Docker
❌ A database — everything is JSON in ~/.squeezr/
❌ Ollama — optional, only if you want a fully local AI compression backend

With Zest local model (future)

When the Zest local model (zest-0.8b) ships via Ollama, requirements increase: ~2 GB extra RAM for the model (~1 GB in Q4 quantization) and a CPU with AVX2 or a small GPU for reasonable inference speed. This will be documented separately.