Squeezr — AI Context Compression

Squeezr uses TOML for configuration. This page is the complete reference for every key in the config file, organized by section.

File locations

# Global config — next to the installed binary (in npm global prefix)
squeezr.toml

# Project config — deep-merged over global, apply per-repo overrides
.squeezr.toml   (in your project root)

Use squeezr config to print the resolved path and current values.

[proxy]

Controls the proxy server ports.

Key	Type	Default	Description
`port`	integer	`8080`	HTTP proxy port (Claude Code, Aider, Gemini CLI).
`mitm_port`	integer	`8081`	MITM proxy port (Codex). Defaults to port + 1.

[compression]

Controls how and when content is compressed.

Key	Type	Default	Description
`threshold`	integer	`800`	Minimum content size (chars) to trigger compression.
`keep_recent`	integer	`3`	Last N tool results to leave uncompressed.
`compress_system_prompt`	boolean	`true`	Compress and cache the system prompt.
`compress_conversation`	boolean	`false`	Also compress assistant messages (aggressive mode).
`skip_tools`	array	`[]`	Tool names to never compress (e.g. `["Read"]`).
`only_tools`	array	`[]`	Only compress these tools, skip all others (e.g. `["Bash"]`).
`ai_compression`	boolean	`false`	Enable AI-based compression (Haiku/GPT-mini/Gemini Flash). Off by default — deterministic compression still runs. Can also be toggled from the dashboard and is persisted in `~/.squeezr/ai-compression.json`.
`ai_min_chars`	integer	`1500`	Minimum block size (chars) to send to the AI backend. Blocks smaller than this use deterministic-only compression. Data shows blocks <500 chars are often expanded by AI; blocks ≥1500 save 70–91%.
`stale_turns`	boolean	`true`	Collapse old assistant/user turns in very long sessions to save context. Only triggers when the session exceeds `stale_turn_threshold` user turns. Never touches the last `stale_turn_keep_recent` turns.
`stale_turn_threshold`	integer	`50`	Number of user turns after which stale turn summarization activates.
`stale_turn_keep_recent`	integer	`20`	Number of recent turns to always keep at full fidelity (never summarized).
`capture_requests`	boolean	`false`	Save anonymized incoming request payloads to `~/.squeezr/captures/` for debugging. Auth headers are redacted. Stops after `capture_limit` files.
`capture_limit`	integer	`20`	Maximum number of capture files to write before stopping.

[cache]

Controls in-process caching of compressed results.

Key	Type	Default	Description
`enabled`	boolean	`true`	Enable the cache.
`max_entries`	integer	`1000`	Maximum number of cached compressed results.

[adaptive]

Adaptive pressure automatically increases compression aggressiveness as the context window fills up.

Key	Type	Default	Description
`enabled`	boolean	`true`	Enable adaptive compression.
`low_threshold`	integer	`1500`	Min chars to compress when context is below 50%.
`mid_threshold`	integer	`800`	Min chars to compress when context is 50–75%.
`high_threshold`	integer	`400`	Min chars to compress when context is 75–90%.
`critical_threshold`	integer	`150`	Min chars to compress when context exceeds 90%. Git diff context set to 0.

[local]

Configuration for local model servers (Ollama) used as the compression backend.

Key	Type	Default	Description
`enabled`	boolean	`true`	Enable local model support.
`upstream_url`	string	`"http://localhost:11434"`	URL of the local model server.
`compression_model`	string	`"qwen2.5-coder:1.5b"`	Local model to use for AI compression.

Full example

# squeezr.toml — v1.67.4 complete reference

[proxy]
port = 8080
mitm_port = 8081

[compression]
threshold = 800
keep_recent = 3
compress_system_prompt = true
compress_conversation = false
ai_compression = false        # OFF by default — toggle from dashboard or set here
ai_min_chars = 1500           # Minimum block size for AI compression
stale_turns = true            # Summarize old turns in long sessions
stale_turn_threshold = 50     # Activate after N user turns
stale_turn_keep_recent = 20   # Keep last N turns at full fidelity
# skip_tools = ["Read"]
# only_tools = ["Bash"]
# capture_requests = false    # Debug: save anonymized payloads to ~/.squeezr/captures/
# capture_limit = 20

[cache]
enabled = true
max_entries = 1000

[adaptive]
enabled = true
low_threshold = 1500
mid_threshold = 800
high_threshold = 400
critical_threshold = 150

[local]
enabled = true
upstream_url = "http://localhost:11434"
compression_model = "qwen2.5-coder:1.5b"