All docs

Dashboard

Squeezr includes a built-in web dashboard with real-time visibility into compression activity, savings history, prompt-cache health and configuration.

Accessing the dashboard

http://localhost:8080/squeezr/dashboard

Available whenever the proxy is running. Updates in real time via SSE (/squeezr/events), with polling fallback.

Pages

The dashboard has three pages:

Overview

All-time stats from a single source of truth (~/.squeezr/stats.json):

  • Hero cards — net tokens saved, tokens processed, ratio + avg saved per request, cost saved, requests.
  • Compression Mode — mode selector, Toggle Bypass (persisted) and AI Compression ON/OFF (persisted master switch).
  • Rate Limits — live gauges: Anthropic plan limits, OpenAI billing, Gemini quota.
  • Top Tools — per-tool compressed block counts (deterministic + dedup + AI).
  • Session Cache — AI-layer block reuses, expand calls, LRU size.
  • AI Compression — calls, tokens saved vs tokens spent by the backend, net balance.
  • Prompt Cache (Anthropic)cache_read vs cache_creation tokens and a Hit Health %. Green (≥80%) = cache working, you pay the minimum. Red = something is invalidating your prefix and re-billing your full context. Hit Health = cache_read / (cache_read + cache_creation) × 100. A new session starts at 0% and should reach ≥80% by the 2nd–3rd message as the prefix stabilises. If it stays low or drops mid-session, check for system prompt mutations or stale-turns activity.
  • Savings by type — per-technique breakdown with the authoritative net total.
  • By model / By client — savings per model (incl. compression-backend spend) and per CLI.

Savings

Historical savings with Day / Week / Month / All-time filters and period navigation. Tokens, cost, sessions, charts, By Model, By Client, Top Tools, AI Compression and Session Cache — all per-period and persisted across restarts.

Settings

Client base-URLs, ports, version/uptime, bypass and circuit-breaker state, the AI Compression switch (with billing warning for subscription tokens), Restart/Stop buttons, and update check.

Safety indicators

  • Hit Health — catches silent over-billing: if anything mutates your conversation prefix, Anthropic re-bills the whole context. Red = alarm. Expected behaviour: starts at 0%, rises to ≥80% within the first 2–3 messages of a session, then stays stable. A drop mid-session means something changed the cached prefix (e.g. non-deterministic compression output, adaptive threshold shift, stale-turns boundary move).
  • Expand rate — how often the model recovers compressed content. 0 means nothing important was lost.
  • Circuit breaker — AI backend health.