← back to knowledge-hub

Caveman: The One-Line Install That Cuts AI Token Usage by 65%

Every AI coding agent I have ever used has the same problem: it narrates.

Ask it why your React component re-renders and it will open with “Great question! The reason your component is re-rendering is likely because…” before it gets to the actual answer. Ask it to look at an auth bug and it will say “Sure! I’d be happy to help you with that.” as if enthusiasm is billable. You are paying for those words. Every token in a response costs money and takes time to generate.

The fix is a small, weird, open-source skill called caveman.


Why use many token when few do trick

Caveman is a Claude Code skill (it also supports GitHub Copilot, Cursor, Windsurf, Gemini CLI, and 30+ other agents) that changes how your AI responds — not what it knows. The same model, the same reasoning, the same code it would write. Just stripped of every word that was never load-bearing.

Here is the same answer with and without it:

Normal Claude (69 tokens):

“The reason your React component is re-rendering is likely because you’re creating a new object reference on each render cycle. When you pass an inline object as a prop, React’s shallow comparison sees it as a different object every time, which triggers a re-render. I’d recommend using useMemo to memoize the object.”

Caveman Claude (19 tokens):

“New object ref each render. Inline object prop = new ref = re-render. Wrap in useMemo.”

Same fix. 75% less word. Brain still big.

The technical accuracy does not degrade — a March 2026 paper actually found that constraining large models to brief responses improved accuracy by 26 points on certain benchmarks. Verbose is not always better.


Benchmarks: what 65% means in practice

These are real token counts from the Claude API, not estimates:

TaskNormalCavemanSaved
Explain React re-render bug1,18015987%
Fix auth middleware token expiry70412183%
Set up PostgreSQL connection pool2,34738084%
Explain git rebase vs merge70229258%
Docker multi-stage build1,04229072%
Debug PostgreSQL race condition1,20023281%
Average1,21429465%

The heavy wins are on explanatory tasks — exactly the kind you ask most often. The smaller wins (22–30%) are on architecture discussions where nuance genuinely earns its words.

One important note: caveman only touches output tokens. Thinking/reasoning tokens are untouched. The model still reasons fully; it just reports the conclusion without the preamble.


Four levels of grunt

You are not locked into one style. Caveman ships with four modes you can switch between with a single command:

ModeWhat it does
liteDrops filler phrases and pleasantries. Reads like a direct colleague.
fullDefault caveman. Telegraphic fragments, no articles, no hedging.
ultraAbsolute minimum. Single-line answers, terse to the point of grunts.
wenyanClassical Chinese compression style. Even shorter than ultra.

Trigger any of them: /caveman lite, /caveman full, /caveman ultra, or /caveman wenyan. Levels stick for the session. Return to normal with “normal mode”.

Caveman also respects your language. If you write in Portuguese, Spanish, or French, it compresses the style, not your language. Code, paths, and error strings are always preserved byte-for-byte.


Installing on Claude Code

Prerequisites: Node ≥ 18. The install takes about 30 seconds.

The one-liner detects every supported agent on your machine and installs for all of them:

1
2
3
4
5
# macOS / Linux / WSL / Git Bash
curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh | bash

# Windows (PowerShell 5.1+)
irm https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.ps1 | iex

For Claude Code specifically, the installer does three things:

  1. Registers the plugin: claude plugin marketplace add JuliusBrussee/caveman && claude plugin install caveman@caveman
  2. Wires a SessionStart hook that writes a flag file so caveman activates automatically — no /caveman needed each session.
  3. Adds a statusline badge: [CAVEMAN] ⛏ 12.4k showing lifetime tokens saved.

After install, open Claude Code and type /caveman. The response should be terse fragments. Run /caveman-stats to see your running token and USD savings.


Installing on GitHub Copilot

Copilot does not have a hook system, so caveman activates through a repo-level instructions file at .github/copilot-instructions.md. The install command:

1
npx -y github:JuliusBrussee/caveman -- --only copilot --with-init

The --with-init flag writes the caveman rule file into .github/copilot-instructions.md in your current repo. Copilot picks it up automatically on the next request — no restart, no per-session trigger needed.

If you want to install for just one agent and skip the rest, use --only:

1
2
3
4
5
# Claude Code only
claude plugin marketplace add JuliusBrussee/caveman && claude plugin install caveman@caveman

# GitHub Copilot only
npx -y github:JuliusBrussee/caveman -- --only copilot --with-init

Beyond response compression: the extra skills

Once installed, you get several companion commands that apply the same philosophy to other parts of your workflow:

/caveman-commit — Generates Conventional Commit messages with a subject under 50 characters, focused on why over what. Useful when you want consistent commit history without thinking about it.

/caveman-review — PR comments in one line per finding: L42: 🔴 bug: user null. Add guard. No paragraph-length explanations for a missing null check.

/caveman-compress <file> — Rewrites a memory file (like your CLAUDE.md or project notes) into caveman-speak. Since CLAUDE.md is loaded as context on every session, a compressed version saves input tokens permanently — not just for one response. Real numbers from actual files:

FileOriginalCompressedSaved
claude-md-preferences.md70628560%
project-notes.md1,14553553%
todo-list.md62738838%

/caveman-compress preserves all code blocks, URLs, and paths exactly. Only the prose around them changes.

/caveman-stats — Reads your Claude Code session log, counts tokens saved, and prints a tweetable line. Updates the statusline badge.


The thing it is not

Caveman is not a jailbreak, a prompt injection, or a way to get around rate limits. It does not make the model smarter, faster on inference, or cheaper at the provider level — it makes the responses shorter, which means fewer tokens billed and faster time-to-answer in your terminal.

It also does not change how the model reasons. Thinking tokens, tool calls, code it writes — all of that is unchanged. The only thing that changes is how the model narrates its conclusions back to you.

If you are already happy with your agent’s verbosity, this is not for you. If you find yourself skimming past the first two sentences of every response to get to the actual answer, that is exactly the problem caveman solves.


Uninstalling

1
npx -y github:JuliusBrussee/caveman -- --uninstall

Removes hooks, the plugin, and the flag file. Per-repo rule files written by --with-init (like .github/copilot-instructions.md) need to be deleted manually — caveman does not touch other projects.


The model’s reasoning has not gotten worse. The filler has just stopped taking up space on the way out. That is the whole trick, and it works.

GitHub repository · caveman.so

graph cloud