Every AI coding agent I have ever used has the same problem: it narrates.
Ask it why your React component re-renders and it will open with “Great question! The reason your component is re-rendering is likely because…” before it gets to the actual answer. Ask it to look at an auth bug and it will say “Sure! I’d be happy to help you with that.” as if enthusiasm is billable. You are paying for those words. Every token in a response costs money and takes time to generate.
The fix is a small, weird, open-source skill called caveman.
Why use many token when few do trick
Caveman is a Claude Code skill (it also supports GitHub Copilot, Cursor, Windsurf, Gemini CLI, and 30+ other agents) that changes how your AI responds — not what it knows. The same model, the same reasoning, the same code it would write. Just stripped of every word that was never load-bearing.
Here is the same answer with and without it:
Normal Claude (69 tokens):
“The reason your React component is re-rendering is likely because you’re creating a new object reference on each render cycle. When you pass an inline object as a prop, React’s shallow comparison sees it as a different object every time, which triggers a re-render. I’d recommend using useMemo to memoize the object.”
Caveman Claude (19 tokens):
“New object ref each render. Inline object prop = new ref = re-render. Wrap in
useMemo.”
Same fix. 75% less word. Brain still big.
The technical accuracy does not degrade — a March 2026 paper actually found that constraining large models to brief responses improved accuracy by 26 points on certain benchmarks. Verbose is not always better.
Benchmarks: what 65% means in practice
These are real token counts from the Claude API, not estimates:
| Task | Normal | Caveman | Saved |
|---|---|---|---|
| Explain React re-render bug | 1,180 | 159 | 87% |
| Fix auth middleware token expiry | 704 | 121 | 83% |
| Set up PostgreSQL connection pool | 2,347 | 380 | 84% |
| Explain git rebase vs merge | 702 | 292 | 58% |
| Docker multi-stage build | 1,042 | 290 | 72% |
| Debug PostgreSQL race condition | 1,200 | 232 | 81% |
| Average | 1,214 | 294 | 65% |
The heavy wins are on explanatory tasks — exactly the kind you ask most often. The smaller wins (22–30%) are on architecture discussions where nuance genuinely earns its words.
One important note: caveman only touches output tokens. Thinking/reasoning tokens are untouched. The model still reasons fully; it just reports the conclusion without the preamble.
Four levels of grunt
You are not locked into one style. Caveman ships with four modes you can switch between with a single command:
| Mode | What it does |
|---|---|
lite | Drops filler phrases and pleasantries. Reads like a direct colleague. |
full | Default caveman. Telegraphic fragments, no articles, no hedging. |
ultra | Absolute minimum. Single-line answers, terse to the point of grunts. |
wenyan | Classical Chinese compression style. Even shorter than ultra. |
Trigger any of them: /caveman lite, /caveman full, /caveman ultra, or /caveman wenyan. Levels stick for the session. Return to normal with “normal mode”.
Caveman also respects your language. If you write in Portuguese, Spanish, or French, it compresses the style, not your language. Code, paths, and error strings are always preserved byte-for-byte.
Installing on Claude Code
Prerequisites: Node ≥ 18. The install takes about 30 seconds.
The one-liner detects every supported agent on your machine and installs for all of them:
| |
For Claude Code specifically, the installer does three things:
- Registers the plugin:
claude plugin marketplace add JuliusBrussee/caveman && claude plugin install caveman@caveman - Wires a
SessionStarthook that writes a flag file so caveman activates automatically — no/cavemanneeded each session. - Adds a statusline badge:
[CAVEMAN] ⛏ 12.4kshowing lifetime tokens saved.
After install, open Claude Code and type /caveman. The response should be terse fragments. Run /caveman-stats to see your running token and USD savings.
Installing on GitHub Copilot
Copilot does not have a hook system, so caveman activates through a repo-level instructions file at .github/copilot-instructions.md. The install command:
| |
The --with-init flag writes the caveman rule file into .github/copilot-instructions.md in your current repo. Copilot picks it up automatically on the next request — no restart, no per-session trigger needed.
If you want to install for just one agent and skip the rest, use --only:
| |
Beyond response compression: the extra skills
Once installed, you get several companion commands that apply the same philosophy to other parts of your workflow:
/caveman-commit — Generates Conventional Commit messages with a subject under 50 characters, focused on why over what. Useful when you want consistent commit history without thinking about it.
/caveman-review — PR comments in one line per finding: L42: 🔴 bug: user null. Add guard. No paragraph-length explanations for a missing null check.
/caveman-compress <file> — Rewrites a memory file (like your CLAUDE.md or project notes) into caveman-speak. Since CLAUDE.md is loaded as context on every session, a compressed version saves input tokens permanently — not just for one response. Real numbers from actual files:
| File | Original | Compressed | Saved |
|---|---|---|---|
claude-md-preferences.md | 706 | 285 | 60% |
project-notes.md | 1,145 | 535 | 53% |
todo-list.md | 627 | 388 | 38% |
/caveman-compress preserves all code blocks, URLs, and paths exactly. Only the prose around them changes.
/caveman-stats — Reads your Claude Code session log, counts tokens saved, and prints a tweetable line. Updates the statusline badge.
The thing it is not
Caveman is not a jailbreak, a prompt injection, or a way to get around rate limits. It does not make the model smarter, faster on inference, or cheaper at the provider level — it makes the responses shorter, which means fewer tokens billed and faster time-to-answer in your terminal.
It also does not change how the model reasons. Thinking tokens, tool calls, code it writes — all of that is unchanged. The only thing that changes is how the model narrates its conclusions back to you.
If you are already happy with your agent’s verbosity, this is not for you. If you find yourself skimming past the first two sentences of every response to get to the actual answer, that is exactly the problem caveman solves.
Uninstalling
| |
Removes hooks, the plugin, and the flag file. Per-repo rule files written by --with-init (like .github/copilot-instructions.md) need to be deleted manually — caveman does not touch other projects.
The model’s reasoning has not gotten worse. The filler has just stopped taking up space on the way out. That is the whole trick, and it works.