Your Prompts Are Bleeding Money. Here's How to Build an Optimizer Agent on Telegram
A single commit message containing the string "HERMES.md" silently routed $200 of Claude Code usage to the wrong billing bucket. Meanwhile, a developer benchmarked a popular "compression prompt" plugin against two words ("be brief") — and found the plugin didn't beat the boring default. Your prompts have a cost problem. Here's how to fix it.
Two stories trending on Hacker News this week expose a painful truth about AI agent costs:
-
CVE-2026-31431 "Copy Fail": A commit message
containing the string
HERMES.mdcaused Claude Code API requests to route to expensive "extra usage" billing instead of included plan quota. One developer burned $200.98 in extra credits while their Max plan sat 87% unused — and no error message told them why. -
"Be brief." beats a premium plugin: Max Taylor
benchmarked the popular "Caveman" Claude Code compression plugin (six
modes, slash commands, intensity dials) against simply prepending
"Be brief."to each prompt. Result: same quality (within 1.5%), same token count. The plugin didn't beat the two-word default on either axis.
These aren't edge cases. They're symptoms of the same problem: most people have no visibility into what their prompts actually cost, and no systematic way to optimize them.
How Invisible Costs Add Up
Every time you paste a verbose prompt into Claude, GPT, or your custom agent, you're paying for every token in the conversation — including the parts that don't add value. Here's what the numbers look like in practice:
| Pattern | Extra Tokens (per call) | 100 calls/day | 30 days |
|---|---|---|---|
| Context dump (full git log, entire file) | ~4,000 | 400K tokens | 12M tokens (~$60) |
| Verbose system prompt (>500 words) | ~800 | 80K tokens | 2.4M (~$12) |
| Repetitive history (same prompt, daily) | ~2,000/turn | 200K tokens | 6M (~$30) |
| Unnecessary "thinking" overhead | ~15% over baseline | — | ~$15 on $100 plan |
That's over $100/month in waste — per person. Scale across a team, and you're looking at serious budget bleed for nothing.
Why "Just Be Brief" Isn't the Full Solution
The Caveman benchmark showed that "Be brief." works, but
it's a blunt instrument. On multi-step setup and security warnings,
compression actually hurt — the model dropped warnings about
irreversible actions to save tokens.
What you actually need is context-aware optimization. Different tasks need different token budgets:
- Bug diagnosis: Needs full error context, compressed trade-off analysis
- Code generation: Needs structural clarity, can skip marketing-speak
- Research/writing: Needs nuance, shouldn't compress heavily
- Security review: Must never compress warnings — ever
A "be brief" blanket approach saves tokens but erodes quality where it matters most. What you want is an optimizer that pre-processes your prompts, trims waste, preserves safety-critical instructions, and estimates cost — before the prompt ever hits the model.
The Prompt: AI Prompt Cost Optimizer Agent
Copy this prompt into your OpenClaw-powered Telegram bot, then send it any prompt you use, and it will return an optimized version with a cost breakdown.
How to use:
- Deploy OpenClaw on GetClawCloud (60-second setup)
- Connect Telegram (built-in pairing, no config)
- Send this prompt as your agent's system prompt
- Then send any prompt you want optimized
💡 Works in any OpenClaw agent with web search. Paste your prompt and get an instant audit + optimized version.
Real Scenarios This Agent Handles
💰 "Here's my daily research prompt — what's it costing
me?"
Paste any prompt you use regularly. The agent estimates cost per call
and annual spend, then gives you a leaner version.
A single optimization can save hundreds per year.
🔧 "Optimize my code review prompt"
Your code review prompt has accumulated 400 words over months of
tweaking. The agent strips the fluff, preserves technical constraints,
and gives you a 200-word version that works just as well.
📊 "Compare my prompts"
Send multiple prompts (e.g., for research, writing, code review,
analysis). The agent audits all of them, ranks by efficiency, and
shows which ones are wasting the most tokens.
🏢 "Audit my team's system prompt"
Share a shared team system prompt. The agent checks for redundant
instructions, asks what each sentence actually adds, and produces a
lean team prompt everyone can use.
📋 "Generate a cost report for my Claude Code usage"
Describe your workflow (how many calls per day, typical prompt length,
model used). The agent estimates monthly spend and identifies the
biggest optimization opportunities.
What the HERMES.md Bug Teaches About Prompt Awareness
The HERMES.md bug (CVE-2026-31431) was discovered through systematic binary search. A developer who noticed their Max plan wasn't being used correctly cloned affected repos, tested orphan branches, and isolated individual commit message strings until the exact trigger was found.
The scary part: no error message told them what was happening. The API just said "out of extra usage" while the dashboard showed 86%+ remaining weekly capacity.
If you don't measure your token consumption, you don't know what it's costing you. The model tells you what it output, not what it could have output for half the price.
The Caveman Benchmark Proves Less Is More
Max Taylor's benchmark of the Caveman Claude Code plugin ran 24 prompts across 6 categories, 5 arms (baseline, "be brief", lite, full, ultra), with a separate Claude scoring every response. Key findings:
- Quality: All arms within 1.5% of each other (baseline: .985, brief: .985, ultra: .970)
- Key points: Every arm hit 100% of required key points
- Token savings: "Be brief" cut 34% — and the Caveman plugin matched it, not beat it
- Safety risk: On multi-step setup and security categories, compression modes were more variable — sometimes dropping warnings human reviewers need
The takeaway: just adding "be brief" saves you one-third of your tokens. But a context-aware optimizer saves you more without the safety risk — because it knows when not to compress.
Automate Your Prompt Optimization Workflow
Once you have the optimizer agent running in your Telegram bot, you can level up further:
- Weekly prompt audit: Schedule a cron job that reviews your last 7 days of prompts, identifies patterns, and suggests a better system prompt
- Pre-flight cost check: Route every prompt through the optimizer before it hits your main model — think of it as a lint step for prompt efficiency
- Team dashboard: Collect anonymized prompts from your team, run them through the optimizer, and share a weekly "cost savings leaderboard"
With OpenClaw's cron scheduling, all of this runs automatically and delivers results to your Telegram — no dashboards to check, no web UI to load.
Getting Started
Two steps, under one minute:
- Launch an OpenClaw agent on GetClawCloud — no VPS, no Docker, nothing to configure
- Paste the optimizer prompt above, then send any complex prompt for a cost audit
The same agent handles prompt optimization, research, monitoring, code review, and more — it's a single OpenClaw deployment that grows with your workflow.
Deploy Your Prompt Optimizer in 1 Minute
Launch OpenClaw on the cloud, connect Telegram, and paste the optimizer prompt. No server setup, no complex pipelines — just instant cost visibility and leaner prompts.
Start with GetClawCloud →