← Back to Blog

Your Prompts Are Bleeding Money. Here's How to Build an Optimizer Agent on Telegram

A single commit message containing the string "HERMES.md" silently routed $200 of Claude Code usage to the wrong billing bucket. Meanwhile, a developer benchmarked a popular "compression prompt" plugin against two words ("be brief") — and found the plugin didn't beat the boring default. Your prompts have a cost problem. Here's how to fix it.

Published by GetClawCloud · April 30, 2026

Two stories trending on Hacker News this week expose a painful truth about AI agent costs:

CVE-2026-31431 "Copy Fail": A commit message containing the string HERMES.md caused Claude Code API requests to route to expensive "extra usage" billing instead of included plan quota. One developer burned $200.98 in extra credits while their Max plan sat 87% unused — and no error message told them why.
"Be brief." beats a premium plugin: Max Taylor benchmarked the popular "Caveman" Claude Code compression plugin (six modes, slash commands, intensity dials) against simply prepending "Be brief." to each prompt. Result: same quality (within 1.5%), same token count. The plugin didn't beat the two-word default on either axis.

These aren't edge cases. They're symptoms of the same problem: most people have no visibility into what their prompts actually cost, and no systematic way to optimize them.

How Invisible Costs Add Up

Every time you paste a verbose prompt into Claude, GPT, or your custom agent, you're paying for every token in the conversation — including the parts that don't add value. Here's what the numbers look like in practice:

Pattern	Extra Tokens (per call)	100 calls/day	30 days
Context dump (full git log, entire file)	~4,000	400K tokens	12M tokens (~$60)
Verbose system prompt (>500 words)	~800	80K tokens	2.4M (~$12)
Repetitive history (same prompt, daily)	~2,000/turn	200K tokens	6M (~$30)
Unnecessary "thinking" overhead	~15% over baseline	—	~$15 on $100 plan

That's over $100/month in waste — per person. Scale across a team, and you're looking at serious budget bleed for nothing.

Why "Just Be Brief" Isn't the Full Solution

The Caveman benchmark showed that "Be brief." works, but it's a blunt instrument. On multi-step setup and security warnings, compression actually hurt — the model dropped warnings about irreversible actions to save tokens.

What you actually need is context-aware optimization. Different tasks need different token budgets:

Bug diagnosis: Needs full error context, compressed trade-off analysis
Code generation: Needs structural clarity, can skip marketing-speak
Research/writing: Needs nuance, shouldn't compress heavily
Security review: Must never compress warnings — ever

A "be brief" blanket approach saves tokens but erodes quality where it matters most. What you want is an optimizer that pre-processes your prompts, trims waste, preserves safety-critical instructions, and estimates cost — before the prompt ever hits the model.

The Prompt: AI Prompt Cost Optimizer Agent

Copy this prompt into your OpenClaw-powered Telegram bot, then send it any prompt you use, and it will return an optimized version with a cost breakdown.

How to use:

Deploy OpenClaw on GetClawCloud (60-second setup)
Connect Telegram (built-in pairing, no config)
Send this prompt as your agent's system prompt
Then send any prompt you want optimized

You are an AI Prompt Cost Optimizer Agent. Your job is to analyze prompts for token efficiency, identify waste, produce optimized versions, and estimate cost savings. ## Instructions User will send you a prompt they use with an AI model. Process it as follows: ### Phase 1: Audit Analyze the prompt and report: 1. **Total estimated tokens** (use ~1.3 tokens per word for English, 4 chars per token) 2. **Waste categories detected** (check all that apply): - 🟡 Fluff words: "Please", "kindly", "I'd like you to", "could you please" - 🟡 Redundant context: repeating instructions already in the user's system prompt - 🟡 Over-explanation: explaining basic concepts the model already knows - 🟡 Excessive examples: more than 3 examples for a pattern - 🟡 Nested instructions: multi-level bullet lists that can be flattened - 🟡 Empty politeness: greetings, sign-offs, conversational filler - 🔴 Safety-critical: Do NOT compress warnings about destructive operations, irreversible actions, or security-sensitive content 3. **Task type classification**: bug_diagnosis, code_generation, research, writing, security_review, creative, analysis ### Phase 2: Optimize Produce an optimized version: 1. Remove all waste identified in Phase 1 2. Preserve all safety-critical instructions verbatim 3. Keep technical terms and constraints intact 4. Flatten nested structures where possible 5. Use concise, direct language 6. Remove conversational framing (greetings, sign-offs) 7. Target: reduce tokens by at least 30% without losing semantic intent ### Phase 3: Report Present your output in this format: ## Cost Analysis - Original: X tokens - Optimized: Y tokens - Savings: Z tokens (N%) - Estimated cost at claude-opus-4: $A vs $B per call - Estimated cost at gpt-4o: $C vs $D per call - Estimated cost at claude-sonnet-4: $E vs $F per call ## Waste Log | Token Type | Est. Waste | Removed? | |---|---|---| | Fluff/politeness | X tokens | ✅/❌ | | Redundant context | X tokens | ✅/❌ | | Over-explanation | X tokens | ✅/❌ | | Excessive examples | X tokens | ✅/❌ | | Nested instructions | X tokens | ✅/❌ | | (Other) | X tokens | ✅/❌ | ## Optimized Prompt ``` [optimized prompt here] ``` ## Task-Specific Guidance [If applicable, note what compression level is safe for this task type] ## Safety Note [If any safety-critical content was detected, confirm it was preserved verbatim] ### Rules - Never compress safety warnings, destructive operation instructions, or security review content - Flag if the prompt is already well-optimized (<15% possible savings) - Use current API pricing for estimates (claude-opus-4: $15/M in / $75/M out, gpt-4o: $2.5/M in / $10/M out, claude-sonnet-4: $3/M in / $15/M out) - Provide both markdown and plain-text versions of the optimized prompt - For prompts under 50 words, note "minimal optimization possible" ## Start User has sent a prompt. Begin Phase 1: Audit.

💡 Works in any OpenClaw agent with web search. Paste your prompt and get an instant audit + optimized version.

Real Scenarios This Agent Handles

💰 "Here's my daily research prompt — what's it costing me?"
Paste any prompt you use regularly. The agent estimates cost per call and annual spend, then gives you a leaner version. A single optimization can save hundreds per year.

🔧 "Optimize my code review prompt"
Your code review prompt has accumulated 400 words over months of tweaking. The agent strips the fluff, preserves technical constraints, and gives you a 200-word version that works just as well.

📊 "Compare my prompts"
Send multiple prompts (e.g., for research, writing, code review, analysis). The agent audits all of them, ranks by efficiency, and shows which ones are wasting the most tokens.

🏢 "Audit my team's system prompt"
Share a shared team system prompt. The agent checks for redundant instructions, asks what each sentence actually adds, and produces a lean team prompt everyone can use.

📋 "Generate a cost report for my Claude Code usage"
Describe your workflow (how many calls per day, typical prompt length, model used). The agent estimates monthly spend and identifies the biggest optimization opportunities.

What the HERMES.md Bug Teaches About Prompt Awareness

The HERMES.md bug (CVE-2026-31431) was discovered through systematic binary search. A developer who noticed their Max plan wasn't being used correctly cloned affected repos, tested orphan branches, and isolated individual commit message strings until the exact trigger was found.

The scary part: no error message told them what was happening. The API just said "out of extra usage" while the dashboard showed 86%+ remaining weekly capacity.

If you don't measure your token consumption, you don't know what it's costing you. The model tells you what it output, not what it could have output for half the price.

        The HERMES.md bug wouldn't have been caught for weeks (or months) if
        that developer hadn't been systematically checking their usage. Most
        teams have no equivalent audit. A prompt optimizer agent is the
        difference between "I wonder where my credits went" and "here's exactly
        how we save $X/month."
      

The Caveman Benchmark Proves Less Is More

Max Taylor's benchmark of the Caveman Claude Code plugin ran 24 prompts across 6 categories, 5 arms (baseline, "be brief", lite, full, ultra), with a separate Claude scoring every response. Key findings:

Quality: All arms within 1.5% of each other (baseline: .985, brief: .985, ultra: .970)
Key points: Every arm hit 100% of required key points
Token savings: "Be brief" cut 34% — and the Caveman plugin matched it, not beat it
Safety risk: On multi-step setup and security categories, compression modes were more variable — sometimes dropping warnings human reviewers need

The takeaway: just adding "be brief" saves you one-third of your tokens. But a context-aware optimizer saves you more without the safety risk — because it knows when not to compress.

Automate Your Prompt Optimization Workflow

Once you have the optimizer agent running in your Telegram bot, you can level up further:

Weekly prompt audit: Schedule a cron job that reviews your last 7 days of prompts, identifies patterns, and suggests a better system prompt
Pre-flight cost check: Route every prompt through the optimizer before it hits your main model — think of it as a lint step for prompt efficiency
Team dashboard: Collect anonymized prompts from your team, run them through the optimizer, and share a weekly "cost savings leaderboard"

With OpenClaw's cron scheduling, all of this runs automatically and delivers results to your Telegram — no dashboards to check, no web UI to load.

Getting Started

Two steps, under one minute:

Launch an OpenClaw agent on GetClawCloud — no VPS, no Docker, nothing to configure
Paste the optimizer prompt above, then send any complex prompt for a cost audit

The same agent handles prompt optimization, research, monitoring, code review, and more — it's a single OpenClaw deployment that grows with your workflow.

        Two trending HN bugs this week prove that prompt awareness isn't
        optional anymore. The HERMES.md billing bug silently burned $200 because
        nobody was watching. The Caveman benchmark proved that raw token savings
        are table stakes — but safety-aware optimization is the real skill.
        Paste the optimizer prompt into your Telegram bot and start auditing
        your prompts today.
      

Deploy Your Prompt Optimizer in 1 Minute

Launch OpenClaw on the cloud, connect Telegram, and paste the optimizer prompt. No server setup, no complex pipelines — just instant cost visibility and leaner prompts.

Start with GetClawCloud →