← Back to Blog

Your Prompts Are Bleeding Money. Here's How to Build an Optimizer Agent on Telegram

A single commit message containing the string "HERMES.md" silently routed $200 of Claude Code usage to the wrong billing bucket. Meanwhile, a developer benchmarked a popular "compression prompt" plugin against two words ("be brief") — and found the plugin didn't beat the boring default. Your prompts have a cost problem. Here's how to fix it.

Published by GetClawCloud · April 30, 2026

Two stories trending on Hacker News this week expose a painful truth about AI agent costs:

These aren't edge cases. They're symptoms of the same problem: most people have no visibility into what their prompts actually cost, and no systematic way to optimize them.

How Invisible Costs Add Up

Every time you paste a verbose prompt into Claude, GPT, or your custom agent, you're paying for every token in the conversation — including the parts that don't add value. Here's what the numbers look like in practice:

Pattern Extra Tokens (per call) 100 calls/day 30 days
Context dump (full git log, entire file) ~4,000 400K tokens 12M tokens (~$60)
Verbose system prompt (>500 words) ~800 80K tokens 2.4M (~$12)
Repetitive history (same prompt, daily) ~2,000/turn 200K tokens 6M (~$30)
Unnecessary "thinking" overhead ~15% over baseline ~$15 on $100 plan

That's over $100/month in waste — per person. Scale across a team, and you're looking at serious budget bleed for nothing.

Why "Just Be Brief" Isn't the Full Solution

The Caveman benchmark showed that "Be brief." works, but it's a blunt instrument. On multi-step setup and security warnings, compression actually hurt — the model dropped warnings about irreversible actions to save tokens.

What you actually need is context-aware optimization. Different tasks need different token budgets:

A "be brief" blanket approach saves tokens but erodes quality where it matters most. What you want is an optimizer that pre-processes your prompts, trims waste, preserves safety-critical instructions, and estimates cost — before the prompt ever hits the model.

The Prompt: AI Prompt Cost Optimizer Agent

Copy this prompt into your OpenClaw-powered Telegram bot, then send it any prompt you use, and it will return an optimized version with a cost breakdown.

How to use:

  1. Deploy OpenClaw on GetClawCloud (60-second setup)
  2. Connect Telegram (built-in pairing, no config)
  3. Send this prompt as your agent's system prompt
  4. Then send any prompt you want optimized
You are an AI Prompt Cost Optimizer Agent. Your job is to analyze prompts for token efficiency, identify waste, produce optimized versions, and estimate cost savings. ## Instructions User will send you a prompt they use with an AI model. Process it as follows: ### Phase 1: Audit Analyze the prompt and report: 1. **Total estimated tokens** (use ~1.3 tokens per word for English, 4 chars per token) 2. **Waste categories detected** (check all that apply): - 🟡 Fluff words: "Please", "kindly", "I'd like you to", "could you please" - 🟡 Redundant context: repeating instructions already in the user's system prompt - 🟡 Over-explanation: explaining basic concepts the model already knows - 🟡 Excessive examples: more than 3 examples for a pattern - 🟡 Nested instructions: multi-level bullet lists that can be flattened - 🟡 Empty politeness: greetings, sign-offs, conversational filler - 🔴 Safety-critical: Do NOT compress warnings about destructive operations, irreversible actions, or security-sensitive content 3. **Task type classification**: bug_diagnosis, code_generation, research, writing, security_review, creative, analysis ### Phase 2: Optimize Produce an optimized version: 1. Remove all waste identified in Phase 1 2. Preserve all safety-critical instructions verbatim 3. Keep technical terms and constraints intact 4. Flatten nested structures where possible 5. Use concise, direct language 6. Remove conversational framing (greetings, sign-offs) 7. Target: reduce tokens by at least 30% without losing semantic intent ### Phase 3: Report Present your output in this format: ## Cost Analysis - Original: X tokens - Optimized: Y tokens - Savings: Z tokens (N%) - Estimated cost at claude-opus-4: $A vs $B per call - Estimated cost at gpt-4o: $C vs $D per call - Estimated cost at claude-sonnet-4: $E vs $F per call ## Waste Log | Token Type | Est. Waste | Removed? | |---|---|---| | Fluff/politeness | X tokens | ✅/❌ | | Redundant context | X tokens | ✅/❌ | | Over-explanation | X tokens | ✅/❌ | | Excessive examples | X tokens | ✅/❌ | | Nested instructions | X tokens | ✅/❌ | | (Other) | X tokens | ✅/❌ | ## Optimized Prompt ``` [optimized prompt here] ``` ## Task-Specific Guidance [If applicable, note what compression level is safe for this task type] ## Safety Note [If any safety-critical content was detected, confirm it was preserved verbatim] ### Rules - Never compress safety warnings, destructive operation instructions, or security review content - Flag if the prompt is already well-optimized (<15% possible savings) - Use current API pricing for estimates (claude-opus-4: $15/M in / $75/M out, gpt-4o: $2.5/M in / $10/M out, claude-sonnet-4: $3/M in / $15/M out) - Provide both markdown and plain-text versions of the optimized prompt - For prompts under 50 words, note "minimal optimization possible" ## Start User has sent a prompt. Begin Phase 1: Audit.

💡 Works in any OpenClaw agent with web search. Paste your prompt and get an instant audit + optimized version.

Real Scenarios This Agent Handles

💰 "Here's my daily research prompt — what's it costing me?"
Paste any prompt you use regularly. The agent estimates cost per call and annual spend, then gives you a leaner version. A single optimization can save hundreds per year.

🔧 "Optimize my code review prompt"
Your code review prompt has accumulated 400 words over months of tweaking. The agent strips the fluff, preserves technical constraints, and gives you a 200-word version that works just as well.

📊 "Compare my prompts"
Send multiple prompts (e.g., for research, writing, code review, analysis). The agent audits all of them, ranks by efficiency, and shows which ones are wasting the most tokens.

🏢 "Audit my team's system prompt"
Share a shared team system prompt. The agent checks for redundant instructions, asks what each sentence actually adds, and produces a lean team prompt everyone can use.

📋 "Generate a cost report for my Claude Code usage"
Describe your workflow (how many calls per day, typical prompt length, model used). The agent estimates monthly spend and identifies the biggest optimization opportunities.

What the HERMES.md Bug Teaches About Prompt Awareness

The HERMES.md bug (CVE-2026-31431) was discovered through systematic binary search. A developer who noticed their Max plan wasn't being used correctly cloned affected repos, tested orphan branches, and isolated individual commit message strings until the exact trigger was found.

The scary part: no error message told them what was happening. The API just said "out of extra usage" while the dashboard showed 86%+ remaining weekly capacity.

If you don't measure your token consumption, you don't know what it's costing you. The model tells you what it output, not what it could have output for half the price.

The HERMES.md bug wouldn't have been caught for weeks (or months) if that developer hadn't been systematically checking their usage. Most teams have no equivalent audit. A prompt optimizer agent is the difference between "I wonder where my credits went" and "here's exactly how we save $X/month."

The Caveman Benchmark Proves Less Is More

Max Taylor's benchmark of the Caveman Claude Code plugin ran 24 prompts across 6 categories, 5 arms (baseline, "be brief", lite, full, ultra), with a separate Claude scoring every response. Key findings:

The takeaway: just adding "be brief" saves you one-third of your tokens. But a context-aware optimizer saves you more without the safety risk — because it knows when not to compress.

Automate Your Prompt Optimization Workflow

Once you have the optimizer agent running in your Telegram bot, you can level up further:

With OpenClaw's cron scheduling, all of this runs automatically and delivers results to your Telegram — no dashboards to check, no web UI to load.

Getting Started

Two steps, under one minute:

  1. Launch an OpenClaw agent on GetClawCloud — no VPS, no Docker, nothing to configure
  2. Paste the optimizer prompt above, then send any complex prompt for a cost audit

The same agent handles prompt optimization, research, monitoring, code review, and more — it's a single OpenClaw deployment that grows with your workflow.

Two trending HN bugs this week prove that prompt awareness isn't optional anymore. The HERMES.md billing bug silently burned $200 because nobody was watching. The Caveman benchmark proved that raw token savings are table stakes — but safety-aware optimization is the real skill. Paste the optimizer prompt into your Telegram bot and start auditing your prompts today.

Deploy Your Prompt Optimizer in 1 Minute

Launch OpenClaw on the cloud, connect Telegram, and paste the optimizer prompt. No server setup, no complex pipelines — just instant cost visibility and leaner prompts.

Start with GetClawCloud →