← Back to Blog

AI Agent Smart Code Search: Stop Burning Tokens on grep+read

Semble hit #6 on Hacker News with a shocking stat: AI agents waste up to 98% of tokens using grep+read to find code. Here's the smarter approach — and a Telegram agent that does semantic code search instead.

Published: May 18, 2026

This week, Semble landed on Hacker News with 313 points. It's an open-source code search library built for AI agents, and its core claim stopped me cold:

AI agents using grep+read waste ~98% of tokens on irrelevant code. Semble's approach retrieves exactly what's needed at 99% of the retrieval quality of a 137M-parameter transformer, using 200x less compute.

Think about what that means. Every time your AI coding agent searches for a function, a class, or an implementation detail, it's grepping the entire codebase and then reading full files to find what it needs. In a large repo — tens or hundreds of thousands of lines — that's thousands of wasted tokens per query. Across a day of agentic coding, it adds up to real money.

The Token Waste Problem

Here's the math. When an AI agent needs to find "how authentication is handled" in a codebase, the fallback is usually:

  1. grep for keywords like "auth" or "login" — returns dozens of matches
  2. Read each matching file to understand context — potentially thousands of lines
  3. Parse all that noise to extract the relevant 3-5 functions

Steps 2 and 3 are where the waste lives. The agent pays for every line it reads, even lines that have nothing to do with the question. Semble's benchmark shows that on a typical repo, grep+read consumes roughly 15,000 tokens per query vs. Semble's ~250 — a 98% reduction.

Approach Tokens per Query Latency Setup
grep + read full files ~15,000 Seconds to minutes None (built-in)
Semble (static BM25 + RRF) ~250 ~1.5 ms pip install
Code transformer (137M params) ~250 ~300 ms GPU required
Translation: If your AI agent runs 100 code searches per day with grep+read, you're burning roughly 1.5 million tokens on noise. Semble drops that to 25,000 — same result, no noise.

Why Semantic Code Search Matters for AI Agents

The deeper insight here isn't just about tokens. It's about agent architecture. An agent that has to grep and read full files is an agent that spends most of its context window on irrelevant content. That leaves less room for actual reasoning, planning, and code generation.

Semantic code search changes the equation. Instead of matching keywords, it understands intent. Ask "how do we handle rate limiting?" and it returns the actual middleware function, not every file that mentions "limit." The agent gets exactly what it needs in one shot, with 98% less noise.

With OpenClaw, you can build a Smart Code Search Agent on Telegram that handles this for your team: paste a natural language query about any codebase, and the agent retrieves the exact code you need — no grep skills required.

The Prompt

Copy-paste this into your OpenClaw Telegram bot. Send any code search question (with or without a repo URL) and get back the exact code snippets you need.

You are a semantic code search assistant. Your job is to help the user find exactly the right code in a codebase using natural language queries — not grep keywords. When the user sends a message, interpret it as one of these intents: ## 1. SEARCH QUERY (default) "If the user describes code they're looking for, search a codebase for it." Examples: - "How does authentication work in my project?" - "Find the rate limiting middleware" - "Where's the payment webhook handler?" - "Show me the database migration for users table" Respond with: - A clear, single-sentence interpretation of what they're looking for - The specific file(s) and function(s) that likely contain the answer - A summary of what the code does (not the full code, unless asked) - If the query is ambiguous, ask for clarification before searching ## 2. REPO SETUP "If the user provides a GitHub URL or path." Guide them: - For local repos: "Point your agent at the repo root with: semble search 'query' ./path/to/repo" - For remote repos: "Use: semble search 'query' https://github.com/user/repo" - Note that Semble handles cloning and indexing automatically for git URLs ## 3. OPTIMIZATION ADVICE "If the user asks about token efficiency or how to set up code search for their agent." Explain: - Semble combines Model2Vec embeddings with BM25, fused via RRF - It runs entirely on CPU with no API keys or GPU - Set it up as an MCP server for Claude Code, Cursor, or Codex - Or add the bash integration to AGENTS.md/CLAUDE.md ## Core Rules - ALWAYS prioritize semantic understanding over keyword matching - When listing results, group by file path and relevance - If the query matches multiple possible areas, present the top candidates - NEVER generate replacement code unless explicitly asked - For each result, include: file path, line number, and one-line summary - Ask clarifying questions when the query is vague ## Example Response Format **Query understood:** Looking for the authentication middleware that validates JWT tokens **Results:** 1. `src/middleware/auth.py:42-78` — JWT validation middleware, checks expiry and signature 2. `src/utils/tokens.py:15-33` — Token generation and refresh utilities 3. `config/auth_settings.py:1-20` — Auth configuration constants **Summary:** Your auth pipeline validates JWTs in middleware, generates tokens in utils, and reads config from a settings file. The core logic is ~36 lines in auth.py.

How to Use It

  1. Deploy OpenClaw on GetClawCloud
  2. Paste the prompt above as your agent's system prompt
  3. Send a natural language code search — the agent returns structured results

Pro tip: For maximum efficiency, install Semble on your development server and configure the bash integration in your AGENTS.md. Then your Telegram agent can delegate actual search queries to Semble for sub-2ms response times.

Set Up the MCP Server (One-Time)

Semble slots into any MCP-compatible agent in one command. Run this on your dev machine:

# For Claude Code
claude mcp add semble -s user -- uvx --from "semble[mcp]" semble

# For Cursor (in .cursor/mcp.json)
{
  "mcpServers": {
    "semble": {
      "command": "uvx",
      "args": ["--from", "semble[mcp]", "semble"]
    }
  }
}

# For Codex CLI (in ~/.codex/config.toml)
[mcp_servers.semble]
command = "uvx"
args = ["--from", "semble[mcp]", "semble"]

Once configured, your agent will automatically use Semble for code search instead of falling back to grep+read. No prompts, no configuration — it just works.

Why Token Efficiency Is a Feature, Not a Detail

The AI agent economy runs on tokens. Every query you make costs something — either in API fees (for hosted models) or in context window space (for reasoning depth). A token-efficient agent isn't just cheaper; it's smarter, because it can spend its limited context on actual problem-solving instead of noise.

Semble's 98% reduction isn't an optimization you apply later. It's a fundamental architecture choice: give your agent the ability to find code semantically, and it will consistently produce better results, faster, at lower cost.

The best optimization you can make to an AI coding agent isn't a better model. It's better retrieval. Stop burning tokens on grep. Search with intent.

And because this runs on Telegram via OpenClaw, your entire team gets access from the chat app they already use. No new tools, no onboarding, no monthly seat fees.

Build Your Smart Code Search Agent

Stop burning tokens on grep+read. Deploy OpenClaw on GetClawCloud, paste the prompt above, and give your team a semantic code search agent in 5 minutes.

Start on GetClawCloud →