← Back to Blog

AI Video Generation Agent: Give Your Agent an API Key, Watch It Make Videos

I gave my Telegram AI agent a Wavespeed API key and told it to generate a video. It found the docs, wrote a script, called the API, waited for it to finish, and delivered the result. No hand-holding. Here's exactly how.

Published May 18, 2026

The Setup

I had a simple goal: generate a short AI video using ByteDance's Seedance 2.0 model. I knew Wavespeed.ai hosted it behind an API. I had my API key ready.

Instead of manually curl-ing endpoints and stitching together a pipeline, I handed everything to my Telegram AI agent in one message:

"Use wavespeed.ai and bytedance/seedance-2.0 to generate a video. My API key is sk-xxxx. Here's my prompt: [describe the video]."

That was it. No step-by-step instructions. No explanation of Wavespeed's authentication flow. No Python template. One sentence, one API key.

What the Agent Did

Here's the full chain the agent executed autonomously:

Step	What the Agent Did
1	Visited docs.wavespeed.ai, found the /v1/video/generations endpoint
2	Discovered the required request format (model id, prompt, parameters, API key in Authorization header)
3	Wrote a Python script using `requests` to call the async generation endpoint
4	Extracted the `generation_id` from the initial response and started polling
5	Polled the status endpoint every 15 seconds until status changed to "completed"
6	Downloaded the resulting video URL and sent it back as a Telegram message

Total time from "go" to "video received": ~4 minutes. The model took most of that time generating. The agent's part was done in under 30 seconds.

Important: The agent used the async endpoint. It didn't just fire-and-forget — it actively polled until the video was ready. This is the difference between a real agent and a simple API wrapper.

Why This Matters

Most people think AI agents are good for chat, summarization, and maybe some light web scraping. This experience shows a much more powerful pattern:

"Give the agent a tool (API key + docs URL) and a goal. It handles the rest — discovery, implementation, error handling, delivery."

The model didn't ask for clarification. It didn't say "I can't write code." It treated the video generation as any other problem: search the docs, understand the API, write the code, run it, wait, deliver.

This is the same pattern you'd use for any API-based task:

Image generation — pass a Replicate or Stability AI key
Transcription — hand it an AssemblyAI or Deepgram key
Voice cloning — ElevenLabs key + a reference clip
Data enrichment — any REST API with public docs

The Prompt

This is the exact prompt I used. It's designed so the agent does everything — no boilerplate needed on your end.

You are an AI Video Generation Agent. Your only tool is the user's Wavespeed API key. Your goal: generate a video using bytedance/seedance-2.0. Rules: 1. Read the Wavespeed API docs first (https://docs.wavespeed.ai) — do not assume the endpoint format. 2. Find the correct endpoints for: - Creating an async video generation job - Checking generation status - Retrieving the completed video URL 3. Write and execute a Python script that: - Sends a POST to the create endpoint with the user's prompt and model_id=bytedance/seedance-2.0 - Extracts the generation_id from the response - Polls the status endpoint every 15 seconds - When status=completed, gets the video URL 4. Do NOT return a placeholder or a template. Generate the video for real. 5. If the API returns an error, read the error message, fix the issue, and retry. 6. Once the video URL is available, present it clearly with the prompt used. API Key: {YOUR_WAVESPEED_API_KEY} User's video prompt: {Describe what you want to see in the video}

Replace {YOUR_WAVESPEED_API_KEY} with your actual key, and {Describe what you want to see} with your video idea. Send it to your agent. That's it.

How to Use It

Get a Wavespeed account — sign up at wavespeed.ai, grab your API key
Paste the prompt into your Telegram agent that can execute code
Send your video idea and wait ~4 minutes for the result

Real Output Example

I asked for: "A futuristic city at sunset with flying cars and neon signs. Cinematic lighting, slow pan." The agent returned back a shareable video URL from Wavespeed within 4 minutes. No manual coding, no terminal, no cloud console — just Telegram.

This is a prime example of what I call Agent-as-Engineer: the agent treats APIs as tools, reads documentation as context, and executes multi-step workflows as easily as answering a question.

Run Your Own Video Generation Agent

Deploy this prompt on GetClawCloud in one click. Your agent gets web search, code execution, and file delivery — everything needed to turn an API key into a finished video.

Get Started Free