← Back to blog

Budget Alerts for Self-Hosted OpenClaw: Stop Runaway LLM Costs

Here's a story that plays out more often than anyone admits. A developer deploys an OpenClaw agent on a Friday afternoon. The agent has a skill that retries on failure. The upstream API starts returning intermittent errors. The retry logic kicks in, each retry making a fresh LLM call to reformat the request. By Monday morning, the agent has made 40,000 API calls and the Anthropic bill is north of $800.

Nobody noticed because the agent was self-hosted. No managed platform was watching the spend. No alerts fired. The agent just kept calling the API, burning through budget, returning errors to nobody because it was the weekend.

**This is the number one financial risk of running a self-hosted AI agent.** Not the server costs. Not the bandwidth. The LLM API spend that nobody is watching.

## Why runaway costs happen

LLM API pricing is per-token. Every call to Claude, GPT-4, or any other model costs money based on the input and output length. For a well-behaved agent handling normal requests, costs are predictable. Maybe $5-20 per day depending on volume.

But agents aren't always well-behaved. Here are the most common ways costs spiral:

### Retry loops

A skill fails, the agent retries, the retry fails, the agent retries again. Each retry is a full LLM call with the complete conversation context. If the failure is persistent (the upstream API is down, a required service is unreachable), the agent will keep retrying until it hits a retry limit. If there is no retry limit, or if the limit is set too high, costs accumulate fast.

**A single retry loop with a 60-second interval and no cap can generate over 1,400 LLM calls per day.** At $0.03 per call, that's $42/day from one broken skill.

### Context window stuffing

Some skills pass large amounts of data into the LLM context: full email threads, long documents, database query results. If the data source grows unexpectedly (someone emails a 50-page PDF, a database query returns 10,000 rows instead of 100), the input tokens per call jump dramatically.

**A single call with a stuffed context window can cost 10-50x what a normal call costs.** If the skill runs frequently, this multiplies fast.

### Conversation memory bloat

Agents that maintain long conversation histories send that history with every new message. Over time, the conversation context grows, and every call gets more expensive. Without conversation pruning, a week-old conversation can cost 5x per message compared to a fresh one.

### Multi-agent chains

Orchestrated multi-agent setups multiply the problem. Agent A calls Agent B, which calls Agent C. Each hop is a separate LLM call. If the chain has a loop (Agent C asks Agent A for clarification), costs compound geometrically.

## The self-hosted monitoring gap

Cloud-hosted OpenClaw providers typically include spend dashboards and basic alerts. Self-hosted agents have none of this by default. You're calling the LLM API directly from your server, and the only way to see your spend is to check the provider's billing dashboard manually.

The problem with checking manually is that you don't check when things are going well. You check after you get the bill. By then, the damage is done.

What you need is **real-time spend monitoring with automatic alerts** that fires before you hit your budget limit, not after.

## Setting up budget alerts with ClawPulsar

ClawPulsar's cost monitoring works by connecting to your LLM provider's API to track actual spend in near-real-time. Here's how to set it up.

### Step 1: Connect your API keys

In the ClawPulsar dashboard, add your LLM provider API keys. ClawPulsar supports Anthropic, OpenAI, Google, Cohere, and any OpenAI-compatible endpoint. The keys are used read-only to check usage and billing data. They're stored encrypted and never used to make LLM calls.

### Step 2: Set your budget

Define your monthly budget for each provider. Be realistic. Check your last three months of billing to establish a baseline, then set the budget at 120-150% of your typical spend to allow for normal variation.

### Step 3: Configure alert thresholds

ClawPulsar supports multiple alert levels:

- **50% threshold**: informational alert. Everything is probably fine, but you're on pace to use your full budget. Good for awareness. - **80% threshold**: warning alert. You're likely to exceed your budget if the current rate continues. Time to investigate. - **100% threshold**: critical alert. You've hit your budget. Immediate action needed. - **Rate spike alert**: fires when your hourly spend exceeds 3x the trailing 7-day average. This catches runaway loops before they eat your entire budget.

**The rate spike alert is the most important one.** Budget percentage alerts tell you about cumulative spend. The rate spike alert catches sudden cost explosions in real-time. A retry loop that starts at 2 AM fires the rate spike alert within an hour, not at the end of the month when you hit 100%.

### Step 4: Choose alert channels

Send alerts to wherever you'll actually see them:

- **Email**: reliable but slow. Good for informational alerts. - **Slack**: fast and visible to the whole team. Good for warnings. - **PagerDuty / Opsgenie**: for critical alerts that need immediate attention, even at 3 AM. - **Webhook**: trigger your own automation when an alert fires. Useful for auto-pausing the agent.

### Step 5: Set up auto-pause (optional but recommended)

ClawPulsar can automatically pause your agent's LLM calls when a critical threshold is hit. The agent stays running but returns a "temporarily unavailable" response instead of making API calls. This is the nuclear option, but it's better than a $2,000 surprise on your credit card.

Auto-pause can be configured per threshold. A common setup is: alert at 80%, auto-pause at 120% of budget. This gives you a window to investigate before the hard stop kicks in.

## What to monitor beyond raw spend

Raw spend tells you how much you're paying. It doesn't tell you why. ClawPulsar tracks additional metrics that help you understand and optimize your costs:

### Cost per conversation

How much does a typical conversation cost? Track the average and the outliers. If your average conversation costs $0.05 but one conversation cost $3.00, you have a skill or workflow that needs optimization.

### Cost per skill

Which skills are the most expensive? A skill that makes multiple LLM calls per invocation (a research skill that searches, summarizes, and formats) costs more than a simple Q&A skill. Knowing the per-skill breakdown helps you target optimization efforts.

### Token usage trends

Are your average input tokens growing? This might indicate conversation memory bloat or context window stuffing. Are output tokens growing? The model might be getting more verbose, which costs money and usually means worse answers.

### Error-to-cost ratio

How much are you spending on failed requests? If 20% of your LLM spend goes to requests that ultimately fail (retries, timeouts, malformed responses), you have an efficiency problem worth fixing.

ClawPulsar surfaces all of these metrics in its [cost dashboard](/try). You can drill down by time period, skill, conversation, or provider.

## Practical cost optimization tips

Once you have visibility into your costs, here are the most effective optimizations:

**Set retry limits on every skill.** No skill should retry more than 3 times. Use exponential backoff between retries. If a call fails 3 times, log the error and move on. Don't burn money on a problem that won't fix itself.

**Prune conversation history.** Keep the last 10-20 messages in context, not the entire conversation. Summarize older context instead of including it verbatim. This keeps input tokens manageable.

**Use the right model for each task.** Not every LLM call needs Claude Opus. Use Haiku or Sonnet for simple tasks (classification, formatting, short answers) and reserve Opus for complex reasoning. ClawProd's model routing feature can automate this.

**Cache common responses.** If your agent answers the same FAQ 50 times a day, cache the response instead of making 50 identical LLM calls. Even a simple in-memory cache with a 1-hour TTL can cut costs dramatically.

**Monitor context window sizes.** Add instrumentation that logs the token count of every LLM call. Set alerts for calls that exceed a threshold (e.g., 10,000 input tokens). Investigate and optimize the outliers.

## The cost of not monitoring

Let's do the math. An unmonitored self-hosted agent with a retry loop bug:

- **Retry interval**: 30 seconds - **Calls per hour**: 120 - **Average cost per call**: $0.04 - **Cost per hour**: $4.80 - **Cost per day**: $115.20 - **Cost over a weekend**: $230.40

That's one bug, one weekend, one skill. Multiply by every skill with retry logic and every edge case that triggers a loop. The monitoring and alerting cost from ClawPulsar is a rounding error compared to a single runaway incident.

## Getting started

If you're running a self-hosted OpenClaw agent and you're not monitoring LLM spend, you're flying blind. The first surprise bill is a matter of when, not if. Head to [ClawPulsar's try page](/try) to connect your API keys and set up budget alerts. The setup takes less than 10 minutes, and the rate spike alert alone can save you hundreds of dollars the first time it catches a runaway loop.

Related posts

How to Set Up a Webhook Relay for Self-Hosted OpenClawWebhook Monitoring Best Practices for Production AI AgentsAgent Uptime Monitoring: Why Internal Health Checks Are Not EnoughOpenClaw Uptime Monitoring Without a Cloud Vendor