OpenClaw Rate Limiting: Staying Within API Budgets

OpenClaw rate limiting strategies to keep your AI agent within API quotas. Learn how to configure limits, handle 429s, and monitor token usage.

Mark Rachapoom

March 26, 2026·9 min read

OpenClaw agents can burn through API quotas fast. A single multi-step task might make dozens of LLM calls, plus API calls to Stripe, GitHub, or Notion. Without rate limiting, you'll hit quota errors at the worst moments — or rack up a bill that surprises you at month end. This guide covers the practical controls you have, how to implement them, and how to monitor usage over time.

If you haven't set up DenchClaw yet, start with the setup guide. For background on how DenchClaw works, see what DenchClaw is.

The Two Types of Rate Limits You're Managing#

Rate limits in an OpenClaw deployment come from two sources:

1. LLM provider limits — Your Claude, OpenAI, or other LLM provider has per-minute and per-day token limits. Exceed them and you get 429 errors.

2. External API limits — Every third-party API your agent calls (Stripe, GitHub, Notion, etc.) has its own rate limits. These are independent of the LLM limits.

Managing both requires different strategies. LLM limits are about token budgets and request frequency. External API limits are about request volume and timing.

Understanding Your LLM Token Budget#

Check your current tier#

Before you can manage token usage, know your limits:

Anthropic: Check your tier at console.anthropic.com — Tier 1 starts at 50K tokens/minute, Tier 4 goes to 400K+
OpenAI: Check at platform.openai.com/usage

The relevant limits for OpenClaw:

TPM (tokens per minute) — affects how fast your agent can operate
RPM (requests per minute) — affects how many separate LLM calls per minute
TPD (tokens per day) — the daily budget ceiling

What burns tokens in OpenClaw#

Every agent response involves:

System prompt (sent with every call) — ~500-2,000 tokens
Conversation history (grows with each turn) — varies
Tool outputs that get fed back — can be large (e.g., full file contents)
The agent's response — typically 500-2,000 tokens

A simple question might use 3,000 tokens. A complex task with many tool calls and large outputs can use 50,000+.

Controlling Token Usage#

1. Keep the system prompt lean#

The system prompt is sent with every LLM call. Every skill you load adds to it. Be selective about which skills are active:

Don't preload every skill. Load them on demand:

# Instead of always having 10 skills loaded,
# load only what the current task needs
"Load the Stripe skill and check my recent charges."

2. Limit conversation history#

Long chat histories increase token usage per call. When a task is done, start a fresh session rather than continuing the same conversation:

# Open a fresh session for a new task
openclaw session new

DenchClaw sessions are cheap to create. Use them liberally — a fresh context is also a cleaner context for the agent.

3. Control tool output size#

Large tool outputs (reading big files, fetching long web pages) consume tokens. Instruct the agent to limit what it returns:

Read only the first 50 lines of the log file.
Fetch just the title and description from each URL, not the full page.
Return only records where status='active', not the entire table.

4. Use streaming for long responses#

For long operations, streaming lets you see progress without waiting for a massive response. Enable it in your DenchClaw config.

Implementing Rate Limiting in Skills#

When your agent calls external APIs in loops, rate limiting needs to be explicit in your skill files:

Add sleep to bulk operations#

In your skill markdown, always specify delays for bulk operations:

## Bulk Operation Pattern
 
When processing more than 10 items in a loop:
1. Process each item
2. Sleep 0.1 seconds between requests (10/sec rate)
3. If you receive a 429 response, sleep 2 seconds and retry once
4. After 3 consecutive failures, stop and report the error
 
Never fire more than 10 API requests per second to any external service.

Handle 429 responses#

# Pattern for shell-based API calls with retry
make_api_call() {
  local url=$1
  local max_retries=3
  local retry=0
  
  while [ $retry -lt $max_retries ]; do
    response=$(curl -s -w "\n%{http_code}" "$url" ...)
    status_code=$(echo "$response" | tail -1)
    body=$(echo "$response" | head -n -1)
    
    if [ "$status_code" = "429" ]; then
      retry_after=$(echo "$body" | jq -r '.retry_after // 2')
      echo "Rate limited. Waiting ${retry_after}s..."
      sleep $retry_after
      retry=$((retry + 1))
    else
      echo "$body"
      return 0
    fi
  done
  
  echo "Failed after $max_retries retries" >&2
  return 1
}

Include this pattern in your skill files for any high-volume external API calls.

Monitoring Token Usage#

Track per-session token usage#

DenchClaw logs session metadata including token counts. Query your usage from DuckDB:

-- Token usage by session (from logs)
SELECT 
  session_id,
  SUM(input_tokens) as total_input,
  SUM(output_tokens) as total_output,
  SUM(input_tokens + output_tokens) as total_tokens,
  DATE(created_at) as date
FROM session_logs
GROUP BY session_id, date
ORDER BY total_tokens DESC
LIMIT 20;

Set up usage alerts#

Create a simple script that checks your Anthropic API usage and alerts if you're approaching limits:

#!/bin/bash
# ~/.openclaw-dench/workspace/scripts/check-usage.sh
 
# Get today's usage from Anthropic API
USAGE=$(curl -s \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  "https://api.anthropic.com/v1/usage?date=$(date +%Y-%m-%d)")
 
TOKENS_USED=$(echo $USAGE | jq '.tokens_used // 0')
DAILY_LIMIT=1000000  # Adjust to your tier
 
PERCENT=$((TOKENS_USED * 100 / DAILY_LIMIT))
 
if [ $PERCENT -gt 80 ]; then
  echo "WARNING: $PERCENT% of daily token budget used ($TOKENS_USED / $DAILY_LIMIT)"
  # Optionally: send notification
fi

Track external API usage#

For external APIs, track usage in DuckDB:

-- Create an API call tracking table
CREATE TABLE IF NOT EXISTS api_call_log (
  id INTEGER PRIMARY KEY,
  service VARCHAR NOT NULL,
  endpoint VARCHAR,
  status_code INTEGER,
  tokens_used INTEGER,
  called_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
 
-- Query usage by service per day
SELECT 
  service,
  DATE(called_at) as date,
  COUNT(*) as calls,
  SUM(CASE WHEN status_code = 429 THEN 1 ELSE 0 END) as rate_limited
FROM api_call_log
GROUP BY service, date
ORDER BY date DESC, calls DESC;

Have your skills insert into this table when they make API calls.

Configuration Controls in OpenClaw#

Set a maximum tokens-per-session limit#

Add a session budget to your agent config to cap how much a single session can spend:

{
  "session": {
    "max_tokens_per_session": 100000,
    "max_tool_calls_per_session": 50,
    "timeout_minutes": 30
  }
}

When the session hits these limits, the agent stops and reports what it accomplished before being cut off. This prevents runaway sessions from consuming your entire daily budget.

Limit tool call depth#

For complex tasks that spawn subagents, limit nesting depth:

{
  "subagent": {
    "max_depth": 2,
    "max_concurrent": 3
  }
}

Deeper subagent trees multiply token usage. Most tasks don't need more than 2 levels of delegation.

Cost-Saving Patterns#

Cache expensive queries#

If your agent frequently runs the same DuckDB query, cache the result:

# Cache a slow query result for 1 hour
CACHE_FILE="/tmp/crm_summary_$(date +%Y%m%d%H).json"
if [ ! -f "$CACHE_FILE" ]; then
  duckdb ~/.openclaw-dench/workspace/workspace.duckdb \
    "SELECT ..." > "$CACHE_FILE"
fi
cat "$CACHE_FILE"

Use fast models for simple tasks#

Not every task needs the most capable model. Route simple tasks to faster, cheaper models. In your skill files:

## Model Selection Guidance
 
For simple tasks (single lookups, status checks): use a fast model
For complex tasks (multi-step analysis, code generation): use the default model
For creative tasks (writing, synthesis): use the default model with thinking enabled

Batch API calls#

Instead of making 50 individual API calls, batch where the API supports it:

# Stripe: retrieve multiple customers in one call
curl -s -u "$STRIPE_API_KEY:" \
  "https://api.stripe.com/v1/customers?limit=100"
 
# Rather than 100 individual:
# curl ".../customers/cus_1"
# curl ".../customers/cus_2"
# ...

Handling Rate Limit Errors Gracefully#

When you hit a rate limit, the agent should degrade gracefully rather than fail completely.

Add this to your skill files:

## Error Recovery
 
If you receive a rate limit error (429) from any API:
1. Stop making new requests immediately
2. Wait the time specified in Retry-After header (or 60 seconds if not provided)
3. Resume where you left off — do not restart from the beginning
4. Report how many items were processed before the rate limit hit
5. Ask the user if they want to continue or stop
 
For LLM rate limits: save your current progress to DuckDB before stopping.

Rate Limiting in Enterprise Deployments#

For the enterprise deployment scenario with multiple users:

Per-user API keys — Give each user their own API key so you can track and limit per-person
Shared key with quotas — If using shared keys, implement a token bucket in Redis or DuckDB that all instances check before calling
Priority queues — Give human-interactive sessions priority over background batch jobs

FAQ#

Q: How do I know if I'm being rate limited vs. hitting an error? A: Rate limits return HTTP 429 with a Retry-After header. Check the response status code. A 500 is an API error; a 429 is rate limiting.

Q: Can I set different token budgets for different task types? A: Not natively, but you can implement this in your skill files by instructing the agent to estimate task complexity and abort if the estimated tokens exceed a threshold.

Q: Does DenchClaw cache LLM responses? A: Not automatically. You can implement caching in your skills using DuckDB or the filesystem, but the LLM layer itself doesn't deduplicate identical requests.

Q: What happens when a subagent hits a rate limit? A: The subagent's session fails and the error is reported back to the parent orchestrator. The parent can retry or take alternative action depending on the skill instructions.

Q: Is there a way to see real-time token usage during a session? A: Check the session logs in ~/.openclaw-dench/workspace/.openclaw/logs/. The JSONL format includes token counts per call if your LLM provider returns them in the response.

Ready to try DenchClaw? Install in one command: npx denchclaw. Full setup guide →

Written by