Why Claude Code Keeps Hitting Its Limit (And How to Stretch It)

Hamza Musa

14 Apr 2026 — 6 min read

If you’re running Claude Code daily, you’ve probably noticed the same annoyance we have: your message cap burns out way faster than it should. Anthropic rolled out the 1-million-token context window expecting it to smooth things out, but for a lot of teams, it’s actually made the problem worse.

We were hitting that wall constantly, so we dug into how the limits actually work, tracked down what’s secretly eating tokens, and built a set of fixes that keep Claude Code running through the workday.

If you’re tired of watching that counter drop before you finish a feature, this is for you. We also broke down a few extra field-tested tricks in our blog post: How to Survive the Claude Limit.

How the 5-Hour Window Actually Works

First, let’s clear up how the system runs. Claude doesn’t reset limits at midnight or when you close the app. It uses a rolling 5-hour window that starts the moment you send your first message. Every message you send—across desktop, web, or CLI—counts toward your plan’s cap. The window keeps ticking even if you step away for two hours. When it finally resets, your quota returns.

Pro: ~45 messages per window
Max: ~225 messages
20x Max: ~900 messages

But those numbers are soft caps. Your actual mileage depends on three things: the model you pick, the kind of tasks you run, and server load. Anthropic also tightens limits during peak hours, so you might get throttled before you even finish your morning standup.

What’s Secretly Eating Your Tokens

It’s not just the official limits. A few hidden drains are quietly chewing through your quota:

Truncated errors stick around. If you hit a rate limit mid-response, Claude sometimes keeps the broken reply in the context and retries. That dead weight stays in the window.
Skill listings bloat the prompt. The system injects full skill menus even when you only need one, wasting tokens on options you’re not using.
Context never shrinks on its own. Every new message ships the entire conversation history, system prompts, and tool definitions. It compounds fast.

Session Commands That Keep You Under the Cap

You don’t need to rewrite your workflow. Just start using these commands strategically:

/clear – Wipes the context when a task is done. Don’t carry implementation details into testing.
/compact – Summarizes the conversation and keeps the summary in the window instead of the raw chat.
/by the way – Opens a side channel for quick questions. Keeps your main window clean and cuts token count on the next reply.
/rewind (or double-tap Escape) – Rolls back to before a bad turn. Stops Claude from sending broken code back into the context and wasting a retry.

Fix Your `claude.md` (Before It Fixes Your Budget)

Most teams treat claude.md like a manual. It shouldn’t be. If it’s over 300 lines, it’s probably costing you tokens every single turn. Claude already knows how to run standard dev servers, read file trees, and parse common frameworks. Only include:

Things it shouldn’t do
Team-specific conventions
Edge cases or custom flags

For everything else, split it out. Put database schemas in one doc, UI rules in another, and link them from claude.md. Claude pulls them in only when needed. Same goes for project rules—scope them to specific paths so they don’t load globally. Use skills for repetitive workflows, and bundle deterministic steps in scripts so the model doesn’t burn tokens figuring out what a computer could just run.

If you need a one-off instruction, use the append system prompt flag. It attaches to the current session only and disappears when you close it. Dumping it in claude.md means it stays forever, charging you for every message.

`.claude` Config Tweaks That Actually Matter

Open your .claude folder and adjust these settings:

disable_prompt_caching: false – Caches repeated prefixes. You only pay for new content.
auto_memory: false – Stops background analysis from stuffing extra files into your context.
disable_background_tasks: true – Kills background indexing, memory refactoring, and dream processes that run silently and eat tokens.
disable_thinking: true (when appropriate) – Thinking mode adds internal reasoning steps. If the task is straightforward, turn it off. If it needs light reasoning, lower the effort level instead of leaving it on auto.
max_output_tokens – Set a hard cap. Forces concise responses instead of letting the model ramble.

Model Choice & Effort Settings

Opus is powerful, but it burns through tokens ~3x faster than Sonnet. Save it for architecture reviews or complex debugging. For day-to-day coding, stick to Sonnet. For quick refactors, lint fixes, or boilerplate, drop down to Haiku.

Pair that with manual effort settings: low for simple edits, medium for feature work, high only when you actually need deep reasoning. Default is auto, which often picks heavier reasoning than the job requires.

Filter the Noise with Hooks

Hooks are underrated. If your test suite prints 50 passed and 2 failed, Claude gets all 52 lines in context. Write a quick hook that strips out the passed tests before they hit the prompt. You can do the same for logs, build output, or lint warnings. Only feed the model what it needs to fix.

Wrap Up

None of this is about working around the limits. It’s about working smarter inside them. Clean up your context, strip the dead weight, and configure the system to only load what’s necessary. Your 5-hour window will stretch further, and you’ll actually finish tasks before the counter hits zero.

We’ve packed a few extra tricks and tips about that in our blog: How to Survive the Claude Limit. If you’re shipping daily or managing a team, it’s worth a quick read.

Let me know what setup you’re running and where the limits usually catch you. We’re tweaking ours weekly, and I’ll share what actually sticks.