Claude Code Performance and Cost Optimization Guide: Making Your AI Coding Assistant Fast and Efficient
2026-03-18·17 min read·AIEngineering
Introduction
In previous posts, we covered context management operations in detail, discussed how to write effective prompts, and touched on basic cost control strategies in the CI/CD guide. But we've been missing a post that systematically addresses optimization from a performance and cost perspective.
Claude Code is powerful, but it's not free. Every conversation, every file read, every code generation consumes tokens. Used well, it's a productivity multiplier. Used poorly, the bill will sting.
The goal of this post is straightforward: maximize efficiency and minimize cost without sacrificing output quality.
Understanding Costs: Token Pricing Mechanics
To optimize costs, you first need to understand how costs are generated.
What Are Tokens
Tokens are the basic units that large language models use to process text. They're not characters or words — they're "fragments" produced by the model's tokenizer.
A key insight: Chinese text consumes significantly more tokens than English. The same semantic content in Chinese typically requires 1.5-2x more tokens, because Chinese characters have lower coverage in the model's vocabulary and are often split into multiple tokens.
# English: ~4 tokens
"Hello world" → ["Hello", " world"]
# Chinese: ~6-8 tokens
"你好世界" → ["你", "好", "世", "界"] (each character may be 1-2 tokens)
If you primarily interact with Claude in Chinese, your token consumption will be higher than English users. Worth considering when writing your CLAUDE.md and prompts.
Token Consumption Breakdown
Every time you interact with Claude Code, token consumption consists of these components:
| Component | Description | Consumed Every Turn? |
|---|---|---|
| System prompt | Claude Code's built-in instructions | Yes (sent every turn) |
| CLAUDE.md | Your project configuration | Yes (sent every turn) |
| Conversation history | All previous conversation content | Yes (accumulates) |
| User input | Your current message | Yes |
| Tool call results | File reads, command outputs, etc. | Yes (accumulates) |
| Model output | Claude's response and code | Yes |
The critical point: conversation history and tool call results accumulate. The longer the session, the more tokens each turn consumes, because all previous content is re-sent to the model.
Input vs Output Price Difference
In Claude's pricing model, output tokens are significantly more expensive than input tokens (typically 3-5x). This means:
- Having Claude generate verbose, redundant code = burning money
- Having Claude output concise, precise answers = saving money
- Explicitly requesting "only output the key code, no explanations" in your prompt can significantly reduce output tokens
Model Price Comparison
| Model | Input Price (per MTok) | Output Price (per MTok) | Capability |
|---|---|---|---|
| Opus | $15 | $75 | Strongest reasoning, complex architecture |
| Sonnet | $3 | $15 | Daily development, best value |
| Haiku | $0.25 | $1.25 | Simple tasks, maximum efficiency |
To put it in perspective: Opus output costs 5x Sonnet and 60x Haiku.
A Real Session Cost Breakdown
Consider a typical bug-fixing session:
Session flow:
1. You describe the bug (200 tokens input)
2. Claude reads 3 files (3000 tokens tool results)
3. Claude analyzes and asks questions (500 tokens output)
4. You provide more info (100 tokens input)
5. Claude reads 2 more files (2000 tokens tool results)
6. Claude proposes a fix (800 tokens output)
7. You confirm execution (50 tokens input)
8. Claude modifies code (600 tokens output)
Cumulative consumption:
- Turn 1: system prompt(2000) + CLAUDE.md(500) + input(200) = 2700 input + analysis output
- Turn 4: all above + conversation history + new input ≈ 8000 input + proposal output
- Turn 7: all accumulated ≈ 12000 input + code output
With Sonnet: ~$0.05-0.08
With Opus: ~$0.25-0.40
The longer the session, the faster the per-turn cost grows. This is why session management matters so much.
Model Selection Strategy
Choosing the right model is the first step in cost optimization — and the most commonly overlooked. Many people default to Sonnet for everything, but different tasks call for different models.
The Three Models
Opus — The deep thinker:
- Complex architecture design and refactoring
- Large-scale changes spanning multiple files
- Deep reasoning for tricky bug investigations
- Security audits and code reviews
Sonnet — The all-rounder:
- Day-to-day feature development
- Medium-complexity bug fixes
- Code explanations and documentation
- Best price-to-performance ratio for most coding tasks
Haiku — The quick executor:
- Simple code formatting
- Variable renaming, type annotations
- Generating boilerplate code
- Simple Q&A and lookups
Choosing Models by Task Type
| Task Type | Recommended Model | Reason |
|---|---|---|
| New feature (simple) | Sonnet | Best value |
| New feature (complex architecture) | Opus | Needs deep reasoning |
| Bug fix (straightforward) | Sonnet | Sufficient and fast |
| Bug fix (mysterious) | Opus | Needs cross-file reasoning |
| Code refactoring | Sonnet/Opus | Depends on complexity |
| Writing tests | Sonnet | Pattern-based task |
| Code review | Sonnet | Sufficient |
| Generating boilerplate | Haiku | Simple repetitive task |
| Documentation/comments | Haiku | No deep reasoning needed |
| Explaining code | Sonnet | Needs comprehension |
Configuring Models in settings.json
You can set a default model in settings.json to avoid manual switching every time (see the settings guide for details):
{
"model": "claude-sonnet-4-20250514",
"smallFastModel": "claude-haiku-4-5-20251001"
}smallFastModel is used for Claude Code's internal lightweight tasks (like auto-completion and file summaries). Setting it to Haiku further reduces costs.
Switching Models Mid-Session
Use the /model command to switch models on the fly:
# Start with Sonnet for daily development
> Help me implement user login
# Hit a complex problem, switch to Opus
/model opus
> This auth flow has a race condition, help me analyze it
# Problem solved, switch back to Sonnet
/model sonnet
> Great, now write the fixPractical advice: Default to Sonnet, switch to Opus only when Sonnet can't handle the problem. For most daily development tasks, the output quality difference between Sonnet and Opus is marginal, but the cost difference is 5x.
Same Task, Different Model Costs
Take "refactoring a 200-line React component" as an example:
| Metric | Opus | Sonnet | Haiku |
|---|---|---|---|
| Input tokens | ~8000 | ~8000 | ~8000 |
| Output tokens | ~3000 | ~3000 | ~3000 |
| Input cost | $0.12 | $0.024 | $0.002 |
| Output cost | $0.225 | $0.045 | $0.00375 |
| Total cost | $0.345 | $0.069 | $0.00575 |
| Quality | Excellent | Good | Adequate |
For this kind of medium-complexity task, Sonnet is the optimal choice — quality close to Opus at 1/5 the cost.
Context Optimization
If model selection determines the "unit price," context management determines the "volume." Context is the biggest cost variable — for the same task, someone with good context management might spend only 1/3 of the tokens.
For detailed context management commands and techniques, see the context management guide. Here we focus on the "why" and the strategies from a cost perspective.
Common Causes of Context Bloat
| Cause | Impact | Solution |
|---|---|---|
| Not clearing completed tasks | Conversation history keeps accumulating | /clear after task completion |
| Reading large files at once | Thousands of lines dumped into context | Specify line ranges or ask Claude to read only key sections |
| Repeated trial-and-error during debugging | Failed attempts fill up context | /compact after reaching a certain point |
| Overly verbose CLAUDE.md | Sent every single turn | Trim to core rules only |
| Unnecessary tool calls | Each call result enters context | Specify exactly what info you need in your prompt |
Three Core Strategies
Strategy 1: Timely /clear
Clear immediately after completing a task. Don't do unrelated work in the same session. This is the simplest and most effective cost control measure.
# Bad: one session for three tasks
> Fix the login bug ← 5000 tokens consumed
> Also add dark mode ← 15000 tokens (includes previous 5000)
> And write some unit tests ← 30000 tokens (includes previous 20000)
# Total: ~50000 tokens
# Good: three independent sessions
> Fix the login bug ← 5000 tokens consumed
/clear
> Add dark mode ← 10000 tokens consumed
/clear
> Write unit tests ← 10000 tokens consumed
# Total: ~25000 tokens — saved half
Strategy 2: Smart /compact
When a task is mid-progress but context is already large, use /compact to compress. The key is providing a good compression prompt:
/compact Keep: 1. Current modification plan 2. Confirmed file list 3. Next steps. Discard all debugging details.Strategy 3: Precise Context Feeding
Don't say "help me look at this project." Say "help me look at src/auth/login.ts lines 50-80." The more precise you are, the fewer files Claude reads, and the cleaner your context stays.
Long Session vs Short Session Cost Comparison
| Scenario | Long session (no clearing) | Short sessions (timely clearing) | Savings |
|---|---|---|---|
| Fix 3 independent bugs | ~45000 tokens | ~20000 tokens | ~55% |
| Develop a feature + write tests | ~60000 tokens | ~35000 tokens | ~42% |
| Code review 5 files | ~80000 tokens | ~40000 tokens | ~50% |
CLAUDE.md Lean Principles
CLAUDE.md is sent in every single turn of conversation, so every line continuously consumes tokens.
# Bad: verbose CLAUDE.md (~2000 tokens)
This project is a blog system built with Next.js. We use Tailwind CSS
for styling, next-intl for internationalization supporting both Chinese
and English languages...
(long descriptive paragraphs)
# Good: lean CLAUDE.md (~500 tokens)
## Stack
Next.js 16 + Tailwind v4 + next-intl (zh/en)
## Rules
- Components in components/, utilities in lib/
- All text via i18n keys, no hardcoding
- Run npm run lint before committingTrimming CLAUDE.md from 2000 to 500 tokens saves 30,000 input tokens over a 20-turn session.
Prompt Economics
Good prompts don't just improve output quality — they directly save money. This isn't a rehash of the prompt guide, but a re-examination of prompt writing from a cost perspective.
One-Shot vs Multi-Round Iteration
Multi-round iteration costs grow exponentially, because each turn re-sends all previous conversation history.
# Vague prompt — takes 5 rounds
> Help me write a form
Claude: What kind of form? What fields?
> A login form with email and password
Claude: Need validation? Which UI library?
> Yes validation, use shadcn
Claude: Need a remember-me feature?
> Yes
Claude: (finally starts writing code)
# Cost: ~25000 tokens
# Precise prompt — done in 1 round
> Create a login form in app/login/page.tsx:
> - Fields: email (required, format validation), password (required, min 8 chars)
> - UI: shadcn/ui Form components
> - Features: remember-me checkbox, submit calls /api/auth/login
> - Error handling: display below the form
# Cost: ~8000 tokens
A precise prompt can save 60-70% of tokens.
Structured Prompt Template
Less ambiguity means fewer retries. Use this template:
[Task] Brief description of what to do
[Files] Which files are involved
[Requirements] Specific technical requirements
[Constraints] What NOT to do
Practical example:
Add reading time display to the BlogPost component:
- File: components/BlogPost.tsx
- Display below the title, next to the date
- Calculate at 250 words/minute for English
- Don't modify the existing style structure
- Don't add new dependencies
Prompt Techniques from a Cost Perspective
Techniques from the prompt guide, reinterpreted through a cost lens:
| Prompt Technique | Quality Benefit | Cost Benefit | Reason |
|---|---|---|---|
| Provide specific file paths | High | High | Claude doesn't need to guess and search |
| State the tech stack | Medium | High | Avoids Claude exploring project structure first |
| Give example code | High | Medium | Reduces back-and-forth confirmation |
| Specify "don't do X" | Medium | High | Avoids unnecessary output |
| Request concise output | Low | High | Directly reduces output tokens |
Batch Operations vs One-by-One
If you have multiple similar changes, stating them all at once is much cheaper than submitting them individually:
# One-by-one — 3 separate requests
> Convert UserCard component classNames to Tailwind
> Convert UserList component classNames to Tailwind
> Convert UserProfile component classNames to Tailwind
# Each reloads context, total ~30000 tokens
# Batch — 1 request
> Migrate these three components from CSS modules to Tailwind:
> 1. components/UserCard.tsx
> 2. components/UserList.tsx
> 3. components/UserProfile.tsx
> Keep existing visual styles intact.
# Done in one go, total ~15000 tokens
Controlling Output Length
Explicitly tell Claude how detailed you want the output:
# Token-saving approach
> Fix this bug. Only output the changed code, no explanations needed.
# Token-burning approach
> Help me fix this bug
# Claude will output: problem analysis + solution + full code + explanation + follow-up suggestionsA simple "no explanations needed" can reduce output tokens by 30-50%.
Session Management Strategies
Session management extends context optimization, but focuses more on "when to start a new session" and "how to split tasks."
Single Long Session vs Multiple Short Sessions
| Dimension | Single Long Session | Multiple Short Sessions |
|---|---|---|
| Context continuity | High (Claude remembers earlier discussion) | Low (fresh start each time) |
| Token consumption | High (accumulates) | Low (starts from zero each time) |
| Error recovery | Hard (wrong assumptions persist) | Easy (new session = clean state) |
| Best for | Continuous development of a single complex task | Multiple independent tasks |
Rule of thumb: If a task requires more than 15 conversation turns, consider splitting it into subtasks.
When to Start a New Session
- Current task is done, moving to a new one
- Claude starts "getting confused" (repeating previous mistakes, forgetting agreements)
- Debugging is going in circles (same approach tried repeatedly)
- You need a completely different approach
- The session has exceeded 20 conversation turns
/resume Cost Considerations
/resume can restore a previous session, but keep in mind: the restored session loads a summary of the previous conversation, which itself consumes tokens.
# Good use of /resume
- Yesterday's task was half-done, continuing today
- Need previous architectural decisions as context
# Bad use of /resume
- The previous session was already very long (summary will be large too)
- New task is barely related to the previous session
- Previous session had lots of failed attemptsIf the previous session was long, it's often cheaper to start a new session and manually copy over key decisions. Cleaner context, lower cost.
Subtask Splitting
Breaking large tasks into smaller ones, each in its own session, is one of the most effective cost control strategies:
# Bad: one session for the entire feature
> Implement a complete user auth system including registration, login,
> forgot password, email verification, OAuth, permission management...
# Session will be extremely long, per-turn cost skyrockets toward the end
# Good: split into independent sessions
Session 1: Design the auth system data model and API interfaces (output a design doc)
/clear
Session 2: Implement registration and login APIs
/clear
Session 3: Implement forgot password and email verification
/clear
Session 4: Implement OAuth integration
/clear
Session 5: Implement permission management
# Each session starts from a clean state, total cost is lower
The key technique: have the first session output a design document or plan, then feed that document as input to subsequent sessions. This maintains coherence while avoiding context accumulation.
Multi-Agent Parallel Cost Impact
Claude Code supports multi-agent parallel execution (see the multi-agent guide for details). From a cost perspective:
- Advantage: Each agent has its own independent context, no cross-contamination
- Disadvantage: Each agent loads the system prompt and CLAUDE.md, incurring fixed overhead
- Good for: Multiple independent subtasks (e.g., modifying frontend and backend simultaneously)
- Bad for: Tightly coupled tasks that need frequent context sharing
# Good for parallel
claude --task "Refactor UserService" &
claude --task "Refactor OrderService" &
# Two independent services, parallel is faster with similar total cost
# Bad for parallel
# If OrderService depends on UserService interface changes,
# sequential execution is more sensible to avoid reworkCaching and Reuse
Claude Code has built-in caching mechanisms. Understanding and leveraging them can significantly reduce costs.
Prompt Caching Mechanism
The Claude API supports prompt caching: if the prefix of consecutive requests is identical, cached tokens are charged at only 10% of the regular price.
In Claude Code, the following content is typically cached:
| Content | Cache Hit Probability | Reason |
|---|---|---|
| System prompt | Very high | Identical every turn |
| CLAUDE.md | Very high | Identical every turn |
| Early conversation history | High | Unchanged within a session |
| Recent conversation | Low | Changes every turn |
Maximizing Cache Hit Rate
Keep CLAUDE.md stable: Frequently modifying CLAUDE.md invalidates the cache. Write it well once and avoid unnecessary changes.
Avoid frequent project switching: Working continuously in the same project maximizes cache hit rates for the system prompt and CLAUDE.md.
Stay coherent within a session: Don't frequently switch topics within a single session — this reduces caching efficiency for conversation history.
Custom Commands for Reuse
Custom slash commands (see the skills guide for details) don't just improve efficiency — they save money too:
# .claude/commands/review.md
Review the following code changes:
- Check for security vulnerabilities
- Check compliance with project coding standards
- Check for performance issues
- Only output issues found, don't repeat the code
$ARGUMENTSBenefits of custom commands:
- Prompt reuse: No need to manually type detailed review requirements each time
- Output control: Commands can include constraints like "only output issues," reducing output tokens
- Consistency: The same prompt every time means higher cache hit rates
Code Snippets and Template Reuse
Put commonly used code patterns in CLAUDE.md or project docs, so Claude references them instead of regenerating from scratch:
# In CLAUDE.md — template references
## Component Template
When creating new React components, follow the structure in components/Button.tsx.
## API Route Template
When creating new API routes, follow the error handling pattern in app/api/users/route.ts.This is far more token-efficient than describing "what kind of component structure I want" in every conversation. Claude reads the reference file directly instead of having you describe it in the chat.
Using .claudeignore to Reduce Noise
Create a .claudeignore file to exclude files and directories Claude doesn't need to see:
# .claudeignore
node_modules/
dist/
.next/
coverage/
*.lock
*.log
This doesn't directly reduce token consumption, but it prevents Claude from reading irrelevant content when searching files, indirectly reducing tokens from tool calls.
CI/CD Cost Control
Claude Code in CI environments has a unique problem: unattended execution means you can't interrupt waste. A misconfigured CI task can burn through tokens without you even knowing.
For complete CI/CD configuration, see the CI/CD guide. Here we focus on cost control strategies.
Trigger Condition Optimization
Not every PR needs Claude's review. Not every push needs Claude to run tests.
# GitHub Actions example: only trigger under specific conditions
on:
pull_request:
paths:
- 'src/**' # Only trigger on source code changes
- '!src/**/*.md' # Exclude documentation changes
types: [opened, synchronize] # Don't trigger on close
# Further optimization: only trigger deep review for large PRs
jobs:
review:
if: github.event.pull_request.changed_files > 5
steps:
- uses: anthropic/claude-code-action@v1
with:
model: claude-sonnet-4-20250514max_turns Limits
Always set max_turns in CI to prevent Claude from entering infinite loops:
# Limit to 10 interaction turns maximum
claude --max-turns 10 --task "Review this PR's code changes"A CI task without max_turns is like an HTTP request without a timeout — it will eventually cause problems.
Model Downgrade for CI
Most CI tasks don't need Opus. A sensible model allocation:
| CI Task | Recommended Model | Reason |
|---|---|---|
| Code review | Sonnet | Best value |
| Lint fixes | Haiku | Simple formatting corrections |
| Test generation | Sonnet | Needs business logic understanding |
| Documentation updates | Haiku | Template-based task |
| Security scanning | Sonnet | Needs some reasoning ability |
| Complex refactoring | Opus | Only when necessary |
# Choose model by task type
- name: Code Review
run: claude --model claude-sonnet-4-20250514 --max-turns 5 --task "..."
- name: Fix Lint
run: claude --model claude-haiku-4-5-20251001 --max-turns 3 --task "..."Cost Budgets and Alerts
Use Anthropic API usage monitoring to set daily/monthly cost caps:
# Check daily usage in CI script (pseudocode)
DAILY_COST=$(curl -s https://api.anthropic.com/v1/usage | jq '.daily_cost')
MAX_DAILY=50 # $50/day cap
if (( $(echo "$DAILY_COST > $MAX_DAILY" | bc -l) )); then
echo "Warning: Daily cost limit reached ($DAILY_COST/$MAX_DAILY), skipping Claude review"
exit 0
fiSet up three alert tiers:
- Notice: 70% of budget reached
- Warning: 90% of budget reached
- Stop: 100% of budget reached — automatically skip non-critical Claude tasks
Team Cost Management
Individual developers can rely on self-discipline for cost control, but teams need processes and tools.
API Key Management
Don't share a single API Key. Problems with shared keys:
- Can't track who consumed how much
- One person's mistake affects everyone
- Can't set per-person limits
Recommended approaches:
Option A: One API Key per person
- Pros: Precise usage tracking, per-person limits
- Cons: Higher management overhead
Option B: API Keys per project
- Pros: Track costs by project
- Cons: Can't distinguish individual usage within a project
Option C: API Gateway (recommended for teams)
- Unified management through a proxy layer
- Track by user/project/team dimensions
- Fine-grained limits and alerts
Per-Project Budget Allocation
Different projects have vastly different Claude usage patterns. Allocate budgets based on project characteristics:
| Project Type | Suggested Monthly Budget/Person | Notes |
|---|---|---|
| New project development | $100-200 | Heavy code generation needs |
| Maintenance project | $30-50 | Mainly bug fixes |
| Infrastructure/DevOps | $20-40 | Occasional use |
| Documentation project | $10-20 | Low frequency |
Cost Monitoring Setup
Build a simple cost monitoring dashboard:
// Cost tracking script example
interface UsageRecord {
user: string;
project: string;
model: string;
inputTokens: number;
outputTokens: number;
cost: number;
timestamp: Date;
}
// Fetch usage data from Anthropic API
async function fetchUsage(apiKey: string): Promise<UsageRecord[]> {
const response = await fetch('https://api.anthropic.com/v1/usage', {
headers: { 'x-api-key': apiKey }
});
return response.json();
}
// Aggregate by dimension
function aggregateCost(records: UsageRecord[], groupBy: 'user' | 'project' | 'model') {
const groups = new Map<string, number>();
for (const record of records) {
const key = record[groupBy];
groups.set(key, (groups.get(key) || 0) + record.cost);
}
return groups;
}Key metrics to monitor:
- Daily/weekly/monthly total cost
- Per-person cost ranking — identify unusually high consumers (may need optimization training)
- Model distribution — if Opus usage is disproportionately high, there's likely room to optimize
- Time distribution — identify usage peaks, optimize CI scheduling
Team Usage Guidelines
Add cost-awareness clauses to your project's CLAUDE.md:
# CLAUDE.md Cost Control Guidelines
## Model Usage Policy
- Default to Sonnet for daily development
- Use Opus only for complex architecture design and difficult bugs
- Prefer Haiku for CI tasks
## Session Management Policy
- Run /clear after completing each task
- Consider /compact or starting a new session after 15+ turns
- Don't handle unrelated tasks in the same session
## Output Control
- Add "only output the changed code" for modification tasks
- Explicitly state when explanations aren't neededCost Strategies by Team Size
| Dimension | Individual Developer | Small Team (3-10) | Enterprise (10+) |
|---|---|---|---|
| API Keys | Personal key | Per-project | API Gateway |
| Budget management | Self-discipline | Monthly budgets | Fine-grained quotas |
| Model strategy | Switch as needed | Team guidelines | Enforced policies |
| Monitoring | Check the bill | Simple dashboard | Full monitoring system |
| Training | Self-learning | Share best practices | Formal training |
| Avg monthly cost/person | $30-100 | $50-150 | $80-200 |
Performance Optimization Tips
Beyond cost, speed matters too. Nobody likes waiting 30 seconds for Claude to think.
Factors Affecting Response Speed
| Factor | Impact | Controllability |
|---|---|---|
| Model selection | High | Fully controllable |
| Context size | High | Fully controllable |
| Output length | Medium | Partially controllable |
| Network latency | Medium | Partially controllable |
| API load | Low | Not controllable |
Model speed ranking: Haiku >> Sonnet > Opus
Haiku's response speed is typically 3-5x faster than Opus. For tasks that don't require deep reasoning, Haiku isn't just cheaper — it's faster.
Reducing Unnecessary Tool Calls
Every Claude Code tool call (reading files, executing commands, searching) adds latency and token consumption.
# Inefficient: let Claude find the files
> Help me fix the user login bug
# Claude might: search files → read 5 files → search again → read 3 more files
# Efficient: tell Claude exactly where to look
> Fix the null pointer error at src/auth/login.ts line 45,
> related type definitions are in src/types/auth.ts
# Claude reads 2 files directly and starts fixing
Tips for reducing tool calls:
- Provide specific file paths — don't make Claude guess
- Document project structure — list key directories in CLAUDE.md
- Give enough info upfront — avoid Claude needing multiple reads to understand context
- Use @file references — feed file contents directly to Claude
Hook Performance Impact
Hooks execute on every tool call. If the hook script itself is slow, it will noticeably impact the overall experience.
// Hook configuration in settings.json
{
"hooks": {
"afterWrite": {
"command": "eslint --fix $FILE" // Auto-lint after every file write
}
}
}Optimization tips:
- Keep hooks fast: Target under 1 second. If linting the entire project takes 10 seconds, lint only the changed file instead
- Avoid unnecessary hooks: Not every event needs a hook. Only add them where truly needed
- Run asynchronously: If a hook doesn't affect subsequent operations, consider async execution
# Slow: lint the entire project
eslint .
# Fast: lint only the changed file
eslint "$FILE"MCP Tool Latency Considerations
MCP (Model Context Protocol) tools (see the MCP guide for details) introduce external service calls, each with network latency.
Optimization tips:
- Local first: If a local tool can solve it, don't use a remote MCP service
- Batch queries: If the MCP tool supports batch operations, querying multiple items at once is faster than multiple single queries
- Cache results: For data that doesn't change often (like database schemas), consider caching to a local file
- Set timeouts: Configure reasonable timeouts for MCP tools to prevent a slow request from blocking the entire session
Network Optimization
If you're using Claude Code from regions with high latency, network can be the biggest performance bottleneck:
- Use a stable network connection
- Consider using an API proxy to reduce latency
- Avoid executing large tasks during network instability (disconnection and reconnection wastes already-consumed tokens)
Conclusion
Performance optimization and cost control for Claude Code ultimately comes down to three core principles:
Choose the right model — The most expensive option isn't always the best. Sonnet handles 80% of daily tasks, Haiku can manage simple formatting and queries, and only truly complex architecture design and tricky bugs warrant Opus.
Manage your context — Context is the biggest cost variable. Timely /clear, smart /compact, and precise context feeding — these three practices alone can save over half your tokens.
Write good prompts — One precise prompt beats five rounds of vague conversation. Get it right the first time, reduce back-and-forth, and control output length.
Quick Reference Checklist
| Optimization | Action | Expected Savings |
|---|---|---|
| Model selection | Sonnet for daily work, Haiku for simple tasks | 50-80% |
| Timely clearing | /clear after task completion | 40-55% |
| Precise prompts | Give enough info upfront, avoid multi-round | 60-70% |
| Output control | Request concise output, skip explanations | 30-50% |
| Batch operations | Combine similar tasks into one request | 40-50% |
| Session splitting | Break large tasks into multiple short sessions | 30-40% |
| CI optimization | Limit triggers and max_turns | 50-70% |
| Lean CLAUDE.md | Keep it concise, core rules only | 10-20% |
Recommended Reading
- Context Management Guide — Complete context management operations manual
- Prompt Guide — Techniques for writing high-quality prompts
- CI/CD Guide — Complete CI/CD environment configuration
- Multi-Agent Guide — Multi-agent parallel architecture
- Settings Guide — Detailed model and tool configuration
- Hooks Guide — Hook system configuration and optimization
- MCP Guide — MCP tool usage and configuration
- Skills Guide — Creating and reusing custom commands