Modern Claude models (Opus 4.7, Sonnet 4.6, etc.) can do extended thinking — taking extra time to reason through a hard problem before answering. You see the thinking, then the answer.
resp = client.messages.create(
model="claude-opus-4-7",
max_tokens=8192,
thinking={"type": "enabled", "budget_tokens": 4096},
messages=[{"role": "user", "content": "Plan a 3-day Tokyo itinerary optimized for foodies who hate crowds. Reason carefully."}],
)
The budget_tokens is roughly how much “thinking” Claude can do. More budget → deeper reasoning, slower, more expensive.
The response will contain both thinking blocks and text blocks. Render the text to users; keep the thinking for debugging or display it as collapsible “show reasoning” UI.
For Opus 4.7, Sonnet 4.6, and Opus 4.6, you can also tune an effort parameter — letting Claude decide adaptively how much to think on each step. Lower effort is faster/cheaper; higher effort handles harder problems.
An agent is Claude in a loop, using tools to make progress on a goal.
🎯 Receive goal
│
▼
🧠 Plan next action ◀──────────────┐
│ │
▼ │
Need a tool? │
│ │ │
yes no (have answer) │
▼ │ │
🔧 Call tool │ │
│ │ │
▼ │ │
👀 Observe │ │
│ │ │
▼ ▼ │
Goal achieved? │
│ │ │
no ──────────┘ (loop) │
│ │
yes │
▼ │
💬 Give answer ──▶ ✅ Done │
| Level | Description | Example |
|---|---|---|
| 0 | Single shot, no tools | “Summarize this.” |
| 1 | One tool call | “What’s the weather?” → calls weather API → answers |
| 2 | Multi-step with one tool | “Research these 5 companies” → searches each, summarizes |
| 3 | Multi-step, multi-tool | “Book me a flight under $500 and add it to my calendar” |
| 4 | Long-horizon | “Refactor this codebase to use TypeScript” (hours of work) |
max_tokens and a hard limit on iterations.A simple agent loop pattern:
def run_agent(goal: str, tools, max_iterations=20):
messages = [{"role": "user", "content": goal}]
for i in range(max_iterations):
resp = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
tools=tools,
messages=messages,
)
if resp.stop_reason == "end_turn":
return next(b.text for b in resp.content if b.type == "text")
if resp.stop_reason == "tool_use":
tool_use = next(b for b in resp.content if b.type == "tool_use")
result = execute_tool(tool_use.name, tool_use.input)
messages.append({"role": "assistant", "content": resp.content})
messages.append({
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": tool_use.id,
"content": str(result),
}],
})
return "Hit iteration limit without finishing."
MCP is an open standard (originated by Anthropic, now adopted broadly) for connecting AI models to tools and data sources. Think of it as USB for AI: instead of every AI vendor and every tool building bespoke integrations, they all speak MCP.
📧 Mail MCP server ──▶ Tools · Resources · Prompts
🐙 GitHub MCP server ──▶ Tools · Resources · Prompts
🧠 Claude ──MCP protocol──▶
🗄️ Your DB MCP server ──▶ Tools · Resources · Prompts
📓 Notion MCP server ──▶ Tools · Resources · Prompts
An MCP server exposes:
The official MCP SDKs (Python, TypeScript) make writing one ~30 lines of code. See modelcontextprotocol.io for the spec and SDKs.
The computer use tool gives Claude a screen, keyboard, and mouse. It can:
Use cases: web automation that doesn’t have a clean API, legacy enterprise software, QA testing, accessibility tooling.
When you change a prompt, how do you know things actually got better and not silently worse?
You build a small eval set and run it on every change.
# eval_set.json
test_cases = [
{"input": "I want a refund for order 12345", "expected_intent": "refund"},
{"input": "When does my package arrive?", "expected_intent": "shipping"},
{"input": "How do I change my password?", "expected_intent": "account"},
# ... 20-50 of these
]
correct = 0
for case in test_cases:
pred = classify(case["input"]) # your prompt + Claude
if pred == case["expected_intent"]:
correct += 1
print(f"Accuracy: {correct}/{len(test_cases)} = {correct/len(test_cases):.0%}")
For most production work, levels 3 + 4 combined are plenty.
The Console Workbench lets you A/B prompts, save versions, and run them against test inputs. Use it instead of bouncing between scripts.
Claude hallucinates less than it used to, but not zero. Defenses:
For anything that downstream code will parse, force structure.
resp = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system="You are an extraction engine. Return ONLY JSON matching the schema. No prose.",
messages=[{
"role": "user",
"content": f"""Extract from the email below into this schema:
from
EMAIL:
{email_text}"""
}],
)
import json
data = json.loads(resp.content[0].text)
Or use the API’s strict tool use feature, which guarantees the output matches your JSON schema (or raises an error).
The four levers:
A common architecture — the router pattern:
┌──▶ 🚀 Haiku (easy 90%) ──▶ 💬 Answer
👤 User query ──▶ ⚡ Router (cheap/fast) ──┤
└──▶ 💎 Opus (hard 10%) ──▶ 💬 Answer
A router prompt (cheap, fast) decides which lane each query takes.
If your app uses Claude on user-generated input, you should think about:
Anthropic publishes a Cookbook — Jupyter notebooks covering tool use, embeddings, RAG, evaluations, and more, with runnable code. Highly recommended next stop.
You should now understand:
👉 Next up: Module 08 — Real-World Projects — four full projects with code.
| ← Previous | 🏠 Home | Next → |
|---|---|---|
| Module 06 — API Development | Course README | Module 08 — Real-World Projects |