Preface
This book was written for one person: someone who has heard the words AI agent and felt a mixture of curiosity and mild terror. You do not need a computer science degree. You do not need to know how to code. You need only a willingness to read carefully and experiment boldly.
The course material on which this guide is based was developed by practitioners who run real businesses β generating millions of dollars in annual revenue β using the techniques described here. The goal is not theory for its own sake but practical fluency: the ability to put AI agents to work on tasks that matter to you.
How to Use This Book
Each chapter builds on the last, but experienced readers may jump to any chapter using the sidebar. Throughout the text you will encounter four types of special boxes:
Tip
Practical shortcuts and best practices drawn from real-world use.
Note
Key concepts and clarifications that deserve extra attention.
Warning
Common mistakes and pitfalls you will want to avoid.
Exercise
Hands-on prompts and activities to reinforce the concepts.
At the end of each chapter, a short quiz tests your comprehension. These are low-stakes β treat them as a self-check, not an exam. Glossary terms appear in blue dotted underline and reveal their definitions on hover.
A Note on Rapid Change
AI is evolving at a pace unlike almost any technology in history. Specific model names, pricing figures, and platform features cited in this book reflect the state of the field as of mid-2025. Treat those details as illustrative examples rather than permanent facts. The underlying principles β parallelization, context management, multi-agent orchestration β are durable.
Acknowledgements
The foundational course material was authored by Nick Sarif, whose practical, business-first approach to AI agents has informed the structure of this guide. Academic references throughout the text draw on seminal work in large language models, tool-augmented reasoning, and multi-agent systems from researchers at Google DeepMind, Anthropic, OpenAI, Stanford, and MIT.
Introduction: The Age of Agents
For most of computing history, software did exactly what it was told. A spreadsheet calculated what you entered. A search engine returned pages matching your query. Every outcome was a direct, predictable consequence of a human instruction.
Then came large language models β systems trained on vast amounts of human text that could generate coherent, contextually aware responses to open-ended questions. That was remarkable. But an AI agent goes further: it sets intermediate goals, uses tools, observes results, adapts its plan, and keeps going until the job is done.
A chatbot responds to a single message with a single answer. An agent receives a high-level goal and then autonomously takes the sequence of actions required to achieve it β potentially over many minutes or hours, without human input at each step.
Why Now?
Several developments converged around 2024β2025 to make practical AI agents possible for non-engineers:
- Model intelligence crossed a threshold where models reliably follow complex, multi-step instructions.
- Tool use became standardized: models can now call APIs, browse the web, read files, write code, and control browsers using protocols like MCP.
- Agentic platforms (Codex, Claude Code, Anti-Gravity) wrapped these capabilities in consumer-friendly interfaces.
- Parallelization allowed multiple agents to work simultaneously, compressing days of work into minutes.
What Can Agents Actually Do?
Outreach at Scale
Scrape leads, visit their websites, fill contact forms, personalize messages β all in parallel.
Research & Synthesis
Compile dozens of sources, extract key findings, produce structured reports.
Software Development
Write, test, debug, and deploy applications end-to-end.
Data Processing
Scrape, clean, analyze, and visualize large datasets automatically.
Content Creation
Draft, refine, and publish content tailored to specific audiences.
Workflow Automation
Orchestrate multi-step business processes across multiple services.
What This Book Will Teach You
By the end of this guide you will understand:
- How agents think and act (the core loop)
- Which platform to choose and why
- How to write prompts that produce consistent, high-quality results
- How to orchestrate multiple agents working in parallel
- How to control costs without sacrificing quality
Agents are not smarter than humans at any given task β yet. What they are is faster and parallelizable. Ten agents working simultaneously on ten sub-tasks accomplish in five minutes what would take a human fifty. The skill you are learning is how to orchestrate that parallelism effectively.
Core Concepts: How Agents Think
Before you can direct an AI agent effectively, you need to understand what is happening inside it. This chapter demystifies the agent's internal process β a simple three-step loop that repeats until the job is done β and explains why that loop is so powerful.
1.1 The Core Agent Loop
Every AI agent, regardless of which platform or model powers it, executes the same fundamental cycle. Researchers at Google DeepMind and Stanford have formalized variants of this cycle under names like ReAct [5] and Reflexion [7], but the intuition is simple:
The loop repeats until the Definition of Done is satisfied.
Step 1 β Observe
The agent reads everything available to it. This includes:
- Your original instruction (the prompt)
- Any files, documents, or data you provided
- The results of previous tool calls (web searches, code execution, etc.)
- Its own memory files (
claude.md,gemini.md,agents.md) - Multimodal data β images, audio, video frames β if applicable
All of this information sits in the agent's context window β a finite memory space that grows with each loop iteration.
Step 2 β Think
The agent reasons about what to do next. Modern platforms expose this reasoning step visually: you can literally read the agent's mini-plan before it acts. This interpretability is one of Claude Code's standout strengths.
When you can see the agent's reasoning, you can steer it. If the plan looks wrong, pause the agent mid-thought and redirect it. This is far more efficient than letting it run to completion and discovering the error at the end.
Step 3 β Act
The agent executes its plan. Common actions include:
- Tool calls: searching the web, reading a file, calling an API
- Code execution: writing and running a script
- Browser control: navigating a webpage, filling a form, clicking a button
- File editing: creating or modifying documents
After the action completes, its result is fed back into the Observe step, and the loop begins again β now with a slightly fuller context window.
The Definition of Done
The loop continues until the agent decides the task is complete. This decision is governed by the Definition of Done β the set of success criteria you specify in your prompt. This is the single most overlooked element in beginner prompting, and its absence is the primary reason people feel underwhelmed by agent results.
Giving an agent a task without a Definition of Done is like hiring a contractor without agreeing on what "finished" means. The agent will stop when it thinks it is done β which may be nowhere near what you actually wanted. Always define success criteria explicitly.
Vague (produces inconsistent results):
Well-defined (reliable, high-quality output):
The second prompt includes an explicit Definition of Done ("10 or more peer-reviewed empirical sources"), a format specification, and a stop condition. The agent has no ambiguity about when to quit.
1.2 Agent Architecture: More Than a Model
A common misconception is that an AI agent is just a chatbot with a fancier interface. In reality, the LLM is only one component of a larger system. Think of the LLM as the brain: capable and intelligent, but helpless without a body.
LLM (The Brain)
Reasons, generates language, makes decisions. Examples: Claude Opus, GPT-4, Gemini Pro.
Tools (The Hands)
Web search, file read/write, code execution, browser control, API calls.
Memory (The Journal)
claude.md, gemini.md, agents.md β persistent files that carry knowledge across sessions.
The Loop (The Work Ethic)
The observe-think-act cycle that lets the agent pursue goals autonomously over time.
This architecture is why the term agent β rather than chatbot β is appropriate. An agent is embedded in an environment, observes that environment, and takes actions to change it toward a desired goal.[6]
1.3 The Growing Context Window
Each time the loop completes one cycle, the result of the action is added to the context. Imagine a notepad that gets a new line after every step. By the fifth loop, the agent has five lines of context: its original instructions, the result of step 1, the result of step 2, and so on.
This is both a strength (the agent accumulates knowledge) and a constraint (the notepad has a maximum size). Context management β deciding what to keep, what to summarize, and what to discard β is one of the core skills covered in Chapter 6.
Everything in the context window is measured in tokens. A token is approximately 0.75 words. Current models (Claude Opus 4.6, Gemini 2.5 Pro) support context windows of 200,000 to 1,000,000 tokens β equivalent to roughly 150,000 to 750,000 words, or several long novels.
π§ͺ Chapter 1 Knowledge Check
1. What are the three steps in the core agent loop, in order?
2. What is the most common reason beginners are disappointed by agent results?
3. Which of the following is NOT a component of an AI agent's architecture?
The Three Platforms
Three major platforms dominate the agentic coding landscape as of mid-2025. Each wraps a world-class LLM in a purpose-built interface for autonomous work. Understanding their differences β and their surprising similarities β will help you choose the right tool for each job.
2.1 The Big Picture
The intelligence gap between these platforms is small β a few percentage points at most. The differences that matter are in interpretability, design quality, multimodal capabilities, and ecosystem maturity. For most tasks, any of the three will serve you well.
| Platform | Underlying Model | Made By | Best For | Price |
|---|---|---|---|---|
| Codex | GPT-4 / GPT-5 series | OpenAI | Backend, math, test-driven development | API pricing |
| Claude Code | Claude Opus / Sonnet | Anthropic | Orchestration, interpretable reasoning | $17β20/mo or API |
| Anti-Gravity | Gemini 2.5 Pro/Flash | Frontend design, video understanding | API pricing |
2.2 Claude Code β The Orchestrator
Claude Code's greatest strength is interpretability. Its reasoning tab shows you, in plain language, exactly what the model is planning to do before it does it. This makes Claude ideal as an orchestrator β the top-level manager that delegates work to other models and reviews their results.
Strengths
Most interpretable reasoning; excellent for orchestration and multi-agent workflows; consistent quality.
Weaknesses
Slower than competitors unless Fast Mode is enabled (which burns credits); weaker at frontend/visual design.
2.3 Anti-Gravity (Gemini) β The Designer
Gemini's standout feature is its native video understanding. Unlike Claude and GPT, Gemini can process video at the API level β analyzing frames, extracting steps, and executing what it watches. It also consistently produces the most visually polished frontend designs.
Strengths
Best frontend/design output; native video understanding; very fast text generation; massive 1M-token context.
Weaknesses
Least interpretable reasoning; quality can be inconsistent day-to-day.
2.4 Codex (GPT) β The Engineer
The GPT model series excels at backend development, rigorous mathematics, and test-driven development. Its "fire and forget" style β point it at a target, let it run β suits tasks with a clear, verifiable Definition of Done.
Strengths
Best backend/API development; strongest at mathematics; largest ecosystem of documentation and examples.
Weaknesses
Less interpretable than Claude; reasoning is harder to steer mid-task.
2.5 Getting Started
- Visit claude.ai and create an account (Google login works).
- Subscribe to the Pro plan ($17/mo annual or $20/mo monthly).
- Search for "Claude Code desktop download" and install for your OS (Mac, Windows, Windows ARM).
- Open Claude Code, click the Code button, choose a working folder.
- Enable "bypass permissions" for autonomous operation, then type your first task.
- Search "Google Anti-Gravity download" β you likely already have a Google account.
- Download for Mac (Intel or Apple Silicon) or Windows/Linux.
- Open Anti-Gravity β Google logs you in automatically.
- In the right-hand panel, find the Agent modal and start prompting.
- Visit openai.com and create an account.
- Search "OpenAI Codex download" and install for Mac or Windows.
- Open Codex, create a new folder, and start your first project.
Pick any one platform and give it this prompt:
"Make a single-page portfolio site for [your name]. Keep it simple, minimal, and light-themed. When done, open it in the browser."
Observe the reasoning tab (if available) and notice each loop iteration: observe β think β act.
π§ͺ Chapter 2 Knowledge Check
1. Which platform is most recommended as a top-level orchestrator for multi-agent workflows?
2. Which platform has built-in native video understanding?
Prompting Techniques
Prompting is the primary interface between you and an agent. A well-structured prompt is the difference between an agent that wanders and one that executes with surgical precision. This chapter covers four high-leverage techniques: self-modifying system prompts, agent skills, prompt contracts, and reverse prompting.
3.1 Self-Modifying System Prompts
Every major platform provides a special file that is automatically prepended to every conversation. This file goes by different names depending on the platform:
| Platform | File Name |
|---|---|
| Claude Code | claude.md |
| Anti-Gravity (Gemini) | gemini.md |
| Codex | agents.md |
Because this file is read at the start of every session, it functions as a persistent, growing memory. The key insight is to instruct the agent to update this file automatically whenever you correct it β turning your feedback into permanent rules.
Over time, this creates a compounding improvement. After one session you have one rule; after ten sessions you have ten rules; after fifty sessions your agent almost never makes mistakes relative to your preferences.
You can maintain two levels: a global file with universal preferences (your name, tone, language style) that applies to all projects, and a local file in each project folder with project-specific rules. Claude Code stores the global file at ~/.claude/claude.md.
3.2 Agent Skills
An agent skill is a reusable workflow stored as a markdown file. Where the system prompt file captures your preferences, a skill captures a process. Skills transform the LLM's natural statistical variability into a predictable, deterministic workflow.
A skill file has two parts: a short YAML header (loaded into context every session) and a detailed body (loaded only when the skill is invoked):
Only the YAML header is loaded into the context window at startup. The detailed body is read only when you invoke the skill by name. This keeps your context lean and your costs low β a technique explored further in Chapter 6.
3.3 Prompt Contracts
A prompt contract is a structured agreement between you and the agent, established before any work begins, that defines four things:
Goal
What does success look like? Be specific and measurable.
Constraints
What limits apply? (file size, line count, technology stack, timeline)
Format
How should the output be structured? (sections, file types, style)
Failure Conditions
What does a bad result look like? Explicit failures prevent silent mediocrity.
User instruction: "Build a beautiful single-page site for LeftClick.ai"
Agent-generated contract (requires approval before proceeding):
3.4 Reverse Prompting
Reverse prompting takes prompt contracts one step further. Instead of having the agent generate the contract directly from your vague instruction, it first asks you five clarifying questions β surfacing assumptions and preferences you might not have known to mention. Only after you answer does it draft the contract.
The practical benefit is dramatically improved one-shot potential β the probability the agent gets it right on the first try, with no corrections needed.
Add this line to the end of any task prompt you give an agent today:
Notice how the final output differs from what you would have received without this step.
π§ͺ Chapter 3 Knowledge Check
1. What is the purpose of the self-modifying system prompt file (e.g., claude.md)?
2. What are the four components of a prompt contract?
Multi-Agent Strategies
A single agent is impressive. A coordinated fleet of agents is transformative. This chapter covers the four multi-agent design patterns that separate casual users from power users: MCP orchestration, stochastic consensus, agent chat rooms, and sub-agent verification loops.
4.1 Multi-Agent MCP Orchestration
MCP orchestration uses one model as a manager that delegates subtasks to specialist sub-agents. Each sub-agent is chosen for its comparative advantage:
Plans, delegates, validates
Frontend / UI / Video
Backend / API / Tests
Review & Integration
The orchestrator decomposes the task, routes each piece to the best model via MCP server calls, then collects and validates results. This approach has two advantages: the best model handles each subtask, and sub-agents work in parallel, reducing total completion time.
Multi-model orchestration requires API keys for each platform and bills at API rates β you lose the monthly plan subsidization. Reserve this pattern for high-complexity projects where the quality gain justifies the cost.
4.2 Stochastic Multi-Agent Consensus
LLMs are stochastic: ask the same question twice and you get slightly different answers. Most users treat this as a bug. Power users treat it as a feature.
Stochastic consensus works by spawning N agents simultaneously, each with a slightly different prompt framing, then aggregating their outputs. The result traverses far more of the "answer space" than any single query.
The aggregated report categorizes findings into three tiers:
- Consensus items: Ideas that multiple agents independently surfaced β high confidence, act on these.
- Divergent items: Ideas supported by some agents but not others β worth careful evaluation.
- Outliers: Ideas from only one agent β potentially brilliant, potentially hallucinated.
Stochastic consensus is ideal for: strategic decision-making, content ideation, keyword research, competitive analysis, product naming, and any question where the goal is to maximize the diversity of ideas rather than execute a known process.
4.3 Agent Chat Rooms
Where stochastic consensus runs agents in parallel isolation (they don't communicate), agent chat rooms run agents in a shared debate. Each agent is assigned a distinct personality, and they argue about the problem in a shared chat.json file.
| Persona | Role in the Debate |
|---|---|
| π§© Systems Thinker | Examines structural and systemic causes |
| βοΈ Pragmatist | Focuses on what is feasible and measurable |
| π Edge Case Finder | Stress-tests every assumption |
| π€ User Advocate | Represents the end-user perspective |
| π£οΈ Contrarian | Challenges consensus, prevents groupthink |
The debate sharpens ideas: agents push back on each other's weak points, surface hidden assumptions, and converge on more nuanced answers than any single agent could produce. The resulting chat.json log serves as valuable context for an orchestrator's final synthesis.
| Stochastic Consensus | Chat Rooms | |
|---|---|---|
| Agent interaction | None (parallel, isolated) | Active debate |
| Best for | Idea volume, search space coverage | Depth, nuance, decision quality |
| Time | Fast (parallel) | Slower (sequential rounds) |
| Output | Frequency/consensus map | Structured debate transcript |
4.4 Sub-Agent Verification Loops
When an agent spends a long time building something, it develops a kind of sunk-cost bias: it has explored many dead ends, made many decisions, and accumulated a sense that its approach was the best one. Ask that same agent to critique its own work and it will find remarkably little wrong with it.
The solution is to pass the output only β stripped of all reasoning history β to a fresh agent that has never seen the problem before.
Builds the thing
Fresh context, zero bias
Fixes issues
The reviewer evaluates for four dimensions: correctness, edge cases, simplification opportunities, and security vulnerabilities. Because it sees only the output β not the journey to get there β it catches errors the implementer is blind to. This is conceptually identical to peer review in academic publishing.[3]
A 200,000-token context is not an unbiased judge. Every wrong turn the implementer explored, every bug it fixed, every assumption it made β all of that is in the context window. A fresh agent sees only what exists, not the journey to create it. Fresh eyes catch what tired eyes miss.
π§ͺ Chapter 4 Knowledge Check
1. In a sub-agent verification loop, what does the Reviewer agent receive?
2. What is the key difference between Stochastic Consensus and Agent Chat Rooms?
Advanced Pipelines
5.1 Video-to-Action Pipeline
Until recently, agents could only learn from text. A YouTube tutorial β rich with visual context, cursor movements, and UI navigation β was opaque to any model. The video-to-action pipeline changes this by routing video through Gemini's native video understanding API.
(cannot watch video)
(native video understanding)
numbered step list
each step with tools
Gemini samples the video at one frame per second, analyzes the image sequence, and produces a hyper-specific numbered step list (e.g., "At 0:17, click the blue 'Add Node' button in the top-left toolbar"). Claude receives this list and executes each step using browser control, file editing, or other tools.
Spencer Sterling demonstrated an agent that watched the canonical Blender "donut tutorial" on YouTube, extracted every step, and autonomously reproduced the 3D donut model β without any human guidance beyond providing the URL.[9]
5.2 Multi-Agent Chrome Automation
The Chrome DevTools MCP server lets an agent control a real browser: navigate pages, click elements, fill forms, take screenshots, and read page content. A single agent performing these actions sequentially is useful. Ten agents performing them in parallel is transformative.
Distributes target URLs
Site A
Site B
Site C
β¦
The throughput math is compelling. If a single agent takes 2β3 minutes per form submission, one agent processes ~0.5 forms/minute. With 10 agents: 5 forms/minute. With 100 agents: 50 forms/minute. A list of 2,000 contacts processed in under an hour.
Multi-agent browser automation is powerful enough to be misused. Respect robots.txt files, website terms of service, and applicable laws. Responsible practitioners use these capabilities for legitimate outreach, research, and automation β not scraping at a scale that damages services or violates consent.
The Shared Chat File Pattern
Coordinating N agents requires a communication mechanism. The simplest and most reliable approach is a shared chat.json file in a central workspace folder. The orchestrator resets this file at the start of each run. Each sub-agent appends its status, results, and any issues every 30 seconds. The orchestrator polls the file periodically and re-routes work as needed.
Install the Chrome DevTools MCP for your platform, then give an agent this prompt:
This single-agent example helps you feel the Chrome MCP before scaling to multi-agent.
π§ͺ Chapter 5 Knowledge Check
1. In the video-to-action pipeline, which model actually watches the video?
Cost & Optimization
AI agents are not free. Every token processed costs money, and complex multi-agent workflows can accumulate costs quickly. This chapter covers two interrelated topics: managing the context window to maintain output quality, and structuring model usage to minimize cost without sacrificing results.
6.1 Why Quality Degrades Over Time
As a conversation grows, so does the context window. And as the context window grows, model performance measurably declines. This is not speculation β it is a documented phenomenon in large language model research[2] sometimes called "lost in the middle": models attend less effectively to information buried deep in a long context.
Conceptual illustration (not to scale)
6.2 Context Compaction
When the context window approaches its limit, the platform automatically triggers compaction: a summarization process that compresses old context into fewer tokens. Think of it as a hydraulic press squishing older parts of the conversation into dense summaries.
Compaction preserves meaning but loses precision. Specific tool outputs, exact error messages, and nuanced reasoning steps may be reduced to one-line summaries. For long-running agentic tasks, this can cause subtle regressions.
In Claude Code, type /context in the terminal panel to see a breakdown of exactly how your tokens are being used β system prompt, memory files, skills, tool results, and conversation history β before you've even sent a message.
6.3 The Iceberg Technique
The iceberg technique is the leading approach to strategic context management. The idea: keep only a small amount of information immediately loaded in the context window, and give the agent tools to access everything else on demand.
- System prompt + memory file (claude.md)
- Skill YAML headers (not full bodies)
- Current task context
- Active file being edited
- Full codebase (read via
readtool, selectively) - Web data (fetched only when needed)
- Full skill bodies (loaded only when invoked)
- Git history (queried only when relevant)
- Database / external files
6.4 Model Selection Strategy β The 60/30/10 Rule
Not every subtask requires your most powerful (and expensive) model. The 60/30/10 rule provides a practical allocation framework:
| Allocation | Model Tier | Example Tasks | Approx. Cost |
|---|---|---|---|
| 60% | Small/Fast (Haiku, Flash) | Classification, extraction, simple formatting | $0.25β1/M tokens |
| 30% | Mid-tier (Sonnet, GPT-4o) | Research, content generation, code review | $3β5/M tokens |
| 10% | Top-tier (Opus, GPT-5) | Routing decisions, final synthesis, complex reasoning | $15β30/M tokens |
Scenario: 100 million tokens of work on a large project.
Opus-only approach:
100M tokens Γ $15/M = $1,500
60/30/10 tiered approach:
60M Γ $0.80 = $48
30M Γ $3.00 = $90
10M Γ $15.00 = $150
Total: $288 β an 81% cost reduction
Quality impact: minimal, because the tasks routed to cheaper models are those for which cheaper models are already sufficient.
6.5 Batch API Pricing
All major providers offer a Batch API that processes requests asynchronously during off-peak server hours. In exchange for accepting up to 24-hour latency, you receive approximately 50% off standard pricing. For high-volume, non-time-sensitive workloads (overnight research, bulk data processing), the batch API can halve your monthly AI costs.
Open Claude Code, navigate to a project, and type /context in the terminal. Note:
1. How many tokens your system prompt consumes before any conversation starts.
2. How many tokens your skills use.
3. What percentage of your total context window is already consumed at session start.
If your starting token count exceeds 10% of your limit, trim your claude.md and consolidate your skills.
π§ͺ Chapter 6 Knowledge Check
1. What does the Iceberg Technique recommend keeping "below water" (not loaded at startup)?
2. In the 60/30/10 rule, which tier handles routing decisions and final synthesis?
Real-World Applications
Everything in Chapters 1β6 was preparation. This chapter is where you actually do things. We will walk through four complete, real-world applications β from your very first agent setup to running a multi-agent virtual company, automating repetitive jobs, and producing research-grade written work. Every step is explained, illustrated, and shown in full.
You will need at least one platform installed and working (Claude Code is recommended). If you have not completed the setup in Chapter 2, go back and do that first. Everything else in this chapter assumes you can open your agent platform and type a prompt.
7.1 Setting Up Your First Agent Task
This section is for the complete beginner. We are going to build a working agent setup from scratch β step by step, with no assumed knowledge. By the end, your agent will autonomously perform a real task while you watch.
Step 1 β Open Claude Code and Choose a Folder
Think of a folder on your computer the same way you think of a desk drawer. Your agent works inside that drawer β it can read, write, and organise files there, but it cannot reach outside unless you give it permission.
- Create a new folder on your Desktop called
My First Agent. - Open Claude Code.
- Click the Code button (top left).
- Click Open Folder and select
My First Agent. - Toggle Bypass Permissions to ON β this lets the agent act without asking for confirmation at every step.
Step 2 β Create Your Memory File
Before giving the agent any task, set up your claude.md β the persistent memory file from Chapter 3. This is a plain text file you create once and the agent reads forever.
In the Claude Code chat box, type exactly this:
Replace [YOUR NAME] with your actual first name. Press Enter and wait for the confirmation.
You created a self-improving memory. From this moment forward, every conversation in this folder begins with the agent reading those rules. When you correct the agent, it adds the correction to the file automatically. Your agent will never make the same mistake twice.
Step 3 β Give Your First Real Task
Now let us give the agent a task with a proper Definition of Done β the most important lesson from Chapter 1. We will ask it to research a topic and produce a structured report.
Watch the agent work. You will see it: (1) observe your instructions, (2) think through a plan, (3) search the web, (4) write the file. This is the core loop in action.
Step 4 β Correct and Improve
When the agent finishes, read the report. If anything is not to your liking, correct it in plain English:
The agent will fix the report and write a new rule into claude.md. You have now completed your first full agent learning cycle.
7.2 Running a Virtual Company with Multi-Agents
A virtual company is a group of specialised AI agents that each handle one department β just like a real business. One agent handles marketing, another handles operations, another handles finance, and a manager agent oversees and coordinates them all. Together they can accomplish in a few hours what would take a small human team days.
The Blueprint: A Four-Department Virtual Company
CEO Agent (Orchestrator)
Receives the high-level goal, breaks it into department tasks, delegates work, reviews outputs, resolves conflicts.
Marketing Agent
Writes copy, drafts social posts, creates campaign strategies, analyses competitors.
Operations Agent
Builds workflows, writes process documents, automates routine tasks, manages file organisation.
Research Agent
Gathers data, synthesises reports, fact-checks claims, monitors trends.
Step-by-Step: Setting Up Your Virtual Company
Step 1 β Create the Company Folder Structure
In your agent chat, type:
Step 2 β Write the Company Briefing
The briefing is the single document every agent reads first. It tells them who they are, what the company does, and what the current goal is.
Fill in the bracketed sections with your actual company or project details.
Step 3 β Create Agent Identity Files
Each agent gets its own claude.md-style identity file inside its folder. This is what makes each agent behave differently from the others.
Step 4 β Run Your First Multi-Agent Task
Now open Claude Code and run the CEO agent. Give it a real company task and watch it delegate:
Each agent "reads" its identity file before acting, which changes how it approaches the task. The Marketing Agent focuses on audience and persuasion; the Research Agent focuses on evidence and sources; the Operations Agent focuses on timelines and processes. The CEO reviews all three outputs with fresh eyes β the same principle as the sub-agent verification loop from Chapter 4.
Scaling Up: True Parallel Agents
Once you are comfortable with one-at-a-time delegation, you can run real parallel agents by opening multiple Claude Code windows simultaneously, each pointed at a different department folder. All four run at the same time, writing to chat.json as they go. The CEO window monitors the chat log and synthesises when all departments report completion.
Assigns tasks Β· Monitors chat.json Β· Synthesises
Window 2
Window 3
Window 4
7.3 Automating Routine Jobs
Routine jobs are the best first target for AI agents β they are repetitive, well-defined, and the Definition of Done is obvious. This section shows you how to automate five common professional tasks completely.
Routine Job 1 β Weekly Email Digest
Scenario: Every Monday morning you want a summary of emails received the previous week, organised by priority.
Using the Schedule skill (available in this app), you can set this prompt to run automatically every Monday at 8:00 AM β so the digest is waiting for you when you start work. No manual trigger needed.
Routine Job 2 β Meeting Preparation Brief
Before any important meeting, an agent can prepare a one-page brief β background on attendees, agenda summary, suggested questions, and relevant context from your files.
Routine Job 3 β Social Media Content Calendar
Creating a month of social media content is one of the most time-consuming marketing tasks. An agent can produce a complete 30-day calendar in minutes.
Routine Job 4 β Financial Expense Report
If you keep receipts or expense logs in a folder, an agent can read them and produce a formatted expense report automatically.
Routine Job 5 β Competitive Intelligence Monitor
Stay ahead of competitors by running a weekly intelligence sweep.
7.4 Research and Producing Publishable Papers
This is the most sophisticated application in the chapter. We will walk through how to use AI agents to produce rigorous, well-structured academic or professional research papers β and how to ensure the writing is genuinely original, not plagiarised.
AI-assisted research is a legitimate and growing practice in professional and academic settings. However, your institution or publisher may have specific policies about AI use. Always check and disclose AI assistance where required. The techniques in this section are designed to produce original synthesis and analysis β not to copy or reproduce others' work. Used ethically, these tools make your research more thorough, not less honest.
Why AI-Written Text Can Fail Plagiarism Detectors β and Why That Is Not Enough
AI text is not plagiarised in the traditional sense β it does not copy sentences from existing sources. However, AI detectors (Turnitin's AI detection, GPTZero, Originality.ai) look for statistical patterns in writing: unnaturally consistent sentence length, predictable word choices, and absence of the "noise" that human writing naturally contains.
The strategies below produce writing that passes these detectors not by tricking them, but by producing genuinely high-quality, original, analytically rich text that reflects real thinking.
The Six-Stage Research Pipeline
Stage 1 β Scoping (Define the Research Question)
Never let the agent choose your research question. You define it. The agent helps you refine it into something researchable and specific.
Stage 2 β Source Gathering
Use the Research Agent pattern to gather sources. Crucially, instruct the agent to save raw source information separately from its interpretation of that information.
AI agents can occasionally hallucinate citations β inventing plausible-sounding paper titles that do not exist. Before submitting any paper, manually verify each citation by searching the DOI or title on Google Scholar, PubMed, or your institution's library database. This step is non-negotiable.
Stage 3 β Synthesis (Finding the Argument)
Synthesis is the most intellectually demanding step β and the one most likely to produce genuinely original content, because it requires the agent to find patterns and tensions across sources.
Stage 4 β Drafting
Now write the paper section by section. Writing it in sections (rather than all at once) produces better, more focused output and reduces the statistical AI-writing patterns that detectors flag.
| Section | Purpose | Typical Length |
|---|---|---|
| Abstract | Summary of the whole paper (written last) | 150β300 words |
| Introduction | Context, research gap, question, structure | 400β600 words |
| Literature Review | What is already known; your synthesis | 800β1,500 words |
| Methodology | How you approached the research | 300β600 words |
| Findings / Analysis | Your core argument and evidence | 1,000β2,000 words |
| Discussion | Implications, limitations, future research | 500β800 words |
| Conclusion | Summary and contribution | 200β400 words |
| References | All citations in APA format | β |
Stage 5 β Humanisation
This is the stage that makes the difference between text that reads as AI-generated and text that reads as a human scholar. Humanisation does not mean hiding AI use β it means elevating the quality to genuine human-level writing.
Add Your Voice
Ask the agent to rewrite any section "as a sceptical scholar who challenges the mainstream view" or "as a practitioner who has seen this theory fail in the real world."
Vary Rhythm
Have the agent deliberately alternate between short and long sentences. Academic AI writing tends to use uniformly medium-length sentences β the giveaway pattern detectors look for.
Add Specificity
Instruct the agent to add concrete examples, specific numbers, or named cases wherever it has used vague generalisations. Vague generalisations are a hallmark of AI filler.
Insert Disagreement
The most human-sounding academic writing acknowledges counterarguments. Ask the agent to add a "however" or "this view is not without its critics" moment to every major claim.
Stage 6 β Verification
Before submitting, run a final verification using a fresh sub-agent with no context of how the paper was written:
Assembling and Exporting the Final Paper
Papers produced using this pipeline are original because: (1) the research question is yours; (2) the synthesis β finding the argument β is generated from your specific set of sources; (3) the humanisation stage adds analytical depth and specificity that generic AI cannot produce; and (4) the verification stage catches and eliminates any remaining generic language. The result is a paper that reflects genuine intellectual work, assisted by AI rather than replaced by it.
π§ͺ Chapter 7 Knowledge Check
1. What is the very first thing you should create before giving an agent any task?
2. In the virtual company setup, what is the CEO agent's primary responsibility?
3. In the research pipeline, which stage is most responsible for originality?
4. Which of these is a reliable way to reduce AI-writing detection flags?
Conclusion: Putting It All Together
You have now covered the full arc of practical AI agent mastery β from the three-step loop that powers every agent, to the multi-agent architectures that compress weeks of work into hours, to the cost optimization strategies that make serious deployment economically viable.
The Hierarchy of Skills
Think of everything you have learned as a hierarchy, where each layer builds on the one below:
A Recommended Learning Path
- Week 1: Set up one platform (Claude Code recommended). Give it 10 different real tasks. Observe the loop. Add a Definition of Done to each prompt.
- Week 2: Create your first
claude.mdwith 5 rules. Correct the agent 3 times and verify the corrections are written to the file. Build your first agent skill. - Week 3: Run your first prompt contract and reverse prompting session. Notice the difference in output quality.
- Week 4: Attempt a simple two-agent setup: one agent builds, another reviews. Run a stochastic consensus with 5 agents on a decision you have been putting off.
- Month 2+: Build a domain-specific skill library. Implement the 60/30/10 model rule. Explore Chrome automation for a real outreach or research task.
The Bigger Picture
AI agents are shifting the nature of knowledge work. Tasks that required a team β research, drafting, coding, quality review, outreach β can increasingly be accomplished by one person with a well-orchestrated fleet of agents. This is not a threat to thoughtful human judgment; it is an amplifier of it.
The practitioners who will thrive are not those who use agents blindly, but those who understand the architecture deeply enough to direct, verify, and correct agent work with authority. That is what this book aimed to give you.
The single highest-ROI habit you can build: after every agent session that produces something useful, write one new rule into your claude.md. After six months, that file will be the most valuable prompt engineering asset you own.
Thank you for reading. Continue to the Bibliography for academic references.
Bibliography
- [1] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ε., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30. https://arxiv.org/abs/1706.03762
- [2] Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Manning, C. D., Hashimoto, T., & Liang, P. (2023). Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics. https://arxiv.org/abs/2307.03172
- [3] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., β¦ & Clark, P. (2023). Self-Refine: Iterative refinement with self-feedback. Advances in Neural Information Processing Systems, 36. https://arxiv.org/abs/2303.17651
- [4] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., β¦ & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35. https://arxiv.org/abs/2201.11903
- [5] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2023). ReAct: Synergizing reasoning and acting in language models. International Conference on Learning Representations (ICLR). https://arxiv.org/abs/2210.03629
- [6] Park, J. S., O'Brien, J. C., Cai, C. J., Morris, M. R., Liang, P., & Bernstein, M. S. (2023). Generative agents: Interactive simulacra of human behavior. Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. https://arxiv.org/abs/2304.03442
- [7] Shinn, N., Cassano, F., Berman, E., Gopinath, A., Narasimhan, K., & Yao, S. (2023). Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems, 36. https://arxiv.org/abs/2303.11366
- [8] Wu, Q., Bansal, G., Zhang, J., Wu, Y., Li, B., Zhu, E., β¦ & Wang, C. (2023). AutoGen: Enabling next-gen LLM applications via multi-agent conversation. arXiv preprint. https://arxiv.org/abs/2308.08155
- [9] Sterling, S. (2025). Autonomous Blender tutorial replication via YouTube video understanding. Post on X (formerly Twitter). [Social media post demonstrating video-to-action agent pipeline for 3D modeling tutorial execution.]
- [10] Jiang, A. Q., Sablayrolles, A., Roux, A., Mensch, A., Savary, B., Bamford, C., β¦ & Sayed, W. E. (2024). Mixtral of experts. arXiv preprint. https://arxiv.org/abs/2401.04088
- [11] Anthropic. (2025). Claude model documentation and API reference. Anthropic Technical Documentation. https://docs.anthropic.com
- [12] OpenAI. (2025). Codex and GPT-4 API documentation. OpenAI Platform Documentation. https://platform.openai.com/docs
- [13] Google DeepMind. (2025). Gemini model family technical report. Google AI. https://deepmind.google/technologies/gemini
- [14] Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J., β¦ & Wen, J.-R. (2024). A survey on large language model based autonomous agents. Frontiers of Computer Science. https://arxiv.org/abs/2308.11432
- [15] Yerkes, R. M., & Dodson, J. D. (1908). The relation of strength of stimulus to rapidity of habit-formation. Journal of Comparative Neurology and Psychology, 18(5), 459β482. [Foundation for the inverted-U performance curve referenced in Chapter 6.]