Front Matter

Preface

● Beginner Friendly Estimated reading time: 5 minutes

This book was written for one person: someone who has heard the words AI agent and felt a mixture of curiosity and mild terror. You do not need a computer science degree. You do not need to know how to code. You need only a willingness to read carefully and experiment boldly.

The course material on which this guide is based was developed by practitioners who run real businesses — generating millions of dollars in annual revenue — using the techniques described here. The goal is not theory for its own sake but practical fluency: the ability to put AI agents to work on tasks that matter to you.

How to Use This Book

Each chapter builds on the last, but experienced readers may jump to any chapter using the sidebar. Throughout the text you will encounter four types of special boxes:

💡

Tip

Practical shortcuts and best practices drawn from real-world use.

📝

Note

Key concepts and clarifications that deserve extra attention.

⚠️

Warning

Common mistakes and pitfalls you will want to avoid.

🧪

Exercise

Hands-on prompts and activities to reinforce the concepts.

At the end of each chapter, a short quiz tests your comprehension. These are low-stakes — treat them as a self-check, not an exam. Glossary terms appear in blue dotted underline and reveal their definitions on hover.

A Note on Rapid Change

⚠️ The Field Moves Fast

AI is evolving at a pace unlike almost any technology in history. Specific model names, pricing figures, and platform features cited in this book reflect the state of the field as of mid-2025. Treat those details as illustrative examples rather than permanent facts. The underlying principles — parallelization, context management, multi-agent orchestration — are durable.

Acknowledgements

The foundational course material was authored by Nick Sarif, whose practical, business-first approach to AI agents has informed the structure of this guide. Academic references throughout the text draw on seminal work in large language models, tool-augmented reasoning, and multi-agent systems from researchers at Google DeepMind, Anthropic, OpenAI, Stanford, and MIT.

Front Matter

Introduction: The Age of Agents

● Beginner Friendly Estimated reading time: 10 minutes

For most of computing history, software did exactly what it was told. A spreadsheet calculated what you entered. A search engine returned pages matching your query. Every outcome was a direct, predictable consequence of a human instruction.

Then came large language models — systems trained on vast amounts of human text that could generate coherent, contextually aware responses to open-ended questions. That was remarkable. But an AI agent goes further: it sets intermediate goals, uses tools, observes results, adapts its plan, and keeps going until the job is done.

📝 Chatbot vs. Agent

A chatbot responds to a single message with a single answer. An agent receives a high-level goal and then autonomously takes the sequence of actions required to achieve it — potentially over many minutes or hours, without human input at each step.

Why Now?

Several developments converged around 2024–2025 to make practical AI agents possible for non-engineers:

Model intelligence crossed a threshold where models reliably follow complex, multi-step instructions.
Tool use became standardized: models can now call APIs, browse the web, read files, write code, and control browsers using protocols like MCP.
Agentic platforms (Codex, Claude Code, Anti-Gravity) wrapped these capabilities in consumer-friendly interfaces.
Parallelization allowed multiple agents to work simultaneously, compressing days of work into minutes.

What Can Agents Actually Do?

📧

Outreach at Scale

Scrape leads, visit their websites, fill contact forms, personalize messages — all in parallel.

🔬

Research & Synthesis

Compile dozens of sources, extract key findings, produce structured reports.

💻

Software Development

Write, test, debug, and deploy applications end-to-end.

📊

Data Processing

Scrape, clean, analyze, and visualize large datasets automatically.

🎨

Content Creation

Draft, refine, and publish content tailored to specific audiences.

🤝

Workflow Automation

Orchestrate multi-step business processes across multiple services.

What This Book Will Teach You

By the end of this guide you will understand:

How agents think and act (the core loop)
Which platform to choose and why
How to write prompts that produce consistent, high-quality results
How to orchestrate multiple agents working in parallel
How to control costs without sacrificing quality

💡 The Central Insight

Agents are not smarter than humans at any given task — yet. What they are is faster and parallelizable. Ten agents working simultaneously on ten sub-tasks accomplish in five minutes what would take a human fifty. The skill you are learning is how to orchestrate that parallelism effectively.

Chapter 1

Core Concepts: How Agents Think

● Beginner Friendly Estimated reading time: 20 minutes

Before you can direct an AI agent effectively, you need to understand what is happening inside it. This chapter demystifies the agent's internal process — a simple three-step loop that repeats until the job is done — and explains why that loop is so powerful.

1.1 The Core Agent Loop

Every AI agent, regardless of which platform or model powers it, executes the same fundamental cycle. Researchers at Google DeepMind and Stanford have formalized variants of this cycle under names like ReAct ^[5] and Reflexion ^[7], but the intuition is simple:

The Core Agent Loop

👁 Observe

→

🧠 Think

→

⚡ Act

→

📋 Result

↩ back to Observe

The loop repeats until the Definition of Done is satisfied.

Step 1 — Observe

The agent reads everything available to it. This includes:

Your original instruction (the prompt)
Any files, documents, or data you provided
The results of previous tool calls (web searches, code execution, etc.)
Its own memory files (claude.md, gemini.md, agents.md)
Multimodal data — images, audio, video frames — if applicable

All of this information sits in the agent's context window — a finite memory space that grows with each loop iteration.

Step 2 — Think

The agent reasons about what to do next. Modern platforms expose this reasoning step visually: you can literally read the agent's mini-plan before it acts. This interpretability is one of Claude Code's standout strengths.

💡 Why the Thinking Step Matters

When you can see the agent's reasoning, you can steer it. If the plan looks wrong, pause the agent mid-thought and redirect it. This is far more efficient than letting it run to completion and discovering the error at the end.

Step 3 — Act

The agent executes its plan. Common actions include:

Tool calls: searching the web, reading a file, calling an API
Code execution: writing and running a script
Browser control: navigating a webpage, filling a form, clicking a button
File editing: creating or modifying documents

After the action completes, its result is fed back into the Observe step, and the loop begins again — now with a slightly fuller context window.

The Definition of Done

The loop continues until the agent decides the task is complete. This decision is governed by the Definition of Done — the set of success criteria you specify in your prompt. This is the single most overlooked element in beginner prompting, and its absence is the primary reason people feel underwhelmed by agent results.

⚠️ The #1 Beginner Mistake

Giving an agent a task without a Definition of Done is like hiring a contractor without agreeing on what "finished" means. The agent will stop when it thinks it is done — which may be nowhere near what you actually wanted. Always define success criteria explicitly.

🔍 Example: Vague vs. Well-Defined Prompt ▼

Vague (produces inconsistent results):

Research creatine supplementation for me.

Well-defined (reliable, high-quality output):

Research creatine supplementation in men aged 25–45.
Once you have compiled 10 or more peer-reviewed empirical sources,
return a structured report with: an executive summary, key findings,
dosage recommendations, and a table of citations.
Do not stop until all 10 sources are found.

The second prompt includes an explicit Definition of Done ("10 or more peer-reviewed empirical sources"), a format specification, and a stop condition. The agent has no ambiguity about when to quit.

1.2 Agent Architecture: More Than a Model

A common misconception is that an AI agent is just a chatbot with a fancier interface. In reality, the LLM is only one component of a larger system. Think of the LLM as the brain: capable and intelligent, but helpless without a body.

🧠

LLM (The Brain)

Reasons, generates language, makes decisions. Examples: Claude Opus, GPT-4, Gemini Pro.

🔧

Tools (The Hands)

Web search, file read/write, code execution, browser control, API calls.

💾

Memory (The Journal)

claude.md, gemini.md, agents.md — persistent files that carry knowledge across sessions.

🔄

The Loop (The Work Ethic)

The observe-think-act cycle that lets the agent pursue goals autonomously over time.

This architecture is why the term agent — rather than chatbot — is appropriate. An agent is embedded in an environment, observes that environment, and takes actions to change it toward a desired goal.^[6]

1.3 The Growing Context Window

Each time the loop completes one cycle, the result of the action is added to the context. Imagine a notepad that gets a new line after every step. By the fifth loop, the agent has five lines of context: its original instructions, the result of step 1, the result of step 2, and so on.

This is both a strength (the agent accumulates knowledge) and a constraint (the notepad has a maximum size). Context management — deciding what to keep, what to summarize, and what to discard — is one of the core skills covered in Chapter 6.

📝 Token Counting

Everything in the context window is measured in tokens. A token is approximately 0.75 words. Current models (Claude Opus 4.6, Gemini 2.5 Pro) support context windows of 200,000 to 1,000,000 tokens — equivalent to roughly 150,000 to 750,000 words, or several long novels.

🧪 Chapter 1 Knowledge Check

1. What are the three steps in the core agent loop, in order?

✅ Correct! The agent first observes all available context, then thinks (plans), then acts.

❌ Not quite. The agent must first observe its environment before it can reason about what to do.

2. What is the most common reason beginners are disappointed by agent results?

✅ Exactly. Without a Definition of Done, the agent decides when to stop — and its judgment rarely matches yours.

❌ The most common issue is a missing Definition of Done. Even current models are powerful enough for most practical tasks.

3. Which of the following is NOT a component of an AI agent's architecture?

✅ Right. The four components are: LLM, Tools, Memory, and the Reasoning Loop. Internet speed is not part of agent architecture.

❌ The four components are: LLM, Tools, Memory, and the Reasoning Loop.

Chapter 2

The Three Platforms

● Beginner Friendly Estimated reading time: 15 minutes

Three major platforms dominate the agentic coding landscape as of mid-2025. Each wraps a world-class LLM in a purpose-built interface for autonomous work. Understanding their differences — and their surprising similarities — will help you choose the right tool for each job.

2.1 The Big Picture

📝 The Key Insight

The intelligence gap between these platforms is small — a few percentage points at most. The differences that matter are in interpretability, design quality, multimodal capabilities, and ecosystem maturity. For most tasks, any of the three will serve you well.

Platform	Underlying Model	Made By	Best For	Price
Codex	GPT-4 / GPT-5 series	OpenAI	Backend, math, test-driven development	API pricing
Claude Code	Claude Opus / Sonnet	Anthropic	Orchestration, interpretable reasoning	$17–20/mo or API
Anti-Gravity	Gemini 2.5 Pro/Flash	Google	Frontend design, video understanding	API pricing

2.2 Claude Code — The Orchestrator

Claude Code's greatest strength is interpretability. Its reasoning tab shows you, in plain language, exactly what the model is planning to do before it does it. This makes Claude ideal as an orchestrator — the top-level manager that delegates work to other models and reviews their results.

✅

Strengths

Most interpretable reasoning; excellent for orchestration and multi-agent workflows; consistent quality.

⚠️

Weaknesses

Slower than competitors unless Fast Mode is enabled (which burns credits); weaker at frontend/visual design.

2.3 Anti-Gravity (Gemini) — The Designer

Gemini's standout feature is its native video understanding. Unlike Claude and GPT, Gemini can process video at the API level — analyzing frames, extracting steps, and executing what it watches. It also consistently produces the most visually polished frontend designs.

✅

Strengths

Best frontend/design output; native video understanding; very fast text generation; massive 1M-token context.

⚠️

Weaknesses

Least interpretable reasoning; quality can be inconsistent day-to-day.

2.4 Codex (GPT) — The Engineer

The GPT model series excels at backend development, rigorous mathematics, and test-driven development. Its "fire and forget" style — point it at a target, let it run — suits tasks with a clear, verifiable Definition of Done.

✅

Strengths

Best backend/API development; strongest at mathematics; largest ecosystem of documentation and examples.

⚠️

Weaknesses

Less interpretable than Claude; reasoning is harder to steer mid-task.

2.5 Getting Started

📋 Setting Up Claude Code ▼

Visit claude.ai and create an account (Google login works).
Subscribe to the Pro plan ($17/mo annual or $20/mo monthly).
Search for "Claude Code desktop download" and install for your OS (Mac, Windows, Windows ARM).
Open Claude Code, click the Code button, choose a working folder.
Enable "bypass permissions" for autonomous operation, then type your first task.

📋 Setting Up Anti-Gravity (Gemini) ▼

Search "Google Anti-Gravity download" — you likely already have a Google account.
Download for Mac (Intel or Apple Silicon) or Windows/Linux.
Open Anti-Gravity — Google logs you in automatically.
In the right-hand panel, find the Agent modal and start prompting.

📋 Setting Up Codex (OpenAI) ▼

Visit openai.com and create an account.
Search "OpenAI Codex download" and install for Mac or Windows.
Open Codex, create a new folder, and start your first project.

🧪 Try It

Pick any one platform and give it this prompt:

"Make a single-page portfolio site for [your name]. Keep it simple, minimal, and light-themed. When done, open it in the browser."

Observe the reasoning tab (if available) and notice each loop iteration: observe → think → act.

🧪 Chapter 2 Knowledge Check

1. Which platform is most recommended as a top-level orchestrator for multi-agent workflows?

✅ Correct! Claude's interpretable reasoning makes it ideal for orchestrating and reviewing other agents' work.

❌ Claude Code is preferred for orchestration because its reasoning tab lets you see and steer decisions in real time.

2. Which platform has built-in native video understanding?

✅ Right! Gemini's API includes native video understanding, enabling the video-to-action pipeline covered in Chapter 5.

❌ Gemini (Anti-Gravity) is the only platform with native video understanding at the API level.

Chapter 3

Prompting Techniques

● Intermediate Estimated reading time: 25 minutes

Prompting is the primary interface between you and an agent. A well-structured prompt is the difference between an agent that wanders and one that executes with surgical precision. This chapter covers four high-leverage techniques: self-modifying system prompts, agent skills, prompt contracts, and reverse prompting.

3.1 Self-Modifying System Prompts

Every major platform provides a special file that is automatically prepended to every conversation. This file goes by different names depending on the platform:

Platform	File Name
Claude Code	`claude.md`
Anti-Gravity (Gemini)	`gemini.md`
Codex	`agents.md`

Because this file is read at the start of every session, it functions as a persistent, growing memory. The key insight is to instruct the agent to update this file automatically whenever you correct it — turning your feedback into permanent rules.

Self-Improving Agent Loop

Session starts

→

Agent reads claude.md

→

Agent completes task

↓ (you correct: "no dark mode")

Agent writes rule to claude.md

→

Next session: rule applied automatically

Over time, this creates a compounding improvement. After one session you have one rule; after ten sessions you have ten rules; after fifty sessions your agent almost never makes mistakes relative to your preferences.

# Add this block to your claude.md / gemini.md

## How This File Works
Before starting any task, read this entire file.
It contains a growing ruleset that improves over time.

## When to Add a Rule
- When the user explicitly corrects your output
- When you hit a bug caused by a wrong assumption
- When the user states a preference

## Rule Format
[Category] Never/Always do X because Y.

## Learned Rules
Rule 1: [Design] Never use dark mode by default because the user prefers light themes.
Rule 2: [Code] Always use TypeScript, not JavaScript.
  

💡 Global vs. Local Files

You can maintain two levels: a global file with universal preferences (your name, tone, language style) that applies to all projects, and a local file in each project folder with project-specific rules. Claude Code stores the global file at ~/.claude/claude.md.

3.2 Agent Skills

An agent skill is a reusable workflow stored as a markdown file. Where the system prompt file captures your preferences, a skill captures a process. Skills transform the LLM's natural statistical variability into a predictable, deterministic workflow.

A skill file has two parts: a short YAML header (loaded into context every session) and a detailed body (loaded only when the skill is invoked):

---
name: pdf-processing
description: >
  Extract text and tables from PDF files.
  Use when the user provides a PDF and wants
  its contents analyzed or summarized.
---

## Steps
1. Read the uploaded PDF using the file tool.
2. Extract all text, preserving section headings.
3. Identify any tables and convert them to markdown.
4. Return a structured summary with: executive summary,
   key sections, and all extracted tables.
5. Ask the user if they need the raw text as well.
  

📝 Token Efficiency of Skills

Only the YAML header is loaded into the context window at startup. The detailed body is read only when you invoke the skill by name. This keeps your context lean and your costs low — a technique explored further in Chapter 6.

3.3 Prompt Contracts

A prompt contract is a structured agreement between you and the agent, established before any work begins, that defines four things:

🎯

Goal

What does success look like? Be specific and measurable.

🚧

Constraints

What limits apply? (file size, line count, technology stack, timeline)

📄

Format

How should the output be structured? (sections, file types, style)

❌

Failure Conditions

What does a bad result look like? Explicit failures prevent silent mediocrity.

📋 Example Prompt Contract ▼

User instruction: "Build a beautiful single-page site for LeftClick.ai"

Agent-generated contract (requires approval before proceeding):

GOAL: Single-page marketing site for LeftClick.ai with
      smooth scroll animations, light theme, 5 sections.

CONSTRAINTS:
- Under 500 lines of HTML/CSS/JS
- No external frameworks (vanilla only)
- Must be mobile-responsive
- Deploy-ready as a single .html file

FORMAT: Sections: Hero, About, Services, Testimonials, CTA
        Fade-in animations on scroll
        Hover states on buttons

FAILURE CONDITIONS:
- Looks like a generic Bootstrap template
- Broken on mobile
- Janky or missing animations
- Exceeds 500 lines

Do you approve this contract? [yes/no/modify]
      

3.4 Reverse Prompting

Reverse prompting takes prompt contracts one step further. Instead of having the agent generate the contract directly from your vague instruction, it first asks you five clarifying questions — surfacing assumptions and preferences you might not have known to mention. Only after you answer does it draft the contract.

Reverse Prompting Flow

Vague task

→

5 clarifying Qs

→

Your answers

→

Prompt contract

→

Execution

The practical benefit is dramatically improved one-shot potential — the probability the agent gets it right on the first try, with no corrections needed.

🧪 Try It — Reverse Prompting

Add this line to the end of any task prompt you give an agent today:

Before you begin, ask me exactly 5 clarifying questions
to surface assumptions, constraints, and preferences I may
not have stated. Then summarize the answers as a prompt
contract before proceeding.

Notice how the final output differs from what you would have received without this step.

🧪 Chapter 3 Knowledge Check

1. What is the purpose of the self-modifying system prompt file (e.g., claude.md)?

✅ Exactly. The file is read at the start of every session, so rules added in one session are automatically applied in all future sessions.

❌ The file is used to persist learned rules and preferences. It is prepended to every conversation as a system prompt.

2. What are the four components of a prompt contract?

✅ Correct! Goal (what success looks like), Constraints (limits), Format (structure of output), Failure Conditions (what counts as a bad result).

❌ The four components are Goal, Constraints, Format, and Failure Conditions.

Chapter 4

Multi-Agent Strategies

● Intermediate ● Advanced Estimated reading time: 30 minutes

A single agent is impressive. A coordinated fleet of agents is transformative. This chapter covers the four multi-agent design patterns that separate casual users from power users: MCP orchestration, stochastic consensus, agent chat rooms, and sub-agent verification loops.

4.1 Multi-Agent MCP Orchestration

MCP orchestration uses one model as a manager that delegates subtasks to specialist sub-agents. Each sub-agent is chosen for its comparative advantage:

Multi-Agent Orchestration Stack

🧠 Claude (Orchestrator)
Plans, delegates, validates

↓

🎨 Gemini
Frontend / UI / Video

⚙️ Codex
Backend / API / Tests

🔍 Claude
Review & Integration

The orchestrator decomposes the task, routes each piece to the best model via MCP server calls, then collects and validates results. This approach has two advantages: the best model handles each subtask, and sub-agents work in parallel, reducing total completion time.

⚠️ Cost Consideration

Multi-model orchestration requires API keys for each platform and bills at API rates — you lose the monthly plan subsidization. Reserve this pattern for high-complexity projects where the quality gain justifies the cost.

4.2 Stochastic Multi-Agent Consensus

LLMs are stochastic: ask the same question twice and you get slightly different answers. Most users treat this as a bug. Power users treat it as a feature.

Stochastic consensus works by spawning N agents simultaneously, each with a slightly different prompt framing, then aggregating their outputs. The result traverses far more of the "answer space" than any single query.

Stochastic Consensus — Traversing the Answer Space

Your question

→

Agent 1: Conservative framing

Agent 2: Devil's advocate framing

Agent 3: User-perspective framing

Agent 4–10: Other framings…

→

Aggregated consensus report

The aggregated report categorizes findings into three tiers:

Consensus items: Ideas that multiple agents independently surfaced — high confidence, act on these.
Divergent items: Ideas supported by some agents but not others — worth careful evaluation.
Outliers: Ideas from only one agent — potentially brilliant, potentially hallucinated.

💡 Best Use Cases

Stochastic consensus is ideal for: strategic decision-making, content ideation, keyword research, competitive analysis, product naming, and any question where the goal is to maximize the diversity of ideas rather than execute a known process.

4.3 Agent Chat Rooms

Where stochastic consensus runs agents in parallel isolation (they don't communicate), agent chat rooms run agents in a shared debate. Each agent is assigned a distinct personality, and they argue about the problem in a shared chat.json file.

Persona	Role in the Debate
🧩 Systems Thinker	Examines structural and systemic causes
⚙️ Pragmatist	Focuses on what is feasible and measurable
🔍 Edge Case Finder	Stress-tests every assumption
👤 User Advocate	Represents the end-user perspective
🗣️ Contrarian	Challenges consensus, prevents groupthink

The debate sharpens ideas: agents push back on each other's weak points, surface hidden assumptions, and converge on more nuanced answers than any single agent could produce. The resulting chat.json log serves as valuable context for an orchestrator's final synthesis.

🔍 When to Use Chat Rooms vs. Stochastic Consensus ▼

	Stochastic Consensus	Chat Rooms
Agent interaction	None (parallel, isolated)	Active debate
Best for	Idea volume, search space coverage	Depth, nuance, decision quality
Time	Fast (parallel)	Slower (sequential rounds)
Output	Frequency/consensus map	Structured debate transcript

4.4 Sub-Agent Verification Loops

When an agent spends a long time building something, it develops a kind of sunk-cost bias: it has explored many dead ends, made many decisions, and accumulated a sense that its approach was the best one. Ask that same agent to critique its own work and it will find remarkably little wrong with it.

The solution is to pass the output only — stripped of all reasoning history — to a fresh agent that has never seen the problem before.

Sub-Agent Verification Loop

🔨 Implementer
Builds the thing

→ output only →

🔎 Reviewer
Fresh context, zero bias

→ issue list →

🔧 Resolver
Fixes issues

→

✅ Verified output

The reviewer evaluates for four dimensions: correctness, edge cases, simplification opportunities, and security vulnerabilities. Because it sees only the output — not the journey to get there — it catches errors the implementer is blind to. This is conceptually identical to peer review in academic publishing.^[3]

📝 Key Insight — Context Pollution

A 200,000-token context is not an unbiased judge. Every wrong turn the implementer explored, every bug it fixed, every assumption it made — all of that is in the context window. A fresh agent sees only what exists, not the journey to create it. Fresh eyes catch what tired eyes miss.

🧪 Chapter 4 Knowledge Check

1. In a sub-agent verification loop, what does the Reviewer agent receive?

✅ Correct. The Reviewer gets the output alone — no reasoning history, no context baggage — so it can evaluate objectively.

❌ The Reviewer receives only the output, not the implementer's reasoning history. This is what gives it unbiased fresh eyes.

2. What is the key difference between Stochastic Consensus and Agent Chat Rooms?

✅ Right. Consensus maximizes breadth (many isolated agents); Chat Rooms maximize depth (agents challenging each other's ideas).

❌ The key difference is interaction: consensus agents work in isolation in parallel; chat room agents actively debate each other.

Chapter 5

Advanced Pipelines

● Advanced Estimated reading time: 20 minutes

5.1 Video-to-Action Pipeline

Until recently, agents could only learn from text. A YouTube tutorial — rich with visual context, cursor movements, and UI navigation — was opaque to any model. The video-to-action pipeline changes this by routing video through Gemini's native video understanding API.

Video-to-Action Pipeline

YouTube URL

→

Claude receives URL
(cannot watch video)

→

Calls Gemini API
(native video understanding)

→

Gemini extracts
numbered step list

→

Claude executes
each step with tools

Gemini samples the video at one frame per second, analyzes the image sequence, and produces a hyper-specific numbered step list (e.g., "At 0:17, click the blue 'Add Node' button in the top-left toolbar"). Claude receives this list and executes each step using browser control, file editing, or other tools.

💡 Real-World Example

Spencer Sterling demonstrated an agent that watched the canonical Blender "donut tutorial" on YouTube, extracted every step, and autonomously reproduced the 3D donut model — without any human guidance beyond providing the URL.^[9]

5.2 Multi-Agent Chrome Automation

The Chrome DevTools MCP server lets an agent control a real browser: navigate pages, click elements, fill forms, take screenshots, and read page content. A single agent performing these actions sequentially is useful. Ten agents performing them in parallel is transformative.

Parallel Chrome Automation

🧠 Orchestrator
Distributes target URLs

↓ spawns N agents ↓

🌐 Chrome 1
Site A

🌐 Chrome 2
Site B

🌐 Chrome 3
Site C

🌐 Chrome N
…

↓ results aggregated ↓

📊 Final Report

The throughput math is compelling. If a single agent takes 2–3 minutes per form submission, one agent processes ~0.5 forms/minute. With 10 agents: 5 forms/minute. With 100 agents: 50 forms/minute. A list of 2,000 contacts processed in under an hour.

⚠️ Ethical Use

Multi-agent browser automation is powerful enough to be misused. Respect robots.txt files, website terms of service, and applicable laws. Responsible practitioners use these capabilities for legitimate outreach, research, and automation — not scraping at a scale that damages services or violates consent.

The Shared Chat File Pattern

Coordinating N agents requires a communication mechanism. The simplest and most reliable approach is a shared chat.json file in a central workspace folder. The orchestrator resets this file at the start of each run. Each sub-agent appends its status, results, and any issues every 30 seconds. The orchestrator polls the file periodically and re-routes work as needed.

🧪 Try It — Simple Chrome Automation

Install the Chrome DevTools MCP for your platform, then give an agent this prompt:

Open my Chrome browser and navigate to google.com.
Search for "AI agents 2025 use cases".
Take a screenshot of the first results page.
List the top 5 result titles and URLs in a markdown table.

This single-agent example helps you feel the Chrome MCP before scaling to multi-agent.

🧪 Chapter 5 Knowledge Check

1. In the video-to-action pipeline, which model actually watches the video?

✅ Correct. Claude receives the URL but calls the Gemini API to actually watch and interpret the video.

❌ Gemini's API is the one that watches the video. Claude orchestrates the pipeline but cannot natively process video.

Chapter 6

Cost & Optimization

● Intermediate Estimated reading time: 25 minutes

AI agents are not free. Every token processed costs money, and complex multi-agent workflows can accumulate costs quickly. This chapter covers two interrelated topics: managing the context window to maintain output quality, and structuring model usage to minimize cost without sacrificing results.

6.1 Why Quality Degrades Over Time

As a conversation grows, so does the context window. And as the context window grows, model performance measurably declines. This is not speculation — it is a documented phenomenon in large language model research^[2] sometimes called "lost in the middle": models attend less effectively to information buried deep in a long context.

Quality vs. Token Count

Conceptual illustration (not to scale)

~5k tokens — ~95% quality

~50k tokens — ~80% quality

~100k tokens — ~65% quality

~180k tokens — ~45% quality

6.2 Context Compaction

When the context window approaches its limit, the platform automatically triggers compaction: a summarization process that compresses old context into fewer tokens. Think of it as a hydraulic press squishing older parts of the conversation into dense summaries.

Compaction preserves meaning but loses precision. Specific tool outputs, exact error messages, and nuanced reasoning steps may be reduced to one-line summaries. For long-running agentic tasks, this can cause subtle regressions.

💡 Check Your Context Usage

In Claude Code, type /context in the terminal panel to see a breakdown of exactly how your tokens are being used — system prompt, memory files, skills, tool results, and conversation history — before you've even sent a message.

6.3 The Iceberg Technique

The iceberg technique is the leading approach to strategic context management. The idea: keep only a small amount of information immediately loaded in the context window, and give the agent tools to access everything else on demand.

The Iceberg Context Model

Above water (always in context) — ~20–30%

System prompt + memory file (claude.md)
Skill YAML headers (not full bodies)
Current task context
Active file being edited

Below water (accessed on demand) — ~70–80%

Full codebase (read via read tool, selectively)
Web data (fetched only when needed)
Full skill bodies (loaded only when invoked)
Git history (queried only when relevant)
Database / external files

6.4 Model Selection Strategy — The 60/30/10 Rule

Not every subtask requires your most powerful (and expensive) model. The 60/30/10 rule provides a practical allocation framework:

Allocation	Model Tier	Example Tasks	Approx. Cost
60%	Small/Fast (Haiku, Flash)	Classification, extraction, simple formatting	$0.25–1/M tokens
30%	Mid-tier (Sonnet, GPT-4o)	Research, content generation, code review	$3–5/M tokens
10%	Top-tier (Opus, GPT-5)	Routing decisions, final synthesis, complex reasoning	$15–30/M tokens

📊 Cost Comparison: Naive vs. Tiered ▼

Scenario: 100 million tokens of work on a large project.

Opus-only approach:
100M tokens × $15/M = $1,500

60/30/10 tiered approach:
60M × $0.80 = $48
30M × $3.00 = $90
10M × $15.00 = $150
Total: $288 — an 81% cost reduction

Quality impact: minimal, because the tasks routed to cheaper models are those for which cheaper models are already sufficient.

6.5 Batch API Pricing

All major providers offer a Batch API that processes requests asynchronously during off-peak server hours. In exchange for accepting up to 24-hour latency, you receive approximately 50% off standard pricing. For high-volume, non-time-sensitive workloads (overnight research, bulk data processing), the batch API can halve your monthly AI costs.

🧪 Try It — Audit Your Context

Open Claude Code, navigate to a project, and type /context in the terminal. Note:
1. How many tokens your system prompt consumes before any conversation starts.
2. How many tokens your skills use.
3. What percentage of your total context window is already consumed at session start.

If your starting token count exceeds 10% of your limit, trim your claude.md and consolidate your skills.

🧪 Chapter 6 Knowledge Check

1. What does the Iceberg Technique recommend keeping "below water" (not loaded at startup)?

✅ Correct. Large resources like the full codebase are kept out of context and accessed on demand via tools, keeping the active context lean.

❌ The iceberg keeps large resources (full codebase, skill bodies, web data) below the surface, accessed on demand rather than always loaded.

2. In the 60/30/10 rule, which tier handles routing decisions and final synthesis?

✅ Right. High-stakes decisions and final synthesis deserve your best model. Volume work (classification, extraction) goes to cheaper models.

❌ The 10% tier (top-tier: Opus, GPT-5) handles the highest-stakes decisions: routing, synthesis, and final review.

Chapter 7

Real-World Applications

● Step-by-Step ● Practical Estimated reading time: 45 minutes · The most hands-on chapter in the book

Everything in Chapters 1–6 was preparation. This chapter is where you actually do things. We will walk through four complete, real-world applications — from your very first agent setup to running a multi-agent virtual company, automating repetitive jobs, and producing research-grade written work. Every step is explained, illustrated, and shown in full.

📝 Before You Begin

You will need at least one platform installed and working (Claude Code is recommended). If you have not completed the setup in Chapter 2, go back and do that first. Everything else in this chapter assumes you can open your agent platform and type a prompt.

7.1 Setting Up Your First Agent Task

This section is for the complete beginner. We are going to build a working agent setup from scratch — step by step, with no assumed knowledge. By the end, your agent will autonomously perform a real task while you watch.

Step 1 — Open Claude Code and Choose a Folder

Think of a folder on your computer the same way you think of a desk drawer. Your agent works inside that drawer — it can read, write, and organise files there, but it cannot reach outside unless you give it permission.

🧪 Do This Now

Create a new folder on your Desktop called My First Agent.
Open Claude Code.
Click the Code button (top left).
Click Open Folder and select My First Agent.
Toggle Bypass Permissions to ON — this lets the agent act without asking for confirmation at every step.

Step 2 — Create Your Memory File

Before giving the agent any task, set up your claude.md — the persistent memory file from Chapter 3. This is a plain text file you create once and the agent reads forever.

🧪 Do This Now

In the Claude Code chat box, type exactly this:

Create a file called claude.md in this folder.
Write the following into it:

# My Agent Rules
- My name is [YOUR NAME]. Always address me by name.
- Always use British English spelling.
- Format all reports with: Executive Summary, Main Findings, Recommendations.
- Never use dark mode in any web output.
- At the end of every session, add one new rule you learned about my preferences to this file.

Confirm when done.

Replace [YOUR NAME] with your actual first name. Press Enter and wait for the confirmation.

💡 What Just Happened?

You created a self-improving memory. From this moment forward, every conversation in this folder begins with the agent reading those rules. When you correct the agent, it adds the correction to the file automatically. Your agent will never make the same mistake twice.

Step 3 — Give Your First Real Task

Now let us give the agent a task with a proper Definition of Done — the most important lesson from Chapter 1. We will ask it to research a topic and produce a structured report.

🧪 Do This Now — Your First Research Task

Research the top 5 benefits of regular exercise for adults over 40.

Definition of Done:
- Minimum 5 peer-reviewed or reputable sources cited
- Report saved as exercise-report.md in this folder
- Format: Executive Summary (3 sentences), 5 numbered findings each with evidence, Recommendations section
- Do not stop until all sections are complete and the file is saved

Watch the agent work. You will see it: (1) observe your instructions, (2) think through a plan, (3) search the web, (4) write the file. This is the core loop in action.

Step 4 — Correct and Improve

When the agent finishes, read the report. If anything is not to your liking, correct it in plain English:

The recommendations section is too vague. Rewrite it with specific,
actionable advice — e.g. exact exercise types, durations, and frequencies.
Also add this correction to claude.md so you remember it next time.

The agent will fix the report and write a new rule into claude.md. You have now completed your first full agent learning cycle.

7.2 Running a Virtual Company with Multi-Agents

A virtual company is a group of specialised AI agents that each handle one department — just like a real business. One agent handles marketing, another handles operations, another handles finance, and a manager agent oversees and coordinates them all. Together they can accomplish in a few hours what would take a small human team days.

The Blueprint: A Four-Department Virtual Company

🧠

CEO Agent (Orchestrator)

Receives the high-level goal, breaks it into department tasks, delegates work, reviews outputs, resolves conflicts.

📣

Marketing Agent

Writes copy, drafts social posts, creates campaign strategies, analyses competitors.

⚙️

Operations Agent

Builds workflows, writes process documents, automates routine tasks, manages file organisation.

📊

Research Agent

Gathers data, synthesises reports, fact-checks claims, monitors trends.

Step-by-Step: Setting Up Your Virtual Company

Step 1 — Create the Company Folder Structure

🧪 Do This Now

In your agent chat, type:

Create the following folder structure in our workspace:

/company
  /ceo          ← CEO agent workspace
  /marketing    ← Marketing agent workspace
  /operations   ← Operations agent workspace
  /research     ← Research agent workspace
  /shared       ← Shared files all agents can read
    chat.json   ← Communication log (create as empty JSON array [])
    briefing.md ← Company mission and active goals

Step 2 — Write the Company Briefing

The briefing is the single document every agent reads first. It tells them who they are, what the company does, and what the current goal is.

🧪 Do This Now

Write the following into /company/shared/briefing.md:

# Company Briefing
**Company:** [YOUR COMPANY NAME]
**Mission:** [ONE SENTENCE DESCRIPTION]

## Active Goal
[DESCRIBE YOUR CURRENT PROJECT IN 2-3 SENTENCES]

## Department Rules
- Marketing: Always match brand voice. No unverified claims.
- Operations: Document every process. Prefer simple solutions.
- Research: Cite sources for every claim. Flag uncertainty explicitly.
- CEO: Coordinate departments. Review all outputs before finalising.

## Communication Protocol
All agents log their progress to /company/shared/chat.json
Format: {"agent": "name", "status": "message", "timestamp": "HH:MM"}

Fill in the bracketed sections with your actual company or project details.

Step 3 — Create Agent Identity Files

Each agent gets its own claude.md-style identity file inside its folder. This is what makes each agent behave differently from the others.

📋 CEO Agent Identity File (click to expand) ▼

Save this as /company/ceo/claude.md:

# CEO Agent Identity
You are the CEO of [COMPANY NAME].
Your job is to coordinate all departments, not to do their work for them.

## Your Responsibilities
1. Read briefing.md before every task
2. Break the goal into subtasks for each department
3. Assign tasks with clear Definitions of Done
4. Review each department's output for quality
5. Write the final synthesised deliverable
6. Log your decisions to /company/shared/chat.json

## Your Principles
- Never skip the planning step
- Always verify facts before accepting research outputs
- If two departments disagree, resolve by reasoning, not by picking a side arbitrarily

📋 Marketing Agent Identity File (click to expand) ▼

Save this as /company/marketing/claude.md:

# Marketing Agent Identity
You are the Head of Marketing at [COMPANY NAME].

## Your Responsibilities
- Write persuasive, accurate marketing copy
- Analyse competitors when asked
- Draft social media content tailored to the platform
- Always log completed work to /company/shared/chat.json

## Your Constraints
- Never make unverified claims
- Always ask: "Who is the audience?" before writing
- Keep social posts under 280 characters unless told otherwise

📋 Research Agent Identity File (click to expand) ▼

Save this as /company/research/claude.md:

# Research Agent Identity
You are the Head of Research at [COMPANY NAME].

## Your Responsibilities
- Gather, verify, and synthesise information from reputable sources
- Flag any claims you cannot verify with [UNVERIFIED]
- Save all research outputs to /company/research/ with clear filenames
- Log progress to /company/shared/chat.json

## Quality Standards
- Minimum 3 sources per claim
- Include publication date and author for every source
- Never fabricate citations

Step 4 — Run Your First Multi-Agent Task

Now open Claude Code and run the CEO agent. Give it a real company task and watch it delegate:

🧪 Do This Now — Launch the Virtual Company

You are the CEO agent. Read /company/shared/briefing.md first.

Our goal this session:
Produce a complete go-to-market plan for our new product launch
next month. The plan must include:
1. A competitor analysis (delegate to Research Agent)
2. Three marketing campaign concepts (delegate to Marketing Agent)
3. A launch week timeline (delegate to Operations Agent)
4. A synthesised executive summary (you write this after reviewing all three)

For each department task:
- Write the task instructions into their folder as task.md
- Simulate their response by completing the task yourself, working from
  their /company/[department]/claude.md identity
- Save each output to their folder
- Log completion to /company/shared/chat.json

Do not skip any department. Do not write the executive summary until
all three department outputs are reviewed.

💡 Why This Works

Each agent "reads" its identity file before acting, which changes how it approaches the task. The Marketing Agent focuses on audience and persuasion; the Research Agent focuses on evidence and sources; the Operations Agent focuses on timelines and processes. The CEO reviews all three outputs with fresh eyes — the same principle as the sub-agent verification loop from Chapter 4.

Scaling Up: True Parallel Agents

Once you are comfortable with one-at-a-time delegation, you can run real parallel agents by opening multiple Claude Code windows simultaneously, each pointed at a different department folder. All four run at the same time, writing to chat.json as they go. The CEO window monitors the chat log and synthesises when all departments report completion.

Parallel Virtual Company in Action

🧠 CEO Window
Assigns tasks · Monitors chat.json · Synthesises

↙ ↓ ↘

📣 Marketing
Window 2

⚙️ Operations
Window 3

📊 Research
Window 4

All writing to /company/shared/chat.json simultaneously

7.3 Automating Routine Jobs

Routine jobs are the best first target for AI agents — they are repetitive, well-defined, and the Definition of Done is obvious. This section shows you how to automate five common professional tasks completely.

Routine Job 1 — Weekly Email Digest

Scenario: Every Monday morning you want a summary of emails received the previous week, organised by priority.

# Paste this prompt every Monday morning (or schedule it)

Search my Gmail inbox for all emails received in the past 7 days.

Organise them into three groups:
1. URGENT — requires a response within 24 hours
2. THIS WEEK — requires action but not immediately
3. FYI — informational only, no action needed

For each email provide: sender, subject, one-sentence summary, suggested action.
Save the output as weekly-digest-[DATE].md in my Documents folder.

Definition of Done: All emails from the past 7 days are categorised.
No email is missed. File is saved and confirmed.

💡 Make It Truly Automatic

Using the Schedule skill (available in this app), you can set this prompt to run automatically every Monday at 8:00 AM — so the digest is waiting for you when you start work. No manual trigger needed.

Routine Job 2 — Meeting Preparation Brief

Before any important meeting, an agent can prepare a one-page brief — background on attendees, agenda summary, suggested questions, and relevant context from your files.

I have a meeting at [TIME] with [NAME/ORGANISATION] about [TOPIC].

Prepare a one-page meeting brief that includes:
1. Background: Who is [NAME/ORGANISATION]? (search the web)
2. Agenda: Key discussion points based on [TOPIC]
3. My objectives: What I want to achieve in this meeting
4. Suggested questions I should ask
5. Relevant context: Search my /Documents folder for any prior notes
   or files related to [NAME] or [TOPIC]

Save as meeting-brief-[DATE]-[NAME].md
Definition of Done: All 5 sections complete. File saved.

Routine Job 3 — Social Media Content Calendar

Creating a month of social media content is one of the most time-consuming marketing tasks. An agent can produce a complete 30-day calendar in minutes.

Create a 30-day social media content calendar for [YOUR BRAND/TOPIC].

Audience: [DESCRIBE YOUR AUDIENCE]
Platforms: LinkedIn and X (Twitter)
Tone: [PROFESSIONAL / CASUAL / EDUCATIONAL]
Posting frequency: 5 times per week (Mon–Fri)

For each post provide:
- Day and date
- Platform (LinkedIn or X)
- Full post text (LinkedIn max 1,300 chars; X max 280 chars)
- 3 relevant hashtags
- Content type (tip / story / question / announcement / case study)

Vary content types so no two consecutive posts are the same type.
Save as content-calendar-[MONTH]-[YEAR].md

Definition of Done: 22 posts (Mon–Fri × ~4.5 weeks), all sections
complete, no two consecutive identical content types.

Routine Job 4 — Financial Expense Report

If you keep receipts or expense logs in a folder, an agent can read them and produce a formatted expense report automatically.

Read all files in /Expenses/[MONTH]/ folder.
Each file contains a receipt or expense entry.

Produce a professional expense report with:
1. Summary table: Date | Description | Category | Amount | Currency
2. Subtotals by category (Travel, Meals, Software, Equipment, Other)
3. Grand total
4. Any receipts that appear unusual or missing details (flag these)

Save as expense-report-[MONTH]-[YEAR].xlsx and also as .md

Definition of Done: Every file in the folder is processed.
Totals are verified (recheck arithmetic). Both files saved.

Routine Job 5 — Competitive Intelligence Monitor

Stay ahead of competitors by running a weekly intelligence sweep.

Conduct a competitive intelligence sweep for [YOUR INDUSTRY].

Monitor the following competitors: [LIST 3-5 COMPETITORS]

For each competitor, search the web and find:
1. Any new product announcements in the past 7 days
2. Any pricing changes
3. Any significant press coverage (positive or negative)
4. Any new job postings that hint at their strategic direction
5. Their most recent social media posts (tone, themes, engagement)

Produce a structured report with an executive summary and
one section per competitor.
Rate the overall competitive threat level: LOW / MEDIUM / HIGH
with reasoning.

Save as competitive-intel-[DATE].md
Definition of Done: All 5 competitors covered. All 5 data points
per competitor addressed. Threat rating included with reasoning.

7.4 Research and Producing Publishable Papers

This is the most sophisticated application in the chapter. We will walk through how to use AI agents to produce rigorous, well-structured academic or professional research papers — and how to ensure the writing is genuinely original, not plagiarised.

⚠️ Academic Integrity — Read This First

AI-assisted research is a legitimate and growing practice in professional and academic settings. However, your institution or publisher may have specific policies about AI use. Always check and disclose AI assistance where required. The techniques in this section are designed to produce original synthesis and analysis — not to copy or reproduce others' work. Used ethically, these tools make your research more thorough, not less honest.

Why AI-Written Text Can Fail Plagiarism Detectors — and Why That Is Not Enough

AI text is not plagiarised in the traditional sense — it does not copy sentences from existing sources. However, AI detectors (Turnitin's AI detection, GPTZero, Originality.ai) look for statistical patterns in writing: unnaturally consistent sentence length, predictable word choices, and absence of the "noise" that human writing naturally contains.

The strategies below produce writing that passes these detectors not by tricking them, but by producing genuinely high-quality, original, analytically rich text that reflects real thinking.

The Six-Stage Research Pipeline

Research Paper Production Pipeline

① Scoping

→

② Source Gathering

→

③ Synthesis

→

④ Drafting

→

⑤ Humanisation

→

⑥ Verification

Stage 1 — Scoping (Define the Research Question)

Never let the agent choose your research question. You define it. The agent helps you refine it into something researchable and specific.

🧪 Scoping Prompt

I want to write a research paper on [BROAD TOPIC].

Help me refine this into a specific, researchable question by:
1. Identifying 5 possible specific angles within this topic
2. For each angle, assessing: research gap, available evidence, novelty
3. Recommending the strongest angle with reasoning
4. Drafting a formal research question and hypothesis for that angle

Do not start researching yet. Only scope and recommend.
Wait for my approval before proceeding.

Stage 2 — Source Gathering

Use the Research Agent pattern to gather sources. Crucially, instruct the agent to save raw source information separately from its interpretation of that information.

🧪 Source Gathering Prompt

Research question: [PASTE YOUR APPROVED QUESTION HERE]

Search for a minimum of 15 credible sources on this topic.
Credible means: peer-reviewed journals, government reports, established
think tanks, major academic publishers.

For each source, record in sources.md:
- Full citation (APA 7th edition format)
- URL or DOI
- Publication year
- Key finding or argument (2-3 sentences, in YOUR words — do not quote)
- Relevance to our research question (1 sentence)
- Credibility rating: HIGH / MEDIUM (with reason)

Do NOT include any source you cannot verify.
Flag any finding you are uncertain about with [VERIFY].

Definition of Done: 15+ verified sources in sources.md,
all in APA format, all with key findings in your own words.

⚠️ Always Verify Citations Manually

AI agents can occasionally hallucinate citations — inventing plausible-sounding paper titles that do not exist. Before submitting any paper, manually verify each citation by searching the DOI or title on Google Scholar, PubMed, or your institution's library database. This step is non-negotiable.

Stage 3 — Synthesis (Finding the Argument)

Synthesis is the most intellectually demanding step — and the one most likely to produce genuinely original content, because it requires the agent to find patterns and tensions across sources.

🧪 Synthesis Prompt

Read sources.md carefully.

Produce a synthesis document (synthesis.md) that:
1. Identifies the 3-5 major themes that emerge across the sources
2. For each theme: which sources agree, which disagree, and why
3. Identifies the key tensions or unresolved debates in the literature
4. States where our research question fits within these debates
5. Proposes our paper's original contribution:
   what argument or perspective have we found that is not already
   well-represented in the literature?

This must be analytical, not descriptive. Do not just summarise
sources — find the argument that connects them.

Definition of Done: synthesis.md covers all 5 points,
each theme references specific sources by author name,
original contribution is clearly stated.

Stage 4 — Drafting

Now write the paper section by section. Writing it in sections (rather than all at once) produces better, more focused output and reduces the statistical AI-writing patterns that detectors flag.

📋 Standard Academic Paper Structure ▼

Section	Purpose	Typical Length
Abstract	Summary of the whole paper (written last)	150–300 words
Introduction	Context, research gap, question, structure	400–600 words
Literature Review	What is already known; your synthesis	800–1,500 words
Methodology	How you approached the research	300–600 words
Findings / Analysis	Your core argument and evidence	1,000–2,000 words
Discussion	Implications, limitations, future research	500–800 words
Conclusion	Summary and contribution	200–400 words
References	All citations in APA format	—

🧪 Section Drafting Prompt (repeat for each section)

Read synthesis.md and sources.md.

Write the [SECTION NAME] for our paper on [TOPIC].

Research question: [YOUR QUESTION]
Our argument/contribution: [FROM SYNTHESIS.MD]

Requirements:
- Write in an academic register: precise, evidence-based, analytical
- Vary sentence length significantly (mix short punchy sentences
  with longer analytical ones)
- Use active voice where appropriate, not exclusively passive
- Reference specific authors by name: "Smith (2022) argues..."
- Do NOT use bullet points or lists — full paragraphs only
- Do NOT use the words: "delve", "crucial", "it is worth noting",
  "in conclusion", "leverage", "utilise" — these are AI clichés
- Target length: [WORD COUNT] words

Definition of Done: [SECTION NAME] complete, no AI clichés,
properly cited, saved as draft-[section].md

Stage 5 — Humanisation

This is the stage that makes the difference between text that reads as AI-generated and text that reads as a human scholar. Humanisation does not mean hiding AI use — it means elevating the quality to genuine human-level writing.

🎭

Add Your Voice

Ask the agent to rewrite any section "as a sceptical scholar who challenges the mainstream view" or "as a practitioner who has seen this theory fail in the real world."

✂️

Vary Rhythm

Have the agent deliberately alternate between short and long sentences. Academic AI writing tends to use uniformly medium-length sentences — the giveaway pattern detectors look for.

🔍

Add Specificity

Instruct the agent to add concrete examples, specific numbers, or named cases wherever it has used vague generalisations. Vague generalisations are a hallmark of AI filler.

💬

Insert Disagreement

The most human-sounding academic writing acknowledges counterarguments. Ask the agent to add a "however" or "this view is not without its critics" moment to every major claim.

🧪 Humanisation Prompt

Read draft-[section].md.

Rewrite this section to read more like a thoughtful human scholar:
1. Vary sentence length dramatically — include at least 3 sentences
   under 10 words and at least 2 sentences over 40 words
2. Replace any vague generalisation with a specific example or data point
3. Add one counterargument or scholarly tension for each main claim
4. Remove any sentence that could have been written by any AI about any topic
   (generic transitions, filler sentences, bland summaries)
5. Keep all citations intact. Do not invent new ones.

The goal is not to disguise AI writing — it is to raise it to
genuine analytical quality that reflects original scholarly thinking.

Save as draft-[section]-revised.md

Stage 6 — Verification

Before submitting, run a final verification using a fresh sub-agent with no context of how the paper was written:

🧪 Verification Prompt (open a NEW chat window)

Read the paper in /research/full-draft.md (assembled from all sections).

Review it for:
1. CITATIONS: Does every factual claim have a citation?
   List any uncited claims.
2. CONSISTENCY: Does the argument flow logically from introduction
   to conclusion? Note any contradictions.
3. EVIDENCE: Are claims proportionate to the evidence cited?
   Flag any claims that seem overstated.
4. WORD CHOICE: List any sentences that sound generic, vague,
   or like AI boilerplate.
5. ORIGINALITY: Based only on what you read, what is the paper's
   unique contribution? Can you state it in one sentence?

Produce a structured review report. Do not rewrite anything —
only identify issues for the author to address.

Assembling and Exporting the Final Paper

🧪 Final Assembly Prompt

Assemble the final paper from these files (in this order):
- draft-abstract-revised.md
- draft-introduction-revised.md
- draft-literature-review-revised.md
- draft-methodology-revised.md
- draft-findings-revised.md
- draft-discussion-revised.md
- draft-conclusion-revised.md
- sources.md (formatted as References section)

Combine into a single document: final-paper.md
Then convert to final-paper.docx with:
- Title page (paper title, author name, date, institution)
- Page numbers
- 1.5 line spacing
- 12pt Times New Roman
- APA 7th edition formatting throughout

Definition of Done: Both .md and .docx files saved and confirmed.

💡 The Originality Guarantee

Papers produced using this pipeline are original because: (1) the research question is yours; (2) the synthesis — finding the argument — is generated from your specific set of sources; (3) the humanisation stage adds analytical depth and specificity that generic AI cannot produce; and (4) the verification stage catches and eliminates any remaining generic language. The result is a paper that reflects genuine intellectual work, assisted by AI rather than replaced by it.

🧪 Chapter 7 Knowledge Check

1. What is the very first thing you should create before giving an agent any task?

✅ Correct! The memory file is the foundation. Every preference you set there is automatically applied to all future sessions — you never have to repeat yourself.

❌ The memory file (claude.md) comes first. It stores your preferences so the agent applies them automatically in every future session.

2. In the virtual company setup, what is the CEO agent's primary responsibility?

✅ Right. The CEO orchestrates — it does not do all the work itself. Delegation + review is the core value of the orchestrator role.

❌ The CEO agent coordinates and reviews. Doing all the work itself would defeat the purpose of a multi-agent system.

3. In the research pipeline, which stage is most responsible for originality?

✅ Exactly. Synthesis is where the unique argument is formed — finding patterns, tensions, and a contribution not already well-represented in the literature. That is the intellectual heart of original research.

❌ Synthesis is the originality engine. It is where you find what the sources collectively suggest that no single source has said — the unique contribution of your paper.

4. Which of these is a reliable way to reduce AI-writing detection flags?

✅ Correct. These techniques raise the genuine quality of the writing — producing analytical depth that detectors associate with human scholarship. The goal is better writing, not evasion.

❌ The reliable approach is raising writing quality: varied sentence rhythm, concrete specificity, and analytical nuance. These are qualities of good writing that detectors associate with humans.

Back Matter

Conclusion: Putting It All Together

● All Levels Estimated reading time: 10 minutes

You have now covered the full arc of practical AI agent mastery — from the three-step loop that powers every agent, to the multi-agent architectures that compress weeks of work into hours, to the cost optimization strategies that make serious deployment economically viable.

The Hierarchy of Skills

Think of everything you have learned as a hierarchy, where each layer builds on the one below:

🏆 Level 5 — Cost-Optimized Multi-Agent Systems

📡 Level 4 — Advanced Pipelines (Video-to-Action, Chrome Automation)

🤝 Level 3 — Multi-Agent Orchestration, Consensus, Chat Rooms, Verification

✍️ Level 2 — Prompting Techniques (Contracts, Reverse Prompting, Skills)

🧱 Level 1 — Core Concepts (Agent Loop, Architecture, Platforms)

A Recommended Learning Path

Week 1: Set up one platform (Claude Code recommended). Give it 10 different real tasks. Observe the loop. Add a Definition of Done to each prompt.
Week 2: Create your first claude.md with 5 rules. Correct the agent 3 times and verify the corrections are written to the file. Build your first agent skill.
Week 3: Run your first prompt contract and reverse prompting session. Notice the difference in output quality.
Week 4: Attempt a simple two-agent setup: one agent builds, another reviews. Run a stochastic consensus with 5 agents on a decision you have been putting off.
Month 2+: Build a domain-specific skill library. Implement the 60/30/10 model rule. Explore Chrome automation for a real outreach or research task.

The Bigger Picture

AI agents are shifting the nature of knowledge work. Tasks that required a team — research, drafting, coding, quality review, outreach — can increasingly be accomplished by one person with a well-orchestrated fleet of agents. This is not a threat to thoughtful human judgment; it is an amplifier of it.

The practitioners who will thrive are not those who use agents blindly, but those who understand the architecture deeply enough to direct, verify, and correct agent work with authority. That is what this book aimed to give you.

💡 Final Advice

The single highest-ROI habit you can build: after every agent session that produces something useful, write one new rule into your claude.md. After six months, that file will be the most valuable prompt engineering asset you own.

Thank you for reading. Continue to the Bibliography for academic references.

Back Matter

Bibliography

Academic and technical references cited throughout this guide.

[1] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30. https://arxiv.org/abs/1706.03762
[2] Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Manning, C. D., Hashimoto, T., & Liang, P. (2023). Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics. https://arxiv.org/abs/2307.03172
[3] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., … & Clark, P. (2023). Self-Refine: Iterative refinement with self-feedback. Advances in Neural Information Processing Systems, 36. https://arxiv.org/abs/2303.17651
[4] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., … & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35. https://arxiv.org/abs/2201.11903
[5] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2023). ReAct: Synergizing reasoning and acting in language models. International Conference on Learning Representations (ICLR). https://arxiv.org/abs/2210.03629
[6] Park, J. S., O'Brien, J. C., Cai, C. J., Morris, M. R., Liang, P., & Bernstein, M. S. (2023). Generative agents: Interactive simulacra of human behavior. Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. https://arxiv.org/abs/2304.03442
[7] Shinn, N., Cassano, F., Berman, E., Gopinath, A., Narasimhan, K., & Yao, S. (2023). Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems, 36. https://arxiv.org/abs/2303.11366
[8] Wu, Q., Bansal, G., Zhang, J., Wu, Y., Li, B., Zhu, E., … & Wang, C. (2023). AutoGen: Enabling next-gen LLM applications via multi-agent conversation. arXiv preprint. https://arxiv.org/abs/2308.08155
[9] Sterling, S. (2025). Autonomous Blender tutorial replication via YouTube video understanding. Post on X (formerly Twitter). [Social media post demonstrating video-to-action agent pipeline for 3D modeling tutorial execution.]
[10] Jiang, A. Q., Sablayrolles, A., Roux, A., Mensch, A., Savary, B., Bamford, C., … & Sayed, W. E. (2024). Mixtral of experts. arXiv preprint. https://arxiv.org/abs/2401.04088
[11] Anthropic. (2025). Claude model documentation and API reference. Anthropic Technical Documentation. https://docs.anthropic.com
[12] OpenAI. (2025). Codex and GPT-4 API documentation. OpenAI Platform Documentation. https://platform.openai.com/docs
[13] Google DeepMind. (2025). Gemini model family technical report. Google AI. https://deepmind.google/technologies/gemini
[14] Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J., … & Wen, J.-R. (2024). A survey on large language model based autonomous agents. Frontiers of Computer Science. https://arxiv.org/abs/2308.11432
[15] Yerkes, R. M., & Dodson, J. D. (1908). The relation of strength of stimulus to rapidity of habit-formation. Journal of Comparative Neurology and Psychology, 18(5), 459–482. [Foundation for the inverted-U performance curve referenced in Chapter 6.]