第 03 课

第3课：TodoWrite —— 让 Agent 学会做计划

一句话总结：给 Agent 加一个"待办清单"工具，让它自己管理任务进度，再用"催促提醒"机制防止它偷懒忘了更新。

你将学到什么

什么是 TodoWrite，为什么 Agent 需要一个"计划管理"能力
如何实现一个 TodoManager 类来管理待办事项
待办事项的 3 个状态（pending / in_progress / completed）以及"同一时间只能做一件事"的约束
什么是"催促提醒"（nag reminder）机制，怎么防止 Agent 忘了更新进度
todo 工具本质上和 bash、read_file 没有任何区别，就是一个普通工具

核心概念：为什么 Agent 需要一个待办清单？

想象一下：你让一个实习生做一个复杂任务——"帮我把这个项目重构一下"。如果他直接闷头写代码，不做任何计划，会怎样？

做到一半忘了还有哪些没做
重复做已经做过的事
跟你汇报不清楚进度

所以好的实习生会怎么做？先列一个清单，做一项划一项。

AI Agent 也一样。当你给它一个复杂任务时（比如"帮我创建一个 Web 应用"），它需要：

先拆解任务 → 列出待办清单
标记当前在做哪个 → 同一时间只专注做一件事
做完一项打个勾 → 这样它（和你）都知道进度

TodoWrite 就是给 Agent 发了一个"便利贴本"，让它自己管理自己的任务进度。

关键洞察： todo 工具不是什么魔法，它就是一个和 bash、read_file 完全一样的普通工具。Agent 调用它来更新自己的待办列表，就像调 bash 执行命令一样自然。

ASCII 流程图

用户输入："帮我创建一个计算器项目"
   |
   v
+---------------------------+
| Agent 收到任务            |
| 调用 todo 工具列计划：    |
| [ ] #1: 创建项目结构      |
| [ ] #2: 写计算器逻辑      |
| [ ] #3: 写测试            |
+---------------------------+
   |
   v
+---------------------------+
| Agent 标记 #1 为进行中    |<---------------------------+
| [>] #1: 创建项目结构      |                            |
| [ ] #2: 写计算器逻辑      |                            |
| [ ] #3: 写测试            |                            |
+---------------------------+                            |
   |                                                     |
   v                                                     |
+---------------------------+                            |
| Agent 用 bash/write_file  |                            |
| 完成 #1，标记完成         |                            |
| [x] #1: 创建项目结构      |                            |
| [>] #2: 写计算器逻辑  ----+--- 标记下一个为进行中      |
| [ ] #3: 写测试            |                            |
+---------------------------+                            |
   |                                                     |
   +--- 继续循环，直到全部完成 --------------------------+
   |
   v
+---------------------------+       +---------------------------+
| 如果 Agent 连续 3 轮      |       | [x] #1: 创建项目结构      |
| 没更新 todo...            | ----> | [x] #2: 写计算器逻辑      |
| 系统注入催促提醒：        |       | [x] #3: 写测试            |
| "Update your todos!"      |       | (3/3 completed)           |
+---------------------------+       +---------------------------+

和上一课的对比

上一课（第2课）我们学了给 Agent 加更多工具（read_file / write_file / edit_file）。这节课的核心变化是：

对比项	上一课（多工具）	这一课（TodoWrite）
工具数量	4 个（bash / read / write / edit）	5 个（多了一个 todo）
Agent 有没有计划能力	没有，做到哪算哪	有，先列计划再做事
任务进度可见性	你不知道它做到哪了	清单一目了然
防偷懒机制	没有	催促提醒（nag reminder）
新增核心类	无	TodoManager

简单说：上一课的 Agent 是个"想到啥做啥"的实习生，这一课变成了"先列计划再做事"的靠谱员工。

完整代码

python
#!/usr/bin/env python3
"""s03_todo_write.py - TodoWrite（待办清单工具）

让 Agent 学会做计划：先列清单，做一项划一项。
核心是 TodoManager 类 + 催促提醒机制。
"""

import os          # 读取环境变量
import subprocess  # 执行 shell 命令
from pathlib import Path  # 路径操作

from anthropic import Anthropic          # Anthropic 官方 SDK
from dotenv import load_dotenv           # 加载 .env 环境变量

# ============================================================
# 第一步：初始化配置（和前几课一样）
# ============================================================

# 加载 .env 文件，override=True 让 .env 的值覆盖系统环境变量
load_dotenv(override=True)

# 如果用了代理地址，移除可能冲突的 AUTH_TOKEN
if os.getenv("ANTHROPIC_BASE_URL"):
    os.environ.pop("ANTHROPIC_AUTH_TOKEN", None)

# 当前工作目录 —— Agent 只能在这个目录下操作文件
WORKDIR = Path.cwd()

# 创建 Anthropic 客户端
client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL"))

# 从环境变量读取模型名称
MODEL = os.environ["MODEL_ID"]

# ============================================================
# 第二步：系统提示词 —— 告诉 AI 要用 todo 工具来做计划
# ============================================================

# 关键变化！提示词里明确告诉 AI：
# 1. 要用 todo 工具来规划多步骤任务
# 2. 开始做之前标记 in_progress
# 3. 做完了标记 completed
SYSTEM = f"""You are a coding agent at {WORKDIR}.
Use the todo tool to plan multi-step tasks. Mark in_progress before starting, completed when done.
Prefer tools over prose."""


# ============================================================
# 第三步：TodoManager —— 待办清单管理器（本课核心！）
# ============================================================

class TodoManager:
    """
    待办清单管理器。

    就像一个便利贴本，Agent 用它来：
    - 列出要做的事情
    - 标记正在做哪件
    - 打勾已完成的

    三个状态：
    - pending    (待办)      → 显示为 [ ]
    - in_progress(进行中)    → 显示为 [>]
    - completed  (已完成)    → 显示为 [x]

    核心约束：
    - 最多 20 个待办（防止列太多管不过来）
    - 同一时间只能有 1 个 in_progress（一次只做一件事，专注！）
    """

    def __init__(self):
        # items 就是待办列表，每一项是 {"id": "1", "text": "做什么", "status": "pending"}
        self.items = []

    def update(self, items: list) -> str:
        """
        更新整个待办清单。

        Agent 每次调 todo 工具时，会传入完整的待办列表。
        这个方法负责验证 + 保存 + 渲染。

        参数：
            items: 待办列表，每项包含 id、text、status

        返回：
            渲染后的待办清单文本

        异常：
            ValueError: 超过 20 个 / 多于 1 个 in_progress / 缺少 text 等
        """
        # 约束1：最多 20 个待办
        if len(items) > 20:
            raise ValueError("Max 20 todos allowed")

        validated = []          # 验证通过的待办列表
        in_progress_count = 0   # 计数：有几个 in_progress

        for i, item in enumerate(items):
            # 提取并清洗每一项的字段
            text = str(item.get("text", "")).strip()       # 任务描述
            status = str(item.get("status", "pending")).lower()  # 状态，默认 pending
            item_id = str(item.get("id", str(i + 1)))     # ID，默认用序号

            # 验证：text 不能为空
            if not text:
                raise ValueError(f"Item {item_id}: text required")

            # 验证：status 必须是三个合法值之一
            if status not in ("pending", "in_progress", "completed"):
                raise ValueError(f"Item {item_id}: invalid status '{status}'")

            # 计数 in_progress 的任务数
            if status == "in_progress":
                in_progress_count += 1

            validated.append({"id": item_id, "text": text, "status": status})

        # 约束2：同一时间只能有 1 个 in_progress
        # 这个很重要！强制 Agent 一次只做一件事，避免"并行"导致混乱
        if in_progress_count > 1:
            raise ValueError("Only one task can be in_progress at a time")

        # 验证通过，保存新的待办列表
        self.items = validated

        # 返回渲染后的文本，这个文本会作为工具结果返回给 AI
        return self.render()

    def render(self) -> str:
        """
        把待办列表渲染成人类可读的格式。

        效果类似：
        [ ] #1: 创建项目结构
        [>] #2: 写计算器逻辑
        [x] #3: 写单元测试

        (1/3 completed)
        """
        if not self.items:
            return "No todos."

        lines = []
        for item in self.items:
            # 根据状态选择对应的标记符号
            marker = {
                "pending": "[ ]",       # 待办 → 空框
                "in_progress": "[>]",   # 进行中 → 箭头
                "completed": "[x]",     # 已完成 → 打勾
            }[item["status"]]
            lines.append(f"{marker} #{item['id']}: {item['text']}")

        # 统计已完成数量，添加进度摘要
        done = sum(1 for t in self.items if t["status"] == "completed")
        lines.append(f"\n({done}/{len(self.items)} completed)")
        return "\n".join(lines)


# 创建全局的待办管理器实例
# 整个 Agent 生命周期内共享同一个待办列表
TODO = TodoManager()


# ============================================================
# 第四步：文件操作的安全检查 + 工具实现（和前几课一样）
# ============================================================

def safe_path(p: str) -> Path:
    """
    路径安全检查：确保 Agent 不会操作工作目录之外的文件。
    比如 Agent 传了个 "../../etc/passwd"，这里就会拦截。
    """
    path = (WORKDIR / p).resolve()
    if not path.is_relative_to(WORKDIR):
        raise ValueError(f"Path escapes workspace: {p}")
    return path

def run_bash(command: str) -> str:
    """执行 shell 命令，带安全检查和超时限制。"""
    dangerous = ["rm -rf /", "sudo", "shutdown", "reboot", "> /dev/"]
    if any(d in command for d in dangerous):
        return "Error: Dangerous command blocked"
    try:
        r = subprocess.run(command, shell=True, cwd=WORKDIR,
                           capture_output=True, text=True, timeout=120)
        out = (r.stdout + r.stderr).strip()
        return out[:50000] if out else "(no output)"
    except subprocess.TimeoutExpired:
        return "Error: Timeout (120s)"

def run_read(path: str, limit: int = None) -> str:
    """读取文件内容，可选限制行数。"""
    try:
        lines = safe_path(path).read_text().splitlines()
        if limit and limit < len(lines):
            lines = lines[:limit] + [f"... ({len(lines) - limit} more)"]
        return "\n".join(lines)[:50000]
    except Exception as e:
        return f"Error: {e}"

def run_write(path: str, content: str) -> str:
    """写入文件，自动创建父目录。"""
    try:
        fp = safe_path(path)
        fp.parent.mkdir(parents=True, exist_ok=True)
        fp.write_text(content)
        return f"Wrote {len(content)} bytes"
    except Exception as e:
        return f"Error: {e}"

def run_edit(path: str, old_text: str, new_text: str) -> str:
    """精确替换文件中的指定文本。"""
    try:
        fp = safe_path(path)
        content = fp.read_text()
        if old_text not in content:
            return f"Error: Text not found in {path}"
        fp.write_text(content.replace(old_text, new_text, 1))
        return f"Edited {path}"
    except Exception as e:
        return f"Error: {e}"


# ============================================================
# 第五步：工具注册 —— 把 todo 和其他工具放在一起
# ============================================================

# 工具处理器映射：工具名 -> 执行函数
# 注意 todo 和其他工具完全平等，没有任何特殊地位！
TOOL_HANDLERS = {
    "bash":       lambda **kw: run_bash(kw["command"]),
    "read_file":  lambda **kw: run_read(kw["path"], kw.get("limit")),
    "write_file": lambda **kw: run_write(kw["path"], kw["content"]),
    "edit_file":  lambda **kw: run_edit(kw["path"], kw["old_text"], kw["new_text"]),
    # todo 工具：调用 TODO.update()，传入完整的待办列表
    # 返回值就是渲染后的清单文本，AI 会看到进度
    "todo":       lambda **kw: TODO.update(kw["items"]),
}

# 工具定义列表 —— 告诉 AI 有哪些工具可用、参数格式是什么
TOOLS = [
    # 基础工具（和前几课一样）
    {"name": "bash", "description": "Run a shell command.",
     "input_schema": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}},
    {"name": "read_file", "description": "Read file contents.",
     "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "limit": {"type": "integer"}}, "required": ["path"]}},
    {"name": "write_file", "description": "Write content to file.",
     "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]}},
    {"name": "edit_file", "description": "Replace exact text in file.",
     "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "old_text": {"type": "string"}, "new_text": {"type": "string"}}, "required": ["path", "old_text", "new_text"]}},
    # ★ 新增的 todo 工具 ★
    # 注意它的参数是一个 items 数组，每项有 id、text、status
    {"name": "todo", "description": "Update task list. Track progress on multi-step tasks.",
     "input_schema": {"type": "object", "properties": {"items": {"type": "array", "items": {"type": "object", "properties": {"id": {"type": "string"}, "text": {"type": "string"}, "status": {"type": "string", "enum": ["pending", "in_progress", "completed"]}}, "required": ["id", "text", "status"]}}}, "required": ["items"]}},
]


# ============================================================
# 第六步：Agent Loop —— 核心循环，加入催促提醒机制
# ============================================================

def agent_loop(messages: list):
    """
    Agent 主循环。

    和前几课相比，最大的变化是：
    加了一个 rounds_since_todo 计数器，追踪 Agent 多久没更新 todo 了。
    如果连续 3 轮都没更新 → 注入一条催促提醒，提醒它更新进度。
    """
    # 关键变量：记录距上次更新 todo 已经过了几轮
    rounds_since_todo = 0

    while True:
        # 调用 Claude API
        response = client.messages.create(
            model=MODEL, system=SYSTEM, messages=messages,
            tools=TOOLS, max_tokens=8000,
        )
        # 把 AI 的回复加入对话历史
        messages.append({"role": "assistant", "content": response.content})

        # 如果 AI 不再需要调用工具 → 任务结束，退出循环
        if response.stop_reason != "tool_use":
            return

        # 处理 AI 调用的工具
        results = []
        used_todo = False  # 标记这一轮有没有用 todo 工具

        for block in response.content:
            if block.type == "tool_use":
                # 找到对应的处理函数并执行
                handler = TOOL_HANDLERS.get(block.name)
                try:
                    output = handler(**block.input) if handler else f"Unknown tool: {block.name}"
                except Exception as e:
                    output = f"Error: {e}"

                # 打印工具调用信息（方便你观察 Agent 在做什么）
                print(f"> {block.name}:")
                print(str(output)[:200])

                # 把工具结果拼成 tool_result 格式
                results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": str(output),
                })

                # 如果这一轮用了 todo 工具，记下来
                if block.name == "todo":
                    used_todo = True

        # ★ 催促提醒机制 ★
        # 如果这一轮用了 todo → 计数器归零
        # 如果没用 → 计数器 +1
        rounds_since_todo = 0 if used_todo else rounds_since_todo + 1

        # 如果连续 3 轮没更新 todo → 注入一条提醒消息
        # 这条消息会被 AI 看到，相当于有人拍了拍它的肩膀说"嘿，别忘了更新进度"
        if rounds_since_todo >= 3:
            results.append({
                "type": "text",
                "text": "<reminder>Update your todos.</reminder>",
            })

        # 把所有工具结果（+ 可能的催促提醒）作为 user 消息拼回对话
        messages.append({"role": "user", "content": results})


# ============================================================
# 第七步：交互式主循环
# ============================================================

if __name__ == "__main__":
    history = []  # 对话历史
    while True:
        try:
            query = input("\033[36ms03 >> \033[0m")  # 青色提示符
        except (EOFError, KeyboardInterrupt):
            break
        if query.strip().lower() in ("q", "exit", ""):
            break
        # 把用户输入加入对话历史
        history.append({"role": "user", "content": query})
        # 启动 Agent 循环
        agent_loop(history)
        # 打印 AI 的最终回答
        response_content = history[-1]["content"]
        if isinstance(response_content, list):
            for block in response_content:
                if hasattr(block, "text"):
                    print(block.text)
        print()

代码逐行拆解

模块一：TodoManager 类 —— 核心中的核心

这个类是本课唯一的新东西。它只有两个方法：

`update()` 方法 —— 写入待办清单

python
def update(self, items: list) -> str:

Agent 每次调用 todo 工具时，会传入完整的待办列表（不是增量更新，而是全量替换）。update() 做三件事：

验证数据

最多 20 个待办（if len(items) > 20）
每项必须有 text 内容
status 只能是 pending、in_progress、completed 三选一
同一时间最多 1 个 in_progress（这是最重要的约束！）

保存数据 → self.items = validated

返回渲染结果 → 调用 self.render() 返回可读的清单文本

为什么限制只能有 1 个 in_progress？因为这强制 Agent 专注做一件事。你不希望它同时标记了 5 个任务为"进行中"，然后东搞一下西搞一下。

`render()` 方法 —— 把清单变成可读文本

python
def render(self) -> str:

输出格式：

text
[ ] #1: 创建项目结构         ← pending：还没开始
[>] #2: 写计算器逻辑         ← in_progress：正在做
[x] #3: 写单元测试           ← completed：已完成

(1/3 completed)              ← 进度摘要

这个文本会作为工具的返回值发给 AI，AI 看到后就知道"噢，我正在做 #2，#1 已经搞定了"。

模块二：todo 作为普通工具注册

python
TOOL_HANDLERS = {
    "bash":       lambda **kw: run_bash(kw["command"]),
    "read_file":  lambda **kw: run_read(kw["path"], kw.get("limit")),
    ...
    "todo":       lambda **kw: TODO.update(kw["items"]),  # ← 和其他工具完全一样！
}

看到了吗？todo 就是 TOOL_HANDLERS 字典里的一个键值对，和 bash、read_file 完全平等。Agent 调用 todo 工具 → 我们调 TODO.update() → 返回渲染后的清单。就这么简单。

在 TOOLS 列表中，todo 工具的参数是一个 items 数组：

python
{"name": "todo", "description": "Update task list. Track progress on multi-step tasks.",
 "input_schema": {
     "type": "object",
     "properties": {
         "items": {
             "type": "array",
             "items": {
                 "type": "object",
                 "properties": {
                     "id":     {"type": "string"},
                     "text":   {"type": "string"},
                     "status": {"type": "string", "enum": ["pending", "in_progress", "completed"]}
                 },
                 "required": ["id", "text", "status"]
             }
         }
     },
     "required": ["items"]
 }}

AI 每次调用 todo 时，会传入类似这样的 JSON：

json
{
    "items": [
        {"id": "1", "text": "创建项目结构", "status": "completed"},
        {"id": "2", "text": "写计算器逻辑", "status": "in_progress"},
        {"id": "3", "text": "写单元测试",   "status": "pending"}
    ]
}

模块三：催促提醒机制（Nag Reminder）

这是 Agent Loop 里唯一的变化。核心逻辑就三行代码：

python
# 追踪多久没更新 todo
rounds_since_todo = 0 if used_todo else rounds_since_todo + 1

# 连续 3 轮没更新 → 催它一下
if rounds_since_todo >= 3:
    results.append({"type": "text", "text": "<reminder>Update your todos.</reminder>"})

为什么需要催促？ 因为 AI 可能做着做着就忘了更新待办清单，尤其是任务做到一半的时候。催促提醒就像一个小闹钟，每隔一段时间提醒它："嘿，你的清单该更新了。"

怎么催的？ 在工具结果后面额外塞一条文本消息。这条消息是 user 角色发出的（因为它在 results 里，会作为 user 消息发给 AI），AI 会当成一个善意的提醒来处理。

为什么是 3 轮？ 这是一个经验值。太小了（比如每轮都催）会很烦，太大了又起不到提醒作用。3 轮是一个合理的平衡点。

运行效果

示例：让 Agent 创建一个计算器项目

text
s03 >> 创建一个 Python 计算器项目，包含加减乘除功能和测试

> todo:
[ ] #1: 创建项目目录结构
[ ] #2: 实现计算器核心模块
[ ] #3: 编写单元测试
[ ] #4: 运行测试验证

(0/4 completed)

> todo:
[>] #1: 创建项目目录结构
[ ] #2: 实现计算器核心模块
[ ] #3: 编写单元测试
[ ] #4: 运行测试验证

(0/4 completed)

> bash:
mkdir -p calculator

> write_file:
Wrote 380 bytes

> todo:
[x] #1: 创建项目目录结构
[>] #2: 实现计算器核心模块
[ ] #3: 编写单元测试
[ ] #4: 运行测试验证

(1/4 completed)

> write_file:
Wrote 620 bytes

> todo:
[x] #1: 创建项目目录结构
[x] #2: 实现计算器核心模块
[>] #3: 编写单元测试
[ ] #4: 运行测试验证

(2/4 completed)

> write_file:
Wrote 890 bytes

> todo:
[x] #1: 创建项目目录结构
[x] #2: 实现计算器核心模块
[x] #3: 编写单元测试
[>] #4: 运行测试验证

(3/4 completed)

> bash:
....
----------------------------------------------------------------------
Ran 4 tests in 0.001s
OK

> todo:
[x] #1: 创建项目目录结构
[x] #2: 实现计算器核心模块
[x] #3: 编写单元测试
[x] #4: 运行测试验证

(4/4 completed)

计算器项目已创建完成！所有 4 个测试均通过。

注意观察 Agent 的行为模式：

先列清单（收到任务后第一件事就是调 todo）
做之前标记 in_progress（告诉你它要开始做哪个了）
做完了标记 completed，同时标记下一个为 in_progress（有条不紊）
全部完成后给你总结

催促提醒的效果

如果 Agent 连续 3 轮只用 bash / write_file 而不更新 todo，你会在日志中看到催促消息被注入。Agent 收到后通常会立刻调用 todo 工具更新进度。

关键收获

todo 就是个普通工具 —— 和 bash、read_file 没有本质区别。它的"特殊性"完全来自于 system prompt 里的指示和催促提醒机制，而不是代码层面的魔法。

约束出好结果 —— "同一时间只能有 1 个 in_progress" 这个约束看起来很简单，但它强制 Agent 专注，避免东搞西搞。好的工具设计不是给 Agent 最大的自由度，而是给它合理的约束。

催促提醒是一种软控制 —— 我们没有强制 Agent 必须更新 todo（那会打断它的工作流），而是用一个温和的提醒它。这种"软推一把"的模式在 Agent 设计中很常见。

全量更新而非增量 —— Agent 每次调 todo 都传入完整的列表，而不是"给 #2 改成 completed"。这样设计更简单、更不容易出错（不用担心增量操作的顺序问题）。

进度可观测性 —— 有了 todo，你（作为用户）和 Agent（作为执行者）都能清楚看到任务的整体进度。这在复杂任务中极其重要。

下一课预告

到目前为止，我们的 Agent 虽然能列计划了，但所有事情都是它一个人干的。如果任务特别复杂怎么办？

第4课：子Agent —— 让 Agent 学会"分配子任务"。就像一个经理可以把任务派给下属一样，父 Agent 可以生成子 Agent 来处理独立的子任务。子 Agent 有全新的上下文，做完了只汇报一个总结。这就是 AI Agent 的"分治法"。

上一课 02. 工具使用下一课 04. 子 Agent

第3课：TodoWrite —— 让 Agent 学会做计划

你将学到什么

核心概念：为什么 Agent 需要一个待办清单？

ASCII 流程图

和上一课的对比

完整代码

代码逐行拆解

模块一：TodoManager 类 —— 核心中的核心

update() 方法 —— 写入待办清单

render() 方法 —— 把清单变成可读文本

模块二：todo 作为普通工具注册

模块三：催促提醒机制（Nag Reminder）

运行效果

示例：让 Agent 创建一个计算器项目

催促提醒的效果

关键收获

下一课预告

`update()` 方法 —— 写入待办清单

`render()` 方法 —— 把清单变成可读文本