BoxAgnts Agent多轮对话与工具技能调用深度评测

2026-05-31阅读 0热度 0

skill

如果你只和 ChatGPT 聊过天，你可能会觉得 AI Agent 就是"把 prompt 发给 API，把回复显示出来"。

真实情况要复杂得多。下面是 BoxAgnts 中一个完整的 Agent 交互流程：


用户输入："帮我读一下 config.toml，把 port 改成 9090"  
1. 用户消息加入对话历史
2. 构建 system prompt（工具列表 + 技能列表 + AGENTS.md + Agent 角色定义）
3. 调用 LLM API → 流式接收响应
4. AI 决定调用工具：tool_use("read", {path: "config.toml"})
5. 执行 read 工具（WASM 沙箱内）
6. 工具结果注入对话历史
7. 再次调用 API → AI 分析配置
8. AI 决定调用工具：tool_use("edit", {path: "config.toml", old: "port = 8080", new: "port = 9090"})
9. 执行 edit 工具
10. 工具结果注入对话
11. 再次调用 API → AI 回复："已将端口从 8080 改为 9090"
12. end_turn → 对话结束

这个过程涉及 3 次 API 调用、2 次工具执行、流式推送、上下文管理。本文拆解每个环节的设计和实现。

Agent 定义：给 Agent 一个"身份"

在开始推理循环之前，需要先定义 Agent 的"角色"。BoxAgnts 内置了三个预置 Agent：


// boxagnts-workspace/src/config.rs
pub struct AgentDefinition {
    pub description: Option,    // 描述
    pub model: Option,          // 模型覆盖
    pub temperature: Option,       // 温度覆盖
    pub prompt: Option,         // 系统提示前缀
    pub access: String,                 // 权限：full / read-only / search-only
    pub visible: bool,                  // 是否在 @agent 自动补全中显示
    pub max_turns: Option,         // 最大轮次覆盖
    pub color: Option,          // 终端显示颜色
}

预置的三个 Agent 角色：

Agent	权限	prompt 特征	适用场景
build	full	"You are the build agent. Focus on implementing..."	编码、修改文件
plan	read-only	"You are the plan agent. You can read files and analyze..."	代码分析、架构设计
explore	search-only	"Fast search-only agent for code exploration"	快速搜索、文件定位

Agent prompt 如何注入

Agent 定义中的 prompt 字段会在查询循环启动时被注入到 system prompt 的最前面：


// boxagnts-query/src/query.rs
if let Some(ref agent) = config.agent_definition {
    if let Some(ref agent_prompt) = agent.prompt {
        patched.system_prompt = Some(match &config.system_prompt {
            Some(existing) => format!("{}\n\n{}", agent_prompt, existing),
            None => agent_prompt.clone(),
        });
    }
}

同时，Agent 可以覆盖模型和最大轮次：


let effective_model = if let Some(ref agent) = config.agent_definition {
    agent.model.clone().unwrap_or_else(|| config.model.clone())
} else {
    config.model.clone()
};
let effective_max_turns = config.agent_definition
    .as_ref()
    .and_then(|a| a.max_turns)
    .unwrap_or(config.max_turns);

这意味着用户可以通过 Agent 定义实现"同一个会话中不同阶段使用不同模型和角色"——比如规划阶段用 read-only 的慢思考模型，执行阶段用 full-access 的快速模型。

run_query_loop：Agent 的心脏

run_query_loop() 是 BoxAgnts 中最核心的函数，位于 boxagnts-query crate 中：


pub async fn run_query_loop(
    client: &AnthropicClient,           // API 客户端
    messages: &mut Vec,       // 对话历史（可变引用）
    tools: &[Box],            // 工具集合
    tool_ctx: &ToolContext,             // 工具执行上下文
    config: &QueryConfig,               // 循环配置
    cost_tracker: Arc,     // 成本追踪
    event_tx: Option>, // 事件推送
    cancel_token: CancellationToken,    // 取消信号
    pending_messages: Option<&mut Vec>, // 待处理消息队列
) -> QueryOutcome

这个函数签名本身就是一篇架构文档。每个参数都是一个设计决策：

参数	设计意图
`client`	单一入口，但内部通过 ProviderRegistry 可切换 20+ 模型
`messages: &mut Vec`	直接修改对话历史，每次迭代追加内容
`tools: &[Box]`	类型擦除的工具集合，AI 通过名称调用
`tool_ctx`	携带 work_dir、allowed_hosts 等沙箱配置
`event_tx`	实时推送每轮状态给 Dashboard / TUI
`cancel_token`	用户可随时中断循环
`pending_messages`	执行中插入命令（如用户在工具执行时发送新消息）

主循环的五步节拍


┌─────────────────────────────────────────────┐
│                     loop {                   │
│                                                │
│ ① 检查终止条件                                │
│    · turn > max_turns ? → EndTurn            │
│    · cancel_token ?     → Cancelled           │
│    · budget exceeded?   → BudgetExceeded      │
│                                                │
│ ② 预处理消息                                  │
│    · drain pending_messages queue             │
│    · apply_tool_result_budget (截断旧结果)    │
│    · auto_compact (上下文压缩)                │
│                                                │
│ ③ 构建 system prompt + 调用 LLM API          │
│    · 注入 Agent 定义 / AGENTS.md              │
│    · 构建 CreateMessageRequest                │
│    · 流式接收 StreamEvent                     │
│    · 累积 text / thinking / tool_use blocks   │
│                                                │
│ ④ 处理响应                                    │
│    · end_turn → 返回                          │
│    · tool_use → 并行执行工具 → 结果注入 → 继续│
│    · max_tokens → 恢复对话 → 继续             │
│                                                │
│ ⑤ 错误恢复                                    │
│    · overloaded → switch fallback model        │
│    · stream stall → retry (最多 2 次)          │
│                                                │
│ }                                              │
└─────────────────────────────────────────────┘

System Prompt 构建：Agent 的"世界观"

在每一轮 API 调用前，BoxAgnts 都会构建完整的 system prompt：


fn build_system_prompt(config: &QueryConfig) -> SystemPrompt {
    let opts = SystemPromptOptions {
        custom_system_prompt: config.system_prompt.clone(),    // 用户自定义
        append_system_prompt: config.append_system_prompt.clone(), // 追加内容
        output_style: config.output_style,                     // 输出风格
        custom_output_style_prompt: config.output_style_prompt.clone(),
        working_directory: config.working_directory.clone(),   // 当前工作目录
        ..Default::default()
    };
    let text = boxagnts_core::system_prompt::build_system_prompt(&opts);
    SystemPrompt::Text(text)
}

System prompt 的结构是有层次的：


┌──────────────────────────────────────┐
│ Agent 角色定义 (build/plan/explore)  │ ← AgentDefinition.prompt
├──────────────────────────────────────┤
│ 核心能力声明                          │
│ · 可用工具列表 (16+ 个)               │ ← 由 tools 参数动态生成
│ · 技能列表                            │ ← 由 SkillTool 发现
│ · 输出格式要求                        │
│ · 安全边界                            │
├──────────────────────────────────────┤
│ AGENTS.md 内容                        │ ← 用户项目级行为规范
├──────────────────────────────────────┤
│ 动态边界标记                          │
│ --- 以上缓存，以下不缓存 ---          │
├──────────────────────────────────────┤
│ 会话特定信息                          │ ← 当前工作目录、时间等
└──────────────────────────────────────┘

--- 以上缓存，以下不缓存 --- 这个分割线是一个聪明的设计——Anthropic API 支持 prompt caching，缓存以上部分可以显著降低每次 API 调用的 token 成本。

max_tokens 恢复：Agent 的"断点续传"

当 AI 的回复达到 max_tokens 限制时，模型会中途切断输出。普通 API 调用到这里就结束了——但 Agent 不能停。

BoxAgnts 的解法很巧妙：


// boxagnts-query/src/query.rs
const MAX_TOKENS_RECOVERY_LIMIT: u32 = 3;

const MAX_TOKENS_RECOVERY_MSG: &str =
    "Output token limit hit. Resume directly — no apology, no recap of what 
    you were doing. Pick up mid-thought if that is where the cut happened. 
    Break remaining work into smaller pieces.";

当检测到 stop_reason == "max_tokens" 时：

将部分回复作为 assistant 消息加入对话
追加一条特殊的 user 消息（MAX_TOKENS_RECOVERY_MSG）
继续循环——模型会从中断处继续生成

提示词里的细节值得注意——"no apology, no recap"——因为 LLM 被中断后的本能反应是"抱歉，我刚才被打断了，让我重新开始..."。这会导致无用输出。这条提示词直接禁止了这种模式。

auto_compact：当上下文太长时

LLM 的上下文窗口是有限的。当对话越来越长，工具结果越积越多，总有塞不下的时刻。

BoxAgnts 的响应是自动压缩。触发条件是当 token 估算达到上下文窗口的 90% 时：


// boxagnts-query/src/compact.rs
const AUTOCOMPACT_TRIGGER_FRACTION: f64 = 0.90;
const WARNING_PCT: f64 = 0.80;   // 80% 时警告
const CRITICAL_PCT: f64 = 0.95;  // 95% 时严重警告

压缩策略的核心是调用另一个 LLM 来"总结"对话历史：


原始对话（可能几千条消息）
     │
     ▼
压缩 Prompt（NO_TOOLS_PREAMBLE → 强制总结模式）
     │
     ▼
LLM 生成结构化摘要：
    · Primary Request and Intent    （用户原始请求）
    · Key Technical Concepts        （关键技术概念）
    · Files and Code Sections       （涉及的文件和代码段）
    · Errors and fixes              （遇到的错误和修复）
    · Pending Tasks                 （待完成任务）
    · Current Work                  （当前进度）
     │
     ▼
摘要替换早期对话历史，最近 10 条消息保留原文

压缩 prompt 中有一个关键设计——NO_TOOLS_PREAMBLE：


CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.
- Do NOT use Read, Bash, Grep, Glob, Edit, Write, or ANY other tool.
- You already ha ve all the context you need in the conversation above.
- Tool calls will be REJECTED and will waste your only turn.

如果压缩的 LLM 尝试调用工具，整个压缩就白费了。这个 preamble 防止了这种元递归。

Tool 执行：从 AI 决定到运行结果

当 LLM 返回 stop_reason == "tool_use" 时，对话进入工具执行阶段：


┌──────────────────────────────────────────────┐
│  Phase 1: 顺序执行 PreToolUse 预处理          │
│ （每个 tool block 顺序处理，可中断执行）      │
├──────────────────────────────────────────────┤
│  Phase 2: 并行执行非阻塞工具                  │
│  join_all(futures) → 所有工具并发运行          │
│ （阻塞的工具返回预计算的错误结果）            │
└──────────────────────────────────────────────┘

关键设计点：工具结果以 user 消息格式注入。这利用了 LLM 的消息角色语义——Assistant 发起了工具调用，User（即系统代用户）返回了工具结果。模型将此理解为"用户回答了你的请求"，自然地进行下一轮推理。

execute_tool：工具分发的核心


// boxagnts-query/src/lib.rs
async fn execute_tool(
    name: &str,
    input: &Value,
    tools: &[Box],
    ctx: &ToolContext,
) -> ToolResult {
    let tool = tools.iter().find(|t| t.name() == name);

    match tool {
        Some(tool) => {
            debug!(tool = name, "Executing tool");
            tool.execute(input.clone(), ctx).await
        }
        None => {
            warn!(tool = name, "Unknown tool requested");
            ToolResult::error(format!("Unknown tool: {}", name))
        }
    }
}

极其简单的实现——一个线性查找。tools 向量通常只有十几个元素，线性查找的开销可以忽略。简洁比复杂更可靠。

托管 Agent 模式：Manager-Executor 架构

当任务复杂度超出单个 Agent 的能力范围时，BoxAgnts 提供了托管 Agent 模式：


                 ┌──────────────────┐
                 │   Manager Agent  │
                 │  (Opus 等强模型) │
                 │  只做规划和分配   │
                 └────────┬─────────┘
                          │
             ┌────────────┼──────────────┐
             ▼            ▼              ▼
        ┌──────────┐ ┌──────────┐ ┌──────────┐
        │ Executor │ │ Executor │ │ Executor │
        │ (Sonnet) │ │ (Sonnet) │ │ (Sonnet) │
        │ 子任务 A │ │ 子任务 B │ │ 子任务 C │
        └──────────┘ └──────────┘ └──────────┘
           并行执行，各自有独立上下文

Manager 的 system prompt 被注入托管模式指令：


pub fn managed_agent_system_prompt(config: &ManagedAgentConfig) -> String {
    format!(r#"
## Managed Agent Mode
You are the MANAGER in a manager-executor architecture.
### Your Role
- You coordinate work but do NOT execute tasks directly.
- Delegate all implementation work to executor agents.
- Each executor uses model `{executor_model}` with up to {max_turns} turns.
- You may run up to {max_concurrent} executors in parallel.

### Workflow
1. Analyze the user's request and break into sub-tasks.
2. Spawn executors using the Agent tool.
3. Review results. If insufficient, spawn follow-up executors.
4. Synthesize all results into a coherent response."#, ...)
}

Manager 自己不执行工具——它只做规划、分配和结果合成。Executor 是普通的 Agent 实例，拥有完整的工具集。这个模式将"思考"和"执行"分离，既避免了单 Agent 的上下文膨胀，又实现了真正的并行处理。

Skill 系统：让 Agent 学会"专业技能"

Tool 是 Agent 的"手"——读文件、写文件、执行命令。Skill 是 Agent 的"专业知识"——代码审查方法论、CSS 重构指南、前端组件模板。

Skill 的文件格式

一个 Skill 就是一个 SKILL.md 文件：


app/extensions/skills/
├── code-review/SKILL.md
├── css-refactor-advisor/SKILL.md
├── current-weather/SKILL.md
├── weather-forecast/SKILL.md
└── front-component-generator/SKILL.md

SkillTool 的实现


pub struct SkillTool;

#[async_trait]
impl Tool for SkillTool {
    fn name(&self) -> &str { "skill-tool" }

    async fn execute(&self, input: Value, ctx: &ToolContext) -> ToolResult {
        let params: SkillInput = serde_json::from_value(input)?;

        // "skill": "list" → 列出所有可用技能
        if params.skill == "list" {
            return list_skills(&dirs).await;
        }

        // 查找并读取 SKILL.md
        let (skill_path, raw) = find_and_read_skill(&skill_name, &dirs).await?;

        // 去除 YAML frontmatter
        let content = strip_frontmatter(&raw);

        // 替换 $ARGUMENTS 占位符
        let prompt = if let Some(args) = ¶ms.args {
            content.replace("$ARGUMENTS", args)
        } else {
            content.replace("$ARGUMENTS", "")
        };

        ToolResult::success(prompt)
    }
}

Skill 的双层搜索路径

Skill 的搜索优先工作空间目录，然后才是应用扩展目录：


async fn skill_search_dirs(ctx: &ToolContext) -> Vec {
    let mut dirs = vec![
        ctx.get_workspace_extensions_dir().await.join("skills") // 项目级
    ];
    dirs.push(ctx.get_app_extensions_dir().await.join("skills")); // 全局级
    dirs
}

这意味着你可以在项目目录下定义项目专用的 Skill（如"理解这个项目的 build system"），同时使用全局 Skill（如"通用的代码审查标准"）。项目级 Skill 优先于全局 Skill。

$ARGUMENTS 占位符

Skill 模板中最关键的机制是 $ARGUMENTS：


# 代码审查 Skill 模板

请审查：$ARGUMENTS

检查要点：
1. 函数是否过长（>50 行）
2. 是否有未处理的 Result/Option
3. 是否有不必要的 .clone()
4. 命名是否符合 Rust 惯例

AI 调用时传入 args: "src/main.rs"，$ARGUMENTS 就被替换为 src/main.rs。这让 Skill 从"静态知识"变成了"参数化工具"。

流式推送：让用户看到 Agent 在"思考"

整个查询循环通过 event_tx 通道实时推送状态：


pub enum QueryEvent {
    Token { text: String },                       // 逐 token 推送
    ToolStart { tool_name, tool_id, input },      // 工具开始
    ToolEnd { tool_name, tool_id, result },       // 工具结束
    Status(String),                               // 状态消息
}

这些事件通过 WebSocket 实时推送到 Dashboard 前端，用户可以看到 Agent 的每一步决策——不是面对一个黑箱。

小结

AI Agent 的多轮对话是一个复杂的控制系统：


System Prompt → API 调用 → 流式解析 → 工具检测 → 工具执行 → 结果注入 → 再次调用
   ↑                                                                         │
   └───────────────────────── 循环直到 end_turn ─────────────────────────────┘

这个循环的鲁棒性取决于：

机制	解决的问题
Agent 定义系统	多角色、多模型切换
System prompt 构建	Agent 世界观 + prompt caching
max_tokens 恢复	长输出被截断
auto_compact（结构化摘要）	上下文超窗口
tool_result_budget	工具结果堆积
fallback_model	主模型过载
托管 Agent 模式	超复杂任务分解
Skill 系统	专业知识参数化注入
并行工具执行	多步操作加速

每一个机制都对应一个真实的生产问题。把它们做对，Agent 才能从"能跑"变成"可靠"。