iceCoder验收门控深度测评：实战对比与精选推荐

2026-06-13阅读 0热度 0

其他

数字化验收门控体系全面解析

这套验收门控机制是一套精密的多层把关系统——任何任务交付前，都必须通过预设的验收标准。系统围绕两大核心门控构建：面向多步骤验收流程的Acceptance Gate（典型场景如 benchmark 评估），以及针对代码变更后单元测试验证的Verification Gate。下面逐层深入拆解其实现原理与工程实践。

Acceptance Gate（验收命令门控）— 适用于多步骤验收流程，尤其在 benchmark 场景下发挥核心作用
Verification Gate（验证门控）— 专门处理代码变更后需执行的单元测试验证

一、Acceptance Gate（验收命令门控）

1.1 核心架构

核心实现位于 src/harness/task-acceptance-tracker.ts，主类是 TaskAcceptanceTracker。

激活条件

// 从 goal 解析验收命令
const parsed = parseAcceptanceCommandsFromGoal(goal);
// 仅当 ≥2 条命令且为长跑 benchmark/goal 时激活
this.active = parsed.length >= 2 && isLongRunningImplementationGoal(goal);

命令解析规则

多步骤验收链：从 goal 文本中自动识别链式命令，例如 npm ci → npm test → npm run build → npm run test:e2e。
归一化匹配：
- npx vitest / npx vitest run --reporter=verbose → 统一映射为 npm test
- npx playwright test / npx cypress run → 统一映射为 npm run test:e2e
- 自动去除 cd /d X &&、2>&1、管道重定向等干扰字符
- 保留带命名空间的命令，如 npm run test:e2e

状态流转

type AcceptanceCommandStatus = 'pending' | 'passed' | 'failed';interface AcceptanceCommandEntry {
  key: string;           // 归一化后的键
  label: string;         // 原始展示文本
  status: AcceptanceCommandStatus;
  lastRunAt?: number;
}

状态更新流程如下：

调用 recordRunCommand(rawCommand, success) 或 recordRunCommandToolResult(classifiedResult)
语义匹配第一条未通过或失败的验收项
返回 AcceptanceTransition（包含命令原文、前状态、新状态）
后台任务支持：

background_start / background_running → 保持 pending
background_completed(exitCode: 0) → 标记为 passed
background_failed / 退出码 ≠ 0 → 标记为 failed

完成判定

isComplete(): boolean {
  if (!this.isActive()) return true;
  return this.commands.every(c => c.status === 'passed');
}

关键特性：

所有命令必须全部 passed 才算完成
允许单条命令重复执行（例如重跑测试）
支持 checkpoint 快照恢复：TaskAcceptanceTracker.fromSnapshot(snapshot)

1.2 反馈注入机制

代码位于 src/harness/harness-tool-round.ts，核心函数 buildAcceptanceSuccessFeedbackMessage。

反馈格式

// 单条通过（未完成）
[System / Acceptance ] npm test — 8 files / 22 tests passed (1/4 passed)// 全部通过（完成信号）
[System / Acceptance ] npm run test:e2e — 5 e2e tests passed in 4.4s (4/4 passed)
[System / Acceptance ] All 4 acceptance commands passed.
Output ≤10 delivery bullets now and STOP calling tools.

规则

单条通过：仅显示进度（如 + 1/4），不注入停止信号
全部通过：附加 "All N acceptance commands passed" 及停止指令
命令标签截断：超过 80 字符时追加 ...

二、Verification Gate（验证门控）

2.1 核心架构

代码位于 src/harness/task-state.ts 和 src/harness/document-deliverable.ts。

验证状态机

type VerificationStatus = 'not_required' | 'required' | 'passed' | 'failed';
type TaskPhase = 'intent' | 'context' | 'editing' | 'verification';

状态流转简洁明了：

无变更           → verificationStatus = 'not_required'
写工程源码       → verificationStatus = 'required', phase = 'editing' → 'verification'
跑单元测试成功   → verificationStatus = 'passed'
跑单元测试失败   → verificationStatus = 'failed'
Acceptance 全通过 → markVerificationPassed() → 'passed'

交付物分类

type DeliverableKind = 'engineering' | 'file_deliverable' | 'none';// 工程源码（需要单元测试的）
ENGINEERING_EXTENSIONS = ['ts', 'tsx', 'js', 'jsx', 'vue', 'py', 'go', ...];// 文件交付物（需要 file_info/read_file 确认的）
其余扩展名（json, yaml, md, sql, ...）

写后读 Gate

// 写操作递增版本号
bumpFileDeliverableWriteVersion(path): void;// 读/file_info 确认版本匹配
tryConfirmFileDeliverable(toolName, path, result): void;// 未确认路径的统计
verificationConfirmationStats(filesChanged, writeVersions, confirmVersions): {
  required: number;  // 需要确认的总数
  pending: number;   // 待确认的数量
  exempt: number;    // 豁免的数量（临时文件或草稿）
}

豁免路径规则：

isGenericTempPath: 以 .tmp/.bak 结尾，或工作区相对路径下的 tmp//temp//cache/
isDotPrefixedDirPath: 父目录以 . 开头（如 .scratch/out.md）
isEphemeralScriptPath: 临时脚本如 check-*.ps1, cleanup.ps1, verify-*.sh
isProjectCustomExemptPath: 由 config.json 或 .icecoder.json 中 verificationExemptDirs 指定的路径

2.2 单元测试 Gate

代码位于 src/harness/document-deliverable.ts 和 src/harness/verification-digest.ts。

判定逻辑

shouldPromptEngineeringUnitTest(filesChanged, verificationStatus): boolean {
  if (!hasEngineeringTestTargets(filesChanged)) return false;
  return verificationStatus === 'required';  // 尚未执行单测
}shouldInjectFailedUnitTestReminder(filesChanged, verificationStatus): boolean {
  if (!hasEngineeringTestTargets(filesChanged)) return false;
  return verificationStatus === 'failed';  // 已执行但失败
}

工程源码路径识别

engineeringTestTargetPaths(filesChanged): string[] {
  return filesChanged.filter(
    path => isEngineeringUnitTestTargetPath(path) && !isVerificationExemptPath(path)
  );
}

提示注入

成功提示（尚未执行单测时）：

[System] You changed source code but ha ve not run unit tests yet.Run unit tests covered these changed files (pick the command for this project):
- src/foo.ts
- src/bar.tsUse run_command, then fix failures before claiming the task is complete.

失败加强提示：

[System] Unit tests failed for your recent changes.Please complete unit tests: fix the failures, re-run tests via run_command, and only then finish.Changed source files:
- src/foo.ts
- src/bar.ts

2.3 Verification Gate 计数器

代码位于 src/harness/harness-verification-gate.ts。

计数器重置规则

shouldResetVerificationGateCounter(
  pendingBefore, pendingAfter, blockingAfter,
  acceptancePendingBefore, acceptancePendingAfter
): boolean {
  if (!blockingAfter) return true;              // blocking 已解除
  if (pendingAfter < pendingBefore) return true; // file pending 净减少
  if (acceptancePendingAfter < acceptancePendingBefore) return true; // acceptance 净减少
  return false;
}

计数器的核心用途：防止 LLM 在验证工作未完成时提前终止。计数器累积到阈值后强制执行 block，确保流程完整。

三、门控集成流程

3.1 Harness 工具轮循环

代码位于 src/harness/harness-tool-round.ts。

Acceptance Gate 集成点

// 对 run_command 的结果进行分类
const classified = classifyRunCommandResult(args, rawOutput, result.success);// 更新 Acceptance Tracker
tracker.recordRunCommandToolResult(classified);// 生成反馈消息
const feedback = buildAcceptanceSuccessFeedbackMessage({
  newlyPassed: [...],
  completedAll: tracker.isComplete(),
  passedCount: tracker.getPassedCount(),
  totalCount: tracker.commands.length
});// 注入到 LLM 上下文
if (feedback) msgs.push({ role: 'system', content: feedback });

Verification Gate 集成点

// 记录工具结果
taskState.recordToolResult(toolCall, result);// 同步 Acceptance Gate 的状态
syncTaskVerificationFromAcceptance(taskState, tracker);// 检查是否阻塞
const acceptanceIncomplete = tracker.hasPendingAcceptanceWork();
const isBlocking = taskState.isVerificationBlockingFinal(acceptanceIncomplete);// 生成 prompt
if (isBlocking) {
  const prompt = taskState.buildVerificationPrompt();
  msgs.push({ role: 'system', content: prompt });
}

3.2 完成判定

代码位于 src/harness/incomplete-completion.ts。

hasPendingWork(task, acceptance, workspaceRoot): boolean {
  if (hasPendingAcceptanceWork(acceptance)) return true;
  if (hasUnfulfilledFileDeliverableGoal(task.goal, task.filesChanged, task.intent)) return true;
  return false;
}

若任务未完成，会注入如下提示：

buildIncompleteContinuationPrompt(task, repo, acceptance): string {
  const lines = [
    '[System] The task is NOT complete. Do not stop without calling tools.',
    '',
    'Evidence:'
  ];  if (hasPendingAcceptanceWork(acceptance)) {
    lines.push(acceptance.buildAcceptancePrompt());
  }
  if (task.verificationStatus === 'failed') {
    lines.push('- Unit tests failed; fix and re-run before stopping.');
  }
  if (shouldPromptEngineeringUnitTest(...)) {
    lines.push('- Source code changed but unit tests ha ve not passed yet.');
  }  return lines.join('n');
}

四、执行模式与门控协同

4.1 Execution Mode（执行模式）

代码位于 src/harness/supervisor/mode-decision-engine.ts。

模式信号

type ModeSignal = 'task_graph_active' | 'pending_steps' | 'multi_write' 
                | 'branch_switched' | 'checkpoint_resumed' | 'tool_failure' 
                | 'large_diff' | 'explicit_impl';// 进入 forced 模式的判断
shouldEnterForcedMode(state, config, signals): ModeSignal[] {
  if (state.pendingStepCount >= config.pendingStepsEnterThreshold) reasons.push('pending_steps');
  if (state.writeTargetsThisRound > config.writeTargetsEnterThreshold) reasons.push('multi_write');
  if (!state.lastToolSuccess) reasons.push('tool_failure');
  // ...
}

门控协同

free 模式：LLM 自主决定工具调用
forced 模式：门控注入强提示，LLM 需优先处理门控任务
另外还有 ToolGate：DefaultToolGate.decide() 在 forced 模式下可 block 特定工具

五、测试覆盖

5.1 Acceptance Gate 测试

测试文件：test/harness/task-acceptance-tracker.test.ts

核心用例：

四命令验收链解析（benchmark goal）
激活条件（≥2 命令 + 长跑 goal）
归一化匹配（如 cd /d X && npm run build 归一化为 npm run build）
Playwright/Cypress 归一化到 npm run test:e2e
后台任务状态流转（start → running → completed/failed）
快照恢复 roundtrip
hasPendingWork 集成验证

5.2 Verification Gate 测试

测试文件：test/harness/harness-verification-gate.test.ts

核心用例：

计数器重置条件（blocking 解除 / pending 减少 / acceptance 减少）
计数器保持（无进展时不重置）

5.3 执行模式测试

测试文件：test/harness/execution-mode-acceptance.test.ts

核心用例：

L0 只读计划保持 free 模式
多写文件 / 工具失败进入 forced 模式
信号优先级排序

六、关键设计原则

6.1 分层门控

Acceptance Gate：顶层多步骤验收（benchmark / 复杂任务）
Verification Gate：代码变更后的单测验证
File Deliverable Gate：写后读确认（非工程文件）

6.2 渐进式反馈

单条验收通过：轻提示（仅显示进度）
全部通过：触发停止信号
单测失败：加强提示（不硬 block，允许解释失败原因）

6.3 容错与恢复

允许命令重跑（多次 npm test 覆盖同一验收项）
后台任务支持（run_command 后台启动 + action:check 轮询）
Checkpoint 快照恢复（TaskAcceptanceTracker.fromSnapshot）

6.4 语义匹配

归一化命令键（剥离噪音、统一变体）
模糊匹配（cd /d X && npm run build 匹配 npm run build）
命令优先于 label（后台任务 check 响应使用真实 command 字段）

七、配置参数

7.1 Execution Mode 参数（supervisor-config.json）

{
  "executionMode": {
    "pendingStepsEnterThreshold": 2,
    "writeTargetsEnterThreshold": 1,
    "diffLinesEnterThreshold": 200,
    "stableRoundsExitThreshold": 2,
    "modeLockRounds": 2,
    "forcedMinDwellRounds": 1,
    "readonlyToolNames": ["read_file", "glob", "grep", "list_dir"]
  }
}

7.2 验证豁免路径（config.json / .icecoder.json）

{
  "verificationExemptDirs": [
    ".scratch",
    ".temp",
    "tmp/"
  ]
}

八、典型场景

8.1 Benchmark 四命令验收链

Goal 示例如下：

从零实现 survivors roguelike。
只有 **`npm ci` → `npm test` → `npm run build` → `npm run test:e2e` 全部成功** 后，才输出交付 bullet 并结束

实际流程：

解析出 4 条验收命令，激活 Acceptance Gate
依次执行 npm ci → npm test → npm run build → npm run test:e2e
每条通过后注入进度反馈（1/4, 2/4, 3/4）
第 4 条通过后注入 "All 4 acceptance commands passed" + 停止信号
hasPendingWork() 返回 false，允许任务结束

8.2 工程源码变更

场景：修改 src/foo.ts 后尚未执行单元测试

流程：

write_file('src/foo.ts') → verificationStatus = 'required'
下一轮 isVerificationBlockingFinal() 返回 true
注入 buildVerificationPrompt() 提示执行单测
用户执行 npm test → verificationStatus 变为 'passed' 或 'failed'
如果失败，注入 buildFailedUnitTestReminderPrompt() 加强提示

8.3 后台长时间测试

场景：npm run test:e2e 需要运行 5 分钟

流程：

run_command('npm run test:e2e 2>&1') → 返回 background_start
recordRunCommandToolResult(background_start) → 保持 pending
轮询 action:check → background_running
最终 action:check → background_completed(exitCode: 0)
recordRunCommandToolResult(background_completed) → passed

九、扩展点

9.1 新增验收命令类型

在 normalizeAcceptanceCommandKey() 中添加新的归一化规则
在 isHarnessVerificationCommand() 中添加对应的命令匹配

9.2 自定义门控策略

实现 ToolGate 接口，自定义 decide() 逻辑
扩展 ExecutionModeConfig 参数，调整各阈值

9.3 多语言支持

buildAcceptanceSuccessFeedbackMessage() 的文案可做国际化
buildIncompleteContinuationPrompt() 使用多语言模板

生成时间: 2026-06-12 分析范围: 验收门控机制（Acceptance Gate + Verification Gate）

数字化验收门控体系全面解析

一、Acceptance Gate（验收命令门控）

1.1 核心架构

激活条件

命令解析规则

状态流转

完成判定

1.2 反馈注入机制

反馈格式

规则

二、Verification Gate（验证门控）

2.1 核心架构

验证状态机

交付物分类

写后读 Gate

2.2 单元测试 Gate

判定逻辑

工程源码路径识别

提示注入

2.3 Verification Gate 计数器

计数器重置规则

三、门控集成流程

3.1 Harness 工具轮循环

Acceptance Gate 集成点

Verification Gate 集成点

3.2 完成判定

四、执行模式与门控协同

4.1 Execution Mode（执行模式）

模式信号

门控协同

五、测试覆盖

5.1 Acceptance Gate 测试

5.2 Verification Gate 测试

5.3 执行模式测试

六、关键设计原则

6.1 分层门控

6.2 渐进式反馈

6.3 容错与恢复

6.4 语义匹配

七、配置参数

7.1 Execution Mode 参数（supervisor-config.json）

7.2 验证豁免路径（config.json / .icecoder.json）

八、典型场景

8.1 Benchmark 四命令验收链

8.2 工程源码变更

8.3 后台长时间测试

九、扩展点

9.1 新增验收命令类型

9.2 自定义门控策略

9.3 多语言支持

相关阅读

最新教程

最新资讯