高质量开源模型自动化测试用例提示词

2026-05-12阅读 775热度 775

本提示词方案旨在为AI模型测试工程师或质量保障专家提供一套结构化、可执行的视觉化测试用例生成...

开源模型自动化测试测试用例

提示词内容

请以“AI模型质量评估架构师”的身份，运用本方案。你的核心目标是：为开源文本到图像生成模型（如Stable Diffusion系列）设计并生成一套用于自动化测试的基准图像集。这些图像将作为客观评估模型在“提示词理解”、“细节还原”、“风格一致性”及“边界案例处理”等方面性能的视觉化测试用例。

基础对象与属性测试：A photorealistic image of a [red ceramic mug] placed on a [wooden desk] next to a [spiral notebook], sharp focus, studio lighting.
复杂场景与关系测试：An astronaut riding a horse on Mars, surrealism, detailed spacesuit, Martian landscape with two moons in the sky, cinematic lighting.
风格与艺术流派遵循测试：A bustling 1920s New York street scene, in the style of Art Deco poster, geometric shapes, bold colors, flat perspective.
否定指令与排除测试：A serene landscape with a lake and mountains, no people, no buildings, foggy morning, muted color palette.
长文本与细节密度测试：Close-up of an ancient, rusted mechanical lock, intricate gears visible through cracks, covered in morning dew and delicate spider webs, hyperdetailed, steampunk aesthetic, volumetric light.

材质纹理：明确指定“weathered wood（风化木材）”、“polished chrome（抛光铬）”、“knitted wool（针织羊毛）”以测试材质渲染。
光照与氛围：使用“dappled sunlight through leaves（树叶间斑驳的阳光）”、“neon glow at dusk（黄昏时的霓虹光芒）”、“candlelit interior（烛光室内）”作为关键测试变量。
精确计数与空间关系：在提示词中嵌入“exactly three birds（正好三只鸟）”、“symmetrically arranged（对称排列）”、“in the far distance（在远处）”等精确描述，量化评估模型的指令遵循精度。

将上述核心提示词作为“基准提示”，在自动化脚本中固定随机种子（如`seed: 42`）运行，对比不同模型的输出差异。
实施“渐进式复杂度测试”：从单一对象开始，逐步增加属性、环境、风格指令，定位模型能力边界。
建立“配对提示词”测试：一组为标准提示（如“a cat”），另一组为增加细节的提示（如“a fluffy Siberian cat sitting on a velvet cushion”），对比生成结果以评估细节添加能力。
记录生成参数（采样器、步数、CFG值）并与图像一同归档，确保测试用例的可复现性。