使用 Mocha/Chai 测试提示

promptfoo 可以与测试框架（如 Mocha）和断言库（如 Chai）集成，以便在现有的测试和 CI 工作流程中评估提示。

本指南包含示例，展示了如何使用语义相似性和 LLM 评分创建 Mocha 测试用例以实现所需的提示质量。

有关支持的检查的更多信息，请参阅预期输出文档。

前提条件

在开始之前，请确保已安装以下 node 包：

mocha: npm install --save-dev mocha
chai: npm install --save-dev chai
promptfoo: npm install --save-dev promptfoo

创建自定义 chai 断言

首先，我们将创建自定义 chai 断言：

toMatchSemanticSimilarity: 比较两个字符串的语义相似性。
toPassLLMRubric: 检查字符串是否符合指定的 LLM Rubric 标准。
toMatchFactuality: 检查字符串是否符合指定的真实性标准。
toMatchClosedQA: 检查字符串是否符合指定的问答标准。

创建一个名为 assertions.js 的新文件并添加以下内容：

Javascript
Typescript

import { Assertion } from 'chai';
import { assertions } from 'promptfoo';

const { matchesSimilarity, matchesLlmRubric } = assertions;

Assertion.addAsyncMethod('toMatchSemanticSimilarity', async function (expected, threshold = 0.8) {
  const received = this._obj;
  const result = await matchesSimilarity(received, expected, threshold);
  const pass = received === expected || result.pass;

  this.assert(
    pass,
    `expected #{this} to match semantic similarity with #{exp}, but it did not. Reason: ${result.reason}`,
    `expected #{this} not to match semantic similarity with #{exp}`,
    expected,
  );
});

Assertion.addAsyncMethod('toPassLLMRubric', async function (expected, gradingConfig) {
  const received = this._obj;
  const gradingResult = await matchesLlmRubric(expected, received, gradingConfig);

  this.assert(
    gradingResult.pass,
    `expected #{this} to pass LLM Rubric with #{exp}, but it did not. Reason: ${gradingResult.reason}`,
    `expected #{this} not to pass LLM Rubric with #{exp}`,
    expected,
  );
});

Assertion.addAsyncMethod('toMatchFactuality', async function (input, expected, gradingConfig) {
  const received = this._obj;
  const gradingResult = await matchesFactuality(input, expected, received, gradingConfig);

  this.assert(
    gradingResult.pass,
    `expected #{this} to match factuality with #{exp}, but it did not. Reason: ${gradingResult.reason}`,
    `expected #{this} not to match factuality with #{exp}`,
    expected,
  );
});

Assertion.addAsyncMethod('toMatchClosedQA', async function (input, expected, gradingConfig) {
  const received = this._obj;
  const gradingResult = await matchesClosedQa(input, expected, received, gradingConfig);

  this.assert(
    gradingResult.pass,
    `expected #{this} to match ClosedQA with #{exp}, but it did not. Reason: ${gradingResult.reason}`,
    `expected #{this} not to match ClosedQA with #{exp}`,
    expected,
  );
});

import { Assertion } from 'chai';
import { assertions } from 'promptfoo';
import type { GradingConfig } from 'promptfoo';

const { matchesSimilarity, matchesLlmRubric } = assertions;

Assertion.addAsyncMethod(
  'toMatchSemanticSimilarity',
  async function (this: Assertion, expected: string, threshold: number = 0.8) {
    const received = this._obj;
    const result = await matchesSimilarity(received, expected, threshold);
    const pass = received === expected || result.pass;

    this.assert(
      pass,
      `expected #{this} to match semantic similarity with #{exp}, but it did not. Reason: ${result.reason}`,
      `expected #{this} not to match semantic similarity with #{exp}`,
      expected,
    );
  },
);

Assertion.addAsyncMethod(
  'toPassLLMRubric',
  async function (this: Assertion, expected: string, gradingConfig: GradingConfig) {
    const received = this._obj;
    const gradingResult = await matchesLlmRubric(expected, received, gradingConfig);

    this.assert(
      gradingResult.pass,
      `expected #{this} to pass LLM Rubric with #{exp}, but it did not. Reason: ${gradingResult.reason}`,
      `expected #{this} not to pass LLM Rubric with #{exp}`,
      expected,
    );
  },
);

编写测试

我们的测试代码将使用自定义的 chai 断言来运行几个测试用例。

创建一个名为 index.test.js 的新文件并添加以下代码：

import { expect } from 'chai';
import './assertions';

const gradingConfig = {
  provider: 'openai:chat:gpt-4o-mini',
};

describe('语义相似性测试', () => {
  it('当字符串语义相似时应通过', async () => {
    await expect('敏捷的棕色狐狸').toMatchSemanticSimilarity('快速的棕色狐狸');
  });

  it('当字符串语义不相似时应失败', async () => {
    await expect('敏捷的棕色狐狸').not.toMatchSemanticSimilarity('今天天气很好');
  });

  it('当字符串语义相似且使用自定义阈值时应通过', async () => {
    await expect('敏捷的棕色狐狸').toMatchSemanticSimilarity('快速的棕色狐狸', 0.7);
  });

  it('当字符串语义不相似且使用自定义阈值时应失败', async () => {
    await expect('敏捷的棕色狐狸').not.toMatchSemanticSimilarity(
      '今天天气很好',
      0.9,
    );
  });
});

describe('LLM评估测试', () => {
  it('当字符串符合LLM评分标准时应通过', async () => {
    await expect('四年前').toPassLLMRubric(
      '包含一段著名演讲的部分内容',
      gradingConfig,
    );
  });

  it('当字符串不符合LLM评分标准时应失败', async () => {
    await expect('是时候洗衣服了').not.toPassLLMRubric(
      '包含一段著名演讲的部分内容',
      gradingConfig,
    );
  });
});

最终设置

将以下行添加到 package.json 的 scripts 部分：

"test": "mocha"

现在，您可以使用以下命令运行测试：

npm test

这将执行测试并在终端中显示结果。

请注意，如果您使用的是默认提供程序，则需要设置 OPENAI_API_KEY 环境变量。

前提条件​

创建自定义 chai 断言​

编写测试​

最终设置​

前提条件

创建自定义 chai 断言

编写测试

最终设置