使用 Jest 和 Vitest 测试提示

promptfoo 可以与 Jest 和 Vitest 等测试框架集成，以评估提示作为现有测试和 CI 工作流程的一部分。

本指南包括示例，展示如何使用语义相似性和 LLM 评分创建所需的提示质量测试用例。您也可以跳转到完整示例代码。

有关支持的检查的更多信息，请参阅预期输出文档。

前提条件

在开始之前，请确保已安装以下节点包：

jest: npm install --save-dev jest
vitest: npm install --save-dev vitest
promptfoo: npm install --save-dev promptfoo

创建自定义匹配器

首先，我们将创建自定义匹配器：

toMatchSemanticSimilarity: 比较两个字符串的语义相似性。
toPassLLMRubric: 检查字符串是否符合指定的 LLM 评分标准。
toMatchFactuality: 检查字符串是否符合指定的真实性标准。
toMatchClosedQA: 检查字符串是否符合指定的问答标准。

创建一个名为 matchers.js 的新文件，并添加以下内容：

Javascript
Typescript

import { assertions } from 'promptfoo';

const { matchesSimilarity, matchesLlmRubric } = assertions;

export function installMatchers() {
  expect.extend({
    async toMatchSemanticSimilarity(received, expected, threshold = 0.8) {
      const result = await matchesSimilarity(received, expected, threshold);
      const pass = received === expected || result.pass;
      if (pass) {
        return {
          message: () => `expected ${received} not to match semantic similarity with ${expected}`,
          pass: true,
        };
      } else {
        return {
          message: () =>
            `expected ${received} to match semantic similarity with ${expected}, but it did not. Reason: ${result.reason}`,
          pass: false,
        };
      }
    },

    async toPassLLMRubric(received, expected, gradingConfig) {
      const gradingResult = await matchesLlmRubric(expected, received, gradingConfig);
      if (gradingResult.pass) {
        return {
          message: () => `expected ${received} not to pass LLM Rubric with ${expected}`,
          pass: true,
        };
      } else {
        return {
          message: () =>
            `expected ${received} to pass LLM Rubric with ${expected}, but it did not. Reason: ${gradingResult.reason}`,
          pass: false,
        };
      }
    },

    async toMatchFactuality(input, expected, received, gradingConfig) {
      const gradingResult = await matchesFactuality(input, expected, received, gradingConfig);
      if (gradingResult.pass) {
        return {
          message: () => `expected ${received} not to match factuality with ${expected}`,
          pass: true,
        };
      } else {
        return {
          message: () =>
            `expected ${received} to match factuality with ${expected}, but it did not. Reason: ${gradingResult.reason}`,
          pass: false,
        };
      }
    },

    async toMatchClosedQA(input, expected, received, gradingConfig) {
      const gradingResult = await matchesClosedQa(input, expected, received, gradingConfig);
      if (gradingResult.pass) {
        return {
          message: () => `expected ${received} not to match ClosedQA with ${expected}`,
          pass: true,
        };
      } else {
        return {
          message: () =>
            `expected ${received} to match ClosedQA with ${expected}, but it did not. Reason: ${gradingResult.reason}`,
          pass: false,
        };
      }
    },
  });
}

import { assertions } from 'promptfoo';
import type { GradingConfig } from 'promptfoo';

const { matchesSimilarity, matchesLlmRubric } = assertions;

declare global {
  namespace jest {
    interface Matchers<R> {
      toMatchSemanticSimilarity(expected: string, threshold?: number): R;
      toPassLLMRubric(expected: string, gradingConfig: GradingConfig): R;
    }
  }
}

export function installMatchers() {
  expect.extend({
    async toMatchSemanticSimilarity(
      received: string,
      expected: string,
      threshold: number = 0.8,
    ): Promise<jest.CustomMatcherResult> {
      const result = await matchesSimilarity(received, expected, threshold);
      const pass = received === expected || result.pass;
      if (pass) {
        return {
          message: () => `期望 ${received} 与 ${expected} 不匹配语义相似性`,
          pass: true,
        };
      } else {
        return {
          message: () =>
            `期望 ${received} 与 ${expected} 匹配语义相似性，但未匹配。原因: ${result.reason}`,
          pass: false,
        };
      }
    },

    async toPassLLMRubric(
      received: string,
      expected: string,
      gradingConfig: GradingConfig,
    ): Promise<jest.CustomMatcherResult> {
      const gradingResult = await matchesLlmRubric(expected, received, gradingConfig);
      if (gradingResult.pass) {
        return {
          message: () => `期望 ${received} 不通过 LLM 评分标准与 ${expected}`,
          pass: true,
        };
      } else {
        return {
          message: () =>
            `期望 ${received} 通过 LLM 评分标准与 ${expected}，但未通过。原因: ${gradingResult.reason}`,
          pass: false,
        };
      }
    },
  });
}

编写测试

我们的测试代码将使用自定义匹配器来运行几个测试用例。

创建一个名为 index.test.js 的新文件，并添加以下代码：

import { installMatchers } from './matchers';

installMatchers();

const gradingConfig = {
  provider: 'openai:chat:gpt-4o-mini',
};

describe('语义相似性测试', () => {
  test('当字符串语义相似时应通过', async () => {
    await expect('The quick brown fox').toMatchSemanticSimilarity('A fast brown fox');
  });

  test('当字符串语义不相似时应失败', async () => {
    await expect('The quick brown fox').not.toMatchSemanticSimilarity('The weather is nice today');
  });

  test('当字符串语义相似且使用自定义阈值时应通过', async () => {
    await expect('The quick brown fox').toMatchSemanticSimilarity('A fast brown fox', 0.7);
  });

  test('当字符串语义不相似且使用自定义阈值时应失败', async () => {
    await expect('The quick brown fox').not.toMatchSemanticSimilarity(
      'The weather is nice today',
      0.9,
    );
  });
});

describe('LLM 评估测试', () => {
  test('当字符串符合 LLM 评分标准时应通过', async () => {
    await expect('Four score and seven years ago').toPassLLMRubric(
      '包含著名演讲的一部分',
      gradingConfig,
    );
  });

  test('当字符串不符合 LLM 评分标准时应失败', async () => {
    await expect('It is time to do laundry').not.toPassLLMRubric(
      '包含著名演讲的一部分',
      gradingConfig,
    );
  });
});

最终设置

将以下行添加到 package.json 的 scripts 部分：

"test": "jest"

现在，您可以使用以下命令运行测试：

npm test

这将执行测试并在终端中显示结果。

请注意，如果您使用的是默认提供程序，则需要设置 OPENAI_API_KEY 环境变量。

前提条件​

创建自定义匹配器​

编写测试​

最终设置​

前提条件

创建自定义匹配器

编写测试

最终设置