嵌入相似度评估器¶

这个笔记本展示了SemanticSimilarityEvaluator，它通过语义相似性评估问答系统的质量。

具体来说，它计算生成的答案和参考答案的嵌入之间的相似性分数。

如果您在colab上打开这个笔记本，您可能需要安装LlamaIndex 🦙。

In [ ]:

Copied!

!pip install llama-index
!pip install llama-index

In [ ]:

Copied!

from llama_index.core.evaluation import SemanticSimilarityEvaluator

evaluator = SemanticSimilarityEvaluator()
from llama_index.core.evaluation import SemanticSimilarityEvaluator

evaluator = SemanticSimilarityEvaluator()

In [ ]:

Copied!

# 该评估器只使用`response`和`reference`，传入查询不会影响评估# 查询 = '天空的颜色是什么'response = "天空通常是蓝色的"reference = """天空的颜色可能会因多种因素而变化，包括时间、天气条件和地点。白天，当太阳在天空中时，天空通常呈现蓝色。这是由于一种称为瑞利散射的现象，地球大气中的分子和颗粒会将阳光向各个方向散射，而蓝光比其他颜色更容易散射，因为它以更短、更小的波长传播。这就是为什么我们在晴朗的日子里会感知天空是蓝色的。"""result = await evaluator.aevaluate(    response=response,    reference=reference,)
# 该评估器只使用`response`和`reference`，传入查询不会影响评估# 查询 = '天空的颜色是什么'response = "天空通常是蓝色的"reference = """天空的颜色可能会因多种因素而变化，包括时间、天气条件和地点。白天，当太阳在天空中时，天空通常呈现蓝色。这是由于一种称为瑞利散射的现象，地球大气中的分子和颗粒会将阳光向各个方向散射，而蓝光比其他颜色更容易散射，因为它以更短、更小的波长传播。这就是为什么我们在晴朗的日子里会感知天空是蓝色的。"""result = await evaluator.aevaluate(    response=response,    reference=reference,)

In [ ]:

Copied!

# 打印分数print("分数：", result.score)# 打印通过情况，默认相似度阈值为0.8print("通过：", result.passing)
# 打印分数print("分数：", result.score)# 打印通过情况，默认相似度阈值为0.8print("通过：", result.passing)

Score:  0.874911773340899
Passing:  True

In [ ]:

Copied!

response = "抱歉，我没有足够的上下文来回答这个问题。"reference = """天空的颜色可能会因多种因素而变化，包括时间、天气条件和地点。白天，当太阳在天空中时，天空通常呈现蓝色。这是由于一种称为瑞利散射的现象，地球大气中的分子和颗粒会将阳光向各个方向散射，而蓝光比其他颜色更容易被散射，因为它以更短、更小的波长传播。这就是为什么我们在晴朗的日子里会感知天空是蓝色的原因。"""result = await evaluator.aevaluate(    response=response,    reference=reference,)
response = "抱歉，我没有足够的上下文来回答这个问题。"reference = """天空的颜色可能会因多种因素而变化，包括时间、天气条件和地点。白天，当太阳在天空中时，天空通常呈现蓝色。这是由于一种称为瑞利散射的现象，地球大气中的分子和颗粒会将阳光向各个方向散射，而蓝光比其他颜色更容易被散射，因为它以更短、更小的波长传播。这就是为什么我们在晴朗的日子里会感知天空是蓝色的原因。"""result = await evaluator.aevaluate(    response=response,    reference=reference,)

In [ ]:

Copied!

print("得分: ", result.score)print("通过: ", result.passing)  # 默认相似度阈值为0.8
print("得分: ", result.score)print("通过: ", result.passing)  # 默认相似度阈值为0.8

Score:  0.7221738929165528
Passing:  False

自定义¶

这个notebook提供了一些关于如何自定义你的Jupyter笔记本的提示和技巧。

In [ ]:

Copied!





from llama_index.core.evaluation import SemanticSimilarityEvaluator
from llama_index.core.embeddings import SimilarityMode, resolve_embed_model

embed_model = resolve_embed_model("local")
evaluator = SemanticSimilarityEvaluator(
    embed_model=embed_model,
    similarity_mode=SimilarityMode.DEFAULT,
    similarity_threshold=0.6,
)
from llama_index.core.evaluation import SemanticSimilarityEvaluator
from llama_index.core.embeddings import SimilarityMode, resolve_embed_model

embed_model = resolve_embed_model("local")
evaluator = SemanticSimilarityEvaluator(
    embed_model=embed_model,
    similarity_mode=SimilarityMode.DEFAULT,
    similarity_threshold=0.6,
)

In [ ]:

Copied!





response = "The sky is yellow."
reference = "The sky is blue."

result = await evaluator.aevaluate(
    response=response,
    reference=reference,
)
response = "The sky is yellow."
reference = "The sky is blue."

result = await evaluator.aevaluate(
    response=response,
    reference=reference,
)

In [ ]:

Copied!

print("Score: ", result.score)
print("Passing: ", result.passing)
print("Score: ", result.score)
print("Passing: ", result.passing)

Score:  0.9178505509625874
Passing:  True

我们在这里指出，高分并不意味着答案总是正确的。

嵌入相似性主要捕捉了“相关性”的概念。由于回答和参考文本都讨论了“天空”和颜色，它们在语义上是相似的。