评估柏拉图的对话

背景

以下提示测试了一个LLM的能力，即像老师一样对两个不同模型的输出进行评估。

首先，两个模型（例如，ChatGPT 和 GPT-4）被提示使用以下提示：

Plato’s Gorgias is a critique of rhetoric and sophistic oratory, where he makes the point that not only is it not a proper form of art, but the use of rhetoric and oratory can often be harmful and malicious. Can you write a dialogue by Plato where instead he criticizes the use of autoregressive language models?

然后，使用下面的评估提示对这些输出进行评估。

提示

Can you compare the two outputs below as if you were a teacher?

Output from ChatGPT: {output 1}

Output from GPT-4: {output 2}

代码 / API

from openai import OpenAI
client = OpenAI()
 
response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {
        "role": "user",
        "content": "Can you compare the two outputs below as if you were a teacher?\n\nOutput from ChatGPT:\n{output 1}\n\nOutput from GPT-4:\n{output 2}"
        }
    ],
    temperature=1,
    max_tokens=1500,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0
)

参考

人工通用智能的火花：GPT-4的早期实验 (在新标签页中打开) (2023年4月13日)

Evaluation Information Extraction