评估柏拉图的对话
背景
以下提示测试了一个LLM的能力,即像老师一样对两个不同模型的输出进行评估。
首先,两个模型(例如,ChatGPT 和 GPT-4)被提示使用以下提示:
Plato’s Gorgias is a critique of rhetoric and sophistic oratory, where he makes the point that not only is it not a proper form of art, but the use of rhetoric and oratory can often be harmful and malicious. Can you write a dialogue by Plato where instead he criticizes the use of autoregressive language models?
然后,使用下面的评估提示对这些输出进行评估。
提示
Can you compare the two outputs below as if you were a teacher?
Output from ChatGPT: {output 1}
Output from GPT-4: {output 2}
代码 / API
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4",
messages=[
{
"role": "user",
"content": "Can you compare the two outputs below as if you were a teacher?\n\nOutput from ChatGPT:\n{output 1}\n\nOutput from GPT-4:\n{output 2}"
}
],
temperature=1,
max_tokens=1500,
top_p=1,
frequency_penalty=0,
presence_penalty=0
)
参考
- 人工通用智能的火花:GPT-4的早期实验 (在新标签页中打开) (2023年4月13日)