评估柏拉图的对话
背景
以下提示测试了LLM对两个不同模型输出进行评价的能力,就像是一名老师。
首先,用以下提示让两个模型(例如,ChatGPT和GPT-4)生成输出:
柏拉图的《高尔吉亚篇》批判了修辞学和诡辩演讲,他指出这不仅不是一种适当的艺术形式,而且修辞和演讲的使用往往是有害和恶意的。你能写一段柏拉图的对话,其中他批判自回归语言模型的使用吗?
然后,使用以下评价提示对这些输出进行评价。
提示
你能像老师一样比较以下两个输出吗?
ChatGPT的输出: {output 1}
GPT-4的输出: {output 2}
代码 / API
- GPT-4 (OpenAI)
- Mixtral MoE 8x7B Instruct (Fireworks)
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4",
messages=[
{
"role": "user",
"content": "Can you compare the two outputs below as if you were a teacher?\n\nOutput from ChatGPT:\n{output 1}\n\nOutput from GPT-4:\n{output 2}"
}
],
temperature=1,
max_tokens=1500,
top_p=1,
frequency_penalty=0,
presence_penalty=0
)
import fireworks.client
fireworks.client.api_key = "<FIREWORKS_API_KEY>"
completion = fireworks.client.ChatCompletion.create(
model="accounts/fireworks/models/mixtral-8x7b-instruct",
messages=[
{
"role": "user",
"content": "Can you compare the two outputs below as if you were a teacher?\n\nOutput from ChatGPT:\n{output 1}\n\nOutput from GPT-4:\n{output 2}",
}
],
stop=["<|im_start|>","<|im_end|>","<|endoftext|>"],
stream=True,
n=1,
top_p=1,
top_k=40,
presence_penalty=0,
frequency_penalty=0,
prompt_truncate_len=1024,
context_length_exceeded_behavior="truncate",
temperature=0.9,
max_tokens=4000
)