评估柏拉图的对话

背景

以下提示测试了LLM对两个不同模型输出进行评价的能力，就像是一名老师。

首先，用以下提示让两个模型（例如，ChatGPT和GPT-4）生成输出：

柏拉图的《高尔吉亚篇》批判了修辞学和诡辩演讲，他指出这不仅不是一种适当的艺术形式，而且修辞和演讲的使用往往是有害和恶意的。你能写一段柏拉图的对话，其中他批判自回归语言模型的使用吗？

然后，使用以下评价提示对这些输出进行评价。

提示

你能像老师一样比较以下两个输出吗？

ChatGPT的输出: {output 1}

GPT-4的输出: {output 2}

代码 / API

GPT-4 (OpenAI)
Mixtral MoE 8x7B Instruct (Fireworks)

from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {
        "role": "user",
        "content": "Can you compare the two outputs below as if you were a teacher?\n\nOutput from ChatGPT:\n{output 1}\n\nOutput from GPT-4:\n{output 2}"
        }
    ],
    temperature=1,
    max_tokens=1500,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0
)

import fireworks.client
fireworks.client.api_key = "<FIREWORKS_API_KEY>"
completion = fireworks.client.ChatCompletion.create(
    model="accounts/fireworks/models/mixtral-8x7b-instruct",
    messages=[
        {
        "role": "user",
        "content": "Can you compare the two outputs below as if you were a teacher?\n\nOutput from ChatGPT:\n{output 1}\n\nOutput from GPT-4:\n{output 2}",
        }
    ],
    stop=["<|im_start|>","<|im_end|>","<|endoftext|>"],
    stream=True,
    n=1,
    top_p=1,
    top_k=40,
    presence_penalty=0,
    frequency_penalty=0,
    prompt_truncate_len=1024,
    context_length_exceeded_behavior="truncate",
    temperature=0.9,
    max_tokens=4000
)

参考

Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023年4月13日)

评估柏拉图的对话

背景​

提示​

代码 / API​

参考​

背景

提示

代码 / API

参考