使用助手API(GPT-4)和DALL·E-3创建幻灯片
本笔记本演示了如何利用新的助手API(GPT-4)和DALL·E-3来制作信息丰富且视觉吸引人的幻灯片。 创建幻灯片是许多工作的关键方面,但可能是费力且耗时的。此外,从数据中提取见解并有效地表达在幻灯片上可能具有挑战性。本食谱将演示如何利用新的助手API来为您简化端到端的幻灯片创建过程,而无需触碰Microsoft PowerPoint或Google幻灯片,节省您宝贵的时间和精力!
0. 设置
from IPython.display import display, Image
from openai import OpenAI
import os
import pandas as pd
import json
import io
from PIL import Image
import requests
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>"))
# 让我们从 https://cookbook.openai.com/examples/assistants_api_overview_python 导入一些助手辅助函数
def show_json(obj):
display(json.loads(obj.model_dump_json()))
def submit_message(assistant_id, thread, user_message,file_ids=None):
params = {
'thread_id': thread.id,
'role': 'user',
'content': user_message,
}
if file_ids:
params['file_ids']=file_ids
client.beta.threads.messages.create(
**params
)
return client.beta.threads.runs.create(
thread_id=thread.id,
assistant_id=assistant_id,
)
def get_response(thread):
return client.beta.threads.messages.list(thread_id=thread.id)
1. 创建内容
在这个示例中,我们将为我们公司NotReal Corporation的季度财务审查创建一个简短的虚构演示文稿。我们希望突出一些影响公司盈利能力的关键趋势。假设我们有一些财务数据可供使用。让我们加载数据,然后看一下…
financial_data_path = 'data/NotRealCorp_financial_data.json'
financial_data = pd.read_json(financial_data_path)
financial_data.head(5)
Year | Quarter | Distribution channel | Revenue ($M) | Costs ($M) | Customer count | Time | |
---|---|---|---|---|---|---|---|
0 | 2021 | Q1 | Online Sales | 1.50 | 1.301953 | 150 | 2021 Q1 |
1 | 2021 | Q1 | Direct Sales | 1.50 | 1.380809 | 151 | 2021 Q1 |
2 | 2021 | Q1 | Retail Partners | 1.50 | 1.348246 | 152 | 2021 Q1 |
3 | 2021 | Q2 | Online Sales | 1.52 | 1.308608 | 152 | 2021 Q2 |
4 | 2021 | Q2 | Direct Sales | 1.52 | 1.413305 | 153 | 2021 Q2 |
正如您所看到的,这些数据包括不同分销渠道的季度收入、成本和客户数据。让我们创建一个助手,可以充当个人分析师,并为我们的PowerPoint制作出漂亮的可视化图表!
首先,我们需要上传我们的文件,这样我们的助手才能访问它。
file = client.files.create(
file=open('data/NotRealCorp_financial_data.json',"rb"),
purpose='assistants',
)
现在,我们已经准备好创建我们的助手了。我们可以指示我们的助手扮演数据科学家的角色,接受我们给出的任何查询,并运行必要的代码来输出正确的数据可视化。这里的指令参数类似于ChatCompletions端点中的系统指令,可以帮助指导助手。我们还可以打开代码解释器工具,这样我们的助手就能编写代码了。最后,我们可以指定我们想要使用的任何文件,这种情况下只有我们上面创建的financial_data
文件。
assistant = client.beta.assistants.create(
instructions="You are a data scientist assistant. When given data and a query, write the proper code and create the proper visualization",
model="gpt-4-1106-preview",
tools=[{"type": "code_interpreter"}],
file_ids=[file.id]
)
现在让我们创建一个线程,作为我们的第一个请求,要求助手计算季度利润,然后按分销渠道随时间变化绘制利润图。助手将自动计算每个季度的利润,并创建一个新列,将季度和年份合并在一起,而无需我们直接要求。我们还可以指定每条线的颜色。
thread = client.beta.threads.create(
messages=[
{
"role": "user",
"content": "Calculate profit (revenue minus cost) by quarter and year, and visualize as a line plot across the distribution channels, where the colors of the lines are green, light red, and light blue",
"file_ids": [file.id]
}
]
)
现在我们可以执行线程的运行。
run = client.beta.threads.runs.create(
thread_id=thread.id,
assistant_id=assistant.id,
)
现在我们可以开始一个循环,检查图像是否已经创建。注意:这可能需要几分钟。
messages = client.beta.threads.messages.list(thread_id=thread.id)
import time
while True:
messages = client.beta.threads.messages.list(thread_id=thread.id)
try:
#检查图像是否已创建
messages.data[0].content[0].image_file
#睡眠以确保运行已完成
time.sleep(5)
print('Plot created!')
break
except:
time.sleep(10)
print('Assistant still working...')
Assistant still working...
Assistant still working...
Assistant still working...
Assistant still working...
Assistant still working...
Assistant still working...
Assistant still working...
Assistant still working...
Assistant still working...
Assistant still working...
Assistant still working...
Assistant still working...
Assistant still working...
Assistant still working...
Assistant still working...
Assistant still working...
Assistant still working...
Plot created!
让我们看看助手添加的消息。
messages = client.beta.threads.messages.list(thread_id=thread.id)
[message.content[0] for message in messages.data]
[MessageContentImageFile(image_file=ImageFile(file_id='file-0rKABLygI02MgwwhpgWdRFY1'), type='image_file'),
MessageContentText(text=Text(annotations=[], value="The profit has been calculated for each distribution channel by quarter and year. Next, I'll create a line plot to visualize these profits. As specified, I will use green for the 'Online Sales', light red for 'Direct Sales', and light blue for 'Retail Partners' channels. Let's create the plot."), type='text'),
MessageContentText(text=Text(annotations=[], value="The JSON data has been successfully restructured into a tabular dataframe format. It includes the year, quarter, distribution channel, revenue, costs, customer count, and a combined 'Time' representation of 'Year Quarter'. Now, we have the necessary information to calculate the profit (revenue minus cost) by quarter and year.\n\nTo visualize the profit across the different distribution channels with a line plot, we will proceed with the following steps:\n\n1. Calculate the profit for each row in the dataframe.\n2. Group the data by 'Time' (which is a combination of Year and Quarter) and 'Distribution channel'.\n3. Aggregate the profit for each group.\n4. Plot the aggregated profits as a line plot with the distribution channels represented in different colors as requested.\n\nLet's calculate the profit for each row and then continue with the visualization."), type='text'),
MessageContentText(text=Text(annotations=[], value='The structure of the JSON data shows that it is a dictionary with "Year", "Quarter", "Distribution channel", and potentially other keys that map to dictionaries containing the data. The keys of the inner dictionaries are indices, indicating that the data is tabular but has been converted into a JSON object.\n\nTo properly convert this data into a DataFrame, I will restructure the JSON data into a more typical list of dictionaries, where each dictionary represents a row in our target DataFrame. Subsequent to this restructuring, I can then load the data into a Pandas DataFrame. Let\'s restructure and load the data.'), type='text'),
MessageContentText(text=Text(annotations=[], value="The JSON data has been incorrectly loaded into a single-row DataFrame with numerous columns representing each data point. This implies the JSON structure is not as straightforward as expected, and a direct conversion to a flat table is not possible without further processing.\n\nTo better understand the JSON structure and figure out how to properly normalize it into a table format, I will print out the raw JSON data structure. We will analyze its format and then determine the correct approach to extract the profit by quarter and year, as well as the distribution channel information. Let's take a look at the JSON structure."), type='text'),
MessageContentText(text=Text(annotations=[], value="It seems that the file content was successfully parsed as JSON, and thus, there was no exception raised. The variable `error_message` is not defined because the `except` block was not executed.\n\nI'll proceed with displaying the data that was parsed from JSON."), type='text'),
MessageContentText(text=Text(annotations=[], value="It appears that the content of the dataframe has been incorrectly parsed, resulting in an empty dataframe with a very long column name that seems to contain JSON data rather than typical CSV columns and rows.\n\nTo address this issue, I will take a different approach to reading the file. I will attempt to parse the content as JSON. If this is not successful, I'll adjust the loading strategy accordingly. Let's try to read the contents as JSON data first."), type='text'),
MessageContentText(text=Text(annotations=[], value="Before we can calculate profits and visualize the data as requested, I need to first examine the contents of the file that you have uploaded. Let's go ahead and read the file to understand its structure and the kind of data it contains. Once I have a clearer picture of the dataset, we can proceed with the profit calculations. I'll begin by loading the file into a dataframe and displaying the first few entries to see the data schema."), type='text'),
MessageContentText(text=Text(annotations=[], value='Calculate profit (revenue minus cost) by quarter and year, and visualize as a line plot across the distribution channels, where the colors of the lines are green, light red, and light blue'), type='text')]
我们可以看到助手的最后一条消息(最新消息首先显示)包含了我们正在寻找的图像文件。这里有一个有趣的地方是,助手能够尝试多次解析JSON数据,因为第一次解析失败了,展示了助手的适应能力。
# 快速辅助函数,用于将我们的输出文件转换为PNG格式。
def convert_file_to_png(file_id, write_path):
data = client.files.content(file_id)
data_bytes = data.read()
with open(write_path, "wb") as file:
file.write(data_bytes)
plot_file_id = messages.data[0].content[0].image_file.file_id
image_path = "../images/NotRealCorp_chart.png"
convert_file_to_png(plot_file_id,image_path)
#上传
plot_file = client.files.create(
file=open(image_path, "rb"),
purpose='assistants'
)
让我们加载图表!