Examples - PandasAI

这里有一些如何使用PandasAI的示例。更多示例包含在存储库中，以及数据样本。

使用pandas数据框

使用PandasAI与Pandas DataFrame

import os

from pandasai import SmartDataframe

import pandas as pd

# pandas dataframe

sales_by_country = pd.DataFrame({

    "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],

    "sales": [5000, 3200, 2900, 4100, 2300, 2100, 2500, 2600, 4500, 7000]

})

# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"

# convert to SmartDataframe

sdf = SmartDataframe(sales_by_country)

response = sdf.chat('Which are the top 5 countries by sales?')

print(response)

# Output: China, United States, Japan, Germany, Australia

处理CSV文件

使用PandasAI与CSV文件的示例

import os

from pandasai import SmartDataframe

# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"

# You can instantiate a SmartDataframe with a path to a CSV file

sdf = SmartDataframe("data/Loan payments data.csv")

response = sdf.chat("How many loans are from men and have been paid off?")

print(response)

# Output: 247 loans have been paid off by men.

处理Excel文件

使用PandasAI与Excel文件的示例。为了使用Excel文件作为数据源，您需要安装pandasai[excel]额外的依赖项。

pip install pandasai[excel]

然后，您可以按如下方式使用PandasAI与Excel文件：

import os

from pandasai import SmartDataframe

# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"

# You can instantiate a SmartDataframe with a path to an Excel file

sdf = SmartDataframe("data/Loan payments data.xlsx")

response = sdf.chat("How many loans are from men and have been paid off?")

print(response)

# Output: 247 loans have been paid off by men.

处理Parquet文件

使用PandasAI与Parquet文件的示例

import os

from pandasai import SmartDataframe

# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"

# You can instantiate a SmartDataframe with a path to a Parquet file

sdf = SmartDataframe("data/Loan payments data.parquet")

response = sdf.chat("How many loans are from men and have been paid off?")

print(response)

# Output: 247 loans have been paid off by men.

使用Google Sheets

使用PandasAI与Google表格的示例。为了使用Google表格作为数据源，您需要安装pandasai[google-sheet]额外的依赖项。

pip install pandasai[google-sheet]

然后，您可以按如下方式使用PandasAI与Google表格：

import os

from pandasai import SmartDataframe

# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"

# You can instantiate a SmartDataframe with a path to a Google Sheet

sdf = SmartDataframe("https://docs.google.com/spreadsheets/d/fake/edit#gid=0")

response = sdf.chat("How many loans are from men and have been paid off?")

print(response)

# Output: 247 loans have been paid off by men.

请记住，目前您需要确保Google表格是公开的。

使用Modin数据框

使用PandasAI与Modin DataFrame的示例。为了使用Modin数据框作为数据源，你需要安装pandasai[modin]额外的依赖项。

pip install pandasai[modin]

然后，您可以按如下方式使用带有Modin DataFrame的PandasAI：

import os

import pandasai

from pandasai import SmartDataframe

import modin.pandas as pd

# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"

sales_by_country = pd.DataFrame({

    "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],

    "sales": [5000, 3200, 2900, 4100, 2300, 2100, 2500, 2600, 4500, 7000]

})

pandasai.set_pd_engine("modin")

sdf = SmartDataframe(sales_by_country)

response = sdf.chat('Which are the top 5 countries by sales?')

print(response)

# Output: China, United States, Japan, Germany, Australia

# you can switch back to pandas using

# pandasai.set_pd_engine("pandas")

使用Polars数据框

使用PandasAI与Polars DataFrame的示例（仍在测试阶段）。为了使用Polars数据框作为数据源，您需要安装pandasai[polars]额外的依赖项。

pip install pandasai[polars]

然后，您可以按如下方式使用PandasAI与Polars DataFrame：

import os

from pandasai import SmartDataframe

import polars as pl

# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"

# You can instantiate a SmartDataframe with a Polars DataFrame

sales_by_country = pl.DataFrame({

    "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],

    "sales": [5000, 3200, 2900, 4100, 2300, 2100, 2500, 2600, 4500, 7000]

})

sdf = SmartDataframe(sales_by_country)

response = sdf.chat("How many loans are from men and have been paid off?")

print(response)

# Output: 247 loans have been paid off by men.

绘图

使用PandasAI从Pandas DataFrame绘制图表的示例

import os

from pandasai import SmartDataframe

# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"

sdf = SmartDataframe("data/Countries.csv")

response = sdf.chat(

    "Plot the histogram of countries showing for each the gpd, using different colors for each bar",

)

print(response)

# Output: check out assets/histogram-chart.png

使用用户定义路径保存图表

您可以传递自定义路径来保存图表。路径必须是有效的全局路径。以下是使用用户定义位置保存图表的示例。

import os

from pandasai import SmartDataframe

user_defined_path = os.getcwd()

# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"

sdf = SmartDataframe("data/Countries.csv", config={

    "save_charts": True,

    "save_charts_path": user_defined_path,

})

response = sdf.chat(

    "Plot the histogram of countries showing for each the gpd,"

    " using different colors for each bar",

)

print(response)

# Output: check out $pwd/exports/charts/{hashid}/chart.png

处理多个数据框（使用SmartDatalake）

使用PandasAI与多个数据框的示例。为了使用多个数据框作为数据源，你需要使用SmartDatalake而不是SmartDataframe。你可以如下实例化一个SmartDatalake：

import os

from pandasai import SmartDatalake

import pandas as pd

employees_data = {

    'EmployeeID': [1, 2, 3, 4, 5],

    'Name': ['John', 'Emma', 'Liam', 'Olivia', 'William'],

    'Department': ['HR', 'Sales', 'IT', 'Marketing', 'Finance']

}

salaries_data = {

    'EmployeeID': [1, 2, 3, 4, 5],

    'Salary': [5000, 6000, 4500, 7000, 5500]

}

employees_df = pd.DataFrame(employees_data)

salaries_df = pd.DataFrame(salaries_data)

# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"

lake = SmartDatalake([employees_df, salaries_df])

response = lake.chat("Who gets paid the most?")

print(response)

# Output: Olivia gets paid the most.

与代理合作

通过聊天代理，您可以进行动态对话，代理在整个讨论过程中保留上下文。这使得您可以进行更具互动性和意义的交流。

主要特点

上下文保留： 代理记住对话历史，允许无缝、上下文感知的交互。
澄清问题： 您可以使用clarification_questions方法来请求对话中任何方面的澄清。这有助于确保您完全理解所提供的信息。
解释： explain 方法可用于获取关于代理如何得出特定解决方案或响应的详细解释。它提供了对代理决策过程的透明度和洞察。

请随时发起对话，寻求澄清，并探索解释，以增强您与聊天代理的互动！

import os

import pandas as pd

from pandasai import Agent

employees_data = {

    "EmployeeID": [1, 2, 3, 4, 5],

    "Name": ["John", "Emma", "Liam", "Olivia", "William"],

    "Department": ["HR", "Sales", "IT", "Marketing", "Finance"],

}

salaries_data = {

    "EmployeeID": [1, 2, 3, 4, 5],

    "Salary": [5000, 6000, 4500, 7000, 5500],

}

employees_df = pd.DataFrame(employees_data)

salaries_df = pd.DataFrame(salaries_data)

# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"

agent = Agent([employees_df, salaries_df], memory_size=10)

query = "Who gets paid the most?"

# Chat with the agent

response = agent.chat(query)

print(response)

# Get Clarification Questions

questions = agent.clarification_questions(query)

for question in questions:

    print(question)

# Explain how the chat response is generated

response = agent.explain()

print(response)

代理的描述

当你实例化一个代理时，你可以提供代理的描述。这个描述将用于在聊天中描述代理，并为LLM提供更多关于如何响应查询的上下文。

一些描述的例子可以是：

你是一个数据分析代理。你的主要目标是帮助非技术用户分析数据
充当数据分析师。每次我向你提问时，你都应该提供使用plotly可视化答案的代码

import os

from pandasai import Agent

# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"

agent = Agent(

    "data.csv",

    description="You are a data analysis agent. Your main goal is to help non-technical users to analyze data",

)

向代理添加技能

您可以为代理添加自定义功能，使其能够扩展其能力。这些自定义功能可以与代理的技能无缝集成，从而实现广泛的用户定义操作。

import os

import pandas as pd

from pandasai import Agent

from pandasai.skills import skill

employees_data = {

    "EmployeeID": [1, 2, 3, 4, 5],

    "Name": ["John", "Emma", "Liam", "Olivia", "William"],

    "Department": ["HR", "Sales", "IT", "Marketing", "Finance"],

}

salaries_data = {

    "EmployeeID": [1, 2, 3, 4, 5],

    "Salary": [5000, 6000, 4500, 7000, 5500],

}

employees_df = pd.DataFrame(employees_data)

salaries_df = pd.DataFrame(salaries_data)

@skill

def plot_salaries(merged_df: pd.DataFrame):

    """

    Displays the bar chart having name on x-axis and salaries on y-axis using streamlit

    """

    import matplotlib.pyplot as plt

    plt.bar(merged_df["Name"], merged_df["Salary"])

    plt.xlabel("Employee Name")

    plt.ylabel("Salary")

    plt.title("Employee Salaries")

    plt.xticks(rotation=45)

    plt.savefig("temp_chart.png")

    plt.close()

# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"

agent = Agent([employees_df, salaries_df], memory_size=10)

agent.add_skills(plot_salaries)

# Chat with the agent

response = agent.chat("Plot the employee salaries against names")

print(response)

开始使用

平台

图书馆

高级代理

高级用法

关于

示例

使用pandas数据框

处理CSV文件

处理Excel文件

处理Parquet文件

使用Google Sheets

使用Modin数据框

使用Polars数据框

绘图

使用用户定义路径保存图表

处理多个数据框（使用SmartDatalake）

与代理合作

代理的描述

向代理添加技能

开始使用

平台

图书馆

高级代理

高级用法

关于

​使用pandas数据框

​处理CSV文件

​处理Excel文件

​处理Parquet文件

​使用Google Sheets

​使用Modin数据框

​使用Polars数据框

​绘图

​使用用户定义路径保存图表

​处理多个数据框（使用SmartDatalake）

​与代理合作

​代理的描述

​向代理添加技能

使用pandas数据框

处理CSV文件

处理Excel文件

处理Parquet文件

使用Google Sheets

使用Modin数据框

使用Polars数据框

绘图

使用用户定义路径保存图表

处理多个数据框（使用SmartDatalake）

与代理合作

代理的描述

向代理添加技能