这里有一些如何使用PandasAI的示例。 更多示例包含在存储库中,以及数据样本。

使用pandas数据框

使用PandasAI与Pandas DataFrame

import os

from pandasai import SmartDataframe

import pandas as pd



# pandas dataframe

sales_by_country = pd.DataFrame({

    "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],

    "sales": [5000, 3200, 2900, 4100, 2300, 2100, 2500, 2600, 4500, 7000]

})





# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"



# convert to SmartDataframe

sdf = SmartDataframe(sales_by_country)



response = sdf.chat('Which are the top 5 countries by sales?')

print(response)

# Output: China, United States, Japan, Germany, Australia

处理CSV文件

使用PandasAI与CSV文件的示例

import os

from pandasai import SmartDataframe



# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"



# You can instantiate a SmartDataframe with a path to a CSV file

sdf = SmartDataframe("data/Loan payments data.csv")



response = sdf.chat("How many loans are from men and have been paid off?")

print(response)

# Output: 247 loans have been paid off by men.

处理Excel文件

使用PandasAI与Excel文件的示例。为了使用Excel文件作为数据源,您需要安装pandasai[excel]额外的依赖项。

pip install pandasai[excel]

然后,您可以按如下方式使用PandasAI与Excel文件:

import os

from pandasai import SmartDataframe



# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"



# You can instantiate a SmartDataframe with a path to an Excel file

sdf = SmartDataframe("data/Loan payments data.xlsx")



response = sdf.chat("How many loans are from men and have been paid off?")

print(response)

# Output: 247 loans have been paid off by men.

处理Parquet文件

使用PandasAI与Parquet文件的示例

import os

from pandasai import SmartDataframe



# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"



# You can instantiate a SmartDataframe with a path to a Parquet file

sdf = SmartDataframe("data/Loan payments data.parquet")



response = sdf.chat("How many loans are from men and have been paid off?")

print(response)

# Output: 247 loans have been paid off by men.

使用Google Sheets

使用PandasAI与Google表格的示例。为了使用Google表格作为数据源,您需要安装pandasai[google-sheet]额外的依赖项。

pip install pandasai[google-sheet]

然后,您可以按如下方式使用PandasAI与Google表格:

import os

from pandasai import SmartDataframe



# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"



# You can instantiate a SmartDataframe with a path to a Google Sheet

sdf = SmartDataframe("https://docs.google.com/spreadsheets/d/fake/edit#gid=0")

response = sdf.chat("How many loans are from men and have been paid off?")

print(response)

# Output: 247 loans have been paid off by men.

请记住,目前您需要确保Google表格是公开的。

使用Modin数据框

使用PandasAI与Modin DataFrame的示例。为了使用Modin数据框作为数据源,你需要安装pandasai[modin]额外的依赖项。

pip install pandasai[modin]

然后,您可以按如下方式使用带有Modin DataFrame的PandasAI:

import os

import pandasai

from pandasai import SmartDataframe

import modin.pandas as pd



# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"



sales_by_country = pd.DataFrame({

    "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],

    "sales": [5000, 3200, 2900, 4100, 2300, 2100, 2500, 2600, 4500, 7000]

})



pandasai.set_pd_engine("modin")

sdf = SmartDataframe(sales_by_country)

response = sdf.chat('Which are the top 5 countries by sales?')

print(response)

# Output: China, United States, Japan, Germany, Australia



# you can switch back to pandas using

# pandasai.set_pd_engine("pandas")

使用Polars数据框

使用PandasAI与Polars DataFrame的示例(仍在测试阶段)。为了使用Polars数据框作为数据源,您需要安装pandasai[polars]额外的依赖项。

pip install pandasai[polars]

然后,您可以按如下方式使用PandasAI与Polars DataFrame:

import os

from pandasai import SmartDataframe

import polars as pl



# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"



# You can instantiate a SmartDataframe with a Polars DataFrame

sales_by_country = pl.DataFrame({

    "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],

    "sales": [5000, 3200, 2900, 4100, 2300, 2100, 2500, 2600, 4500, 7000]

})



sdf = SmartDataframe(sales_by_country)

response = sdf.chat("How many loans are from men and have been paid off?")

print(response)

# Output: 247 loans have been paid off by men.

绘图

使用PandasAI从Pandas DataFrame绘制图表的示例

import os

from pandasai import SmartDataframe



# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"



sdf = SmartDataframe("data/Countries.csv")

response = sdf.chat(

    "Plot the histogram of countries showing for each the gpd, using different colors for each bar",

)

print(response)

# Output: check out assets/histogram-chart.png

使用用户定义路径保存图表

您可以传递自定义路径来保存图表。路径必须是有效的全局路径。 以下是使用用户定义位置保存图表的示例。

import os

from pandasai import SmartDataframe



user_defined_path = os.getcwd()



# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"



sdf = SmartDataframe("data/Countries.csv", config={

    "save_charts": True,

    "save_charts_path": user_defined_path,

})

response = sdf.chat(

    "Plot the histogram of countries showing for each the gpd,"

    " using different colors for each bar",

)

print(response)

# Output: check out $pwd/exports/charts/{hashid}/chart.png

处理多个数据框(使用SmartDatalake)

使用PandasAI与多个数据框的示例。为了使用多个数据框作为数据源,你需要使用SmartDatalake而不是SmartDataframe。你可以如下实例化一个SmartDatalake

import os

from pandasai import SmartDatalake

import pandas as pd



employees_data = {

    'EmployeeID': [1, 2, 3, 4, 5],

    'Name': ['John', 'Emma', 'Liam', 'Olivia', 'William'],

    'Department': ['HR', 'Sales', 'IT', 'Marketing', 'Finance']

}



salaries_data = {

    'EmployeeID': [1, 2, 3, 4, 5],

    'Salary': [5000, 6000, 4500, 7000, 5500]

}



employees_df = pd.DataFrame(employees_data)

salaries_df = pd.DataFrame(salaries_data)



# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"



lake = SmartDatalake([employees_df, salaries_df])

response = lake.chat("Who gets paid the most?")

print(response)

# Output: Olivia gets paid the most.

与代理合作

通过聊天代理,您可以进行动态对话,代理在整个讨论过程中保留上下文。这使得您可以进行更具互动性和意义的交流。

主要特点

  • 上下文保留: 代理记住对话历史,允许无缝、上下文感知的交互。

  • 澄清问题: 您可以使用clarification_questions方法来请求对话中任何方面的澄清。这有助于确保您完全理解所提供的信息。

  • 解释: explain 方法可用于获取关于代理如何得出特定解决方案或响应的详细解释。它提供了对代理决策过程的透明度和洞察。

请随时发起对话,寻求澄清,并探索解释,以增强您与聊天代理的互动!

import os

import pandas as pd

from pandasai import Agent



employees_data = {

    "EmployeeID": [1, 2, 3, 4, 5],

    "Name": ["John", "Emma", "Liam", "Olivia", "William"],

    "Department": ["HR", "Sales", "IT", "Marketing", "Finance"],

}



salaries_data = {

    "EmployeeID": [1, 2, 3, 4, 5],

    "Salary": [5000, 6000, 4500, 7000, 5500],

}



employees_df = pd.DataFrame(employees_data)

salaries_df = pd.DataFrame(salaries_data)





# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"



agent = Agent([employees_df, salaries_df], memory_size=10)



query = "Who gets paid the most?"



# Chat with the agent

response = agent.chat(query)

print(response)



# Get Clarification Questions

questions = agent.clarification_questions(query)



for question in questions:

    print(question)



# Explain how the chat response is generated

response = agent.explain()

print(response)

代理的描述

当你实例化一个代理时,你可以提供代理的描述。这个描述将用于在聊天中描述代理,并为LLM提供更多关于如何响应查询的上下文。

一些描述的例子可以是:

  • 你是一个数据分析代理。你的主要目标是帮助非技术用户分析数据
  • 充当数据分析师。每次我向你提问时,你都应该提供使用plotly可视化答案的代码
import os

from pandasai import Agent



# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"



agent = Agent(

    "data.csv",

    description="You are a data analysis agent. Your main goal is to help non-technical users to analyze data",

)

向代理添加技能

您可以为代理添加自定义功能,使其能够扩展其能力。这些自定义功能可以与代理的技能无缝集成,从而实现广泛的用户定义操作。

import os

import pandas as pd

from pandasai import Agent

from pandasai.skills import skill





employees_data = {

    "EmployeeID": [1, 2, 3, 4, 5],

    "Name": ["John", "Emma", "Liam", "Olivia", "William"],

    "Department": ["HR", "Sales", "IT", "Marketing", "Finance"],

}



salaries_data = {

    "EmployeeID": [1, 2, 3, 4, 5],

    "Salary": [5000, 6000, 4500, 7000, 5500],

}



employees_df = pd.DataFrame(employees_data)

salaries_df = pd.DataFrame(salaries_data)





@skill

def plot_salaries(merged_df: pd.DataFrame):

    """

    Displays the bar chart having name on x-axis and salaries on y-axis using streamlit

    """

    import matplotlib.pyplot as plt



    plt.bar(merged_df["Name"], merged_df["Salary"])

    plt.xlabel("Employee Name")

    plt.ylabel("Salary")

    plt.title("Employee Salaries")

    plt.xticks(rotation=45)

    plt.savefig("temp_chart.png")

    plt.close()



# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"



agent = Agent([employees_df, salaries_df], memory_size=10)

agent.add_skills(plot_salaries)



# Chat with the agent

response = agent.chat("Plot the employee salaries against names")

print(response)