Kuzu
Kùzu 是一款可嵌入的属性图数据库管理系统,旨在提高查询速度和可扩展性。
Kùzu采用宽松的(MIT)开源许可证,并实现了Cypher,这是一种声明式图查询语言,可在属性图中实现富有表现力和高效的数据查询。
它采用列存储,并且其查询处理器包含新颖的连接算法,使其能够扩展到非常大的图,而不会牺牲查询性能。
这个笔记本展示了如何使用LLMs为Kùzu数据库提供自然语言接口,并使用Cypher进行查询。
设置
Kùzu是一个嵌入式数据库(在进程中运行),因此无需管理服务器。
只需通过其Python包进行安装:
pip install kuzu
在本地机器上创建一个数据库并连接到它:
import kuzu
db = kuzu.Database("test_db")
conn = kuzu.Connection(db)
首先,我们为一个简单的电影数据库创建架构:
conn.execute("CREATE NODE TABLE Movie (name STRING, PRIMARY KEY(name))")
conn.execute("CREATE NODE TABLE Person (name STRING, birthDate STRING, PRIMARY KEY(name))")
conn.execute("CREATE REL TABLE ActedIn (FROM Person TO Movie)")
然后我们可以插入一些数据:
conn.execute("CREATE (:Person {name: 'Al Pacino', birthDate: '1940-04-25'})")
conn.execute("CREATE (:Person {name: 'Robert De Niro', birthDate: '1943-08-17'})")
conn.execute("CREATE (:Movie {name: 'The Godfather'})")
conn.execute("CREATE (:Movie {name: 'The Godfather: Part II'})")
conn.execute("CREATE (:Movie {name: 'The Godfather Coda: The Death of Michael Corleone'})")
conn.execute("MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather' CREATE (p)-[:ActedIn]->(m)")
conn.execute("MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather: Part II' CREATE (p)-[:ActedIn]->(m)")
conn.execute("MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather Coda: The Death of Michael Corleone' CREATE (p)-[:ActedIn]->(m)")
conn.execute("MATCH (p:Person), (m:Movie) WHERE p.name = 'Robert De Niro' AND m.name = 'The Godfather: Part II' CREATE (p)-[:ActedIn]->(m)")
创建 KuzuQAChain
现在我们可以创建 KuzuGraph
和 KuzuQAChain
。要创建 KuzuGraph
,我们只需要将数据库对象传递给 KuzuGraph
构造函数:
from langchain.chains import KuzuQAChain
from langchain_community.graphs import KuzuGraph
from langchain_openai import ChatOpenAI
graph = KuzuGraph(db)
chain = KuzuQAChain.from_llm(
llm=ChatOpenAI(temperature=0, model="gpt-3.5-turbo-16k"),
graph=graph,
verbose=True,
)
刷新图架构信息
如果数据库的架构发生变化,可以刷新生成Cypher语句所需的架构信息。
您还可以显示Kùzu图的架构,如下所示:
# graph.refresh_schema()
print(graph.get_schema)
Node properties: [{'properties': [('name', 'STRING')], 'label': 'Movie'}, {'properties': [('name', 'STRING'), ('birthDate', 'STRING')], 'label': 'Person'}]
Relationships properties: [{'properties': [], 'label': 'ActedIn'}]
Relationships: ['(:Person)-[:ActedIn]->(:Movie)']
查询图
现在我们可以使用 KuzuQAChain
来向图询问问题:
chain.invoke("Who acted in The Godfather: Part II?")
{'query': 'Who acted in The Godfather: Part II?',
'result': 'Al Pacino, Robert De Niro acted in The Godfather: Part II.'}
chain.invoke("Robert De Niro played in which movies?")
{'query': 'Robert De Niro played in which movies?',
'result': 'Robert De Niro played in The Godfather: Part II.'}
chain.invoke("How many actors played in the Godfather: Part II?")
{'query': 'How many actors played in the Godfather: Part II?',
'result': '0'}
使用单独的LLM进行Cypher和答案生成
您可以分别指定cypher_llm
和qa_llm
来使用不同的LLM进行Cypher生成和答案生成。
chain = KuzuQAChain.from_llm(
cypher_llm=ChatOpenAI(temperature=0, model="gpt-3.5-turbo-16k"),
qa_llm=ChatOpenAI(temperature=0, model="gpt-4"),
graph=graph,
verbose=True,
)
/Users/prrao/code/langchain/.venv/lib/python3.11/site-packages/langchain_core/_api/deprecation.py:119: LangChainDeprecationWarning: The class `LLMChain` was deprecated in LangChain 0.1.17 and will be removed in 0.3.0. Use RunnableSequence, e.g., `prompt | llm` instead.
warn_deprecated(
chain.invoke("How many actors played in The Godfather: Part II?")
> 进入新的KuzuQAChain链...
/Users/prrao/code/langchain/.venv/lib/python3.11/site-packages/langchain_core/_api/deprecation.py:119: LangChainDeprecationWarning: The method `Chain.run` was deprecated in langchain 0.1.0 and will be removed in 0.2.0. Use invoke instead.
warn_deprecated(
生成的Cypher:
MATCH (:Person)-[:ActedIn]->(:Movie {name: 'The Godfather: Part II'})
RETURN count(*)
完整上下文:
[{'COUNT_STAR()': 2}]
/Users/prrao/code/langchain/.venv/lib/python3.11/site-packages/langchain_core/_api/deprecation.py:119: LangChainDeprecationWarning: The method `Chain.__call__` was deprecated in langchain 0.1.0 and will be removed in 0.2.0. Use invoke instead.
warn_deprecated(
> 完成链。
{'query': 'The Godfather: Part II有多少位演员参演?',
'result': 'The Godfather: Part II有两位演员参演。'}