Kuzu
Kùzu 是一个为查询速度和可扩展性而构建的可嵌入属性图数据库管理系统。
Kùzu 拥有一个宽松的(MIT)开源许可证,并实现了 Cypher,这是一种声明式图查询语言,允许在属性图中进行表达性强且高效的数据查询。 它使用列式存储,其查询处理器包含新颖的连接算法,使其能够扩展到非常大的图而不牺牲查询性能。
本笔记本展示了如何使用LLMs为Kùzu数据库提供自然语言接口,使用Cypher。
设置
Kùzu 是一个嵌入式数据库(它在进程中运行),因此无需管理服务器。 只需通过其 Python 包安装即可:
pip install kuzu
在本地机器上创建一个数据库并连接到它:
import kuzu
db = kuzu.Database("test_db")
conn = kuzu.Connection(db)
首先,我们为简单的电影数据库创建模式:
conn.execute("CREATE NODE TABLE Movie (name STRING, PRIMARY KEY(name))")
conn.execute(
"CREATE NODE TABLE Person (name STRING, birthDate STRING, PRIMARY KEY(name))"
)
conn.execute("CREATE REL TABLE ActedIn (FROM Person TO Movie)")
<kuzu.query_result.QueryResult at 0x103a72290>
然后我们可以插入一些数据。
conn.execute("CREATE (:Person {name: 'Al Pacino', birthDate: '1940-04-25'})")
conn.execute("CREATE (:Person {name: 'Robert De Niro', birthDate: '1943-08-17'})")
conn.execute("CREATE (:Movie {name: 'The Godfather'})")
conn.execute("CREATE (:Movie {name: 'The Godfather: Part II'})")
conn.execute(
"CREATE (:Movie {name: 'The Godfather Coda: The Death of Michael Corleone'})"
)
conn.execute(
"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather' CREATE (p)-[:ActedIn]->(m)"
)
conn.execute(
"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather: Part II' CREATE (p)-[:ActedIn]->(m)"
)
conn.execute(
"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather Coda: The Death of Michael Corleone' CREATE (p)-[:ActedIn]->(m)"
)
conn.execute(
"MATCH (p:Person), (m:Movie) WHERE p.name = 'Robert De Niro' AND m.name = 'The Godfather: Part II' CREATE (p)-[:ActedIn]->(m)"
)
<kuzu.query_result.QueryResult at 0x103a9e750>
创建 KuzuQAChain
我们现在可以创建KuzuGraph
和KuzuQAChain
。要创建KuzuGraph
,我们只需要将数据库对象传递给KuzuGraph
构造函数。
from langchain.chains import KuzuQAChain
from langchain_community.graphs import KuzuGraph
from langchain_openai import ChatOpenAI
graph = KuzuGraph(db)
chain = KuzuQAChain.from_llm(
llm=ChatOpenAI(temperature=0, model="gpt-3.5-turbo-16k"),
graph=graph,
verbose=True,
)
刷新图模式信息
如果数据库的模式发生变化,您可以刷新生成Cypher语句所需的模式信息。 您还可以如下所示显示Kùzu图的模式。
# graph.refresh_schema()
print(graph.get_schema)
Node properties: [{'properties': [('name', 'STRING')], 'label': 'Movie'}, {'properties': [('name', 'STRING'), ('birthDate', 'STRING')], 'label': 'Person'}]
Relationships properties: [{'properties': [], 'label': 'ActedIn'}]
Relationships: ['(:Person)-[:ActedIn]->(:Movie)']
查询图
我们现在可以使用KuzuQAChain
来对图提出问题。
chain.invoke("Who acted in The Godfather: Part II?")
[1m> Entering new KuzuQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person)-[:ActedIn]->(m:Movie)
WHERE m.name = 'The Godfather: Part II'
RETURN p.name[0m
Full Context:
[32;1m[1;3m[{'p.name': 'Al Pacino'}, {'p.name': 'Robert De Niro'}][0m
[1m> Finished chain.[0m
{'query': 'Who acted in The Godfather: Part II?',
'result': 'Al Pacino, Robert De Niro acted in The Godfather: Part II.'}
chain.invoke("Robert De Niro played in which movies?")
[1m> Entering new KuzuQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person)-[:ActedIn]->(m:Movie)
WHERE p.name = 'Robert De Niro'
RETURN m.name[0m
Full Context:
[32;1m[1;3m[{'m.name': 'The Godfather: Part II'}][0m
[1m> Finished chain.[0m
{'query': 'Robert De Niro played in which movies?',
'result': 'Robert De Niro played in The Godfather: Part II.'}
chain.invoke("How many actors played in the Godfather: Part II?")
[1m> Entering new KuzuQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (:Person)-[:ActedIn]->(:Movie {name: 'Godfather: Part II'})
RETURN count(*)[0m
Full Context:
[32;1m[1;3m[{'COUNT_STAR()': 0}][0m
[1m> Finished chain.[0m
{'query': 'How many actors played in the Godfather: Part II?',
'result': "I don't know the answer."}
chain.invoke("Who is the oldest actor who played in The Godfather: Part II?")
[1m> Entering new KuzuQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person)-[:ActedIn]->(m:Movie {name: 'The Godfather: Part II'})
RETURN p.name
ORDER BY p.birthDate ASC
LIMIT 1[0m
Full Context:
[32;1m[1;3m[{'p.name': 'Al Pacino'}][0m
[1m> Finished chain.[0m
{'query': 'Who is the oldest actor who played in The Godfather: Part II?',
'result': 'Al Pacino is the oldest actor who played in The Godfather: Part II.'}
使用单独的LLMs进行Cypher和答案生成
您可以分别指定cypher_llm
和qa_llm
,以便为Cypher生成和答案生成使用不同的LLMs。
chain = KuzuQAChain.from_llm(
cypher_llm=ChatOpenAI(temperature=0, model="gpt-3.5-turbo-16k"),
qa_llm=ChatOpenAI(temperature=0, model="gpt-4"),
graph=graph,
verbose=True,
)
/Users/prrao/code/langchain/.venv/lib/python3.11/site-packages/langchain_core/_api/deprecation.py:119: LangChainDeprecationWarning: The class `LLMChain` was deprecated in LangChain 0.1.17 and will be removed in 0.3.0. Use RunnableSequence, e.g., `prompt | llm` instead.
warn_deprecated(
chain.invoke("How many actors played in The Godfather: Part II?")
[1m> Entering new KuzuQAChain chain...[0m
``````output
/Users/prrao/code/langchain/.venv/lib/python3.11/site-packages/langchain_core/_api/deprecation.py:119: LangChainDeprecationWarning: The method `Chain.run` was deprecated in langchain 0.1.0 and will be removed in 0.2.0. Use invoke instead.
warn_deprecated(
``````output
Generated Cypher:
[32;1m[1;3mMATCH (:Person)-[:ActedIn]->(:Movie {name: 'The Godfather: Part II'})
RETURN count(*)[0m
Full Context:
[32;1m[1;3m[{'COUNT_STAR()': 2}][0m
``````output
/Users/prrao/code/langchain/.venv/lib/python3.11/site-packages/langchain_core/_api/deprecation.py:119: LangChainDeprecationWarning: The method `Chain.__call__` was deprecated in langchain 0.1.0 and will be removed in 0.2.0. Use invoke instead.
warn_deprecated(
``````output
[1m> Finished chain.[0m
{'query': 'How many actors played in The Godfather: Part II?',
'result': 'Two actors played in The Godfather: Part II.'}