Neo4j图数据库¶
Neo4j是一个高性能的图数据库,它使用图结构来存储和处理数据。图数据库的优势在于能够更好地处理复杂的关系数据,比如社交网络、推荐系统等。Neo4j使用Cypher查询语言来操作和查询图数据库中的数据。
%pip install llama-index-llms-openai
%pip install llama-index-graph-stores-neo4j
%pip install llama-index-embeddings-openai
%pip install llama-index-llms-azure-openai
# 为OpenAI
import os
os.environ["OPENAI_API_KEY"] = "API_KEY_HERE"
import logging
import sys
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
# 定义LLM
llm = OpenAI(temperature=0, model="gpt-3.5-turbo")
Settings.llm = llm
Settings.chunk_size = 512
# 用于Azure OpenAI
import os
import json
import openai
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import (
VectorStoreIndex,
SimpleDirectoryReader,
KnowledgeGraphIndex,
)
import logging
import sys
from IPython.display import Markdown, display
logging.basicConfig(
stream=sys.stdout, level=logging.INFO
) # logging.DEBUG 用于更详细的输出
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
openai.api_type = "azure"
openai.api_base = "https://<foo-bar>.openai.azure.com"
openai.api_version = "2022-12-01"
os.environ["OPENAI_API_KEY"] = "<your-openai-key>"
openai.api_key = os.getenv("OPENAI_API_KEY")
llm = AzureOpenAI(
deployment_name="<foo-bar-deployment>",
temperature=0,
openai_api_version=openai.api_version,
model_kwargs={
"api_key": openai.api_key,
"api_base": openai.api_base,
"api_type": openai.api_type,
"api_version": openai.api_version,
},
)
# 您需要部署自己的嵌入模型以及自己的聊天完成模型
embedding_llm = OpenAIEmbedding(
model="text-embedding-ada-002",
deployment_name="<foo-bar-deployment>",
api_key=openai.api_key,
api_base=openai.api_base,
api_type=openai.api_type,
api_version=openai.api_version,
)
Settings.llm = llm
Settings.embed_model = embedding_llm
Settings.chunk_size = 512
使用Neo4jGraphStore与知识图谱¶
构建知识图谱¶
from llama_index.core import KnowledgeGraphIndex, SimpleDirectoryReader
from llama_index.core import StorageContext
from llama_index.graph_stores.neo4j import Neo4jGraphStore
from llama_index.llms.openai import OpenAI
from IPython.display import Markdown, display
documents = SimpleDirectoryReader(
"../../../../examples/paul_graham_essay/data"
).load_data()
准备Neo4j¶
在使用Python连接Neo4j之前,需要安装neo4j
包。可以使用以下命令来安装neo4j
包:
pip install neo4j
安装完成后,就可以在Python中使用neo4j
包来连接和操作Neo4j数据库了。
%pip install neo4j
username = "neo4j"
password = "retractor-knot-thermocouples"
url = "bolt://44.211.44.239:7687"
database = "neo4j"
Requirement already satisfied: neo4j in /home/tomaz/anaconda3/envs/snakes/lib/python3.9/site-packages (5.11.0) Requirement already satisfied: pytz in /home/tomaz/anaconda3/envs/snakes/lib/python3.9/site-packages (from neo4j) (2023.3) Note: you may need to restart the kernel to use updated packages.
实例化Neo4jGraph知识图谱索引¶
graph_store = Neo4jGraphStore(
username=username,
password=password,
url=url,
database=database,
)
storage_context = StorageContext.from_defaults(graph_store=graph_store)
# 注意:可能需要一些时间!
index = KnowledgeGraphIndex.from_documents(
documents,
storage_context=storage_context,
max_triplets_per_chunk=2,
)
查询知识图谱¶
首先,我们可以查询并将三元组发送到LLM。
query_engine = index.as_query_engine(
include_text=False, response_mode="tree_summarize"
)
response = query_engine.query("Tell me more about Interleaf")
INFO:llama_index.indices.knowledge_graph.retriever:> Starting query: Tell me more about Interleaf INFO:llama_index.indices.knowledge_graph.retriever:> Query keywords: ['Interleaf'] ERROR:llama_index.indices.knowledge_graph.retriever:Index was not constructed with embeddings, skipping embedding usage... INFO:llama_index.indices.knowledge_graph.retriever:> Extracted relationships: The following are knowledge sequence in max depth 2 in the form of `subject [predicate, object, predicate_next_hop, object_next_hop ...]` Interleaf ['IS_ABOUT', 'what not to do'] Interleaf ['ADDED', 'scripting language'] Interleaf ['MADE', 'software for creating documents']
display(Markdown(f"<b>{response}</b>"))
Interleaf is a subject that is related to "what not to do" and "scripting language". It is also associated with the predicates "ADDED" and "MADE", with the objects being "scripting language" and "software for creating documents" respectively.
对于更详细的答案,我们还可以发送从中提取出检索到的三元组的文本。
query_engine = index.as_query_engine(
include_text=True, response_mode="tree_summarize"
)
response = query_engine.query(
"Tell me more about what the author worked on at Interleaf"
)
INFO:llama_index.indices.knowledge_graph.retriever:> Starting query: Tell me more about what the author worked on at Interleaf INFO:llama_index.indices.knowledge_graph.retriever:> Query keywords: ['Interleaf', 'worked', 'author'] ERROR:llama_index.indices.knowledge_graph.retriever:Index was not constructed with embeddings, skipping embedding usage... INFO:llama_index.indices.knowledge_graph.retriever:> Querying with idx: c3fd9444-6c20-4cdc-9598-8f0e9ed0b85d: each student had. But the Accademia wasn't teaching me anything except Italia... INFO:llama_index.indices.knowledge_graph.retriever:> Querying with idx: f4bfad23-0cde-4425-99f9-9229ca0a5cc5: learned some useful things at Interleaf, though they were mostly about what n... INFO:llama_index.indices.knowledge_graph.retriever:> Extracted relationships: The following are knowledge sequence in max depth 2 in the form of `subject [predicate, object, predicate_next_hop, object_next_hop ...]` Interleaf ['IS_ABOUT', 'what not to do'] Interleaf ['ADDED', 'scripting language'] Interleaf ['MADE', 'software for creating documents']
display(Markdown(f"<b>{response}</b>"))
At Interleaf, the author worked on software for creating documents. The company had added a scripting language, inspired by Emacs, and the author was hired as a Lisp hacker to write things in it. However, the author admits to being a bad employee and not fully understanding the software, as it was primarily written in C. Despite this, the author was paid well and managed to save enough money to go back to RISD and pay off their college loans. The author also learned some valuable lessons at Interleaf, particularly about what not to do in technology companies.
用嵌入进行查询¶
# 首先清理数据集
graph_store.query(
"""
MATCH (n) DETACH DELETE n
"""
)
# 注意:可能需要一些时间!
index = KnowledgeGraphIndex.from_documents(
documents,
storage_context=storage_context,
max_triplets_per_chunk=2,
include_embeddings=True,
)
query_engine = index.as_query_engine(
include_text=True,
response_mode="tree_summarize",
embedding_mode="hybrid",
similarity_top_k=5,
)
# 使用前3个三元组加关键词进行查询(重复的三元组将被移除)
response = query_engine.query(
"告诉我更多关于作者在Interleaf工作过的内容"
)
INFO:llama_index.indices.knowledge_graph.retriever:> Starting query: Tell me more about what the author worked on at Interleaf INFO:llama_index.indices.knowledge_graph.retriever:> Query keywords: ['Interleaf', 'worked', 'author'] INFO:llama_index.indices.knowledge_graph.retriever:> Querying with idx: e0067958-8b62-4186-b78c-a07281531e40: each student had. But the Accademia wasn't teaching me anything except Italia... INFO:llama_index.indices.knowledge_graph.retriever:> Querying with idx: 38459cd5-bc20-428d-a2db-9dc2e716bd15: learned some useful things at Interleaf, though they were mostly about what n... INFO:llama_index.indices.knowledge_graph.retriever:> Querying with idx: 6be24830-85d5-49d1-8caa-d297cd0e8b14: It had been so long since I'd painted anything that I'd half forgotten why I ... INFO:llama_index.indices.knowledge_graph.retriever:> Querying with idx: 2ec81827-d6d5-470d-8851-b97b8d8d80b4: Robert Morris showed it to me when I visited him in Cambridge, where he was n... INFO:llama_index.indices.knowledge_graph.retriever:> Querying with idx: 46b8b977-4176-4622-8d4d-ee3ab16132b4: in decent shape at painting and drawing from the RISD foundation that summer,... INFO:llama_index.indices.knowledge_graph.retriever:> Querying with idx: 71363c09-ec6b-47c8-86ac-e18be46f1cc2: as scare-quotes. At the time this bothered me, but now it seems amusingly acc... INFO:llama_index.indices.knowledge_graph.retriever:> Querying with idx: 2dded283-d876-4014-8352-056fccace896: of my old life. Idelle was in New York at least, and there were other people ... INFO:llama_index.indices.knowledge_graph.retriever:> Querying with idx: de937aec-ebee-4348-9f23-c94d0a5d7436: and I had a lot of time to think on those flights. On one of them I realized ... INFO:llama_index.indices.knowledge_graph.retriever:> Querying with idx: 33936f7a-0f89-48c7-af9a-171372b4b4b0: What I Worked On February 2021 Before college the two main things I worked ... INFO:llama_index.indices.knowledge_graph.retriever:> Extracted relationships: The following are knowledge sequence in max depth 2 in the form of `subject [predicate, object, predicate_next_hop, object_next_hop ...]` ('Interleaf', 'made', 'software for creating documents') Interleaf ['MADE', 'software for creating documents'] ('Interleaf', 'added', 'scripting language') ('Interleaf', 'is about', 'what not to do') Interleaf ['ADDED', 'scripting language'] Interleaf ['IS_ABOUT', 'what not to do'] ('I', 'worked on', 'programming') ('I', 'worked on', 'writing')
display(Markdown(f"<b>{response}</b>"))
At Interleaf, the author worked on writing scripts in a Lisp dialect for the company's software, which was used for creating documents.
[可选] 尝试构建图并手动添加三元组!¶
from llama_index.core.node_parser import SentenceSplitter
node_parser = SentenceSplitter()
nodes = node_parser.get_nodes_from_documents(documents)
# 现在初始化一个空的索引
index = KnowledgeGraphIndex.from_documents([], storage_context=storage_context)
# 手动添加关键字映射和节点
# 添加三元组(主语,关系,宾语)
# 对于节点0
node_0_tups = [
("作者", "参与", "写作"),
("作者", "参与", "编程"),
]
for tup in node_0_tups:
index.upsert_triplet_and_node(tup, nodes[0])
# 对于节点1
node_1_tups = [
("Interleaf", "为...制作软件", "创建文档"),
("Interleaf", "添加", "脚本语言"),
("软件", "生成", "网站"),
]
for tup in node_1_tups:
index.upsert_triplet_and_node(tup, nodes[1])
query_engine = index.as_query_engine(
include_text=False, response_mode="tree_summarize"
)
response = query_engine.query("Tell me more about Interleaf")
INFO:llama_index.indices.knowledge_graph.retriever:> Starting query: Tell me more about Interleaf INFO:llama_index.indices.knowledge_graph.retriever:> Query keywords: ['Solutions', 'Interleaf', 'Software', 'Information', 'Technology'] ERROR:llama_index.indices.knowledge_graph.retriever:Index was not constructed with embeddings, skipping embedding usage... INFO:llama_index.indices.knowledge_graph.retriever:> Extracted relationships: The following are knowledge sequence in max depth 2 in the form of `subject [predicate, object, predicate_next_hop, object_next_hop ...]` Interleaf ['MADE_SOFTWARE_FOR', 'creating documents'] Interleaf ['IS_ABOUT', 'what not to do'] Interleaf ['ADDED', 'scripting language'] Interleaf ['MADE', 'software for creating documents']
display(Markdown(f"<b>{response}</b>"))