知识图谱索引¶
本教程将基本概述如何使用我们的 KnowledgeGraphIndex
,该索引处理从非结构化文本中自动构建知识图谱以及基于实体的查询。
如果您希望以更灵活的方式查询知识图谱,包括预先存在的知识图谱,请查看我们的 KnowledgeGraphQueryEngine
和其他构造。
In [ ]:
Copied!
%pip install llama-index-llms-openai
%pip install llama-index-llms-openai
In [ ]:
Copied!
# 我的OpenAI密钥
import os
os.environ["OPENAI_API_KEY"] = "插入OpenAI密钥"
# 我的OpenAI密钥
import os
os.environ["OPENAI_API_KEY"] = "插入OpenAI密钥"
In [ ]:
Copied!
import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
使用知识图谱¶
构建知识图谱¶
In [ ]:
Copied!
from llama_index.core import SimpleDirectoryReader, KnowledgeGraphIndex
from llama_index.core.graph_stores import SimpleGraphStore
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
from IPython.display import Markdown, display
from llama_index.core import SimpleDirectoryReader, KnowledgeGraphIndex
from llama_index.core.graph_stores import SimpleGraphStore
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
from IPython.display import Markdown, display
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
In [ ]:
Copied!
documents = SimpleDirectoryReader(
"../../../../examples/paul_graham_essay/data"
).load_data()
documents = SimpleDirectoryReader(
"../../../../examples/paul_graham_essay/data"
).load_data()
In [ ]:
Copied!
# 定义LLM
# 注意:在演示时,text-davinci-002模型没有速率限制错误
llm = OpenAI(temperature=0, model="text-davinci-002")
Settings.llm = llm
Settings.chunk_size = 512
# 定义LLM
# 注意:在演示时,text-davinci-002模型没有速率限制错误
llm = OpenAI(temperature=0, model="text-davinci-002")
Settings.llm = llm
Settings.chunk_size = 512
In [ ]:
Copied!
from llama_index.core import StorageContext
graph_store = SimpleGraphStore()
storage_context = StorageContext.from_defaults(graph_store=graph_store)
# 注意:可能需要一些时间!
index = KnowledgeGraphIndex.from_documents(
documents,
max_triplets_per_chunk=2,
storage_context=storage_context,
)
from llama_index.core import StorageContext
graph_store = SimpleGraphStore()
storage_context = StorageContext.from_defaults(graph_store=graph_store)
# 注意:可能需要一些时间!
index = KnowledgeGraphIndex.from_documents(
documents,
max_triplets_per_chunk=2,
storage_context=storage_context,
)
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 0 tokens
[可选] 尝试构建图并手动添加三元组!¶
查询知识图谱¶
In [ ]:
Copied!
query_engine = index.as_query_engine(
include_text=False, response_mode="tree_summarize"
)
response = query_engine.query(
"Tell me more about Interleaf",
)
query_engine = index.as_query_engine(
include_text=False, response_mode="tree_summarize"
)
response = query_engine.query(
"Tell me more about Interleaf",
)
INFO:llama_index.indices.knowledge_graph.retrievers:> Starting query: Tell me more about Interleaf INFO:llama_index.indices.knowledge_graph.retrievers:> Query keywords: ['Interleaf', 'company', 'software', 'history'] ERROR:llama_index.indices.knowledge_graph.retrievers:Index was not constructed with embeddings, skipping embedding usage... INFO:llama_index.indices.knowledge_graph.retrievers:> Extracted relationships: The following are knowledge triplets in max depth 2 in the form of `subject [predicate, object, predicate_next_hop, object_next_hop ...]` INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 116 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 116 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
In [ ]:
Copied!
display(Markdown(f"<b>{response}</b>"))
display(Markdown(f"{response}"))
Interleaf was a software company that developed and published document preparation and desktop publishing software. It was founded in 1986 and was headquartered in Waltham, Massachusetts. The company was acquired by Quark, Inc. in 2000.
In [ ]:
Copied!
query_engine = index.as_query_engine(
include_text=True, response_mode="tree_summarize"
)
response = query_engine.query(
"Tell me more about what the author worked on at Interleaf",
)
query_engine = index.as_query_engine(
include_text=True, response_mode="tree_summarize"
)
response = query_engine.query(
"Tell me more about what the author worked on at Interleaf",
)
INFO:llama_index.indices.knowledge_graph.retrievers:> Starting query: Tell me more about what the author worked on at Interleaf INFO:llama_index.indices.knowledge_graph.retrievers:> Query keywords: ['author', 'Interleaf', 'work'] ERROR:llama_index.indices.knowledge_graph.retrievers:Index was not constructed with embeddings, skipping embedding usage... INFO:llama_index.indices.knowledge_graph.retrievers:> Extracted relationships: The following are knowledge triplets in max depth 2 in the form of `subject [predicate, object, predicate_next_hop, object_next_hop ...]` INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 104 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 104 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
In [ ]:
Copied!
display(Markdown(f"<b>{response}</b>"))
display(Markdown(f"{response}"))
The author worked on a number of projects at Interleaf, including the development of the company's flagship product, the Interleaf Publisher.
查询嵌入向量¶
In [ ]:
Copied!
# 注意:可能需要一些时间!
new_index = KnowledgeGraphIndex.from_documents(
documents,
max_triplets_per_chunk=2,
include_embeddings=True,
)
# 注意:可能需要一些时间!
new_index = KnowledgeGraphIndex.from_documents(
documents,
max_triplets_per_chunk=2,
include_embeddings=True,
)
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 0 tokens
In [ ]:
Copied!
# 使用前3个三元组加关键词进行查询(重复的三元组将被移除)
query_engine = index.as_query_engine(
include_text=True,
response_mode="tree_summarize",
embedding_mode="hybrid",
similarity_top_k=5,
)
response = query_engine.query(
"告诉我更多关于作者在Interleaf工作的内容",
)
# 使用前3个三元组加关键词进行查询(重复的三元组将被移除)
query_engine = index.as_query_engine(
include_text=True,
response_mode="tree_summarize",
embedding_mode="hybrid",
similarity_top_k=5,
)
response = query_engine.query(
"告诉我更多关于作者在Interleaf工作的内容",
)
INFO:llama_index.indices.knowledge_graph.retrievers:> Starting query: Tell me more about what the author worked on at Interleaf INFO:llama_index.indices.knowledge_graph.retrievers:> Query keywords: ['author', 'Interleaf', 'work'] ERROR:llama_index.indices.knowledge_graph.retrievers:Index was not constructed with embeddings, skipping embedding usage... INFO:llama_index.indices.knowledge_graph.retrievers:> Extracted relationships: The following are knowledge triplets in max depth 2 in the form of `subject [predicate, object, predicate_next_hop, object_next_hop ...]` INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 104 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 104 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
In [ ]:
Copied!
display(Markdown(f"<b>{response}</b>"))
display(Markdown(f"{response}"))
The author worked on a number of projects at Interleaf, including the development of the company's flagship product, the Interleaf Publisher.
可视化图表¶
In [ ]:
Copied!
## 创建图形
from pyvis.network import Network
g = index.get_networkx_graph()
net = Network(notebook=True, cdn_resources="in_line", directed=True)
net.from_nx(g)
net.show("example.html")
## 创建图形
from pyvis.network import Network
g = index.get_networkx_graph()
net = Network(notebook=True, cdn_resources="in_line", directed=True)
net.from_nx(g)
net.show("example.html")
example.html
Out[ ]:
[可选] 尝试构建图并手动添加三元组!¶
In [ ]:
Copied!
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.node_parser import SentenceSplitter
In [ ]:
Copied!
node_parser = SentenceSplitter()
node_parser = SentenceSplitter()
In [ ]:
Copied!
nodes = node_parser.get_nodes_from_documents(documents)
nodes = node_parser.get_nodes_from_documents(documents)
In [ ]:
Copied!
# 现在初始化一个空的索引
index = KnowledgeGraphIndex(
[],
)
# 现在初始化一个空的索引
index = KnowledgeGraphIndex(
[],
)
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 0 tokens
In [ ]:
Copied!
# 手动添加关键字映射和节点
# 添加三元组(主语,关系,宾语)
# 对于节点0
node_0_tups = [
("author", "worked on", "writing"),
("author", "worked on", "programming"),
]
for tup in node_0_tups:
index.upsert_triplet_and_node(tup, nodes[0])
# 对于节点1
node_1_tups = [
("Interleaf", "made software for", "creating documents"),
("Interleaf", "added", "scripting language"),
("software", "generate", "web sites"),
]
for tup in node_1_tups:
index.upsert_triplet_and_node(tup, nodes[1])
# 手动添加关键字映射和节点
# 添加三元组(主语,关系,宾语)
# 对于节点0
node_0_tups = [
("author", "worked on", "writing"),
("author", "worked on", "programming"),
]
for tup in node_0_tups:
index.upsert_triplet_and_node(tup, nodes[0])
# 对于节点1
node_1_tups = [
("Interleaf", "made software for", "creating documents"),
("Interleaf", "added", "scripting language"),
("software", "generate", "web sites"),
]
for tup in node_1_tups:
index.upsert_triplet_and_node(tup, nodes[1])
In [ ]:
Copied!
query_engine = index.as_query_engine(
include_text=False, response_mode="tree_summarize"
)
response = query_engine.query(
"Tell me more about Interleaf",
)
query_engine = index.as_query_engine(
include_text=False, response_mode="tree_summarize"
)
response = query_engine.query(
"Tell me more about Interleaf",
)
INFO:llama_index.indices.knowledge_graph.retrievers:> Starting query: Tell me more about Interleaf INFO:llama_index.indices.knowledge_graph.retrievers:> Query keywords: ['Interleaf', 'company', 'software', 'history'] ERROR:llama_index.indices.knowledge_graph.retrievers:Index was not constructed with embeddings, skipping embedding usage... INFO:llama_index.indices.knowledge_graph.retrievers:> Extracted relationships: The following are knowledge triplets in max depth 2 in the form of `subject [predicate, object, predicate_next_hop, object_next_hop ...]` INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 116 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 116 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
In [ ]:
Copied!
str(response)
str(response)
Out[ ]:
'\nInterleaf was a software company that developed and published document preparation and desktop publishing software. It was founded in 1986 and was headquartered in Waltham, Massachusetts. The company was acquired by Quark, Inc. in 2000.'