使用预定义模式构建属性图¶
在这个笔记本中,我们将演示如何使用Neo4j、Ollama和Huggingface来构建一个属性图。
具体来说,我们将使用SchemaLLMPathExtractor
,它允许我们指定一个精确的模式,其中包含可能的实体类型、关系类型,并定义它们如何连接在一起。
当你想要构建一个特定的图形,并且想要限制LLM正在预测的内容时,这将非常有用。
In [ ]:
Copied!
%pip install llama-index
%pip install llama-index-llms-ollama
%pip install llama-index-embeddings-huggingface
%pip install llama-index-graph-stores-neo4j
%pip install llama-index
%pip install llama-index-llms-ollama
%pip install llama-index-embeddings-huggingface
%pip install llama-index-graph-stores-neo4j
加载数据¶
首先,让我们下载一些示例数据来进行操作。
In [ ]:
Copied!
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
In [ ]:
Copied!
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
/Users/loganmarkewich/Library/Caches/pypoetry/virtualenvs/llama-index-bXUwlEfH-py3.11/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
In [ ]:
Copied!
import nest_asyncio
nest_asyncio.apply()
import nest_asyncio
nest_asyncio.apply()
In [ ]:
Copied!
from typing import Literalfrom llama_index.llms.ollama import Ollamafrom llama_index.core.indices.property_graph import SchemaLLMPathExtractor# 最佳实践是使用大写entities = Literal["PERSON", "PLACE", "ORGANIZATION"]relations = Literal["HAS", "PART_OF", "WORKED_ON", "WORKED_WITH", "WORKED_AT"]# 定义实体可以具有哪些关系validation_schema = { "PERSON": ["HAS", "PART_OF", "WORKED_ON", "WORKED_WITH", "WORKED_AT"], "PLACE": ["HAS", "PART_OF", "WORKED_AT"], "ORGANIZATION": ["HAS", "PART_OF", "WORKED_WITH"],}kg_extractor = SchemaLLMPathExtractor( llm=Ollama(model="llama3", json_mode=True, request_timeout=3600), possible_entities=entities, possible_relations=relations, kg_validation_schema=validation_schema, # 如果为false,则允许超出模式的值 # 用于将模式用作建议时很有用 strict=True,)
from typing import Literalfrom llama_index.llms.ollama import Ollamafrom llama_index.core.indices.property_graph import SchemaLLMPathExtractor# 最佳实践是使用大写entities = Literal["PERSON", "PLACE", "ORGANIZATION"]relations = Literal["HAS", "PART_OF", "WORKED_ON", "WORKED_WITH", "WORKED_AT"]# 定义实体可以具有哪些关系validation_schema = { "PERSON": ["HAS", "PART_OF", "WORKED_ON", "WORKED_WITH", "WORKED_AT"], "PLACE": ["HAS", "PART_OF", "WORKED_AT"], "ORGANIZATION": ["HAS", "PART_OF", "WORKED_WITH"],}kg_extractor = SchemaLLMPathExtractor( llm=Ollama(model="llama3", json_mode=True, request_timeout=3600), possible_entities=entities, possible_relations=relations, kg_validation_schema=validation_schema, # 如果为false,则允许超出模式的值 # 用于将模式用作建议时很有用 strict=True,)
要在本地启动Neo4j,首先确保已安装了Docker。然后,您可以使用以下Docker命令启动数据库:
docker run \
-p 7474:7474 -p 7687:7687 \
-v $PWD/data:/data -v $PWD/plugins:/plugins \
--name neo4j-apoc \
-e NEO4J_apoc_export_file_enabled=true \
-e NEO4J_apoc_import_file_enabled=true \
-e NEO4J_apoc_import_file_use__neo4j__config=true \
-e NEO4JLABS_PLUGINS=\[\"apoc\"\] \
neo4j:latest
从这里,您可以在 http://localhost:7474/ 打开数据库。在该页面上,您将被要求登录。使用默认的用户名/密码 neo4j
和 neo4j
。
第一次登录后,您将被要求更改密码。
之后,您就可以准备创建您的第一个属性图了!
In [ ]:
Copied!
from llama_index.graph_stores.neo4j import Neo4jPGStore
graph_store = Neo4jPGStore(
username="neo4j",
password="<password>",
url="bolt://localhost:7687",
)
from llama_index.graph_stores.neo4j import Neo4jPGStore
graph_store = Neo4jPGStore(
username="neo4j",
password="",
url="bolt://localhost:7687",
)
注意: 使用本地模型进行提取比使用API模型要慢。本地模型(如Ollama)通常只能进行顺序处理。在M2 Max 上可能需要大约10分钟。
In [ ]:
Copied!
from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
index = PropertyGraphIndex.from_documents(
documents,
kg_extractors=[kg_extractor],
embed_model=HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
property_graph_store=graph_store,
)
from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
index = PropertyGraphIndex.from_documents(
documents,
kg_extractors=[kg_extractor],
embed_model=HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
property_graph_store=graph_store,
)
如果我们检查创建的图表,我们会发现它只包括我们定义的关系和实体类型!
有关所有kg_extractors
的信息,请参阅文档。
In [ ]:
Copied!
from llama_index.core.indices.property_graph import (
LLMSynonymRetriever,
VectorContextRetriever,
)
llm_synonym = LLMSynonymRetriever(
index.property_graph_store,
llm=Ollama(model="llama3", request_timeout=3600),
include_text=False,
)
vector_context = VectorContextRetriever(
index.property_graph_store,
embed_model=HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
include_text=False,
)
from llama_index.core.indices.property_graph import (
LLMSynonymRetriever,
VectorContextRetriever,
)
llm_synonym = LLMSynonymRetriever(
index.property_graph_store,
llm=Ollama(model="llama3", request_timeout=3600),
include_text=False,
)
vector_context = VectorContextRetriever(
index.property_graph_store,
embed_model=HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
include_text=False,
)
In [ ]:
Copied!
retriever = index.as_retriever(
sub_retrievers=[
llm_synonym,
vector_context,
]
)
retriever = index.as_retriever(
sub_retrievers=[
llm_synonym,
vector_context,
]
)
In [ ]:
Copied!
nodes = retriever.retrieve("What happened at Interleaf?")
for node in nodes:
print(node.text)
nodes = retriever.retrieve("What happened at Interleaf?")
for node in nodes:
print(node.text)
Paul Graham -> WORKED_AT -> Interleaf Paul Graham -> WORKED_AT -> Yahoo Paul Graham -> WORKED_AT -> Cambridge Tom Cheatham -> WORKED_AT -> Cambridge Kevin Hale -> WORKED_AT -> Viaweb Paul Graham -> WORKED_AT -> Viaweb Paul Graham -> WORKED_ON -> Viaweb Paul Graham -> PART_OF -> Viaweb
我们也可以使用类似的语法创建一个查询引擎。
In [ ]:
Copied!
query_engine = index.as_query_engine(
sub_retrievers=[
llm_synonym,
vector_context,
],
llm=Ollama(model="llama3", request_timeout=3600),
)
response = query_engine.query("What happened at Interleaf?")
print(str(response))
query_engine = index.as_query_engine(
sub_retrievers=[
llm_synonym,
vector_context,
],
llm=Ollama(model="llama3", request_timeout=3600),
)
response = query_engine.query("What happened at Interleaf?")
print(str(response))
Paul Graham worked at Interleaf.
有关所有检索器的更多信息,请参阅完整指南。