Redis文档存储+索引存储演示¶
本指南向您展示如何直接使用我们由Redis支持的DocumentStore
抽象和IndexStore
抽象。通过将节点放入文档存储中,这使您能够在相同的基础文档存储上定义多个索引,而无需在索引之间复制数据。
索引本身也存储在Redis中,通过IndexStore
实现。
如果您在colab上打开这个笔记本,您可能需要安装LlamaIndex 🦙。
%pip install llama-index-storage-docstore-redis
%pip install llama-index-storage-index-store-redis
%pip install llama-index-llms-openai
!pip install llama-index
import nest_asyncio
nest_asyncio.apply()
import logging
import sys
import os
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
from llama_index.core import SimpleDirectoryReader, StorageContext
from llama_index.core import VectorStoreIndex, SimpleKeywordTableIndex
from llama_index.core import SummaryIndex
from llama_index.core import ComposableGraph
from llama_index.llms.openai import OpenAI
from llama_index.core.response.notebook_utils import display_response
from llama_index.core import Settings
INFO:numexpr.utils:Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. INFO:numexpr.utils:NumExpr defaulting to 8 threads. NumExpr defaulting to 8 threads.
/home/loganm/miniconda3/envs/llama-index/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
下载数据¶
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
加载文档¶
reader = SimpleDirectoryReader("./data/paul_graham/")
documents = reader.load_data()
解析到节点¶
from llama_index.core.node_parser import SentenceSplitter
nodes = SentenceSplitter().get_nodes_from_documents(documents)
将数据添加到文档存储库¶
REDIS_HOST = os.getenv("REDIS_HOST", "127.0.0.1")
REDIS_PORT = os.getenv("REDIS_PORT", 6379)
from llama_index.storage.docstore.redis import RedisDocumentStore
from llama_index.storage.index_store.redis import RedisIndexStore
/home/loganm/miniconda3/envs/llama-index/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
storage_context = StorageContext.from_defaults(
docstore=RedisDocumentStore.from_host_and_port(
host=REDIS_HOST, port=REDIS_PORT, namespace="llama_index"
),
index_store=RedisIndexStore.from_host_and_port(
host=REDIS_HOST, port=REDIS_PORT, namespace="llama_index"
),
)
storage_context.docstore.add_documents(nodes)
len(storage_context.docstore.docs)
20
定义多个索引¶
每个索引都使用相同的基础节点。
summary_index = SummaryIndex(nodes, storage_context=storage_context)
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens > [build_index_from_nodes] Total LLM token usage: 0 tokens INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 0 tokens > [build_index_from_nodes] Total embedding token usage: 0 tokens
vector_index = VectorStoreIndex(nodes, storage_context=storage_context)
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens > [build_index_from_nodes] Total LLM token usage: 0 tokens INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 17050 tokens > [build_index_from_nodes] Total embedding token usage: 17050 tokens
keyword_table_index = SimpleKeywordTableIndex(
nodes, storage_context=storage_context
)
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens > [build_index_from_nodes] Total LLM token usage: 0 tokens INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 0 tokens > [build_index_from_nodes] Total embedding token usage: 0 tokens
# 注意:文档存储仍然具有相同的节点
len(storage_context.docstore.docs)
20
测试保存和加载¶
# 注意:默认情况下,docstore和index_store被持久化在Redis中
# 注意:这里只需要将简单的向量存储持久化到磁盘上
storage_context.persist(persist_dir="./storage")
# 记录索引ID
list_id = summary_index.index_id # 摘要索引ID
vector_id = vector_index.index_id # 向量索引ID
keyword_id = keyword_table_index.index_id # 关键词表索引ID
from llama_index.core import load_index_from_storage
# 重新创建存储上下文
storage_context = StorageContext.from_defaults(
docstore=RedisDocumentStore.from_host_and_port(
host=REDIS_HOST, port=REDIS_PORT, namespace="llama_index"
),
index_store=RedisIndexStore.from_host_and_port(
host=REDIS_HOST, port=REDIS_PORT, namespace="llama_index"
),
)
# 加载索引
summary_index = load_index_from_storage(
storage_context=storage_context, index_id=list_id
)
vector_index = load_index_from_storage(
storage_context=storage_context, index_id=vector_id
)
keyword_table_index = load_index_from_storage(
storage_context=storage_context, index_id=keyword_id
)
INFO:llama_index.indices.loading:Loading indices with ids: ['24e98f9b-9586-4fc6-8341-8dce895e5bcc'] Loading indices with ids: ['24e98f9b-9586-4fc6-8341-8dce895e5bcc'] INFO:llama_index.indices.loading:Loading indices with ids: ['f7b2aeb3-4dad-4750-8177-78d5ae706284'] Loading indices with ids: ['f7b2aeb3-4dad-4750-8177-78d5ae706284'] INFO:llama_index.indices.loading:Loading indices with ids: ['9a9198b4-7cb9-4c96-97a7-5f404f43b9cd'] Loading indices with ids: ['9a9198b4-7cb9-4c96-97a7-5f404f43b9cd']
测试一些查询¶
chatgpt = OpenAI(temperature=0, model="gpt-3.5-turbo")
Settings.llm = chatgpt
Settings.chunk_size = 1024
query_engine = summary_index.as_query_engine()
list_response = query_engine.query("What is a summary of this document?")
INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 26111 tokens > [get_response] Total LLM token usage: 26111 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens > [get_response] Total embedding token usage: 0 tokens
display_response(list_response)
Final Response:
This document is a narrative of the author's journey from writing and programming as a young person to pursuing a career in art. It describes his experiences in high school, college, and graduate school, and how he eventually decided to pursue art as a career. He applied to art schools and eventually was accepted to RISD and the Accademia di Belli Arti in Florence. He passed the entrance exam for the Accademia and began studying art there. He then moved to New York and worked freelance while writing a book on Lisp. He eventually started a company to put art galleries online, but it was unsuccessful. He then pivoted to creating software to build online stores, which eventually became successful. He had the idea to run the software on the server and let users control it by clicking on links, which meant users wouldn't need anything more than a browser. This kind of software, known as "internet storefronts," was eventually successful. He and his team worked hard to make the software user-friendly and inexpensive, and eventually the company was bought by Yahoo. After the sale, he left to pursue his dream of painting, and eventually found success in New York. He was able to afford luxuries such as taxis and restaurants, and he experimented with a new kind of still life painting. He also had the idea to create a web app for making web apps, which he eventually pursued and was successful with. He then started Y Combinator, an investment firm that focused on helping startups, with his own money and the help of his friends Robert and Trevor. He wrote essays and books, invited undergrads to apply to the Summer Founders Program, and eventually married Jessica Livingston. After his mother's death, he decided to quit Y Combinator and pursue painting, but eventually ran out of steam and started writing essays and working on Lisp again. He wrote a new Lisp, called Bel, in itself in Arc, and it took him four years to complete. During this time, he worked hard to make the language user-friendly and precise, and he also took time to enjoy life with his family. He encountered various obstacles along the way, such as customs that constrained him even after the restrictions that caused them had disappeared, and he also had to deal with misinterpretations of his essays on forums. In the end, he was successful in creating Bel and was able to pursue his dream of painting.
query_engine = vector_index.as_query_engine()
vector_response = query_engine.query("What did the author do growing up?")
INFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 0 tokens > [retrieve] Total LLM token usage: 0 tokens INFO:llama_index.token_counter.token_counter:> [retrieve] Total embedding token usage: 8 tokens > [retrieve] Total embedding token usage: 8 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 0 tokens > [get_response] Total LLM token usage: 0 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens > [get_response] Total embedding token usage: 0 tokens
display_response(vector_response)
Final Response:
None
query_engine = keyword_table_index.as_query_engine()
keyword_response = query_engine.query(
"What did the author do after his time at YC?"
)
INFO:llama_index.indices.keyword_table.retrievers:> Starting query: What did the author do after his time at YC? > Starting query: What did the author do after his time at YC? INFO:llama_index.indices.keyword_table.retrievers:query keywords: ['action', 'yc', 'after', 'time', 'author'] query keywords: ['action', 'yc', 'after', 'time', 'author'] INFO:llama_index.indices.keyword_table.retrievers:> Extracted keywords: ['yc', 'time'] > Extracted keywords: ['yc', 'time'] INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 10216 tokens > [get_response] Total LLM token usage: 10216 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens > [get_response] Total embedding token usage: 0 tokens
display_response(keyword_response)
Final Response:
After his time at YC, the author decided to pursue painting and writing. He wanted to see how good he could get if he really focused on it, so he started painting the day after he stopped working on YC. He spent most of the rest of 2014 painting and was able to become better than he had been before. He also wrote essays and started working on Lisp again in March 2015. He then spent 4 years working on a new Lisp, called Bel, which he wrote in itself in Arc. He had to ban himself from writing essays during most of this time, and he moved to England in the summer of 2016. He also wrote a book about Lisp hacking, called On Lisp, which was published in 1993. In the fall of 2019, Bel was finally finished. He also experimented with a new kind of still life painting, and tried to build a web app for making web apps, which he named Aspra. He eventually decided to build a subset of this app as an open source project, which was the new Lisp dialect he called Arc.