Elasticsearch¶
Elasticsearch 是一个搜索数据库,支持全文和向量搜索。
基本示例¶
在这个基本示例中,我们将使用一个Paul Graham的文章,将其分成块,使用开源嵌入模型进行嵌入,加载到Elasticsearch中,然后进行查询。如果想要查看使用不同检索策略的示例,请参见Elasticsearch Vector Store。
如果您在colab上打开这个笔记本,可能需要安装LlamaIndex 🦙。
In [ ]:
Copied!
%pip install -qU llama-index-vector-stores-elasticsearch llama-index-embeddings-huggingface llama-index
%pip install -qU llama-index-vector-stores-elasticsearch llama-index-embeddings-huggingface llama-index
In [ ]:
Copied!
# 导入from llama_index.core import VectorStoreIndex, SimpleDirectoryReaderfrom llama_index.vector_stores.elasticsearch import ElasticsearchStorefrom llama_index.core import StorageContext
# 导入from llama_index.core import VectorStoreIndex, SimpleDirectoryReaderfrom llama_index.vector_stores.elasticsearch import ElasticsearchStorefrom llama_index.core import StorageContext
In [ ]:
Copied!
# 设置OpenAIimport osimport getpassos.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
# 设置OpenAIimport osimport getpassos.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
下载数据
In [ ]:
Copied!
!mkdir -p 'data/paul_graham/'
!wget -nv 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/'
!wget -nv 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
2024-05-13 15:10:43 URL:https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt [75042/75042] -> "data/paul_graham/paul_graham_essay.txt" [1]
In [ ]:
Copied!
from llama_index.embeddings.huggingface import HuggingFaceEmbeddingfrom llama_index.core import Settings# 定义嵌入函数Settings.embed_model = HuggingFaceEmbedding( model_name="BAAI/bge-small-en-v1.5")
from llama_index.embeddings.huggingface import HuggingFaceEmbeddingfrom llama_index.core import Settings# 定义嵌入函数Settings.embed_model = HuggingFaceEmbedding( model_name="BAAI/bge-small-en-v1.5")
In [ ]:
Copied!
# 加载文档documents = SimpleDirectoryReader("./data/paul_graham/").load_data()# 定义索引vector_store = ElasticsearchStore( es_url="http://localhost:9200", # 查看Elasticsearch Vector Store以获取更多认证选项 index_name="paul_graham_essay",)storage_context = StorageContext.from_defaults(vector_store=vector_store)index = VectorStoreIndex.from_documents( documents, storage_context=storage_context)
# 加载文档documents = SimpleDirectoryReader("./data/paul_graham/").load_data()# 定义索引vector_store = ElasticsearchStore( es_url="http://localhost:9200", # 查看Elasticsearch Vector Store以获取更多认证选项 index_name="paul_graham_essay",)storage_context = StorageContext.from_defaults(vector_store=vector_store)index = VectorStoreIndex.from_documents( documents, storage_context=storage_context)
In [ ]:
Copied!
# 查询数据query_engine = index.as_query_engine()response = query_engine.query("作者在成长过程中做了什么?")print(response)
# 查询数据query_engine = index.as_query_engine()response = query_engine.query("作者在成长过程中做了什么?")print(response)
The author worked on writing and programming outside of school. They wrote short stories and tried writing programs on an IBM 1401 computer. They also built a microcomputer kit and started programming on it, writing simple games and a word processor.