如果您在colab上打开这个笔记本,您可能需要安装LlamaIndex 🦙。
In [ ]:
Copied!
%pip install llama-index-vector-stores-lancedb
%pip install llama-index-vector-stores-lancedb
In [ ]:
Copied!
!pip install llama-index
!pip install llama-index
In [ ]:
Copied!
import loggingimport sys# 取消注释以查看调试日志# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))from llama_index.core import SimpleDirectoryReader, Document, StorageContextfrom llama_index.core import VectorStoreIndexfrom llama_index.vector_stores.lancedb import LanceDBVectorStoreimport textwrap
import loggingimport sys# 取消注释以查看调试日志# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))from llama_index.core import SimpleDirectoryReader, Document, StorageContextfrom llama_index.core import VectorStoreIndexfrom llama_index.vector_stores.lancedb import LanceDBVectorStoreimport textwrap
设置OpenAI¶
第一步是配置OpenAI密钥。它将用于为加载到索引中的文档创建嵌入。
In [ ]:
Copied!
import openai
openai.api_key = ""
import openai
openai.api_key = ""
下载数据
In [ ]:
Copied!
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
加载文档¶
使用SimpleDirectoryReader加载存储在data/paul_graham/
中的文档。
In [ ]:
Copied!
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
print("Document ID:", documents[0].doc_id, "Document Hash:", documents[0].hash)
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
print("Document ID:", documents[0].doc_id, "Document Hash:", documents[0].hash)
Document ID: 855fe1d1-1c1a-4fbe-82ba-6bea663a5920 Document Hash: 4c702b4df575421e1d1af4b1fd50511b226e0c9863dbfffeccb8b689b8448f35
In [ ]:
Copied!
vector_store = LanceDBVectorStore(uri="/tmp/lancedb")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context
)
vector_store = LanceDBVectorStore(uri="/tmp/lancedb")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context
)
查询索引¶
现在我们可以使用我们的索引来提问。
In [ ]:
Copied!
query_engine = index.as_query_engine()
response = query_engine.query("How much did Viaweb charge per month?")
query_engine = index.as_query_engine()
response = query_engine.query("How much did Viaweb charge per month?")
In [ ]:
Copied!
print(textwrap.fill(str(response), 100))
print(textwrap.fill(str(response), 100))
Viaweb charged $100 per month for a small store and $300 per month for a big one.
In [ ]:
Copied!
response = query_engine.query("What did the author do growing up?")
response = query_engine.query("What did the author do growing up?")
In [ ]:
Copied!
print(textwrap.fill(str(response), 100))
print(textwrap.fill(str(response), 100))
The author worked on writing and programming outside of school before college. They wrote short stories and tried writing programs on the IBM 1401 computer. They also mentioned getting a microcomputer, a TRS-80, and started programming on it.
追加数据¶
您也可以将数据添加到现有的索引中。
In [ ]:
Copied!
del index
index = VectorStoreIndex.from_documents(
[Document(text="The sky is purple in Portland, Maine")],
uri="/tmp/new_dataset",
)
del index
index = VectorStoreIndex.from_documents(
[Document(text="The sky is purple in Portland, Maine")],
uri="/tmp/new_dataset",
)
In [ ]:
Copied!
query_engine = index.as_query_engine()
response = query_engine.query("Where is the sky purple?")
print(textwrap.fill(str(response), 100))
query_engine = index.as_query_engine()
response = query_engine.query("Where is the sky purple?")
print(textwrap.fill(str(response), 100))
The sky is purple in Portland, Maine.
In [ ]:
Copied!
index = VectorStoreIndex.from_documents(documents, uri="/tmp/new_dataset")
index = VectorStoreIndex.from_documents(documents, uri="/tmp/new_dataset")
In [ ]:
Copied!
query_engine = index.as_query_engine()
response = query_engine.query("What companies did the author start?")
print(textwrap.fill(str(response), 100))
query_engine = index.as_query_engine()
response = query_engine.query("What companies did the author start?")
print(textwrap.fill(str(response), 100))
The author started two companies: Viaweb and Y Combinator.