Astra DB¶
DataStax Astra DB 是一个构建在Apache Cassandra上的无服务器向量数据库,通过易于使用的JSON API访问。
要运行这个笔记本,您需要在云中运行一个DataStax Astra DB实例(您可以在datastax.com免费获取一个)。
您应该确保已安装llama-index
和astrapy
:
In [ ]:
Copied!
%pip install llama-index-vector-stores-astra-db
%pip install llama-index-vector-stores-astra-db
In [ ]:
Copied!
!pip install llama-index
!pip install "astrapy>=0.6.0"
!pip install llama-index
!pip install "astrapy>=0.6.0"
请提供数据库连接参数和密码:¶
In [ ]:
Copied!
import os
import getpass
api_endpoint = input(
"\nPlease enter your Database Endpoint URL (e.g. 'https://4bc...datastax.com'):"
)
token = getpass.getpass(
"\nPlease enter your 'Database Administrator' Token (e.g. 'AstraCS:...'):"
)
os.environ["OPENAI_API_KEY"] = getpass.getpass(
"\nPlease enter your OpenAI API Key (e.g. 'sk-...'):"
)
import os
import getpass
api_endpoint = input(
"\nPlease enter your Database Endpoint URL (e.g. 'https://4bc...datastax.com'):"
)
token = getpass.getpass(
"\nPlease enter your 'Database Administrator' Token (e.g. 'AstraCS:...'):"
)
os.environ["OPENAI_API_KEY"] = getpass.getpass(
"\nPlease enter your OpenAI API Key (e.g. 'sk-...'):"
)
导入所需的包依赖项:¶
In [ ]:
Copied!
from llama_index.core import (
VectorStoreIndex,
SimpleDirectoryReader,
StorageContext,
)
from llama_index.vector_stores.astra_db import AstraDBVectorStore
from llama_index.core import (
VectorStoreIndex,
SimpleDirectoryReader,
StorageContext,
)
from llama_index.vector_stores.astra_db import AstraDBVectorStore
加载一些示例数据:¶
In [ ]:
Copied!
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
读取数据:¶
In [ ]:
Copied!
# 加载文档documents = SimpleDirectoryReader("./data/paul_graham/").load_data()print(f"总文档数:{len(documents)}")print(f"第一个文档,id:{documents[0].doc_id}")print(f"第一个文档,哈希值:{documents[0].hash}")print( "第一个文档,文本" f"({len(documents[0].text)} 个字符):\n{'='*20}\n{documents[0].text[:360]} ...")
# 加载文档documents = SimpleDirectoryReader("./data/paul_graham/").load_data()print(f"总文档数:{len(documents)}")print(f"第一个文档,id:{documents[0].doc_id}")print(f"第一个文档,哈希值:{documents[0].hash}")print( "第一个文档,文本" f"({len(documents[0].text)} 个字符):\n{'='*20}\n{documents[0].text[:360]} ...")
创建Astra DB向量存储对象:¶
In [ ]:
Copied!
astra_db_store = AstraDBVectorStore(
token=token,
api_endpoint=api_endpoint,
collection_name="astra_v_table",
embedding_dimension=1536,
)
astra_db_store = AstraDBVectorStore(
token=token,
api_endpoint=api_endpoint,
collection_name="astra_v_table",
embedding_dimension=1536,
)
从文档中构建索引:¶
In [ ]:
Copied!
storage_context = StorageContext.from_defaults(vector_store=astra_db_store)
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context
)
storage_context = StorageContext.from_defaults(vector_store=astra_db_store)
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context
)
使用索引进行查询:¶
In [ ]:
Copied!
query_engine = index.as_query_engine()
response = query_engine.query("Why did the author choose to work on AI?")
print(response.response)
query_engine = index.as_query_engine()
response = query_engine.query("Why did the author choose to work on AI?")
print(response.response)