Tair向量存储¶

在这个笔记本中，我们将展示如何快速使用TairVectorStore的演示。

如果您在colab上打开这个笔记本，您可能需要安装LlamaIndex 🦙。

In [ ]:

Copied!

%pip install llama-index-vector-stores-tair
%pip install llama-index-vector-stores-tair

In [ ]:

Copied!

!pip install llama-index
!pip install llama-index

In [ ]:

Copied!





import os
import sys
import logging
import textwrap

import warnings

warnings.filterwarnings("ignore")

# 停止huggingface警告
os.environ["TOKENIZERS_PARALLELISM"] = "false"

# 取消注释以查看调试日志
# logging.basicConfig(stream=sys.stdout, level=logging.INFO)
# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index.core import (
    GPTVectorStoreIndex,
    SimpleDirectoryReader,
    Document,
)
from llama_index.vector_stores.tair import TairVectorStore
from IPython.display import Markdown, display
import os
import sys
import logging
import textwrap

import warnings

warnings.filterwarnings("ignore")

# 停止huggingface警告
os.environ["TOKENIZERS_PARALLELISM"] = "false"

# 取消注释以查看调试日志
# logging.basicConfig(stream=sys.stdout, level=logging.INFO)
# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index.core import (
    GPTVectorStoreIndex,
    SimpleDirectoryReader,
    Document,
)
from llama_index.vector_stores.tair import TairVectorStore
from IPython.display import Markdown, display

设置OpenAI¶

让我们首先添加OpenAI的API密钥。这将允许我们访问OpenAI以获取嵌入和使用ChatGPT。

In [ ]:

Copied!

import os

os.environ["OPENAI_API_KEY"] = "sk-<your key here>"
import os

os.environ["OPENAI_API_KEY"] = "sk-"

# 下载数据

在这个部分，我们将学习如何下载数据。

In [ ]:

Copied!

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

读取数据集¶

In [ ]:

Copied!





# 加载文档
documents = SimpleDirectoryReader("./data/paul_graham").load_data()
print(
    "文档ID:",
    documents[0].doc_id,
    "文档哈希值:",
    documents[0].doc_hash,
)
# 加载文档
documents = SimpleDirectoryReader("./data/paul_graham").load_data()
print(
    "文档ID:",
    documents[0].doc_id,
    "文档哈希值:",
    documents[0].doc_hash,
)

从文档构建索引¶

让我们使用GPTVectorStoreIndex和其后端TairVectorStore来构建一个向量索引。将tair_url替换为您的Tair实例的实际url。

In [ ]:

Copied!





from llama_index.core import StorageContext

tair_url = "redis://{username}:{password}@r-bp****************.redis.rds.aliyuncs.com:{port}"

vector_store = TairVectorStore(
    tair_url=tair_url, index_name="pg_essays", overwrite=True
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = GPTVectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)
from llama_index.core import StorageContext

tair_url = "redis://{username}:{password}@r-bp****************.redis.rds.aliyuncs.com:{port}"

vector_store = TairVectorStore(
    tair_url=tair_url, index_name="pg_essays", overwrite=True
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = GPTVectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

查询数据¶

现在我们可以使用索引作为知识库，并向其提出问题。

In [ ]:

Copied!

query_engine = index.as_query_engine()
response = query_engine.query("What did the author learn?")
print(textwrap.fill(str(response), 100))
query_engine = index.as_query_engine()
response = query_engine.query("What did the author learn?")
print(textwrap.fill(str(response), 100))

In [ ]:

Copied!

response = query_engine.query("What was a hard moment for the author?")
print(textwrap.fill(str(response), 100))
response = query_engine.query("What was a hard moment for the author?")
print(textwrap.fill(str(response), 100))

删除文档¶

要从索引中删除文档，请使用delete方法。

In [ ]:

Copied!

document_id = documents[0].doc_id
document_id
document_id = documents[0].doc_id
document_id

In [ ]:

Copied!

info = vector_store.client.tvs_get_index("pg_essays")
print("Number of documents", int(info["data_count"]))
info = vector_store.client.tvs_get_index("pg_essays")
print("Number of documents", int(info["data_count"]))

In [ ]:

Copied!

vector_store.delete(document_id)
vector_store.delete(document_id)

In [ ]:

Copied!

info = vector_store.client.tvs_get_index("pg_essays")
print("Number of documents", int(info["data_count"]))
info = vector_store.client.tvs_get_index("pg_essays")
print("Number of documents", int(info["data_count"]))

删除索引¶

使用 delete_index 方法删除整个索引。

In [ ]:

Copied!

vector_store.delete_index()
vector_store.delete_index()

In [ ]:

Copied!

print("Check index existence:", vector_store.client._index_exists())
print("Check index existence:", vector_store.client._index_exists())