跳到主要内容

使用Milvus和OpenAI入门

nbviewer

寻找你的下一本书

在这个笔记本中,我们将介绍如何使用OpenAI生成书籍描述的嵌入向量,并在Milvus中使用这些嵌入向量来找到相关的书籍。这个示例中的数据集来自HuggingFace datasets,包含了一百多万个标题-描述对。

让我们首先下载本笔记本所需的库: - openai 用于与OpenAI嵌入服务进行通信 - pymilvus 用于与Milvus服务器进行通信 - datasets 用于下载数据集 - tqdm 用于显示进度条

! pip install openai pymilvus datasets tqdm

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: openai in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (0.27.2)
Requirement already satisfied: pymilvus in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (2.2.2)
Requirement already satisfied: datasets in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (2.10.1)
Requirement already satisfied: tqdm in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (4.64.1)
Requirement already satisfied: aiohttp in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from openai) (3.8.4)
Requirement already satisfied: requests>=2.20 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from openai) (2.28.2)
Requirement already satisfied: pandas>=1.2.4 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from pymilvus) (1.5.3)
Requirement already satisfied: ujson<=5.4.0,>=2.0.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from pymilvus) (5.1.0)
Requirement already satisfied: mmh3<=3.0.0,>=2.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from pymilvus) (3.0.0)
Requirement already satisfied: grpcio<=1.48.0,>=1.47.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from pymilvus) (1.47.2)
Requirement already satisfied: grpcio-tools<=1.48.0,>=1.47.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from pymilvus) (1.47.2)
Requirement already satisfied: huggingface-hub<1.0.0,>=0.2.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (0.12.1)
Requirement already satisfied: dill<0.3.7,>=0.3.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (0.3.6)
Requirement already satisfied: xxhash in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (3.2.0)
Requirement already satisfied: pyyaml>=5.1 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (5.4.1)
Requirement already satisfied: fsspec[http]>=2021.11.1 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (2023.1.0)
Requirement already satisfied: packaging in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (23.0)
Requirement already satisfied: numpy>=1.17 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (1.23.5)
Requirement already satisfied: multiprocess in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (0.70.14)
Requirement already satisfied: pyarrow>=6.0.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (10.0.1)
Requirement already satisfied: responses<0.19 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (0.18.0)
Requirement already satisfied: multidict<7.0,>=4.5 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from aiohttp->openai) (6.0.4)
Requirement already satisfied: frozenlist>=1.1.1 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from aiohttp->openai) (1.3.3)
Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from aiohttp->openai) (4.0.2)
Requirement already satisfied: yarl<2.0,>=1.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from aiohttp->openai) (1.8.2)
Requirement already satisfied: aiosignal>=1.1.2 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from aiohttp->openai) (1.3.1)
Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from aiohttp->openai) (3.0.1)
Requirement already satisfied: attrs>=17.3.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from aiohttp->openai) (22.2.0)
Requirement already satisfied: six>=1.5.2 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from grpcio<=1.48.0,>=1.47.0->pymilvus) (1.16.0)
Requirement already satisfied: protobuf<4.0dev,>=3.12.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from grpcio-tools<=1.48.0,>=1.47.0->pymilvus) (3.20.1)
Requirement already satisfied: setuptools in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from grpcio-tools<=1.48.0,>=1.47.0->pymilvus) (65.6.3)
Requirement already satisfied: filelock in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from huggingface-hub<1.0.0,>=0.2.0->datasets) (3.9.0)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from huggingface-hub<1.0.0,>=0.2.0->datasets) (4.5.0)
Requirement already satisfied: python-dateutil>=2.8.1 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from pandas>=1.2.4->pymilvus) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from pandas>=1.2.4->pymilvus) (2022.7.1)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from requests>=2.20->openai) (1.26.14)
Requirement already satisfied: idna<4,>=2.5 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from requests>=2.20->openai) (3.4)
Requirement already satisfied: certifi>=2017.4.17 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from requests>=2.20->openai) (2022.12.7)

安装所需的包后,我们就可以开始了。让我们从启动Milvus服务开始。要运行的文件是在此文件夹中找到的 docker-compose.yaml。这个命令会启动一个Milvus独立实例,我们将在本次测试中使用它。

! docker compose up -d

[+] Running 0/0
⠋ Network milvus Creating 0.1s
[+] Running 1/1
⠿ Network milvus Created 0.1s
⠋ Container milvus-minio Creating 0.1s
⠋ Container milvus-etcd Creating 0.1s
[+] Running 1/3
⠿ Network milvus Created 0.1s
⠙ Container milvus-minio Creating 0.2s
⠙ Container milvus-etcd Creating 0.2s
[+] Running 1/3
⠿ Network milvus Created 0.1s
⠹ Container milvus-minio Creating 0.3s
⠹ Container milvus-etcd Creating 0.3s
[+] Running 3/3
⠿ Network milvus Created 0.1s
⠿ Container milvus-minio Created 0.3s
⠿ Container milvus-etcd Created 0.3s
⠋ Container milvus-standalone Creating 0.1s
[+] Running 3/4
⠿ Network milvus Created 0.1s
⠿ Container milvus-minio Created 0.3s
⠿ Container milvus-etcd Created 0.3s
⠙ Container milvus-standalone Creating 0.2s
[+] Running 4/4
⠿ Network milvus Created 0.1s
⠿ Container milvus-minio Created 0.3s
⠿ Container milvus-etcd Created 0.3s
⠿ Container milvus-standalone Created 0.3s
[+] Running 2/4
⠿ Network milvus Created 0.1s
⠿ Container milvus-minio Starting 0.7s
⠿ Container milvus-etcd Starting 0.7s
⠿ Container milvus-standalone Created 0.3s
[+] Running 2/4
⠿ Network milvus Created 0.1s
⠿ Container milvus-minio Starting 0.8s
⠿ Container milvus-etcd Starting 0.8s
⠿ Container milvus-standalone Created 0.3s
[+] Running 2/4
⠿ Network milvus Created 0.1s
⠿ Container milvus-minio Starting 0.9s
⠿ Container milvus-etcd Starting 0.9s
⠿ Container milvus-standalone Created 0.3s
[+] Running 2/4
⠿ Network milvus Created 0.1s
⠿ Container milvus-minio Starting 1.0s
⠿ Container milvus-etcd Starting 1.0s
⠿ Container milvus-standalone Created 0.3s
[+] Running 2/4
⠿ Network milvus Created 0.1s
⠿ Container milvus-minio Starting 1.1s
⠿ Container milvus-etcd Starting 1.1s
⠿ Container milvus-standalone Created 0.3s
[+] Running 2/4
⠿ Network milvus Created 0.1s
⠿ Container milvus-minio Starting 1.2s
⠿ Container milvus-etcd Starting 1.2s
⠿ Container milvus-standalone Created 0.3s
[+] Running 2/4
⠿ Network milvus Created 0.1s
⠿ Container milvus-minio Starting 1.3s
⠿ Container milvus-etcd Starting 1.3s
⠿ Container milvus-standalone Created 0.3s
[+] Running 2/4
⠿ Network milvus Created 0.1s
⠿ Container milvus-minio Starting 1.4s
⠿ Container milvus-etcd Starting 1.4s
⠿ Container milvus-standalone Created 0.3s
[+] Running 2/4
⠿ Network milvus Created 0.1s
⠿ Container milvus-minio Starting 1.5s
⠿ Container milvus-etcd Starting 1.5s
⠿ Container milvus-standalone Created 0.3s
[+] Running 2/4
⠿ Network milvus Created 0.1s
⠿ Container milvus-minio Starting 1.6s
⠿ Container milvus-etcd Starting 1.6s
⠿ Container milvus-standalone Created 0.3s
[+] Running 2/4
⠿ Network milvus Created 0.1s
⠿ Container milvus-minio Starting 1.7s
⠿ Container milvus-etcd Starting 1.7s
⠿ Container milvus-standalone Created 0.3s
[+] Running 3/4
⠿ Network milvus Created 0.1s
⠿ Container milvus-minio Starting 1.8s
⠿ Container milvus-etcd Started 1.7s
⠿ Container milvus-standalone Created 0.3s
[+] Running 3/4
⠿ Network milvus Created 0.1s
⠿ Container milvus-minio Started 1.8s
⠿ Container milvus-etcd Started 1.7s
⠿ Container milvus-standalone Starting 1.6s
[+] Running 3/4
⠿ Network milvus Created 0.1s
⠿ Container milvus-minio Started 1.8s
⠿ Container milvus-etcd Started 1.7s
⠿ Container milvus-standalone Starting 1.7s
[+] Running 3/4
⠿ Network milvus Created 0.1s
⠿ Container milvus-minio Started 1.8s
⠿ Container milvus-etcd Started 1.7s
⠿ Container milvus-standalone Starting 1.8s
[+] Running 3/4
⠿ Network milvus Created 0.1s
⠿ Container milvus-minio Started 1.8s
⠿ Container milvus-etcd Started 1.7s
⠿ Container milvus-standalone Starting 1.9s
[+] Running 3/4
⠿ Network milvus Created 0.1s
⠿ Container milvus-minio Started 1.8s
⠿ Container milvus-etcd Started 1.7s
⠿ Container milvus-standalone Starting 2.0s
[+] Running 3/4
⠿ Network milvus Created 0.1s
⠿ Container milvus-minio Started 1.8s
⠿ Container milvus-etcd Started 1.7s
⠿ Container milvus-standalone Starting 2.1s
[+] Running 3/4
⠿ Network milvus Created 0.1s
⠿ Container milvus-minio Started 1.8s
⠿ Container milvus-etcd Started 1.7s
⠿ Container milvus-standalone Starting 2.2s
[+] Running 3/4
⠿ Network milvus Created 0.1s
⠿ Container milvus-minio Started 1.8s
⠿ Container milvus-etcd Started 1.7s
⠿ Container milvus-standalone Starting 2.3s
[+] Running 3/4
⠿ Network milvus Created 0.1s
⠿ Container milvus-minio Started 1.8s
⠿ Container milvus-etcd Started 1.7s
⠿ Container milvus-standalone Starting 2.4s
[+] Running 3/4
⠿ Network milvus Created 0.1s
⠿ Container milvus-minio Started 1.8s
⠿ Container milvus-etcd Started 1.7s
⠿ Container milvus-standalone Starting 2.5s
[+] Running 3/4
⠿ Network milvus Created 0.1s
⠿ Container milvus-minio Started 1.8s
⠿ Container milvus-etcd Started 1.7s
⠿ Container milvus-standalone Starting 2.6s
[+] Running 4/4
⠿ Network milvus Created 0.1s
⠿ Container milvus-minio Started 1.8s
⠿ Container milvus-etcd Started 1.7s
⠿ Container milvus-standalone Started 2.6s

在 Milvus 运行时,我们可以设置全局变量: - HOST:Milvus 主机地址 - PORT:Milvus 端口号 - COLLECTION_NAME:Milvus 中集合的名称 - DIMENSION:嵌入的维度 - OPENAI_ENGINE:要使用的嵌入模型 - openai.api_key:您的 OpenAI 账户密钥 - INDEX_PARAM:用于集合的索引设置 - QUERY_PARAM:要使用的搜索参数 - BATCH_SIZE:一次要嵌入和插入多少个文本

import openai

HOST = 'localhost'
PORT = 19530
COLLECTION_NAME = 'book_search'
DIMENSION = 1536
OPENAI_ENGINE = 'text-embedding-3-small'
openai.api_key = 'sk-your_key'

INDEX_PARAM = {
'metric_type':'L2',
'index_type':"HNSW",
'params':{'M': 8, 'efConstruction': 64}
}

QUERY_PARAM = {
"metric_type": "L2",
"params": {"ef": 64},
}

BATCH_SIZE = 1000

Milvus

本部分涉及Milvus和为此用例设置数据库。在Milvus中,我们需要设置一个集合并对集合进行索引。

from pymilvus import connections, utility, FieldSchema, Collection, CollectionSchema, DataType

# 连接到 Milvus 数据库
connections.connect(host=HOST, port=PORT)

# 如果集合已存在,请将其移除。
if utility.has_collection(COLLECTION_NAME):
utility.drop_collection(COLLECTION_NAME)

# 创建一个集合,包含id、标题和嵌入信息。
fields = [
FieldSchema(name='id', dtype=DataType.INT64, is_primary=True, auto_id=True),
FieldSchema(name='title', dtype=DataType.VARCHAR, max_length=64000),
FieldSchema(name='description', dtype=DataType.VARCHAR, max_length=64000),
FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, dim=DIMENSION)
]
schema = CollectionSchema(fields=fields)
collection = Collection(name=COLLECTION_NAME, schema=schema)

# 在集合上创建索引并加载它。
collection.create_index(field_name="embedding", index_params=INDEX_PARAM)
collection.load()

数据集

在 Milvus 运行起来后,我们可以开始获取我们的数据了。Hugging Face Datasets 是一个包含许多不同用户数据集的中心,而在这个示例中,我们使用的是 Skelebor 的书籍数据集。该数据集包含超过100万本书的标题-描述对。我们将嵌入每个描述,并将其与标题一起存储在 Milvus 中。

import datasets

# 下载数据集并仅使用其中的 `train` 部分(文件大小约为800Mb)
dataset = datasets.load_dataset('Skelebor/book_titles_and_descriptions_en_clean', split='train')

/Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
Found cached dataset parquet (/Users/filiphaltmayer/.cache/huggingface/datasets/Skelebor___parquet/Skelebor--book_titles_and_descriptions_en_clean-3596935b1d8a7747/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)

插入数据

现在我们已经将数据存储在我们的机器上,我们可以开始将其嵌入并插入到Milvus中。嵌入函数接受文本并以列表格式返回嵌入结果。

# 简单函数,用于将文本转换为嵌入表示。
def embed(texts):
embeddings = openai.Embedding.create(
input=texts,
engine=OPENAI_ENGINE
)
return [x['embedding'] for x in embeddings['data']]


接下来的步骤是实际的插入操作。由于有这么多数据点,如果你想立即测试它,可以提前停止插入单元格并继续进行。这样做可能会降低结果的准确性,因为数据点较少,但仍然应该足够好。

from tqdm import tqdm

data = [
[], # 标题
[], # 描述
]

# 批量嵌入和插入
for i in tqdm(range(0, len(dataset))):
data[0].append(dataset[i]['title'])
data[1].append(dataset[i]['description'])
if len(data[0]) % BATCH_SIZE == 0:
data.append(embed(data[1]))
collection.insert(data)
data = [[],[]]

# 嵌入并插入余数
if len(data[0]) != 0:
data.append(embed(data[1]))
collection.insert(data)
data = [[],[]]


  0%|          | 1999/1032335 [00:06<57:22, 299.31it/s]  
KeyboardInterrupt: 
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
Cell In[18], line 13
 11 data[1].append(dataset[i]['description'])
 12 if len(data[0]) % BATCH_SIZE == 0:
---> 13 data.append(embed(data[1]))
 14 collection.insert(data)
 15 data = [[],[]]

Cell In[17], line 3, in embed(texts)
 2 def embed(texts):
----> 3 embeddings = openai.Embedding.create(
 4 input=texts,
 5 engine=OPENAI_ENGINE
 6 )
 7 return [x['embedding'] for x in embeddings['data']]

File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/openai/api_resources/embedding.py:33, in Embedding.create(cls, *args, **kwargs)
 31 while True:
 32 try:
---> 33 response = super().create(*args, **kwargs)
 35 # If a user specifies base64, we'll just return the encoded string.
 36 # This is only for the default case.
 37 if not user_provided_encoding_format:

File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/openai/api_resources/abstract/engine_api_resource.py:153, in EngineAPIResource.create(cls, api_key, api_base, api_type, request_id, api_version, organization, **params)
 127 @classmethod
 128 def create(
 129 cls,
 (...)
 136 **params,
 137 ):
 138 (
 139 deployment_id,
 140 engine,
 (...)
 150 api_key, api_base, api_type, api_version, organization, **params
 151 )
--> 153 response, _, api_key = requestor.request(
 154 "post",
 155 url,
 156 params=params,
 157 headers=headers,
 158 stream=stream,
 159 request_id=request_id,
 160 request_timeout=request_timeout,
 161 )
 163 if stream:
 164 # must be an iterator
 165 assert not isinstance(response, OpenAIResponse)

File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/openai/api_requestor.py:216, in APIRequestor.request(self, method, url, params, headers, files, stream, request_id, request_timeout)
 205 def request(
 206 self,
 207 method,
 (...)
 214 request_timeout: Optional[Union[float, Tuple[float, float]]] = None,
 215 ) -> Tuple[Union[OpenAIResponse, Iterator[OpenAIResponse]], bool, str]:
--> 216 result = self.request_raw(
 217 method.lower(),
 218 url,
 219 params=params,
 220 supplied_headers=headers,
 221 files=files,
 222 stream=stream,
 223 request_id=request_id,
 224 request_timeout=request_timeout,
 225 )
 226 resp, got_stream = self._interpret_response(result, stream)
 227 return resp, got_stream, self.api_key

File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/openai/api_requestor.py:516, in APIRequestor.request_raw(self, method, url, params, supplied_headers, files, stream, request_id, request_timeout)
 514 _thread_context.session = _make_session()
 515 try:
--> 516 result = _thread_context.session.request(
 517 method,
 518 abs_url,
 519 headers=headers,
 520 data=data,
 521 files=files,
 522 stream=stream,
 523 timeout=request_timeout if request_timeout else TIMEOUT_SECS,
 524 )
 525 except requests.exceptions.Timeout as e:
 526 raise error.Timeout("Request timed out: {}".format(e)) from e

File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/requests/sessions.py:587, in Session.request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
 582 send_kwargs = {
 583 "timeout": timeout,
 584 "allow_redirects": allow_redirects,
 585 }
 586 send_kwargs.update(settings)
--> 587 resp = self.send(prep, **send_kwargs)
 589 return resp

File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/requests/sessions.py:701, in Session.send(self, request, **kwargs)
 698 start = preferred_clock()
 700 # Send the request
--> 701 r = adapter.send(request, **kwargs)
 703 # Total elapsed time of the request (approximately)
 704 elapsed = preferred_clock() - start

File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/requests/adapters.py:489, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
 487 try:
 488 if not chunked:
--> 489 resp = conn.urlopen(
 490 method=request.method,
 491 url=url,
 492 body=request.body,
 493 headers=request.headers,
 494 redirect=False,
 495 assert_same_host=False,
 496 preload_content=False,
 497 decode_content=False,
 498 retries=self.max_retries,
 499 timeout=timeout,
 500 )
 502 # Send the request.
 503 else:
 504 if hasattr(conn, "proxy_pool"):

File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/urllib3/connectionpool.py:703, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
 700 self._prepare_proxy(conn)
 702 # Make the request on the httplib connection object.
--> 703 httplib_response = self._make_request(
 704 conn,
 705 method,
 706 url,
 707 timeout=timeout_obj,
 708 body=body,
 709 headers=headers,
 710 chunked=chunked,
 711 )
 713 # If we're going to release the connection in ``finally:``, then
 714 # the response doesn't need to know about the connection. Otherwise
 715 # it will also try to release it and we'll have a double-release
 716 # mess.
 717 response_conn = conn if not release_conn else None

File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/urllib3/connectionpool.py:449, in HTTPConnectionPool._make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
 444 httplib_response = conn.getresponse()
 445 except BaseException as e:
 446 # Remove the TypeError from the exception chain in
 447 # Python 3 (including for exceptions like SystemExit).
 448 # Otherwise it looks like a bug in the code.
--> 449 six.raise_from(e, None)
 450 except (SocketTimeout, BaseSSLError, SocketError) as e:
 451 self._raise_timeout(err=e, url=url, timeout_value=read_timeout)

File <string>:3, in raise_from(value, from_value)

File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/urllib3/connectionpool.py:444, in HTTPConnectionPool._make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
 441 except TypeError:
 442 # Python 3
 443 try:
--> 444 httplib_response = conn.getresponse()
 445 except BaseException as e:
 446 # Remove the TypeError from the exception chain in
 447 # Python 3 (including for exceptions like SystemExit).
 448 # Otherwise it looks like a bug in the code.
 449 six.raise_from(e, None)

File ~/miniconda3/envs/haystack/lib/python3.9/http/client.py:1377, in HTTPConnection.getresponse(self)
 1375 try:
 1376 try:
-> 1377 response.begin()
 1378 except ConnectionError:
 1379 self.close()

File ~/miniconda3/envs/haystack/lib/python3.9/http/client.py:320, in HTTPResponse.begin(self)
 318 # read until we get a non-100 response
 319 while True:
--> 320 version, status, reason = self._read_status()
 321 if status != CONTINUE:
 322 break

File ~/miniconda3/envs/haystack/lib/python3.9/http/client.py:281, in HTTPResponse._read_status(self)
 280 def _read_status(self):
--> 281 line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
 282 if len(line) > _MAXLINE:
 283 raise LineTooLong("status line")

File ~/miniconda3/envs/haystack/lib/python3.9/socket.py:704, in SocketIO.readinto(self, b)
 702 while True:
 703 try:
--> 704 return self._sock.recv_into(b)
 705 except timeout:
 706 self._timeout_occurred = True

File ~/miniconda3/envs/haystack/lib/python3.9/ssl.py:1242, in SSLSocket.recv_into(self, buffer, nbytes, flags)
 1238 if flags != 0:
 1239 raise ValueError(
 1240 "non-zero flags not allowed in calls to recv_into() on %s" %
 1241 self.__class__)
-> 1242 return self.read(nbytes, buffer)
 1243 else:
 1244 return super().recv_into(buffer, nbytes, flags)

File ~/miniconda3/envs/haystack/lib/python3.9/ssl.py:1100, in SSLSocket.read(self, len, buffer)
 1098 try:
 1099 if buffer is not None:
-> 1100 return self._sslobj.read(len, buffer)
 1101 else:
 1102 return self._sslobj.read(len)

KeyboardInterrupt:

查询数据库

在我们的数据安全地插入Milvus后,现在可以执行查询操作了。查询接受一个字符串或字符串列表,并对它们进行搜索。结果会打印出您提供的描述以及包括结果分数、结果标题和结果书籍描述的结果。

import textwrap

def query(queries, top_k = 5):
if type(queries) != list:
queries = [queries]
res = collection.search(embed(queries), anns_field='embedding', param=QUERY_PARAM, limit = top_k, output_fields=['title', 'description'])
for i, hit in enumerate(res):
print('Description:', queries[i])
print('Results:')
for ii, hits in enumerate(hit):
print('\t' + 'Rank:', ii + 1, 'Score:', hits.score, 'Title:', hits.entity.get('title'))
print(textwrap.fill(hits.entity.get('description'), 88))
print()

query('Book about a k-9 from europe')

RPC error: [search], <MilvusException: (code=1, message=code: UnexpectedError, reason: code: CollectionNotExists, reason: can't find collection: book_search)>, <Time:{'RPC start': '2023-03-17 14:22:18.368461', 'RPC error': '2023-03-17 14:22:18.382086'}>
MilvusException: <MilvusException: (code=1, message=code: UnexpectedError, reason: code: CollectionNotExists, reason: can't find collection: book_search)>
---------------------------------------------------------------------------
MilvusException Traceback (most recent call last)
Cell In[32], line 1
----> 1 query('Book about a k-9 from europe')

Cell In[31], line 6, in query(queries, top_k)
 4 if type(queries) != list:
 5 queries = [queries]
----> 6 res = collection.search(embed(queries), anns_field='embedding', param=QUERY_PARAM, limit = top_k, output_fields=['title', 'description'])
 7 for i, hit in enumerate(res):
 8 print('Description:', queries[i])

File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/orm/collection.py:614, in Collection.search(self, data, anns_field, param, limit, expr, partition_names, output_fields, timeout, round_decimal, **kwargs)
 611 raise DataTypeNotMatchException(message=ExceptionsMessage.ExprType % type(expr))
 613 conn = self._get_connection()
--> 614 res = conn.search(self._name, data, anns_field, param, limit, expr,
 615 partition_names, output_fields, round_decimal, timeout=timeout,
 616 schema=self._schema_dict, **kwargs)
 617 if kwargs.get("_async", False):
 618 return SearchFuture(res)

File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/decorators.py:109, in error_handler.<locals>.wrapper.<locals>.handler(*args, **kwargs)
 107 record_dict["RPC error"] = str(datetime.datetime.now())
 108 LOGGER.error(f"RPC error: [{inner_name}], {e}, <Time:{record_dict}>")
--> 109 raise e
 110 except grpc.FutureTimeoutError as e:
 111 record_dict["gRPC timeout"] = str(datetime.datetime.now())

File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/decorators.py:105, in error_handler.<locals>.wrapper.<locals>.handler(*args, **kwargs)
 103 try:
 104 record_dict["RPC start"] = str(datetime.datetime.now())
--> 105 return func(*args, **kwargs)
 106 except MilvusException as e:
 107 record_dict["RPC error"] = str(datetime.datetime.now())

File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/decorators.py:136, in tracing_request.<locals>.wrapper.<locals>.handler(self, *args, **kwargs)
 134 if req_id:
 135 self.set_onetime_request_id(req_id)
--> 136 ret = func(self, *args, **kwargs)
 137 return ret

File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/decorators.py:85, in retry_on_rpc_failure.<locals>.wrapper.<locals>.handler(self, *args, **kwargs)
 83 back_off = min(back_off * back_off_multiplier, max_back_off)
 84 else:
---> 85 raise e
 86 except Exception as e:
 87 raise e

File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/decorators.py:50, in retry_on_rpc_failure.<locals>.wrapper.<locals>.handler(self, *args, **kwargs)
 48 while True:
 49 try:
---> 50 return func(self, *args, **kwargs)
 51 except grpc.RpcError as e:
 52 # DEADLINE_EXCEEDED means that the task wat not completed
 53 # UNAVAILABLE means that the service is not reachable currently
 54 # Reference: https://grpc.github.io/grpc/python/grpc.html#grpc-status-code
 55 if e.code() != grpc.StatusCode.DEADLINE_EXCEEDED and e.code() != grpc.StatusCode.UNAVAILABLE:

File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/client/grpc_handler.py:472, in GrpcHandler.search(self, collection_name, data, anns_field, param, limit, expression, partition_names, output_fields, round_decimal, timeout, schema, **kwargs)
 467 requests = Prepare.search_requests_with_expr(collection_name, data, anns_field, param, limit, schema,
 468 expression, partition_names, output_fields, round_decimal,
 469 **kwargs)
 471 auto_id = schema["auto_id"]
--> 472 return self._execute_search_requests(requests, timeout, round_decimal=round_decimal, auto_id=auto_id, **kwargs)

File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/client/grpc_handler.py:441, in GrpcHandler._execute_search_requests(self, requests, timeout, **kwargs)
 439 if kwargs.get("_async", False):
 440 return SearchFuture(None, None, True, pre_err)
--> 441 raise pre_err

File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/client/grpc_handler.py:432, in GrpcHandler._execute_search_requests(self, requests, timeout, **kwargs)
 429 response = self._stub.Search(request, timeout=timeout)
 431 if response.status.error_code != 0:
--> 432 raise MilvusException(response.status.error_code, response.status.reason)
 434 raws.append(response)
 435 round_decimal = kwargs.get("round_decimal", -1)

MilvusException: <MilvusException: (code=1, message=code: UnexpectedError, reason: code: CollectionNotExists, reason: can't find collection: book_search)>