



在这个笔记本中,我们将介绍如何使用OpenAI生成书籍描述的嵌入向量,并在Zilliz中使用这些嵌入向量来找到相关的书籍。本示例中的数据集来自HuggingFace datasets,包含超过100万个标题-描述对。

让我们首先下载本笔记本所需的库: - openai 用于与OpenAI嵌入服务进行通信 - pymilvus 用于与Zilliz实例进行通信 - datasets 用于下载数据集 - tqdm 用于显示进度条

! pip install openai pymilvus datasets tqdm

要让Zilliz运行起来,请查看这里。设置好您的账户和数据库后,继续设置以下数值: - URI:您的数据库运行的URI - USER:您的数据库用户名 - PASSWORD:您的数据库密码 - COLLECTION_NAME:在Zilliz中命名集合的名称 - DIMENSION:嵌入的维度 - OPENAI_ENGINE:要使用的嵌入模型 - openai.api_key:您的OpenAI账户密钥 - INDEX_PARAM:用于集合的索引设置 - QUERY_PARAM:要使用的搜索参数 - BATCH_SIZE:一次嵌入和插入多少个文本

import openai

URI = 'your_uri'
TOKEN = 'your_token' # TOKEN == 用户:密码 或 api_key
COLLECTION_NAME = 'book_search'
OPENAI_ENGINE = 'text-embedding-3-small'
openai.api_key = 'sk-your-key'


"metric_type": "L2",
"params": {},




from pymilvus import connections, utility, FieldSchema, Collection, CollectionSchema, DataType

# 连接至Zilliz数据库
connections.connect(uri=URI, token=TOKEN)

# 如果集合已存在,请将其移除。
if utility.has_collection(COLLECTION_NAME):

# 创建一个集合,包含id、标题和嵌入信息。
fields = [
FieldSchema(name='id', dtype=DataType.INT64, is_primary=True, auto_id=True),
FieldSchema(name='title', dtype=DataType.VARCHAR, max_length=64000),
FieldSchema(name='description', dtype=DataType.VARCHAR, max_length=64000),
FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, dim=DIMENSION)
schema = CollectionSchema(fields=fields)
collection = Collection(name=COLLECTION_NAME, schema=schema)

# 在集合上创建索引并加载它。
collection.create_index(field_name="embedding", index_params=INDEX_PARAM)


有了Zilliz运行起来,我们就可以开始获取我们的数据了。Hugging Face Datasets 是一个包含许多不同用户数据集的中心,而在这个示例中,我们使用的是Skelebor的书籍数据集。这个数据集包含超过100万本书的标题-描述对。我们将嵌入每个描述,并将其与其标题一起存储在Zilliz中。

import datasets

# 下载数据集并仅使用其中的 `train` 部分(文件大小约为 800Mb)
dataset = datasets.load_dataset('Skelebor/book_titles_and_descriptions_en_clean', split='train')

/Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
Found cached dataset parquet (/Users/filiphaltmayer/.cache/huggingface/datasets/Skelebor___parquet/Skelebor--book_titles_and_descriptions_en_clean-3596935b1d8a7747/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)


现在我们已经将数据下载到我们的机器上,我们可以开始将其嵌入并插入到 Zilliz 中。嵌入函数接受文本并以列表格式返回嵌入。

# 简单函数,用于将文本转换为嵌入表示。
def embed(texts):
embeddings = openai.Embedding.create(
return [x['embedding'] for x in embeddings['data']]


from tqdm import tqdm

data = [
[], # 标题
[], # 描述

# 批量嵌入与插入
for i in tqdm(range(0, len(dataset))):
if len(data[0]) % BATCH_SIZE == 0:
data = [[],[]]

# 嵌入并插入余数
if len(data[0]) != 0:
data = [[],[]]

import textwrap

def query(queries, top_k = 5):
if type(queries) != list:
queries = [queries]
res = collection.search(embed(queries), anns_field='embedding', param=QUERY_PARAM, limit = top_k, output_fields=['title', 'description'])
for i, hit in enumerate(res):
print('Description:', queries[i])
for ii, hits in enumerate(hit):
print('\t' + 'Rank:', ii + 1, 'Score:', hits.score, 'Title:', hits.entity.get('title'))
print(textwrap.fill(hits.entity.get('description'), 88))

query('Book about a k-9 from europe')

Description: Book about a k-9 from europe
Rank: 1 Score: 0.3047754764556885 Title: Bark M For Murder
Who let the dogs out? Evildoers beware! Four of mystery fiction's top storytellers are
setting the hounds on your trail -- in an incomparable quartet of crime stories with a
canine edge. Man's (and woman's) best friends take the lead in this phenomenal
collection of tales tense and surprising, humorous and thrilling: New York
Timesbestselling author J.A. Jance's spellbinding saga of a scam-busting septuagenarian
and her two golden retrievers; Anthony Award winner Virginia Lanier's pureblood thriller
featuring bloodhounds and bloody murder; Chassie West's suspenseful stunner about a
life-saving German shepherd and a ghastly forgotten crime; rising star Lee Charles
Kelley's edge-of-your-seat yarn that pits an ex-cop/kennel owner and a yappy toy poodle
against a craven killer.

Rank: 2 Score: 0.3283390402793884 Title: Texas K-9 Unit Christmas: Holiday Hero\Rescuing Christmas
CHRISTMAS COMES WRAPPED IN DANGER Holiday Hero by Shirlee McCoy Emma Fairchild never
expected to find trouble in sleepy Sagebrush, Texas. But when she's attacked and left
for dead in her own diner, her childhood friend turned K-9 cop Lucas Harwood offers a
chance at justice--and love. Rescuing Christmas by Terri Reed She escaped a kidnapper,
but now a killer has set his sights on K-9 dog trainer Lily Anderson. When fellow
officer Jarrod Evans appoints himself her bodyguard, Lily knows more than her life is at
risk--so is her heart. Texas K-9 Unit: These lawmen solve the toughest cases with the
help of their brave canine partners

Rank: 3 Score: 0.33899369835853577 Title: Dogs on Duty: Soldiers' Best Friends on the Battlefield and Beyond
When the news of the raid on Osama Bin Laden's compound broke, the SEAL team member that
stole the show was a highly trained canine companion. Throughout history, dogs have been
key contributors to military units. Dorothy Hinshaw Patent follows man's best friend
onto the battlefield, showing readers why dogs are uniquely qualified for the job at
hand, how they are trained, how they contribute to missions, and what happens when they
retire. With full-color photographs throughout and sidebars featuring heroic canines
throughout history, Dogs on Duty provides a fascinating look at these exceptional
soldiers and companions.

Rank: 4 Score: 0.34207457304000854 Title: Toute Allure: Falling in Love in Rural France
After saying goodbye to life as a successful fashion editor in London, Karen Wheeler is
now happy in her small village house in rural France. Her idyll is complete when she
meets the love of her life - he has shaggy hair, four paws and a wet nose!

Rank: 5 Score: 0.343595951795578 Title: Otherwise Alone (Evan Arden, #1)
Librarian's note: This is an alternate cover edition for ASIN: B00AP5NNWC. Lieutenant
Evan Arden sits in a shack in the middle of nowhere, waiting for orders that will send
him back home - if he ever gets them. Other than his loyal Great Pyrenees, there's no
one around to break up the monotony. The tedium is excruciating, but it is suddenly
interrupted when a young woman stumbles up his path. "It's only 50-something pages, but
in that short amount of time, the author's awesome writing packs in a whole lotta
character detail. And sets the stage for the series, perfectly." -Maryse.net, 4.5 Stars
He has two choices - pick her off from a distance with his trusty sniper-rifle, or dare
let her approach his cabin and enter his life. Why not? It's been ages, and he is
otherwise alone...