使用嵌入和最近邻搜索进行推荐
推荐在网络上随处可见。
- ‘购买了这件物品?试试这些类似的物品。’
- ‘喜欢那本书?试试这些类似的标题。’
- ‘没有找到您要找的帮助页面?试试这些类似的页面。’
本笔记演示了如何使用嵌入来找到类似的物品进行推荐。具体来说,我们使用AG新闻文章语料库作为我们的数据集。
我们的模型将回答这个问题:给定一篇文章,哪些其他文章与之最相似?
import pandas as pd
import pickle
from utils.embeddings_utils import (
get_embedding,
distances_from_embeddings,
tsne_components_from_embeddings,
chart_from_components,
indices_of_nearest_neighbors_from_distances,
)
EMBEDDING_MODEL = "text-embedding-3-small"
2. 加载数据
接下来,让我们加载AG新闻数据并查看其样子。
# 加载数据(完整数据集可在此处获取:http://groups.di.unipi.it/~gulli/AG_corpus_of_news_articles.html)
dataset_path = "data/AG_news_samples.csv"
df = pd.read_csv(dataset_path)
n_examples = 5
df.head(n_examples)
title | description | label_int | label | |
---|---|---|---|---|
0 | World Briefings | BRITAIN: BLAIR WARNS OF CLIMATE THREAT Prime M... | 1 | World |
1 | Nvidia Puts a Firewall on a Motherboard (PC Wo... | PC World - Upcoming chip set will include buil... | 4 | Sci/Tech |
2 | Olympic joy in Greek, Chinese press | Newspapers in Greece reflect a mixture of exhi... | 2 | Sports |
3 | U2 Can iPod with Pictures | SAN JOSE, Calif. -- Apple Computer (Quote, Cha... | 4 | Sci/Tech |
4 | The Dream Factory | Any product, any shape, any size -- manufactur... | 4 | Sci/Tech |
让我们来看看那些相同的例子,但不要被省略号截断。
# 打印每个示例的标题、描述和标签。
for idx, row in df.head(n_examples).iterrows():
print("")
print(f"Title: {row['title']}")
print(f"Description: {row['description']}")
print(f"Label: {row['label']}")
Title: World Briefings
Description: BRITAIN: BLAIR WARNS OF CLIMATE THREAT Prime Minister Tony Blair urged the international community to consider global warming a dire threat and agree on a plan of action to curb the quot;alarming quot; growth of greenhouse gases.
Label: World
Title: Nvidia Puts a Firewall on a Motherboard (PC World)
Description: PC World - Upcoming chip set will include built-in security features for your PC.
Label: Sci/Tech
Title: Olympic joy in Greek, Chinese press
Description: Newspapers in Greece reflect a mixture of exhilaration that the Athens Olympics proved successful, and relief that they passed off without any major setback.
Label: Sports
Title: U2 Can iPod with Pictures
Description: SAN JOSE, Calif. -- Apple Computer (Quote, Chart) unveiled a batch of new iPods, iTunes software and promos designed to keep it atop the heap of digital music players.
Label: Sci/Tech
Title: The Dream Factory
Description: Any product, any shape, any size -- manufactured on your desktop! The future is the fabricator. By Bruce Sterling from Wired magazine.
Label: Sci/Tech