跳到主要内容

使用GPT-4V为图像添加标签和标题

nbviewer

本笔记本探讨如何利用GPT-4V为图像添加标签和标题。

我们可以利用GPT-4V的多模态能力,提供输入图像以及关于它们代表的额外上下文,并提示模型输出标签或图像描述。然后,可以进一步使用语言模型(在本笔记本中,我们将使用GPT-4-turbo)来生成标题。

从图像生成文本内容可以用于多种用例,特别是涉及搜索的用例。
我们将通过使用生成的关键词和产品标题来搜索产品,从文本输入和图像输入两个方面来说明搜索用例。

作为示例,我们将使用亚马逊家具项目的数据集,为它们添加相关关键词并生成简短的描述性标题。

设置

# 如有需要,请安装依赖项。
%pip install openai
%pip install scikit-learn

from IPython.display import Image, display
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
from openai import OpenAI

# 正在初始化OpenAI客户端 - 请参阅 https://platform.openai.com/docs/quickstart?context=python
client = OpenAI()

# 加载数据集
dataset_path = "data/amazon_furniture_dataset.csv"
df = pd.read_csv(dataset_path)
df.head()

asin url title brand price availability categories primary_image images upc ... color material style important_information product_overview about_item description specifications uniq_id scraped_at
0 B0CJHKVG6P https://www.amazon.com/dp/B0CJHKVG6P GOYMFK 1pc Free Standing Shoe Rack, Multi-laye... GOYMFK $24.99 Only 13 left in stock - order soon. ['Home & Kitchen', 'Storage & Organization', '... https://m.media-amazon.com/images/I/416WaLx10j... ['https://m.media-amazon.com/images/I/416WaLx1... NaN ... White Metal Modern [] [{'Brand': ' GOYMFK '}, {'Color': ' White '}, ... ['Multiple layers: Provides ample storage spac... multiple shoes, coats, hats, and other items E... ['Brand: GOYMFK', 'Color: White', 'Material: M... 02593e81-5c09-5069-8516-b0b29f439ded 2024-02-02 15:15:08
1 B0B66QHB23 https://www.amazon.com/dp/B0B66QHB23 subrtex Leather ding Room, Dining Chairs Set o... subrtex NaN NaN ['Home & Kitchen', 'Furniture', 'Dining Room F... https://m.media-amazon.com/images/I/31SejUEWY7... ['https://m.media-amazon.com/images/I/31SejUEW... NaN ... Black Sponge Black Rubber Wood [] NaN ['【Easy Assembly】: Set of 2 dining room chairs... subrtex Dining chairs Set of 2 ['Brand: subrtex', 'Color: Black', 'Product Di... 5938d217-b8c5-5d3e-b1cf-e28e340f292e 2024-02-02 15:15:09
2 B0BXRTWLYK https://www.amazon.com/dp/B0BXRTWLYK Plant Repotting Mat MUYETOL Waterproof Transpl... MUYETOL $5.98 In Stock ['Patio, Lawn & Garden', 'Outdoor Décor', 'Doo... https://m.media-amazon.com/images/I/41RgefVq70... ['https://m.media-amazon.com/images/I/41RgefVq... NaN ... Green Polyethylene Modern [] [{'Brand': ' MUYETOL '}, {'Size': ' 26.8*26.8 ... ['PLANT REPOTTING MAT SIZE: 26.8" x 26.8", squ... NaN ['Brand: MUYETOL', 'Size: 26.8*26.8', 'Item We... b2ede786-3f51-5a45-9a5b-bcf856958cd8 2024-02-02 15:15:09
3 B0C1MRB2M8 https://www.amazon.com/dp/B0C1MRB2M8 Pickleball Doormat, Welcome Doormat Absorbent ... VEWETOL $13.99 Only 10 left in stock - order soon. ['Patio, Lawn & Garden', 'Outdoor Décor', 'Doo... https://m.media-amazon.com/images/I/61vz1Igler... ['https://m.media-amazon.com/images/I/61vz1Igl... NaN ... A5589 Rubber Modern [] [{'Brand': ' VEWETOL '}, {'Size': ' 16*24INCH ... ['Specifications: 16x24 Inch ', " High-Quality... The decorative doormat features a subtle textu... ['Brand: VEWETOL', 'Size: 16*24INCH', 'Materia... 8fd9377b-cfa6-5f10-835c-6b8eca2816b5 2024-02-02 15:15:10
4 B0CG1N9QRC https://www.amazon.com/dp/B0CG1N9QRC JOIN IRON Foldable TV Trays for Eating Set of ... JOIN IRON Store $89.99 Usually ships within 5 to 6 weeks ['Home & Kitchen', 'Furniture', 'Game & Recrea... https://m.media-amazon.com/images/I/41p4d4VJnN... ['https://m.media-amazon.com/images/I/41p4d4VJ... NaN ... Grey Set of 4 Iron X Classic Style [] NaN ['Includes 4 Folding Tv Tray Tables And one Co... Set of Four Folding Trays With Matching Storag... ['Brand: JOIN IRON', 'Shape: Rectangular', 'In... bdc9aa30-9439-50dc-8e89-213ea211d66a 2024-02-02 15:15:11

5 rows × 25 columns

为图片打标签

在这一部分,我们将使用GPT-4V为我们的产品生成相关的标签。

我们将使用一种简单的零-shot方法来提取关键词,并使用嵌入来去重这些关键词,以避免出现太相似的多个关键词。

我们将结合一张图片和产品标题来避免提取图片中所描绘的其他物品的关键词 - 有时场景中使用了多个物品,我们只想关注我们想要标记的物品。

提取关键词

system_prompt = '''
You are an agent specialized in tagging images of furniture items, decorative items, or furnishings with relevant keywords that could be used to search for these items on a marketplace.

You will be provided with an image and the title of the item that is depicted in the image, and your goal is to extract keywords for only the item specified.

Keywords should be concise and in lower case.

Keywords can describe things like:
- Item type e.g. 'sofa bed', 'chair', 'desk', 'plant'
- Item material e.g. 'wood', 'metal', 'fabric'
- Item style e.g. 'scandinavian', 'vintage', 'industrial'
- Item color e.g. 'red', 'blue', 'white'

Only deduce material, style or color keywords when it is obvious that they make the item depicted in the image stand out.

Return keywords in the format of an array of strings, like this:
['desk', 'industrial', 'metal']

'''

def analyze_image(img_url, title):
response = client.chat.completions.create(
model="gpt-4-vision-preview",
messages=[
{
"role": "system",
"content": system_prompt
},
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": img_url,
},
],
},
{
"role": "user",
"content": title
}
],
max_tokens=300,
top_p=0.1
)

return response.choices[0].message.content

通过几个示例进行测试

examples = df.iloc[:5]

for index, ex in examples.iterrows():
url = ex['primary_image']
img = Image(url=url)
display(img)
result = analyze_image(url, ex['title'])
print(result)
print("\n\n")

['shoe rack', 'free standing', 'multi-layer', 'metal', 'white']


['dining chairs', 'set of 2', 'leather', 'black']


['plant repotting mat', 'waterproof', 'portable', 'foldable', 'easy to clean', 'green']


['doormat', 'absorbent', 'non-slip', 'brown']


['tv tray table set', 'foldable', 'iron', 'grey']


查找现有关键字

使用嵌入来避免重复(同义词)和/或匹配预定义关键字

# 您可以在此处随意更改嵌入模型。
def get_embedding(value, model="text-embedding-3-large"):
embeddings = client.embeddings.create(
model=model,
input=value,
encoding_format="float"
)
return embeddings.data[0].embedding

使用示例关键字进行测试

# 现有关键词
keywords_list = ['industrial', 'metal', 'wood', 'vintage', 'bed']

df_keywords = pd.DataFrame(keywords_list, columns=['keyword'])
df_keywords['embedding'] = df_keywords['keyword'].apply(lambda x: get_embedding(x))
df_keywords

keyword embedding
0 industrial [-0.026137426, 0.021297162, -0.007273361, -0.0...
1 metal [-0.020530997, 0.004478126, -0.011049379, -0.0...
2 wood [0.013877833, 0.02955235, 0.0006239023, -0.035...
3 vintage [-0.05235507, 0.008213689, -0.015532949, 0.002...
4 bed [-0.011677503, 0.023275835, 0.0026937425, -0.0...
def compare_keyword(keyword):
embedded_value = get_embedding(keyword)
df_keywords['similarity'] = df_keywords['embedding'].apply(lambda x: cosine_similarity(np.array(x).reshape(1,-1), np.array(embedded_value).reshape(1, -1)))
most_similar = df_keywords.sort_values('similarity', ascending=False).iloc[0]
return most_similar

def replace_keyword(keyword, threshold = 0.6):
most_similar = compare_keyword(keyword)
if most_similar['similarity'] > threshold:
print(f"Replacing '{keyword}' with existing keyword: '{most_similar['keyword']}'")
return most_similar['keyword']
return keyword

# 示例关键词,用于与我们现有关键词列表进行比较
example_keywords = ['bed frame', 'wooden', 'vintage', 'old school', 'desk', 'table', 'old', 'metal', 'metallic', 'woody']
final_keywords = []

for k in example_keywords:
final_keywords.append(replace_keyword(k))

final_keywords = set(final_keywords)
print(f"Final keywords: {final_keywords}")

Replacing 'bed frame' with existing keyword: 'bed'
Replacing 'wooden' with existing keyword: 'wood'
Replacing 'vintage' with existing keyword: 'vintage'
Replacing 'metal' with existing keyword: 'metal'
Replacing 'metallic' with existing keyword: 'metal'
Replacing 'woody' with existing keyword: 'wood'
Final keywords: {'table', 'desk', 'bed', 'old', 'vintage', 'metal', 'wood', 'old school'}

生成标题

在这一部分,我们将使用GPT-4V生成图像描述,然后使用GPT-4-turbo的few-shot示例方法从图像中生成标题。

如果few-shot示例对您的用例不够,可以考虑微调模型,以使生成的标题与您所针对的风格和语气相匹配。

# 清理数据集列
selected_columns = ['title', 'primary_image', 'style', 'material', 'color', 'url']
df = df[selected_columns].copy()
df.head()

title primary_image style material color url
0 GOYMFK 1pc Free Standing Shoe Rack, Multi-laye... https://m.media-amazon.com/images/I/416WaLx10j... Modern Metal White https://www.amazon.com/dp/B0CJHKVG6P
1 subrtex Leather ding Room, Dining Chairs Set o... https://m.media-amazon.com/images/I/31SejUEWY7... Black Rubber Wood Sponge Black https://www.amazon.com/dp/B0B66QHB23
2 Plant Repotting Mat MUYETOL Waterproof Transpl... https://m.media-amazon.com/images/I/41RgefVq70... Modern Polyethylene Green https://www.amazon.com/dp/B0BXRTWLYK
3 Pickleball Doormat, Welcome Doormat Absorbent ... https://m.media-amazon.com/images/I/61vz1Igler... Modern Rubber A5589 https://www.amazon.com/dp/B0C1MRB2M8
4 JOIN IRON Foldable TV Trays for Eating Set of ... https://m.media-amazon.com/images/I/41p4d4VJnN... X Classic Style Iron Grey Set of 4 https://www.amazon.com/dp/B0CG1N9QRC

使用GPT-4V描述图像

describe_system_prompt = '''
You are a system generating descriptions for furniture items, decorative items, or furnishings on an e-commerce website.
Provided with an image and a title, you will describe the main item that you see in the image, giving details but staying concise.
You can describe unambiguously what the item is and its material, color, and style if clearly identifiable.
If there are multiple items depicted, refer to the title to understand which item you should describe.
'''

def describe_image(img_url, title):
response = client.chat.completions.create(
model="gpt-4-vision-preview",
temperature=0.2,
messages=[
{
"role": "system",
"content": describe_system_prompt
},
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": img_url,
},
],
},
{
"role": "user",
"content": title
}
],
max_tokens=300,
)

return response.choices[0].message.content

在一些示例上进行测试

for index, row in examples.iterrows():
print(f"{row['title'][:50]}{'...' if len(row['title']) > 50 else ''} - {row['url']} :\n")
img_description = describe_image(row['primary_image'], row['title'])
print(f"{img_description}\n--------------------------\n")

GOYMFK 1pc Free Standing Shoe Rack, Multi-layer Me... - https://www.amazon.com/dp/B0CJHKVG6P :

This is a free-standing shoe rack featuring a multi-layer design, constructed from metal for durability. The rack is finished in a clean white color, which gives it a modern and versatile look, suitable for various home decor styles. It includes several horizontal shelves dedicated to organizing shoes, providing ample space for multiple pairs.

Additionally, the rack is equipped with 8 double hooks, which are integrated into the frame above the shoe shelves. These hooks offer extra functionality, allowing for the hanging of accessories such as hats, scarves, or bags. The design is space-efficient and ideal for placement in living rooms, bathrooms, hallways, or entryways, where it can serve as a practical storage solution while contributing to the tidiness and aesthetic of the space.
--------------------------

subrtex Leather ding Room, Dining Chairs Set of 2,... - https://www.amazon.com/dp/B0B66QHB23 :

This image showcases a set of two dining chairs. The chairs are upholstered in black leather, featuring a sleek and modern design. They have a high back with subtle stitching details that create vertical lines, adding an element of elegance to the overall appearance. The legs of the chairs are also black, maintaining a consistent color scheme and enhancing the sophisticated look. These chairs would make a stylish addition to any contemporary dining room setting.
--------------------------

Plant Repotting Mat MUYETOL Waterproof Transplanti... - https://www.amazon.com/dp/B0BXRTWLYK :

This is a square plant repotting mat designed for indoor gardening activities such as transplanting or changing soil for plants. The mat measures 26.8 inches by 26.8 inches, providing ample space for gardening tasks. It is made from a waterproof material, which is likely a durable, easy-to-clean fabric, in a vibrant green color. The edges of the mat are raised with corner fastenings to keep soil and water contained, making the workspace tidy and preventing messes. The mat is also foldable, which allows for convenient storage when not in use. This mat is suitable for a variety of gardening tasks, including working with succulents and other small plants, and it can be a practical gift for garden enthusiasts.
--------------------------

Pickleball Doormat, Welcome Doormat Absorbent Non-... - https://www.amazon.com/dp/B0C1MRB2M8 :

This is a rectangular doormat featuring a playful design that caters to pickleball enthusiasts. The mat's background is a natural coir color, which is a fibrous material made from coconut husks, known for its durability and excellent scraping properties. Emblazoned across the mat in bold, black letters is the phrase "it's a good day to play PICKLEBALL," with the word "PICKLEBALL" being prominently displayed in larger font size. Below the text, there are two crossed pickleball paddles in black, symbolizing the sport.

The doormat measures approximately 16x24 inches, making it a suitable size for a variety of entryways. Its design suggests that it has an absorbent quality, which would be useful for wiping shoes and preventing dirt from entering the home. Additionally, the description implies that the doormat has a non-slip feature, which is likely due to a backing material that helps keep the mat in place on various floor surfaces. This mat would be a welcoming addition to the home of any pickleball player or sports enthusiast, offering both functionality and a touch of personal interest.
--------------------------

JOIN IRON Foldable TV Trays for Eating Set of 4 wi... - https://www.amazon.com/dp/B0CG1N9QRC :

This image features a set of four foldable TV trays with a stand, designed for eating or as snack tables. The tables are presented in a sleek grey finish, which gives them a modern and versatile look, suitable for a variety of home decor styles. Each tray table has a rectangular top with a wood grain texture, supported by a sturdy black iron frame that folds easily for compact storage. The accompanying stand allows for neat organization and easy access when the tables are not in use. These tables are ideal for small spaces where multifunctional furniture is essential, offering a convenient surface for meals, work, or leisure activities.
--------------------------

将描述转化为标题

使用少样本示例方法将长描述转化为简短的图像标题

caption_system_prompt = '''
Your goal is to generate short, descriptive captions for images of furniture items, decorative items, or furnishings based on an image description.
You will be provided with a description of an item image and you will output a caption that captures the most important information about the item.
Your generated caption should be short (1 sentence), and include the most relevant information about the item.
The most important information could be: the type of the item, the style (if mentioned), the material if especially relevant and any distinctive features.
'''

few_shot_examples = [
{
"description": "This is a multi-layer metal shoe rack featuring a free-standing design. It has a clean, white finish that gives it a modern and versatile look, suitable for various home decors. The rack includes several horizontal shelves dedicated to organizing shoes, providing ample space for multiple pairs. Above the shoe storage area, there are 8 double hooks arranged in two rows, offering additional functionality for hanging items such as hats, scarves, or bags. The overall structure is sleek and space-saving, making it an ideal choice for placement in living rooms, bathrooms, hallways, or entryways where efficient use of space is essential.",
"caption": "White metal free-standing shoe rack"
},
{
"description": "The image shows a set of two dining chairs in black. These chairs are upholstered in a leather-like material, giving them a sleek and sophisticated appearance. The design features straight lines with a slight curve at the top of the high backrest, which adds a touch of elegance. The chairs have a simple, vertical stitching detail on the backrest, providing a subtle decorative element. The legs are also black, creating a uniform look that would complement a contemporary dining room setting. The chairs appear to be designed for comfort and style, suitable for both casual and formal dining environments.",
"caption": "Set of 2 modern black leather dining chairs"
},
{
"description": "This is a square plant repotting mat designed for indoor gardening tasks such as transplanting and changing soil for plants. It measures 26.8 inches by 26.8 inches and is made from a waterproof material, which appears to be a durable, easy-to-clean fabric in a vibrant green color. The edges of the mat are raised with integrated corner loops, likely to keep soil and water contained during gardening activities. The mat is foldable, enhancing its portability, and can be used as a protective surface for various gardening projects, including working with succulents. It's a practical accessory for garden enthusiasts and makes for a thoughtful gift for those who enjoy indoor plant care.",
"caption": "Waterproof square plant repotting mat"
}
]

formatted_examples = [[{
"role": "user",
"content": ex['description']
},
{
"role": "assistant",
"content": ex['caption']
}]
for ex in few_shot_examples
]

formatted_examples = [i for ex in formatted_examples for i in ex]

def caption_image(description, model="gpt-4-turbo-preview"):
messages = formatted_examples
messages.insert(0,
{
"role": "system",
"content": caption_system_prompt
})
messages.append(
{
"role": "user",
"content": description
})
response = client.chat.completions.create(
model=model,
temperature=0.2,
messages=messages
)

return response.choices[0].message.content

在一些示例上进行测试

examples = df.iloc[5:8]

for index, row in examples.iterrows():
print(f"{row['title'][:50]}{'...' if len(row['title']) > 50 else ''} - {row['url']} :\n")
img_description = describe_image(row['primary_image'], row['title'])
print(f"{img_description}\n--------------------------\n")
img_caption = caption_image(img_description)
print(f"{img_caption}\n--------------------------\n")

LOVMOR 30'' Bathroom Vanity Sink Base Cabine, Stor... - https://www.amazon.com/dp/B0C9WYYFLB :

This is a LOVMOR 30'' Bathroom Vanity Sink Base Cabinet featuring a classic design with a rich brown finish. The cabinet is designed to provide ample storage with three drawers on the left side, offering organized space for bathroom essentials. The drawers are likely to have smooth glides for easy operation. Below the drawers, there is a large cabinet door that opens to reveal additional storage space, suitable for larger items. The paneling on the drawers and door features a raised, framed design, adding a touch of elegance to the overall appearance. This vanity base is versatile and can be used not only in bathrooms but also in kitchens, laundry rooms, and other areas where extra storage is needed. The construction material is not specified, but it appears to be made of wood or a wood-like composite. Please note that the countertop and sink are not included and would need to be purchased separately.
--------------------------

LOVMOR 30'' classic brown bathroom vanity base cabinet with three drawers and additional storage space.
--------------------------

Folews Bathroom Organizer Over The Toilet Storage,... - https://www.amazon.com/dp/B09NZY3R1T :

This is a 4-tier bathroom organizer designed to fit over a standard toilet, providing a space-saving storage solution. The unit is constructed with a sturdy metal frame in a black finish, which offers both durability and a sleek, modern look. The design includes four shelves that offer ample space for bathroom essentials, towels, and decorative items. Two of the shelves are designed with a metal grid pattern, while the other two feature a solid metal surface for stable storage.

Additionally, the organizer includes adjustable baskets, which can be positioned according to your storage needs, allowing for customization and flexibility. The freestanding structure is engineered to maximize the unused vertical space above the toilet, making it an ideal choice for small bathrooms or for those looking to declutter countertops and cabinets.

The overall design is minimalist and functional, with clean lines that can complement a variety of bathroom decors. The open shelving concept ensures that items are easily accessible and visible. Installation is typically straightforward, with no need for wall mounting, making it a convenient option for renters or those who prefer not to drill into walls.
--------------------------

Modern 4-tier black metal bathroom organizer with adjustable shelves and baskets, designed to fit over a standard toilet for space-saving storage.
--------------------------

GOYMFK 1pc Free Standing Shoe Rack, Multi-layer Me... - https://www.amazon.com/dp/B0CJHKVG6P :

This is a multi-functional free-standing shoe rack featuring a sturdy metal construction with a white finish. It is designed with multiple layers, providing ample space to organize and store shoes. The rack includes four tiers dedicated to shoe storage, each tier capable of holding several pairs of shoes.

Above the shoe storage area, there is an additional shelf that can be used for placing bags, small decorative items, or other accessories. At the top, the rack is equipped with 8 double hooks, which are ideal for hanging hats, scarves, coats, or umbrellas, making it a versatile piece for an entryway, living room, bathroom, or hallway.

The overall design is sleek and modern, with clean lines that would complement a variety of home decor styles. The vertical structure of the rack makes it a space-saving solution for keeping footwear and accessories organized in areas with limited floor space.
--------------------------

Multi-layer white metal shoe rack with additional shelf and 8 double hooks for versatile storage in entryways or hallways.
--------------------------

图像搜索

在本节中,我们将使用生成的关键词和标题来搜索与给定输入(文本或图像)匹配的项目。

我们将利用我们的嵌入模型为关键词和标题生成嵌入,并将它们与输入文本或从输入图像生成的标题进行比较。

# Df we'll use to compare keywords
df_keywords = pd.DataFrame(columns=['keyword', 'embedding'])
df['keywords'] = ''
df['img_description'] = ''
df['caption'] = ''

# Function to replace a keyword with an existing keyword if it's too similar
def get_keyword(keyword, df_keywords, threshold = 0.6):
embedded_value = get_embedding(keyword)
df_keywords['similarity'] = df_keywords['embedding'].apply(lambda x: cosine_similarity(np.array(x).reshape(1,-1), np.array(embedded_value).reshape(1, -1)))
sorted_keywords = df_keywords.copy().sort_values('similarity', ascending=False)
if len(sorted_keywords) > 0 :
most_similar = sorted_keywords.iloc[0]
if most_similar['similarity'] > threshold:
print(f"Replacing '{keyword}' with existing keyword: '{most_similar['keyword']}'")
return most_similar['keyword']
new_keyword = {
'keyword': keyword,
'embedding': embedded_value
}
df_keywords = pd.concat([df_keywords, pd.DataFrame([new_keyword])], ignore_index=True)
return keyword

准备数据集

import ast

def tag_and_caption(row):
keywords = analyze_image(row['primary_image'], row['title'])
try:
keywords = ast.literal_eval(keywords)
mapped_keywords = [get_keyword(k, df_keywords) for k in keywords]
except Exception as e:
print(f"Error parsing keywords: {keywords}")
mapped_keywords = []
img_description = describe_image(row['primary_image'], row['title'])
caption = caption_image(img_description)
return {
'keywords': mapped_keywords,
'img_description': img_description,
'caption': caption
}

df.shape

(312, 9)

处理数据集的所有312行将需要一些时间。 为了测试这个想法,我们将只在前50行上运行它:这需要大约20分钟。 如果你愿意,可以跳过这一步,直接加载已经处理过的数据集(见下文)。

# 运行前50行
for index, row in df[:50].iterrows():
print(f"{index} - {row['title'][:50]}{'...' if len(row['title']) > 50 else ''}")
updates = tag_and_caption(row)
df.loc[index, updates.keys()] = updates.values()

df.head()

title primary_image style material color url keywords img_description caption
0 GOYMFK 1pc Free Standing Shoe Rack, Multi-laye... https://m.media-amazon.com/images/I/416WaLx10j... Modern Metal White https://www.amazon.com/dp/B0CJHKVG6P [shoe rack, free standing, multi-layer, metal,... This is a free-standing shoe rack featuring a ... White metal free-standing shoe rack with multi...
1 subrtex Leather ding Room, Dining Chairs Set o... https://m.media-amazon.com/images/I/31SejUEWY7... Black Rubber Wood Sponge Black https://www.amazon.com/dp/B0B66QHB23 [dining chairs, set of 2, leather, black] This image features a set of two black dining ... Set of 2 sleek black faux leather dining chair...
2 Plant Repotting Mat MUYETOL Waterproof Transpl... https://m.media-amazon.com/images/I/41RgefVq70... Modern Polyethylene Green https://www.amazon.com/dp/B0BXRTWLYK [plant repotting mat, waterproof, portable, fo... This is a square plant repotting mat designed ... Waterproof green square plant repotting mat
3 Pickleball Doormat, Welcome Doormat Absorbent ... https://m.media-amazon.com/images/I/61vz1Igler... Modern Rubber A5589 https://www.amazon.com/dp/B0C1MRB2M8 [doormat, absorbent, non-slip, brown] This is a rectangular doormat featuring a play... Pickleball-themed coir doormat with playful de...
4 JOIN IRON Foldable TV Trays for Eating Set of ... https://m.media-amazon.com/images/I/41p4d4VJnN... X Classic Style Iron Grey Set of 4 https://www.amazon.com/dp/B0CG1N9QRC [tv tray table set, foldable, iron, grey] This image showcases a set of two foldable TV ... Set of two foldable TV trays with grey wood gr...
# 本地保存以供稍后使用
data_path = "data/items_tagged_and_captioned.csv"
df.to_csv(data_path, index=False)

# 可选:从保存的文件加载数据
df = pd.read_csv(data_path)

嵌入标题和关键词

现在我们可以使用生成的标题和关键词来将相关内容与输入文本查询或标题进行匹配。 为了做到这一点,我们将嵌入关键词+标题的组合。 注意:创建嵌入将需要大约3分钟的运行时间。可以随时加载预处理的数据集(见下文)。

df_search = df.copy()

def embed_tags_caption(x):
if x['caption'] != '':
keywords_string = ",".join(k for k in x['keywords']) + '\n'
content = keywords_string + x['caption']
embedding = get_embedding(content)
return embedding

df_search['embedding'] = df_search.apply(lambda x: embed_tags_caption(x), axis=1)

df_search.head()

title primary_image style material color url keywords img_description caption embedding
0 GOYMFK 1pc Free Standing Shoe Rack, Multi-laye... https://m.media-amazon.com/images/I/416WaLx10j... Modern Metal White https://www.amazon.com/dp/B0CJHKVG6P ['shoe rack', 'free standing', 'multi-layer', ... This is a free-standing shoe rack featuring a ... White metal free-standing shoe rack with multi... [-0.06596625, -0.026769113, -0.013789515, -0.0...
1 subrtex Leather ding Room, Dining Chairs Set o... https://m.media-amazon.com/images/I/31SejUEWY7... Black Rubber Wood Sponge Black https://www.amazon.com/dp/B0B66QHB23 ['dining chairs', 'set of 2', 'leather', 'black'] This image features a set of two black dining ... Set of 2 sleek black faux leather dining chair... [-0.0077859573, -0.010376813, -0.01928079, -0....
2 Plant Repotting Mat MUYETOL Waterproof Transpl... https://m.media-amazon.com/images/I/41RgefVq70... Modern Polyethylene Green https://www.amazon.com/dp/B0BXRTWLYK ['plant repotting mat', 'waterproof', 'portabl... This is a square plant repotting mat designed ... Waterproof green square plant repotting mat [-0.023248248, 0.005370147, -0.0048999498, -0....
3 Pickleball Doormat, Welcome Doormat Absorbent ... https://m.media-amazon.com/images/I/61vz1Igler... Modern Rubber A5589 https://www.amazon.com/dp/B0C1MRB2M8 ['doormat', 'absorbent', 'non-slip', 'brown'] This is a rectangular doormat featuring a play... Pickleball-themed coir doormat with playful de... [-0.028953036, -0.026369056, -0.011363288, 0.0...
4 JOIN IRON Foldable TV Trays for Eating Set of ... https://m.media-amazon.com/images/I/41p4d4VJnN... X Classic Style Iron Grey Set of 4 https://www.amazon.com/dp/B0CG1N9QRC ['tv tray table set', 'foldable', 'iron', 'grey'] This image showcases a set of two foldable TV ... Set of two foldable TV trays with grey wood gr... [-0.030723095, -0.0051356032, -0.027088132, 0....
# 仅保留包含嵌入向量的行
df_search = df_search.dropna(subset=['embedding'])
print(df_search.shape)

(49, 10)
# 本地保存以供稍后使用
data_embeddings_path = "data/items_tagged_and_captioned_embeddings.csv"
df_search.to_csv(data_embeddings_path, index=False)

# 可选操作:从保存的文件中加载数据
from ast import literal_eval
df_search = pd.read_csv(data_embeddings_path)
df_search["embedding"] = df_search.embedding.apply(literal_eval).apply(np.array)

从输入文本中搜索

我们可以直接将用户输入的文本与我们刚刚创建的嵌入进行比较。

# 寻找N个最相似的结果
def search_from_input_text(query, n = 2):
embedded_value = get_embedding(query)
df_search['similarity'] = df_search['embedding'].apply(lambda x: cosine_similarity(np.array(x).reshape(1,-1), np.array(embedded_value).reshape(1, -1)))
most_similar = df_search.sort_values('similarity', ascending=False).iloc[:n]
return most_similar

user_inputs = ['shoe storage', 'black metal side table', 'doormat', 'step bookshelf', 'ottoman']

for i in user_inputs:
print(f"Input: {i}\n")
res = search_from_input_text(i)
for index, row in res.iterrows():
similarity_score = row['similarity']
if isinstance(similarity_score, np.ndarray):
similarity_score = similarity_score[0][0]
print(f"{row['title'][:50]}{'...' if len(row['title']) > 50 else ''} ({row['url']}) - Similarity: {similarity_score:.2f}")
img = Image(url=row['primary_image'])
display(img)
print("\n\n")

Input: shoe storage

GOYMFK 1pc Free Standing Shoe Rack, Multi-layer Me... (https://www.amazon.com/dp/B0CJHKVG6P) - Similarity: 0.62



GOYMFK 1pc Free Standing Shoe Rack, Multi-layer Me... (https://www.amazon.com/dp/B0CJHKVG6P) - Similarity: 0.57



Input: black metal side table

FLYJOE Narrow Side Table with PU Leather Magazine ... (https://www.amazon.com/dp/B0CHYDTQKN) - Similarity: 0.59



HomePop Metal Accent Table Triangle Base Round Mir... (https://www.amazon.com/dp/B08N5H868H) - Similarity: 0.57



Input: doormat

Pickleball Doormat, Welcome Doormat Absorbent Non-... (https://www.amazon.com/dp/B0C1MRB2M8) - Similarity: 0.59



Caroline's Treasures PPD3013JMAT Enchanted Garden ... (https://www.amazon.com/dp/B08Q5KDSQK) - Similarity: 0.57



Input: step bookshelf

Leick Home 70007-WTGD Mixed Metal and Wood Stepped... (https://www.amazon.com/dp/B098KNRNLQ) - Similarity: 0.61



Wildkin Kids Canvas Sling Bookshelf with Storage f... (https://www.amazon.com/dp/B07GBVFZ1Y) - Similarity: 0.47



Input: ottoman

HomePop Home Decor | K2380-YDQY-2 | Luxury Large F... (https://www.amazon.com/dp/B0B94T1TZ1) - Similarity: 0.53



Moroccan Leather Pouf Ottoman for Living Room - Ro... (https://www.amazon.com/dp/B0CP45784G) - Similarity: 0.51


从图像搜索

如果输入是一张图片,我们可以通过首先将图像转换为标题,然后嵌入这些标题来将它们与已创建的嵌入进行比较,从而找到相似的图像。

# We'll take a mix of images: some we haven't seen and some that are already in the dataset
example_images = df.iloc[306:]['primary_image'].to_list() + df.iloc[5:10]['primary_image'].to_list()

for i in example_images:
img_description = describe_image(i, '')
caption = caption_image(img_description)
img = Image(url=i)
print('Input: \n')
display(img)
res = search_from_input_text(caption, 1).iloc[0]
similarity_score = res['similarity']
if isinstance(similarity_score, np.ndarray):
similarity_score = similarity_score[0][0]
print(f"{res['title'][:50]}{'...' if len(res['title']) > 50 else ''} ({res['url']}) - Similarity: {similarity_score:.2f}")
img_res = Image(url=res['primary_image'])
display(img_res)
print("\n\n")


Input: 
Mimoglad Office Chair, High Back Ergonomic Desk Ch... (https://www.amazon.com/dp/B0C2YQZS69) - Similarity: 0.63



Input:
CangLong Mid Century Modern Side Chair with Wood L... (https://www.amazon.com/dp/B08RTLBD1T) - Similarity: 0.51



Input:
MAEPA RV Shoe Storage for Bedside - 8 Extra Large ... (https://www.amazon.com/dp/B0C4PL1R3F) - Similarity: 0.61



Input:
Chief Mfg.Swing-Arm Wall Mount Hardware Mount Blac... (https://www.amazon.com/dp/B007E40Z5K) - Similarity: 0.63



Input:
HomePop Home Decor | K2380-YDQY-2 | Luxury Large F... (https://www.amazon.com/dp/B0B94T1TZ1) - Similarity: 0.63



Input:
CangLong Mid Century Modern Side Chair with Wood L... (https://www.amazon.com/dp/B08RTLBD1T) - Similarity: 0.58



Input:
LOVMOR 30'' Bathroom Vanity Sink Base Cabine, Stor... (https://www.amazon.com/dp/B0C9WYYFLB) - Similarity: 0.69



Input:
Folews Bathroom Organizer Over The Toilet Storage,... (https://www.amazon.com/dp/B09NZY3R1T) - Similarity: 0.82



Input:
GOYMFK 1pc Free Standing Shoe Rack, Multi-layer Me... (https://www.amazon.com/dp/B0CJHKVG6P) - Similarity: 0.69



Input:
subrtex Leather ding Room, Dining Chairs Set of 2,... (https://www.amazon.com/dp/B0B66QHB23) - Similarity: 0.87



Input:
Plant Repotting Mat MUYETOL Waterproof Transplanti... (https://www.amazon.com/dp/B0BXRTWLYK) - Similarity: 0.69


总结

在这个笔记本中,我们探讨了如何利用GPT-4V的多模态能力来为图像打标签和添加描述。通过向模型提供图像和上下文信息,我们能够生成标签和描述,然后可以使用类似GPT-4-turbo的语言模型进一步完善生成图像描述。这个过程在各种场景中都有实际应用,特别是在增强搜索功能方面。

所展示的搜索用例可以直接应用于推荐系统等应用程序,但本笔记本中涵盖的技术可以扩展到物品搜索以及在多种用例中使用,例如利用非结构化图像数据的RAG应用程序。

作为下一步,您可以探索结合基于规则的过滤和关键词以及使用标题进行嵌入式搜索的组合,以检索更相关的结果。