跳到主要内容

使用Weaviate进行嵌入搜索

nbviewer

本笔记本将带您完成一个简单的流程,下载一些数据,对其进行嵌入,然后使用一些向量数据库进行索引和搜索。这是客户常见的需求,他们希望在安全环境中存储和搜索我们的嵌入,以支持生产用例,如聊天机器人、主题建模等。

什么是向量数据库

向量数据库是一种用于存储、管理和搜索嵌入向量的数据库。近年来,使用嵌入来将非结构化数据(文本、音频、视频等)编码为向量,以供机器学习模型消费的做法已经蓬勃发展,这是由于人工智能在解决涉及自然语言、图像识别和其他非结构化数据形式的用例时日益有效。向量数据库已经成为企业提供和扩展这些用例的有效解决方案。

为什么使用向量数据库

向量数据库使企业能够将我们在这个存储库中分享的许多嵌入用例(例如问答、聊天机器人和推荐服务)应用于安全、可扩展的环境中。许多客户使用嵌入在小规模上解决问题,但性能和安全性阻碍了它们投入生产 - 我们认为向量数据库是解决这一问题的关键组成部分,在本指南中,我们将介绍嵌入文本数据的基础知识,将其存储在向量数据库中,并将其用于语义搜索。

演示流程

演示流程如下: - 设置:导入包并设置任何必需的变量 - 加载数据:加载数据集并使用OpenAI嵌入对其进行嵌入 - Weaviate - 设置:在这里,我们将为Weaviate设置Python客户端。有关更多详细信息,请访问这里 - 索引数据:我们将创建一个包含__title__搜索向量的索引 - 搜索数据:我们将运行一些搜索以确认其有效性

完成本笔记本后,您应该对如何设置和使用向量数据库有基本的了解,并可以继续进行更复杂的用例,利用我们的嵌入。

设置

导入所需的库并设置我们想要使用的嵌入模型。

# 我们需要安装Weaviate客户端。
!pip install weaviate-client

#安装wget以拉取压缩文件
!pip install wget

import openai

from typing import List, Iterator
import pandas as pd
import numpy as np
import os
import wget
from ast import literal_eval

# Weaviate's client library for Python
import weaviate

# I've set this to our new embeddings model, this can be changed to the embedding model of your choice
EMBEDDING_MODEL = "text-embedding-3-small"

# 忽略未关闭的SSL套接字警告 - 如果你遇到这些错误,可以选择忽略。
import warnings

warnings.filterwarnings(action="ignore", message="unclosed", category=ResourceWarning)
warnings.filterwarnings("ignore", category=DeprecationWarning)

加载数据

在这一部分,我们将加载之前为本次会话准备的嵌入数据。

embeddings_url = 'https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedded.zip'

# 文件大小约为700MB,因此需要一些时间来完成。
wget.download(embeddings_url)

import zipfile
with zipfile.ZipFile("vector_database_wikipedia_articles_embedded.zip","r") as zip_ref:
zip_ref.extractall("../data")

article_df = pd.read_csv('../data/vector_database_wikipedia_articles_embedded.csv')

article_df.head()

id url title text title_vector content_vector vector_id
0 1 https://simple.wikipedia.org/wiki/April April April is the fourth month of the year in the J... [0.001009464613161981, -0.020700545981526375, ... [-0.011253940872848034, -0.013491976074874401,... 0
1 2 https://simple.wikipedia.org/wiki/August August August (Aug.) is the eighth month of the year ... [0.0009286514250561595, 0.000820168002974242, ... [0.0003609954728744924, 0.007262262050062418, ... 1
2 6 https://simple.wikipedia.org/wiki/Art Art Art is a creative activity that expresses imag... [0.003393713850528002, 0.0061537534929811954, ... [-0.004959689453244209, 0.015772193670272827, ... 2
3 8 https://simple.wikipedia.org/wiki/A A A or a is the first letter of the English alph... [0.0153952119871974, -0.013759135268628597, 0.... [0.024894846603274345, -0.022186409682035446, ... 3
4 9 https://simple.wikipedia.org/wiki/Air Air Air refers to the Earth's atmosphere. Air is a... [0.02224554680287838, -0.02044147066771984, -0... [0.021524671465158463, 0.018522677943110466, -... 4
# 从字符串中读取向量并将其转换为列表
article_df['title_vector'] = article_df.title_vector.apply(literal_eval)
article_df['content_vector'] = article_df.content_vector.apply(literal_eval)

# 将 `vector_id` 设置为一个字符串
article_df['vector_id'] = article_df['vector_id'].apply(str)

article_df.info(show_counts=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25000 entries, 0 to 24999
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 25000 non-null int64
1 url 25000 non-null object
2 title 25000 non-null object
3 text 25000 non-null object
4 title_vector 25000 non-null object
5 content_vector 25000 non-null object
6 vector_id 25000 non-null object
dtypes: int64(1), object(6)
memory usage: 1.3+ MB

Weaviate

另一个我们将探索的向量数据库选项是Weaviate,它提供托管的SaaS选项,以及自托管的开源选项。由于我们已经看过了云向量数据库,这里我们将尝试自托管选项。

为此,我们将: - 在本地部署Weaviate - 在Weaviate中创建索引 - 将我们的数据存储在那里 - 发出一些相似性搜索查询 - 尝试一个真实的用例

自带向量方法

在这本食谱中,我们提供了已经生成向量的数据。这是一个适用于您的数据已经向量化的情况的良好方法。

使用OpenAI模块进行自动向量化

对于尚未向量化数据的情况,您可以将向量化任务委托给Weaviate的OpenAI。 Weaviate提供了一个内置模块text2vec-openai,它会在以下情况下为您处理向量化: * 导入时 * 任何CRUD操作 * 语义搜索

查看开始使用Weaviate和OpenAI模块食谱,逐步了解如何一步导入和向量化数据。

设置

要在本地运行Weaviate,您需要Docker。按照Weaviate文档中的说明这里,我们在这个存储库中创建了一个示例docker-compose.yml文件,保存在./weaviate/docker-compose.yml

启动Docker后,您可以通过导航到examples/vector_databases/weaviate/目录并运行docker-compose up -d来在本地启动Weaviate。

云服务

或者,您可以使用Weaviate云服务(WCS)来创建一个免费的Weaviate集群。 1. 创建一个免费账户并/或登录WCS 2. 使用以下设置创建一个Weaviate集群: * 沙箱:Sandbox Free * Weaviate版本:使用默认值(最新版本) * OIDC身份验证:已禁用 3. 您的实例应该在一两分钟内准备就绪 4. 记下集群ID。该链接将带您到您集群的完整路径(稍后您将需要连接到它)。它应该类似于:https://your-project-name-suffix.weaviate.network

# 选项一 - 自托管 - Weaviate 开源版 
client = weaviate.Client(
url="http://localhost:8080",
additional_headers={
"X-OpenAI-Api-Key": os.getenv("OPENAI_API_KEY")
}
)

# 选项二 - SaaS - (Weaviate 云服务)
client = weaviate.Client(
url="https://your-wcs-instance-name.weaviate.network",
additional_headers={
"X-OpenAI-Api-Key": os.getenv("OPENAI_API_KEY")
}
)

client.is_ready()

索引数据

在Weaviate中,您可以创建__模式__来捕获您将要搜索的每个实体。

在本例中,我们将创建一个名为Article的模式,其中包括上面提到的title向量,以便我们可以通过它进行搜索。

接下来的几个步骤紧随Weaviate提供的文档这里

# 清理架构,以便我们能够重新创建它。
client.schema.delete_all()
client.schema.get()

# 定义Schema对象,对`title`和`content`字段使用`text-embedding-3-small`进行处理,但跳过`url`字段。
article_schema = {
"class": "Article",
"description": "A collection of articles",
"vectorizer": "text2vec-openai",
"moduleConfig": {
"text2vec-openai": {
"model": "ada",
"modelVersion": "002",
"type": "text"
}
},
"properties": [{
"name": "title",
"description": "Title of the article",
"dataType": ["string"]
},
{
"name": "content",
"description": "Contents of the article",
"dataType": ["text"],
"moduleConfig": { "text2vec-openai": { "skip": True } }
}]
}

# 添加文章模式
client.schema.create_class(article_schema)

# 获取架构以确保其正常工作
client.schema.get()

{'classes': [{'class': 'Article',
'description': 'A collection of articles',
'invertedIndexConfig': {'bm25': {'b': 0.75, 'k1': 1.2},
'cleanupIntervalSeconds': 60,
'stopwords': {'additions': None, 'preset': 'en', 'removals': None}},
'moduleConfig': {'text2vec-openai': {'model': 'ada',
'modelVersion': '002',
'type': 'text',
'vectorizeClassName': True}},
'properties': [{'dataType': ['string'],
'description': 'Title of the article',
'moduleConfig': {'text2vec-openai': {'skip': False,
'vectorizePropertyName': False}},
'name': 'title',
'tokenization': 'word'},
{'dataType': ['text'],
'description': 'Contents of the article',
'moduleConfig': {'text2vec-openai': {'skip': True,
'vectorizePropertyName': False}},
'name': 'content',
'tokenization': 'word'}],
'replicationConfig': {'factor': 1},
'shardingConfig': {'virtualPerPhysical': 128,
'desiredCount': 1,
'actualCount': 1,
'desiredVirtualCount': 128,
'actualVirtualCount': 128,
'key': '_id',
'strategy': 'hash',
'function': 'murmur3'},
'vectorIndexConfig': {'skip': False,
'cleanupIntervalSeconds': 300,
'maxConnections': 64,
'efConstruction': 128,
'ef': -1,
'dynamicEfMin': 100,
'dynamicEfMax': 500,
'dynamicEfFactor': 8,
'vectorCacheMaxObjects': 1000000000000,
'flatSearchCutoff': 40000,
'distance': 'cosine'},
'vectorIndexType': 'hnsw',
'vectorizer': 'text2vec-openai'}]}
# ## 步骤1 - 配置Weaviate批处理,优化批量CRUD操作
# - 初始批次大小为100
# - 根据性能动态增减
# - 如果出现问题,增加超时重试机制

client.batch.configure(
batch_size=100,
dynamic=True,
timeout_retries=3,
)

<weaviate.batch.crud_batch.Batch at 0x3f0ca0fa0>
# ##步骤2 - 导入数据

print("Uploading data with vectors to Article schema..")

counter=0

with client.batch as batch:
for k,v in article_df.iterrows():

# 每处理100个对象打印一次更新信息
if (counter %100 == 0):
print(f"Import {counter} / {len(article_df)} ")

properties = {
"title": v["title"],
"content": v["text"]
}

vector = v["title_vector"]

batch.add_data_object(properties, "Article", None, vector)
counter = counter+1

print(f"Importing ({len(article_df)}) Articles complete")

Uploading data with vectors to Article schema..
Import 0 / 25000
Import 100 / 25000
Import 200 / 25000
Import 300 / 25000
Import 400 / 25000
Import 500 / 25000
Import 600 / 25000
Import 700 / 25000
Import 800 / 25000
Import 900 / 25000
Import 1000 / 25000
Import 1100 / 25000
Import 1200 / 25000
Import 1300 / 25000
Import 1400 / 25000
Import 1500 / 25000
Import 1600 / 25000
Import 1700 / 25000
Import 1800 / 25000
Import 1900 / 25000
Import 2000 / 25000
Import 2100 / 25000
Import 2200 / 25000
Import 2300 / 25000
Import 2400 / 25000
Import 2500 / 25000
Import 2600 / 25000
Import 2700 / 25000
Import 2800 / 25000
Import 2900 / 25000
Import 3000 / 25000
Import 3100 / 25000
Import 3200 / 25000
Import 3300 / 25000
Import 3400 / 25000
Import 3500 / 25000
Import 3600 / 25000
Import 3700 / 25000
Import 3800 / 25000
Import 3900 / 25000
Import 4000 / 25000
Import 4100 / 25000
Import 4200 / 25000
Import 4300 / 25000
Import 4400 / 25000
Import 4500 / 25000
Import 4600 / 25000
Import 4700 / 25000
Import 4800 / 25000
Import 4900 / 25000
Import 5000 / 25000
Import 5100 / 25000
Import 5200 / 25000
Import 5300 / 25000
Import 5400 / 25000
Import 5500 / 25000
Import 5600 / 25000
Import 5700 / 25000
Import 5800 / 25000
Import 5900 / 25000
Import 6000 / 25000
Import 6100 / 25000
Import 6200 / 25000
Import 6300 / 25000
Import 6400 / 25000
Import 6500 / 25000
Import 6600 / 25000
Import 6700 / 25000
Import 6800 / 25000
Import 6900 / 25000
Import 7000 / 25000
Import 7100 / 25000
Import 7200 / 25000
Import 7300 / 25000
Import 7400 / 25000
Import 7500 / 25000
Import 7600 / 25000
Import 7700 / 25000
Import 7800 / 25000
Import 7900 / 25000
Import 8000 / 25000
Import 8100 / 25000
Import 8200 / 25000
Import 8300 / 25000
Import 8400 / 25000
Import 8500 / 25000
Import 8600 / 25000
Import 8700 / 25000
Import 8800 / 25000
Import 8900 / 25000
Import 9000 / 25000
Import 9100 / 25000
Import 9200 / 25000
Import 9300 / 25000
Import 9400 / 25000
Import 9500 / 25000
Import 9600 / 25000
Import 9700 / 25000
Import 9800 / 25000
Import 9900 / 25000
Import 10000 / 25000
Import 10100 / 25000
Import 10200 / 25000
Import 10300 / 25000
Import 10400 / 25000
Import 10500 / 25000
Import 10600 / 25000
Import 10700 / 25000
Import 10800 / 25000
Import 10900 / 25000
Import 11000 / 25000
Import 11100 / 25000
Import 11200 / 25000
Import 11300 / 25000
Import 11400 / 25000
Import 11500 / 25000
Import 11600 / 25000
Import 11700 / 25000
Import 11800 / 25000
Import 11900 / 25000
Import 12000 / 25000
Import 12100 / 25000
Import 12200 / 25000
Import 12300 / 25000
Import 12400 / 25000
Import 12500 / 25000
Import 12600 / 25000
Import 12700 / 25000
Import 12800 / 25000
Import 12900 / 25000
Import 13000 / 25000
Import 13100 / 25000
Import 13200 / 25000
Import 13300 / 25000
Import 13400 / 25000
Import 13500 / 25000
Import 13600 / 25000
Import 13700 / 25000
Import 13800 / 25000
Import 13900 / 25000
Import 14000 / 25000
Import 14100 / 25000
Import 14200 / 25000
Import 14300 / 25000
Import 14400 / 25000
Import 14500 / 25000
Import 14600 / 25000
Import 14700 / 25000
Import 14800 / 25000
Import 14900 / 25000
Import 15000 / 25000
Import 15100 / 25000
Import 15200 / 25000
Import 15300 / 25000
Import 15400 / 25000
Import 15500 / 25000
Import 15600 / 25000
Import 15700 / 25000
Import 15800 / 25000
Import 15900 / 25000
Import 16000 / 25000
Import 16100 / 25000
Import 16200 / 25000
Import 16300 / 25000
Import 16400 / 25000
Import 16500 / 25000
Import 16600 / 25000
Import 16700 / 25000
Import 16800 / 25000
Import 16900 / 25000
Import 17000 / 25000
Import 17100 / 25000
Import 17200 / 25000
Import 17300 / 25000
Import 17400 / 25000
Import 17500 / 25000
Import 17600 / 25000
Import 17700 / 25000
Import 17800 / 25000
Import 17900 / 25000
Import 18000 / 25000
Import 18100 / 25000
Import 18200 / 25000
Import 18300 / 25000
Import 18400 / 25000
Import 18500 / 25000
Import 18600 / 25000
Import 18700 / 25000
Import 18800 / 25000
Import 18900 / 25000
Import 19000 / 25000
Import 19100 / 25000
Import 19200 / 25000
Import 19300 / 25000
Import 19400 / 25000
Import 19500 / 25000
Import 19600 / 25000
Import 19700 / 25000
Import 19800 / 25000
Import 19900 / 25000
Import 20000 / 25000
Import 20100 / 25000
Import 20200 / 25000
Import 20300 / 25000
Import 20400 / 25000
Import 20500 / 25000
Import 20600 / 25000
Import 20700 / 25000
Import 20800 / 25000
Import 20900 / 25000
Import 21000 / 25000
Import 21100 / 25000
Import 21200 / 25000
Import 21300 / 25000
Import 21400 / 25000
Import 21500 / 25000
Import 21600 / 25000
Import 21700 / 25000
Import 21800 / 25000
Import 21900 / 25000
Import 22000 / 25000
Import 22100 / 25000
Import 22200 / 25000
Import 22300 / 25000
Import 22400 / 25000
Import 22500 / 25000
Import 22600 / 25000
Import 22700 / 25000
Import 22800 / 25000
Import 22900 / 25000
Import 23000 / 25000
Import 23100 / 25000
Import 23200 / 25000
Import 23300 / 25000
Import 23400 / 25000
Import 23500 / 25000
Import 23600 / 25000
Import 23700 / 25000
Import 23800 / 25000
Import 23900 / 25000
Import 24000 / 25000
Import 24100 / 25000
Import 24200 / 25000
Import 24300 / 25000
Import 24400 / 25000
Import 24500 / 25000
Import 24600 / 25000
Import 24700 / 25000
Import 24800 / 25000
Import 24900 / 25000
Importing (25000) Articles complete
# 测试所有数据是否已加载 – 获取对象数量
result = (
client.query.aggregate("Article")
.with_fields("meta { count }")
.do()
)
print("Object count: ", result["data"]["Aggregate"]["Article"])

Object count:  [{'meta': {'count': 25000}}]
# 通过检查一个对象,测试一篇文章已经生效。
test_article = (
client.query
.get("Article", ["title", "content", "_additional {id}"])
.with_limit(1)
.do()
)["data"]["Get"]["Article"][0]

print(test_article["_additional"]["id"])
print(test_article["title"])
print(test_article["content"])

000393f2-1182-4e3d-abcf-4217eda64be0
Lago d'Origlio
Lago d'Origlio is a lake in the municipality of Origlio, in Ticino, Switzerland.

Lakes of Ticino

搜索数据

与上面类似,我们将向我们的新索引发送一些查询,并根据与现有向量的接近程度返回结果。

def query_weaviate(query, collection_name, top_k=20):

# 从用户查询生成嵌入向量
embedded_query = openai.Embedding.create(
input=query,
model=EMBEDDING_MODEL,
)["data"][0]['embedding']

near_vector = {"vector": embedded_query}

# 查询输入模式与向量化用户查询
query_result = (
client.query
.get(collection_name, ["title", "content", "_additional {certainty distance}"])
.with_near_vector(near_vector)
.with_limit(top_k)
.do()
)

return query_result

query_result = query_weaviate("modern art in Europe", "Article")
counter = 0
for article in query_result["data"]["Get"]["Article"]:
counter += 1
print(f"{counter}. { article['title']} (Certainty: {round(article['_additional']['certainty'],3) }) (Distance: {round(article['_additional']['distance'],3) })")

1. Museum of Modern Art (Certainty: 0.938) (Distance: 0.125)
2. Western Europe (Certainty: 0.934) (Distance: 0.133)
3. Renaissance art (Certainty: 0.932) (Distance: 0.136)
4. Pop art (Certainty: 0.93) (Distance: 0.14)
5. Northern Europe (Certainty: 0.927) (Distance: 0.145)
6. Hellenistic art (Certainty: 0.926) (Distance: 0.147)
7. Modernist literature (Certainty: 0.924) (Distance: 0.153)
8. Art film (Certainty: 0.922) (Distance: 0.157)
9. Central Europe (Certainty: 0.921) (Distance: 0.157)
10. European (Certainty: 0.921) (Distance: 0.159)
11. Art (Certainty: 0.921) (Distance: 0.159)
12. Byzantine art (Certainty: 0.92) (Distance: 0.159)
13. Postmodernism (Certainty: 0.92) (Distance: 0.16)
14. Eastern Europe (Certainty: 0.92) (Distance: 0.161)
15. Europe (Certainty: 0.919) (Distance: 0.161)
16. Cubism (Certainty: 0.919) (Distance: 0.161)
17. Impressionism (Certainty: 0.919) (Distance: 0.162)
18. Bauhaus (Certainty: 0.919) (Distance: 0.162)
19. Expressionism (Certainty: 0.918) (Distance: 0.163)
20. Surrealism (Certainty: 0.918) (Distance: 0.163)
query_result = query_weaviate("Famous battles in Scottish history", "Article")
counter = 0
for article in query_result["data"]["Get"]["Article"]:
counter += 1
print(f"{counter}. {article['title']} (Score: {round(article['_additional']['certainty'],3) })")

1. Historic Scotland (Score: 0.946)
2. First War of Scottish Independence (Score: 0.946)
3. Battle of Bannockburn (Score: 0.946)
4. Wars of Scottish Independence (Score: 0.944)
5. Second War of Scottish Independence (Score: 0.94)
6. List of Scottish monarchs (Score: 0.937)
7. Scottish Borders (Score: 0.932)
8. Braveheart (Score: 0.929)
9. John of Scotland (Score: 0.929)
10. Guardians of Scotland (Score: 0.926)
11. Holyrood Abbey (Score: 0.925)
12. Scottish (Score: 0.925)
13. Scots (Score: 0.925)
14. Robert I of Scotland (Score: 0.924)
15. Scottish people (Score: 0.924)
16. Edinburgh Castle (Score: 0.924)
17. Alexander I of Scotland (Score: 0.924)
18. Robert Burns (Score: 0.924)
19. Battle of Bosworth Field (Score: 0.922)
20. David II of Scotland (Score: 0.922)

让Weaviate处理向量嵌入

Weaviate具有一个内置的OpenAI模块,它负责生成用于查询和任何CRUD操作的向量嵌入所需的步骤。

这使您可以使用with_near_text过滤器运行向量查询,该过滤器使用您的OPEN_API_KEY

def near_text_weaviate(query, collection_name):

nearText = {
"concepts": [query],
"distance": 0.7,
}

properties = [
"title", "content",
"_additional {certainty distance}"
]

query_result = (
client.query
.get(collection_name, properties)
.with_near_text(nearText)
.with_limit(20)
.do()
)["data"]["Get"][collection_name]

print (f"Objects returned: {len(query_result)}")

return query_result

query_result = near_text_weaviate("modern art in Europe","Article")
counter = 0
for article in query_result:
counter += 1
print(f"{counter}. { article['title']} (Certainty: {round(article['_additional']['certainty'],3) }) (Distance: {round(article['_additional']['distance'],3) })")

Objects returned: 20
1. Museum of Modern Art (Certainty: 0.938) (Distance: 0.125)
2. Western Europe (Certainty: 0.934) (Distance: 0.133)
3. Renaissance art (Certainty: 0.932) (Distance: 0.136)
4. Pop art (Certainty: 0.93) (Distance: 0.14)
5. Northern Europe (Certainty: 0.927) (Distance: 0.145)
6. Hellenistic art (Certainty: 0.926) (Distance: 0.147)
7. Modernist literature (Certainty: 0.923) (Distance: 0.153)
8. Art film (Certainty: 0.922) (Distance: 0.157)
9. Central Europe (Certainty: 0.921) (Distance: 0.157)
10. European (Certainty: 0.921) (Distance: 0.159)
11. Art (Certainty: 0.921) (Distance: 0.159)
12. Byzantine art (Certainty: 0.92) (Distance: 0.159)
13. Postmodernism (Certainty: 0.92) (Distance: 0.16)
14. Eastern Europe (Certainty: 0.92) (Distance: 0.161)
15. Europe (Certainty: 0.919) (Distance: 0.161)
16. Cubism (Certainty: 0.919) (Distance: 0.161)
17. Impressionism (Certainty: 0.919) (Distance: 0.162)
18. Bauhaus (Certainty: 0.919) (Distance: 0.162)
19. Surrealism (Certainty: 0.918) (Distance: 0.163)
20. Expressionism (Certainty: 0.918) (Distance: 0.163)
query_result = near_text_weaviate("Famous battles in Scottish history","Article")
counter = 0
for article in query_result:
counter += 1
print(f"{counter}. { article['title']} (Certainty: {round(article['_additional']['certainty'],3) }) (Distance: {round(article['_additional']['distance'],3) })")

Objects returned: 20
1. Historic Scotland (Certainty: 0.946) (Distance: 0.107)
2. First War of Scottish Independence (Certainty: 0.946) (Distance: 0.108)
3. Battle of Bannockburn (Certainty: 0.946) (Distance: 0.109)
4. Wars of Scottish Independence (Certainty: 0.944) (Distance: 0.111)
5. Second War of Scottish Independence (Certainty: 0.94) (Distance: 0.121)
6. List of Scottish monarchs (Certainty: 0.937) (Distance: 0.127)
7. Scottish Borders (Certainty: 0.932) (Distance: 0.137)
8. Braveheart (Certainty: 0.929) (Distance: 0.141)
9. John of Scotland (Certainty: 0.929) (Distance: 0.142)
10. Guardians of Scotland (Certainty: 0.926) (Distance: 0.148)
11. Holyrood Abbey (Certainty: 0.925) (Distance: 0.15)
12. Scottish (Certainty: 0.925) (Distance: 0.15)
13. Scots (Certainty: 0.925) (Distance: 0.15)
14. Robert I of Scotland (Certainty: 0.924) (Distance: 0.151)
15. Scottish people (Certainty: 0.924) (Distance: 0.152)
16. Edinburgh Castle (Certainty: 0.924) (Distance: 0.153)
17. Alexander I of Scotland (Certainty: 0.924) (Distance: 0.153)
18. Robert Burns (Certainty: 0.924) (Distance: 0.153)
19. Battle of Bosworth Field (Certainty: 0.922) (Distance: 0.155)
20. David II of Scotland (Certainty: 0.922) (Distance: 0.157)