Nomic嵌入¶
Nomic发布了v1.5 🪆🪆🪆,能够使用母子娃娃学习和8192上下文进行可变大小的嵌入,嵌入维度在64到768之间。
在这个笔记本中,我们将探索在不同维度上使用Nomic v1.5嵌入。
安装说明¶
In [ ]:
Copied!
%pip install -U llama-index llama-index-embeddings-nomic
%pip install -U llama-index llama-index-embeddings-nomic
设置API密钥¶
In [ ]:
Copied!
nomic_api_key = "<NOMIC API KEY>"
nomic_api_key = ""
In [ ]:
Copied!
import nest_asyncio
nest_asyncio.apply()
from llama_index.embeddings.nomic import NomicEmbedding
import nest_asyncio
nest_asyncio.apply()
from llama_index.embeddings.nomic import NomicEmbedding
使用128维度¶
In [ ]:
Copied!
embed_model = NomicEmbedding(
api_key=nomic_api_key,
dimensionality=128,
model_name="nomic-embed-text-v1.5",
)
embedding = embed_model.get_text_embedding("Nomic Embeddings")
embed_model = NomicEmbedding(
api_key=nomic_api_key,
dimensionality=128,
model_name="nomic-embed-text-v1.5",
)
embedding = embed_model.get_text_embedding("Nomic Embeddings")
In [ ]:
Copied!
print(len(embedding))
print(len(embedding))
128
In [ ]:
Copied!
embedding[:5]
embedding[:5]
Out[ ]:
[0.05569458, 0.057922363, -0.30126953, -0.09832764, 0.05947876]
尺寸为256¶
In [ ]:
Copied!
embed_model = NomicEmbedding(
api_key=nomic_api_key,
dimensionality=256,
model_name="nomic-embed-text-v1.5",
)
embedding = embed_model.get_text_embedding("Nomic Embeddings")
embed_model = NomicEmbedding(
api_key=nomic_api_key,
dimensionality=256,
model_name="nomic-embed-text-v1.5",
)
embedding = embed_model.get_text_embedding("Nomic Embeddings")
In [ ]:
Copied!
print(len(embedding))
print(len(embedding))
256
In [ ]:
Copied!
embedding[:5]
embedding[:5]
Out[ ]:
[0.044708252, 0.04650879, -0.24182129, -0.07897949, 0.04776001]
维度为768¶
In [ ]:
Copied!
embed_model = NomicEmbedding(
api_key=nomic_api_key,
dimensionality=768,
model_name="nomic-embed-text-v1.5",
)
embedding = embed_model.get_text_embedding("Nomic Embeddings")
embed_model = NomicEmbedding(
api_key=nomic_api_key,
dimensionality=768,
model_name="nomic-embed-text-v1.5",
)
embedding = embed_model.get_text_embedding("Nomic Embeddings")
In [ ]:
Copied!
print(len(embedding))
print(len(embedding))
768
In [ ]:
Copied!
embedding[:5]
embedding[:5]
Out[ ]:
[0.027282715, 0.028381348, -0.14758301, -0.048187256, 0.029144287]
你仍然可以使用v1 Nomic Embeddings¶
它具有768个固定的嵌入维度
In [ ]:
Copied!
embed_model = NomicEmbedding(
api_key=nomic_api_key, model_name="nomic-embed-text-v1"
)
embedding = embed_model.get_text_embedding("Nomic Embeddings")
embed_model = NomicEmbedding(
api_key=nomic_api_key, model_name="nomic-embed-text-v1"
)
embedding = embed_model.get_text_embedding("Nomic Embeddings")
In [ ]:
Copied!
print(len(embedding))
print(len(embedding))
768
In [ ]:
Copied!
embedding[:5]
embedding[:5]
Out[ ]:
[0.0059013367, 0.03744507, 0.0035305023, -0.047180176, 0.0154418945]
让我们使用Nomic v1.5嵌入构建端到端的RAG流水线。¶
我们将使用OpenAI进行生成步骤。
设置嵌入模型和llm。¶
In [ ]:
Copied!
from llama_index.core import settings
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
import os
os.environ["OPENAI_API_KEY"] = "<YOUR OPENAI API KEY>"
embed_model = NomicEmbedding(
api_key=nomic_api_key,
dimensionality=128,
model_name="nomic-embed-text-v1.5",
)
llm = OpenAI(model="gpt-3.5-turbo")
settings.llm = llm
settings.embed_model = embed_model
from llama_index.core import settings
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
import os
os.environ["OPENAI_API_KEY"] = ""
embed_model = NomicEmbedding(
api_key=nomic_api_key,
dimensionality=128,
model_name="nomic-embed-text-v1.5",
)
llm = OpenAI(model="gpt-3.5-turbo")
settings.llm = llm
settings.embed_model = embed_model
下载数据¶
In [ ]:
Copied!
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
--2024-02-16 18:37:03-- https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8001::154, 2606:50c0:8003::154, 2606:50c0:8000::154, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8001::154|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 75042 (73K) [text/plain] Saving to: 'data/paul_graham/paul_graham_essay.txt' data/paul_graham/pa 100%[===================>] 73.28K --.-KB/s in 0.02s 2024-02-16 18:37:03 (3.87 MB/s) - 'data/paul_graham/paul_graham_essay.txt' saved [75042/75042]
加载数据¶
In [ ]:
Copied!
documents = SimpleDirectoryReader("./data/paul_graham").load_data()
documents = SimpleDirectoryReader("./data/paul_graham").load_data()
索引创建¶
In [ ]:
Copied!
index = VectorStoreIndex.from_documents(documents)
index = VectorStoreIndex.from_documents(documents)
查询引擎¶
In [ ]:
Copied!
query_engine = index.as_query_engine()
query_engine = index.as_query_engine()
In [ ]:
Copied!
response = query_engine.query("what did author do growing up?")
print(response)
response = query_engine.query("what did author do growing up?")
print(response)
The author, growing up, worked on writing and programming. They wrote short stories and also tried writing programs on an IBM 1401 computer. Later, they got a microcomputer and started programming more extensively, writing simple games and a word processor.