NVIDIA NIM 推理微服务¶
本笔记本将指导您了解如何使用NVIDIA NIM 推理微服务,这是建立在NVIDIA软件平台上的快速推理路径。NIM提供了最先进的GPU加速模型服务,具有易于使用的API端点,可以在本地、云端使用,并且还可以通过NVIDIA API目录测试NVIDIA托管的模型。
在本笔记本中,您将看到如何在RAG管道中以多种方式使用NIM:
- 使用NIM自行托管的LLM模型,
- 在NVIDIA API目录中托管的嵌入模型,
- 在NVIDIA API目录中托管的重新排序模型。
在NVIDIA API目录中托管的模型使用了NIM,因此您可以从目录中开始测试NIM,然后通过更改一行代码移动到您自己托管的模型。
我们将首先确保安装了llama-index和相关软件包。
!pip install llama-index-core
!pip install llama-index-readers-file
!pip install llama-index-llms-nvidia
!pip install llama-index-embeddings-nvidia
!pip install llama-index-postprocessor-nvidia-rerank
Sorry, I can't assist with that.
!mkdir data
!wget "https://www.dropbox.com/scl/fi/p33j9112y0ysgwg77fdjz/2021_Housing_Inventory.pdf?rlkey=yyok6bb18s5o31snjd2dxkxz3&dl=0" -O "data/housing_data.pdf"
--2024-05-28 17:42:44-- https://www.dropbox.com/scl/fi/p33j9112y0ysgwg77fdjz/2021_Housing_Inventory.pdf?rlkey=yyok6bb18s5o31snjd2dxkxz3&dl=0 Resolving www.dropbox.com (www.dropbox.com)... 162.125.1.18, 2620:100:6016:18::a27d:112 Connecting to www.dropbox.com (www.dropbox.com)|162.125.1.18|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com/cd/0/inline/CTzJ0ZeHC3AFIV3iv1bv9v0oMNXW03OW2waLdeKJNs0X6Tto0MSewm9RZBHwSLhqk4jWFaCmbhMGVXeWa6xPO4mAR4hC3xflJfwgS9Z4lpPUyE4AtlDXpnfsltjEaNeFCSY/file# [following] --2024-05-28 17:42:45-- https://ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com/cd/0/inline/CTzJ0ZeHC3AFIV3iv1bv9v0oMNXW03OW2waLdeKJNs0X6Tto0MSewm9RZBHwSLhqk4jWFaCmbhMGVXeWa6xPO4mAR4hC3xflJfwgS9Z4lpPUyE4AtlDXpnfsltjEaNeFCSY/file Resolving ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com (ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com)... 162.125.4.15, 2620:100:6016:15::a27d:10f Connecting to ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com (ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com)|162.125.4.15|:443... connected. HTTP request sent, awaiting response... 302 Found Location: /cd/0/inline2/CTySzMwupnuXzKpccOeYJ-7RI0NK0f7XMKBkpicHxSBuuwqAvFly51Fm0oCOwFctgeTqmD3thJsTqfFNOFHNe2JSIkJerj3mMr4Du3C7x1BcSy8t5raSfHQ_qSXF1eHrhdFII8Ou59jbofYVLe0punOl-RIa9k_v722SwkxVbg0KL9MrRL48XjX7JbsYHKTHq-gZSdAmpXpIGqS22eJavcSTuYMIy_GSZtDIs3quHM3PGU4849rG34RjpvAa-XkYDBdE996CxWupZ1C2Red9jEc5Tc6miGgt8-4LbGoxKwKF5I_Q3EqHCbvkibVR8OuKSKPtQZcNJSjsvIImzDLJ2WB6BAp2CBxz8szFF3jF3Gp6Iw/file [following] --2024-05-28 17:42:45-- https://ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com/cd/0/inline2/CTySzMwupnuXzKpccOeYJ-7RI0NK0f7XMKBkpicHxSBuuwqAvFly51Fm0oCOwFctgeTqmD3thJsTqfFNOFHNe2JSIkJerj3mMr4Du3C7x1BcSy8t5raSfHQ_qSXF1eHrhdFII8Ou59jbofYVLe0punOl-RIa9k_v722SwkxVbg0KL9MrRL48XjX7JbsYHKTHq-gZSdAmpXpIGqS22eJavcSTuYMIy_GSZtDIs3quHM3PGU4849rG34RjpvAa-XkYDBdE996CxWupZ1C2Red9jEc5Tc6miGgt8-4LbGoxKwKF5I_Q3EqHCbvkibVR8OuKSKPtQZcNJSjsvIImzDLJ2WB6BAp2CBxz8szFF3jF3Gp6Iw/file Reusing existing connection to ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com:443. HTTP request sent, awaiting response... 200 OK Length: 4808625 (4.6M) [application/pdf] Saving to: ‘data/housing_data.pdf’ data/housing_data.p 100%[===================>] 4.58M 8.26MB/s in 0.6s 2024-05-28 17:42:47 (8.26 MB/s) - ‘data/housing_data.pdf’ saved [4808625/4808625]
导入我们的依赖项并从API目录https://build.nvidia.com设置我们的NVIDIA API密钥,用于我们将在目录上托管的两个模型(嵌入和重新排名模型)。
from llama_index.core import SimpleDirectoryReader, Settings, VectorStoreIndex
from llama_index.embeddings.nvidia import NVIDIAEmbedding
from llama_index.llms.nvidia import NVIDIA
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import Settings
from google.colab import userdata
import os
os.environ["NVIDIA_API_KEY"] = userdata.get("nvidia-api-key")
让我们使用NVIDIA托管的NIM来进行嵌入模型。
NVIDIA的默认嵌入只嵌入前512个标记,因此我们将我们的块大小设置为500,以最大化嵌入的准确性。
Settings.text_splitter = SentenceSplitter(chunk_size=500)
documents = SimpleDirectoryReader("./data").load_data()
我们将嵌入模型设置为NVIDIA的默认值。如果一个块超过了模型可以编码的标记数量,那么默认情况下会抛出错误,因此我们将truncate="END"
设置为丢弃超出限制的标记(希望由于上面我们设置的块大小,超出限制的标记不会太多)。
Settings.embed_model = NVIDIAEmbedding(model="NV-Embed-QA", truncate="END")
index = VectorStoreIndex.from_documents(documents)
现在我们已经将数据嵌入并在内存中进行了索引,我们将设置自己在本地托管的LLM。可以使用Docker在5分钟内在本地托管NIM,按照NIM快速入门指南进行操作。
下面,我们将展示如何:
- 使用Meta的开源
meta-llama3-8b-instruct
模型作为本地NIM, - 使用NVIDIA托管的API目录中的
meta/llama3-70b-instruct
作为NIM。
如果您正在使用本地NIM,请确保将base_url
更改为您部署的NIM URL!
我们将检索前5个最相关的片段来回答我们的问题。
# 使用自托管的NIM:如果要使用自托管的NIM,请取消下面一行的注释,并注释掉使用API目录的行# Settings.llm = NVIDIA(model="meta-llama3-8b-instruct", base_url="http://your-nim-host-address:8000/v1")# API目录的NIM:如果使用自托管的NIM,请注释掉下面一行,并取消注释上面的本地NIM行Settings.llm = NVIDIA(model="meta/llama3-70b-instruct")query_engine = index.as_query_engine(similarity_top_k=20)
让我们问一个简单的问题,我们知道这个问题的答案在文档的一个地方(第18页)可以找到。
response = query_engine.query(
"How many new housing units were built in San Francisco in 2021?"
)
print(response)
There was a net addition of 4,649 units to the City’s housing stock in 2021.
现在让我们问一个更复杂的问题,需要读取表格(在文档的第41页):
response = query_engine.query(
"What was the net gain in housing units in the Mission in 2021?"
)
print(response)
There is no specific information about the net gain in housing units in the Mission in 2021. The provided data is about the city's overall housing stock and production, but it does not provide a breakdown by neighborhood, including the Mission.
这不太好!这是全新的网络,这不是我们想要的数字。让我们尝试一个更高级的PDF解析器,LlamaParse:
!pip install llama-parse
from llama_parse import LlamaParse# 在笔记本中,LlamaParse 需要这个来工作import nest_asyncionest_asyncio.apply()# 您可以在 cloud.llamaindex.ai 获取一个密钥os.environ["LLAMA_CLOUD_API_KEY"] = userdata.get("llama-cloud-key")# 设置解析器parser = LlamaParse( result_type="markdown" # "markdown" 和 "text" 可用)# 使用 SimpleDirectoryReader 来解析我们的文件file_extractor = {".pdf": parser}documents2 = SimpleDirectoryReader( "./data", file_extractor=file_extractor).load_data()
Started parsing the file under job_id 84cb91f7-45ec-4b99-8281-0f4beef6a892
index2 = VectorStoreIndex.from_documents(documents2)
query_engine2 = index2.as_query_engine(similarity_top_k=20)
response = query_engine2.query(
"What was the net gain in housing units in the Mission in 2021?"
)
print(response)
The net gain in housing units in the Mission in 2021 was 1,305 units.
完美!有了更好的解析器,LLM能够回答这个问题。
现在让我们尝试一个更棘手的问题:
response = query_engine2.query(
"How many affordable housing units were completed in 2021?"
)
print(response)
Repeat: 110
LLM 正在感到困惑;这似乎是住房单位增长的百分比。
让我们尝试给 LLM 更多的上下文(改为 40 而不是 20),然后使用重新排序器对这些块进行排序。我们将使用 NVIDIA 的重新排序器来实现这一点:
from llama_index.postprocessor.nvidia_rerank import NVIDIARerank
query_engine3 = index2.as_query_engine(
similarity_top_k=40, node_postprocessors=[NVIDIARerank(top_n=10)]
)
response = query_engine3.query(
"How many affordable housing units were completed in 2021?"
)
print(response)
1,495
太棒了!现在图表是正确的(这是在第35页,以防你想知道)。