检索增强型图像字幕¶
在这个示例中,我们展示了如何利用LLaVa + Replicate进行图像理解/字幕生成,并根据图像理解从特斯拉10K文件中检索相关的非结构化文本和嵌入式表格。
- LlaVa可以根据用户提示提供图像理解。
- 我们使用Unstructured来解析表格,并使用LlamaIndex递归检索来索引/检索表格和文本。
- 我们可以利用第1步的图像理解来从第2步生成的知识库中检索相关信息(该知识库由LlamaIndex索引)。
LLaVA的背景:
对于LlamaIndex: LlaVa+Replicate使我们能够在本地运行图像理解,并将多模态知识与我们的RAG知识库系统相结合。
待办事项:
等待llama-cpp-python支持在python包装器中使用LlaVa模型。
这样,LlamaIndex就可以利用LlamaCPP
类直接/本地为LlaVa模型提供服务。
使用 Replicate 通过 LlamaIndex 为 LLaVa 模型提供服务¶
通过Llama.cpp在本地构建和运行LLaVa模型(已弃用)¶
- 克隆 https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
。查看 llama.cpp 仓库以获取更多详细信息。- 运行
make
- 从 此Hugging Face仓库 下载包括
ggml-model-*
和mmproj-model-*
的 Llava 模型。请根据您自己的本地配置选择一个模型。 - 运行
./llava
检查是否在本地运行了 llava。
In [ ]:
Copied!
%pip install llama-index-readers-file
%pip install llama-index-multi-modal-llms-replicate
%pip install llama-index-readers-file
%pip install llama-index-multi-modal-llms-replicate
In [ ]:
Copied!
%load_ext autoreload
% autoreload 2
%load_ext autoreload
% autoreload 2
UsageError: Line magic function `%` not found.
In [ ]:
Copied!
!pip install unstructured
!pip install unstructured
In [ ]:
Copied!
from unstructured.partition.html import partition_html
import pandas as pd
pd.set_option("display.max_rows", None)
pd.set_option("display.max_columns", None)
pd.set_option("display.width", None)
pd.set_option("display.max_colwidth", None)
from unstructured.partition.html import partition_html
import pandas as pd
pd.set_option("display.max_rows", None)
pd.set_option("display.max_columns", None)
pd.set_option("display.width", None)
pd.set_option("display.max_colwidth", None)
WARNING: CPU random generator seem to be failing, disabling hardware random number generation WARNING: RDRND generated: 0xffffffff 0xffffffff 0xffffffff 0xffffffff
从特斯拉10K文件中提取数据¶
在这些部分中,我们使用Unstructured来解析表格和非表格元素。
提取元素¶
我们使用Unstructured来从10-K报告中提取表格和非表格元素。
In [ ]:
Copied!
!wget "https://www.dropbox.com/scl/fi/mlaymdy1ni1ovyeykhhuk/tesla_2021_10k.htm?rlkey=qf9k4zn0ejrbm716j0gg7r802&dl=1" -O tesla_2021_10k.htm
!wget "https://docs.google.com/uc?export=download&id=1THe1qqM61lretr9N3BmINc_NWDvuthYf" -O shanghai.jpg
!wget "https://docs.google.com/uc?export=download&id=1PDVCf_CzLWXNnNoRV8CFgoJxv6U0sHAO" -O tesla_supercharger.jpg
!wget "https://www.dropbox.com/scl/fi/mlaymdy1ni1ovyeykhhuk/tesla_2021_10k.htm?rlkey=qf9k4zn0ejrbm716j0gg7r802&dl=1" -O tesla_2021_10k.htm
!wget "https://docs.google.com/uc?export=download&id=1THe1qqM61lretr9N3BmINc_NWDvuthYf" -O shanghai.jpg
!wget "https://docs.google.com/uc?export=download&id=1PDVCf_CzLWXNnNoRV8CFgoJxv6U0sHAO" -O tesla_supercharger.jpg
In [ ]:
Copied!
from llama_index.readers.file import FlatReader
from pathlib import Path
reader = FlatReader()
docs_2021 = reader.load_data(Path("tesla_2021_10k.htm"))
from llama_index.readers.file import FlatReader
from pathlib import Path
reader = FlatReader()
docs_2021 = reader.load_data(Path("tesla_2021_10k.htm"))
In [ ]:
Copied!
from llama_index.core.node_parser import UnstructuredElementNodeParser
node_parser = UnstructuredElementNodeParser()
from llama_index.core.node_parser import UnstructuredElementNodeParser
node_parser = UnstructuredElementNodeParser()
In [ ]:
Copied!
import os
REPLICATE_API_TOKEN = "..." # 在这里填入你的Relicate API令牌
os.environ["REPLICATE_API_TOKEN"] = REPLICATE_API_TOKEN
import os
REPLICATE_API_TOKEN = "..." # 在这里填入你的Relicate API令牌
os.environ["REPLICATE_API_TOKEN"] = REPLICATE_API_TOKEN
In [ ]:
Copied!
import openai
OPENAI_API_KEY = "sk-..."
openai.api_key = OPENAI_API_KEY # 在这里添加你的openai api密钥
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
import openai
OPENAI_API_KEY = "sk-..."
openai.api_key = OPENAI_API_KEY # 在这里添加你的openai api密钥
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
In [ ]:
Copied!
import os
import pickle
if not os.path.exists("2021_nodes.pkl"):
raw_nodes_2021 = node_parser.get_nodes_from_documents(docs_2021)
pickle.dump(raw_nodes_2021, open("2021_nodes.pkl", "wb"))
else:
raw_nodes_2021 = pickle.load(open("2021_nodes.pkl", "rb"))
import os
import pickle
if not os.path.exists("2021_nodes.pkl"):
raw_nodes_2021 = node_parser.get_nodes_from_documents(docs_2021)
pickle.dump(raw_nodes_2021, open("2021_nodes.pkl", "wb"))
else:
raw_nodes_2021 = pickle.load(open("2021_nodes.pkl", "rb"))
In [ ]:
Copied!
nodes_2021, objects_2021 = node_parser.get_nodes_and_objects(raw_nodes_2021)
nodes_2021, objects_2021 = node_parser.get_nodes_and_objects(raw_nodes_2021)
设置可组合的检索器¶
现在我们已经提取了表格及其摘要,我们可以在LlamaIndex中设置一个可组合的检索器来查询这些表格。
构建检索器¶
In [ ]:
Copied!
from llama_index.core import VectorStoreIndex
# 构建顶层向量索引+查询引擎
vector_index = VectorStoreIndex(nodes=nodes_2021, objects=objects_2021)
query_engine = vector_index.as_query_engine(similarity_top_k=2, verbose=True)
from llama_index.core import VectorStoreIndex
# 构建顶层向量索引+查询引擎
vector_index = VectorStoreIndex(nodes=nodes_2021, objects=objects_2021)
query_engine = vector_index.as_query_engine(similarity_top_k=2, verbose=True)
In [ ]:
Copied!
from PIL import Image
import matplotlib.pyplot as plt
imageUrl = "./tesla_supercharger.jpg"
image = Image.open(imageUrl).convert("RGB")
plt.figure(figsize=(16, 5))
plt.imshow(image)
from PIL import Image
import matplotlib.pyplot as plt
imageUrl = "./tesla_supercharger.jpg"
image = Image.open(imageUrl).convert("RGB")
plt.figure(figsize=(16, 5))
plt.imshow(image)
Out[ ]:
<matplotlib.image.AxesImage at 0x7f24f9bb8410>
使用Replicate通过LlamaIndex运行LLaVa模型进行图像理解¶
LLaVa模型是一种用于图像理解的模型,可以通过Replicate和LlamaIndex来运行。 Replicate是一个用于复制和并行化任务的工具,而LlamaIndex是一个用于图像理解的工具。通过结合使用这两个工具,可以有效地运行LLaVa模型,从而实现对图像的深入理解。
In [ ]:
Copied!
from llama_index.multi_modal_llms.replicate import ReplicateMultiModal
from llama_index.core.schema import ImageDocument
from llama_index.multi_modal_llms.replicate.base import (
REPLICATE_MULTI_MODAL_LLM_MODELS,
)
multi_modal_llm = ReplicateMultiModal(
model=REPLICATE_MULTI_MODAL_LLM_MODELS["llava-13b"],
max_new_tokens=200,
temperature=0.1,
)
prompt = "what is the main object for tesla in the image?"
llava_response = multi_modal_llm.complete(
prompt=prompt,
image_documents=[ImageDocument(image_path=imageUrl)],
)
from llama_index.multi_modal_llms.replicate import ReplicateMultiModal
from llama_index.core.schema import ImageDocument
from llama_index.multi_modal_llms.replicate.base import (
REPLICATE_MULTI_MODAL_LLM_MODELS,
)
multi_modal_llm = ReplicateMultiModal(
model=REPLICATE_MULTI_MODAL_LLM_MODELS["llava-13b"],
max_new_tokens=200,
temperature=0.1,
)
prompt = "what is the main object for tesla in the image?"
llava_response = multi_modal_llm.complete(
prompt=prompt,
image_documents=[ImageDocument(image_path=imageUrl)],
)
从LlamaIndex知识库中检索与LLaVa图像理解相关的信息¶
In [ ]:
Copied!
prompt_template = "please provide relevant information about: "
rag_response = query_engine.query(prompt_template + llava_response.text)
prompt_template = "please provide relevant information about: "
rag_response = query_engine.query(prompt_template + llava_response.text)
Retrieval entering id_1836_table: TextNode Retrieving from object TextNode with query please provide relevant information about: The main object for Tesla in the image is a red and white electric car charging station. Retrieval entering id_431_table: TextNode Retrieving from object TextNode with query please provide relevant information about: The main object for Tesla in the image is a red and white electric car charging station.
展示来自LlamaIndex的最终RAG图像标题结果¶
In [ ]:
Copied!
print(str(rag_response))
print(str(rag_response))
The main object for Tesla in the image is a red and white electric car charging station.
In [ ]:
Copied!
from PIL import Image
import matplotlib.pyplot as plt
imageUrl = "./shanghai.jpg"
image = Image.open(imageUrl).convert("RGB")
plt.figure(figsize=(16, 5))
plt.imshow(image)
from PIL import Image
import matplotlib.pyplot as plt
imageUrl = "./shanghai.jpg"
image = Image.open(imageUrl).convert("RGB")
plt.figure(figsize=(16, 5))
plt.imshow(image)
Out[ ]:
<matplotlib.image.AxesImage at 0x7f24f787aa50>
从LlamaIndex中获取新图像的相关信息¶
In [ ]:
Copied!
prompt = "which Tesla factory is shown in the image?"
llava_response = multi_modal_llm.complete(
prompt=prompt,
image_documents=[ImageDocument(image_path=imageUrl)],
)
prompt = "which Tesla factory is shown in the image?"
llava_response = multi_modal_llm.complete(
prompt=prompt,
image_documents=[ImageDocument(image_path=imageUrl)],
)
In [ ]:
Copied!
prompt_template = "please provide relevant information about: "
rag_response = query_engine.query(prompt_template + llava_response.text)
prompt_template = "please provide relevant information about: "
rag_response = query_engine.query(prompt_template + llava_response.text)
Retrieving with query id None: please provide relevant information about: a large Tesla factory with a white roof, located in Shanghai, China. The factory is surrounded by a parking lot filled with numerous cars, including both small and large vehicles. The cars are parked in various positions, some closer to the factory and others further away. The scene gives an impression of a busy and well-organized facility, likely producing electric vehicles for the global market Retrieved node with id, entering: id_431_table Retrieving with query id id_431_table: please provide relevant information about: a large Tesla factory with a white roof, located in Shanghai, China. The factory is surrounded by a parking lot filled with numerous cars, including both small and large vehicles. The cars are parked in various positions, some closer to the factory and others further away. The scene gives an impression of a busy and well-organized facility, likely producing electric vehicles for the global market Retrieving text node: We continue to increase the degree of localized procurement and manufacturing there. Gigafactory Shanghai is representative of our plan to iteratively improve our manufacturing operations as we establish new factories, as we implemented the learnings from our Model 3 and Model Y ramp at the Fremont Factory to commence and ramp our production at Gigafactory Shanghai quickly and cost-effectively. Other Manufacturing Generally, we continue to expand production capacity at our existing facilities. We also intend to further increase cost-competitiveness in our significant markets by strategically adding local manufacturing, including at Gigafactory Berlin in Germany and Gigafactory Texas in Austin, Texas, which will begin production in 2022. Supply Chain Our products use thousands of purchased parts that are sourced from hundreds of suppliers across the world. We have developed close relationships with vendors of key parts such as battery cells, electronics and complex vehicle assemblies. Certain components purchased from these suppliers are shared or are similar across many product lines, allowing us to take advantage of pricing efficiencies from economies of scale. As is the case for most automotive companies, most of our procured components and systems are sourced from single suppliers. Where multiple sources are available for certain key components, we work to qualify multiple suppliers for them where it is sensible to do so in order to minimize production risks owing to disruptions in their supply. We also mitigate risk by maintaining safety stock for key parts and assemblies and die banks for components with lengthy procurement lead times. Our products use various raw materials including aluminum, steel, cobalt, lithium, nickel and copper. Pricing for these materials is governed by market conditions and may fluctuate due to various factors outside of our control, such as supply and demand and market speculation. We strive to execute long-term supply contracts for such materials at competitive pricing when feasible, and we currently believe that we have adequate access to raw materials supplies in order to meet the needs of our operations. Governmental Programs, Incentives and Regulations Globally, both the operation of our business by us and the ownership of our products by our customers are impacted by various government programs, incentives and other arrangements. Our business and products are also subject to numerous governmental regulations that vary among jurisdictions. Programs and Incentives California Alternative Energy and Advanced Transportation Financing Authority Tax Incentives We have agreements with the California Alternative Energy and Advanced Transportation Financing Authority that provide multi-year sales tax exclusions on purchases of manufacturing equipment that will be used for specific purposes, including the expansion and ongoing development of electric vehicles and powertrain production in California, thus reducing our cost basis in the related assets in our consolidated financial statements included elsewhere in this Annual Report on Form 10-K. Gigafactory Nevada—Nevada Tax Incentives In connection with the construction of Gigafactory Nevada, we entered into agreements with the State of Nevada and Storey County in Nevada that provide abatements for specified taxes, discounts to the base tariff energy rates and transferable tax credits in consideration of capital investment and hiring targets that were met at Gigafactory Nevada. These incentives are available until June 2024 or June 2034, depending on the incentive and primarily offset related costs in our consolidated financial statements included elsewhere in this Annual Report on Form 10-K. Gigafactory New York—New York State Investment and Lease We have a lease through the Research Foundation for the State University of New York (the “SUNY Foundation”) with respect to Gigafactory New York. Under the lease and a related research and development agreement, we are continuing to designate further buildouts at the facility. We are required to comply with certain covenants, including hiring and cumulative investment targets. This incentive offsets the related lease costs of the facility in our consolidated financial statements included elsewhere in this Annual Report on Form 10-K. As we temporarily suspended most of our manufacturing operations at Gigafactory New York pursuant to a New York State executive order issued in March 2020 as a result of the COVID-19 pandemic, we were granted a deferral of our obligation to be compliant with our applicable targets through December 31, 2021 in an amendment memorialized in August 2021. As of December 31, 2021, we are in excess of such targets relating to investments and personnel in the State of New York and Buffalo. Gigafactory Shanghai—Land Use Rights and Economic Benefits We have an agreement with the local government of Shanghai for land use rights at Gigafactory Shanghai. Under the terms of the arrangement, we are required to meet a cumulative capital expenditure target and an annual tax revenue target starting at the end of 2023. In addition, the Shanghai government has granted to our Gigafactory Shanghai subsidiary certain incentives to be used in connection with eligible capital investments at Gigafactory Shanghai.
展示来自LlamaIndex的最终RAG图像标题结果¶
In [ ]:
Copied!
print(rag_response)
print(rag_response)
The Gigafactory Shanghai in Shanghai, China is a large Tesla factory that produces electric vehicles for the global market. The factory has a white roof and is surrounded by a parking lot filled with numerous cars, including both small and large vehicles. The cars are parked in various positions, some closer to the factory and others further away. This scene gives an impression of a busy and well-organized facility.