设置模型¶
!pip install llama-index-multi-modal-llms-ollama
!pip install llama-index-readers-file
!pip install unstructured
!pip install llama-index-embeddings-huggingface
!pip install llama-index-vector-stores-qdrant
!pip install llama-index-embeddings-clip
from llama_index.multi_modal_llms.ollama import OllamaMultiModal
mm_model = OllamaMultiModal(model="llava:13b")
从图像中提取结构化数据¶
在这里,我们将展示如何使用LLaVa从图像中提取信息,并将其转换为一个结构化的Pydantic对象。
我们可以通过我们的MultiModalLLMCompletionProgram
来实现这一点。它通过一个提示模板、一组您想要提出问题的图像以及所需的输出Pydantic对象进行实例化。
加载数据¶
让我们首先加载一张炸鸡的图片。
from pathlib import Path
from llama_index.core import SimpleDirectoryReader
from PIL import Image
import matplotlib.pyplot as plt
input_image_path = Path("restaurant_images")
if not input_image_path.exists():
Path.mkdir(input_image_path)
!wget "https://docs.google.com/uc?export=download&id=1GlqcNJhGGbwLKjJK1QJ_nyswCTQ2K2Fq" -O ./restaurant_images/fried_chicken.png
# 作为图像文档加载
image_documents = SimpleDirectoryReader("./restaurant_images").load_data()
# 显示图片
imageUrl = "./restaurant_images/fried_chicken.png"
image = Image.open(imageUrl).convert("RGB")
plt.figure(figsize=(16, 5))
plt.imshow(image)
<matplotlib.image.AxesImage at 0x3efdae470>
from pydantic import BaseModel
class Restaurant(BaseModel):
"""餐厅的数据模型。"""
restaurant: str
food: str
discount: str
price: str
rating: str
review: str
from llama_index.core.program import MultiModalLLMCompletionProgram
from llama_index.core.output_parsers import PydanticOutputParser
prompt_template_str = """\
{query_str}
Return the answer as a Pydantic object. The Pydantic schema is given below:
"""
mm_program = MultiModalLLMCompletionProgram.from_defaults(
output_parser=PydanticOutputParser(Restaurant),
image_documents=image_documents,
prompt_template_str=prompt_template_str,
multi_modal_llm=mm_model,
verbose=True,
)
response = mm_program(query_str="Can you summarize what is in the image?")
for res in response:
print(res)
> Raw output: ```
{
"restaurant": "Buffalo Wild Wings",
"food": "8 wings or chicken poppers",
"discount": "20% discount on orders over $25",
"price": "$8.73 each",
"rating": "",
"review": ""
}
```
('restaurant', 'Buffalo Wild Wings')
('food', '8 wings or chicken poppers')
('discount', '20% discount on orders over $25')
('price', '$8.73 each')
('rating', '')
('review', '')
检索增强图像字幕¶
在这里,我们通过我们的查询管道语法展示了一个检索增强的图像字幕的简单示例。
!wget "https://www.dropbox.com/scl/fi/mlaymdy1ni1ovyeykhhuk/tesla_2021_10k.htm?rlkey=qf9k4zn0ejrbm716j0gg7r802&dl=1" -O tesla_2021_10k.htm
!wget "https://docs.google.com/uc?export=download&id=1THe1qqM61lretr9N3BmINc_NWDvuthYf" -O shanghai.jpg
# from llama_index import SimpleDirectoryReader
from pathlib import Path
from llama_index.readers.file import UnstructuredReader
from llama_index.core.schema import ImageDocument
loader = UnstructuredReader()
documents = loader.load_data(file=Path("tesla_2021_10k.htm"))
image_doc = ImageDocument(image_path="./shanghai.jpg")
from llama_index.core import VectorStoreIndex
from llama_index.core.embeddings import resolve_embed_model
embed_model = resolve_embed_model("local:BAAI/bge-m3")
vector_index = VectorStoreIndex.from_documents(
documents, embed_model=embed_model
)
query_engine = vector_index.as_query_engine()
from llama_index.core.prompts import PromptTemplate
from llama_index.core.query_pipeline import QueryPipeline, FnComponent
query_prompt_str = """\
请使用特斯拉10K报告中提供的上下文,扩展初始陈述。
{initial_statement}
"""
query_prompt_tmpl = PromptTemplate(query_prompt_str)
# MM模型 --> 查询提示 --> 查询引擎
qp = QueryPipeline(
modules={
"mm_model": mm_model.as_query_component(
partial={"image_documents": [image_doc]}
),
"query_prompt": query_prompt_tmpl,
"query_engine": query_engine,
},
verbose=True,
)
qp.add_chain(["mm_model", "query_prompt", "query_engine"])
rag_response = qp.run("图像中显示了特斯拉的哪家工厂?")
> Running module mm_model with input: prompt: Which Tesla Factory is shown in the image? > Running module query_prompt with input: initial_statement: The image you've provided is a photograph of the Tesla Gigafactory, which is located in Shanghai, China. This facility is one of Tesla's large-scale production plants and is used for manufacturing el... > Running module query_engine with input: input: Please expand the initial statement using the provided context from the Tesla 10K report. The image you've provided is a photograph of the Tesla Gigafactory, which is located in Shanghai, China. Thi...
print(f"> Retrieval Augmented Response: {rag_response}")
> Retrieval Augmented Response: The Gigafactory Shanghai in China is an important manufacturing facility for Tesla. It was established to increase the affordability of Tesla vehicles for customers in local markets by reducing transportation and manufacturing costs and eliminating the impact of unfavorable tariffs. The factory allows Tesla to access high volumes of lithium-ion battery cells manufactured by their partner Panasonic, while achieving a significant reduction in the cost of their battery packs. Tesla continues to invest in Gigafactory Shanghai to achieve additional output. This factory is representative of Tesla's plan to improve their manufacturing operations as they establish new factories, incorporating the learnings from their previous ramp-ups.
rag_response.source_nodes[1].get_content()
'For example, we are currently constructing Gigafactory Berlin under conditional permits in anticipation of being granted final permits. Moreover, we will have to establish and ramp production of our proprietary battery cells and packs at our new factories, and we additionally intend to incorporate sequential design and manufacturing changes into vehicles manufactured at each new factory. We have limited experience to date with developing and implementing manufacturing innovations outside of the Fremont Factory and Gigafactory Shanghai. In particular, the majority of our design and engineering resources are currently located in California. In order to meet our expectations for our new factories, we must expand and manage localized design and engineering talent and resources. If we experience any issues or delays in meeting our projected timelines, costs, capital efficiency and production capacity for our new factories, expanding and managing teams to implement iterative design and production changes there, maintaining and complying with the terms of any debt financing that we obtain to fund them or generating and maintaining demand for the vehicles we manufacture there, our business, prospects, operating results and financial condition may be harmed.\n\nWe will need to maintain and significantly grow our access to battery cells, including through the development and manufacture of our own cells, and control our related costs.\n\nWe are dependent on the continued supply of lithium-ion battery cells for our vehicles and energy storage products, and we will require substantially more cells to grow our business according to our plans. Currently, we rely on suppliers such as Panasonic and Contemporary Amperex Technology Co. Limited (CATL) for these cells. We have to date fully qualified only a very limited number\n\n16\n\nof such suppliers and have limited flexibility in changing suppliers. Any disruption in the supply of battery cells from our suppliers could limit production of our vehicles and energy storage products. In the long term, we intend to supplement cells from our suppliers with cells manufactured by us, which we believe will be more efficient, manufacturable at greater volumes and more cost-effective than currently available cells. However, our efforts to develop and manufacture such battery cells have required, and may continue to require, significant investments, and there can be no assurance that we will be able to achieve these targets in the timeframes that we have planned or at all. If we are unable to do so, we may have to curtail our planned vehicle and energy storage product production or procure additional cells from suppliers at potentially greater costs, either of which may harm our business and operating results.\n\nIn addition, the cost of battery cells, whether manufactured by our suppliers or by us, depends in part upon the prices and availability of raw materials such as lithium, nickel, cobalt and/or other metals. The prices for these materials fluctuate and their available supply may be unstable, depending on market conditions and global demand for these materials, including as a result of increased global production of electric vehicles and energy storage products. Any reduced availability of these materials may impact our access to cells and any increases in their prices may reduce our profitability if we cannot recoup the increased costs through increased vehicle prices. Moreover, any such attempts to increase product prices may harm our brand, prospects and operating results.\n\nWe face strong competition for our products and services from a growing list of established and new competitors.\n\nThe worldwide automotive market is highly competitive today and we expect it will become even more so in the future. For example, Model 3 and Model Y face competition from existing and future automobile manufacturers in the extremely competitive entry-level premium sedan and compact SUV markets. A significant and growing number of established and new automobile manufacturers, as well as other companies, have entered, or are reported to have plans to enter, the market for electric and other alternative fuel vehicles, including hybrid, plug-in hybrid and fully electric vehicles, as well as the market for self-driving technology and other vehicle applications and software platforms. In some cases, our competitors offer or will offer electric vehicles in important markets such as China and Europe, and/or have announced an intention to produce electric vehicles exclusively at some point in the future. Many of our competitors have significantly greater or better-established resources than we do to devote to the design, development, manufacturing, distribution, promotion, sale and support of their products. Increased competition could result in our lower vehicle unit sales, price reductions, revenue shortfalls, loss of customers and loss of market share, which may harm our business, financial condition and operating results.\n\nWe also face competition in our energy generation and storage business from other manufacturers, developers, installers and service providers of competing energy technologies, as well as from large utilities. Decreases in the retail or wholesale prices of electricity from utilities or other renewable energy sources could make our products less attractive to customers and lead to an increased rate of residential customer defaults under our existing long-term leases and PPAs.'
多模态RAG¶
我们使用本地的CLIP嵌入模型对一组图像和文本进行索引。我们可以通过我们的MultiModalVectorStoreIndex
联合索引它们。
注意:当前的实现混合了图像和文本。您可以并且可能应该为图像和文本分别定义索引/检索器,从而可以针对每种模态使用单独的嵌入/检索策略。
!wget "https://drive.usercontent.google.com/download?id=1qQDcaKuzgRGuEC1kxgYL_4mx7vG-v4gC&export=download&authuser=1&confirm=t&uuid=f944e95f-a31f-4b55-b68f-8ea67a6e90e5&at=APZUnTVZ6n1aOg7rtkcjBjw7Pt1D:1707010667927" -O mixed_wiki.zip
!unzip mixed_wiki.zip
!wget "https://www.dropbox.com/scl/fi/mlaymdy1ni1ovyeykhhuk/tesla_2021_10k.htm?rlkey=qf9k4zn0ejrbm716j0gg7r802&dl=1" -O ./mixed_wiki/tesla_2021_10k.htm
from llama_index.core.indices.multi_modal.base import (
MultiModalVectorStoreIndex,
)
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import SimpleDirectoryReader, StorageContext
from llama_index.embeddings.clip import ClipEmbedding
import qdrant_client
from llama_index import (
SimpleDirectoryReader,
)
# 创建一个本地的Qdrant向量存储
client = qdrant_client.QdrantClient(path="qdrant_mm_db")
text_store = QdrantVectorStore(
client=client, collection_name="text_collection"
)
image_store = QdrantVectorStore(
client=client, collection_name="image_collection"
)
storage_context = StorageContext.from_defaults(
vector_store=text_store, image_store=image_store
)
image_embed_model = ClipEmbedding()
# 创建多模态索引
documents = SimpleDirectoryReader("./mixed_wiki/").load_data()
index = MultiModalVectorStoreIndex.from_documents(
documents,
storage_context=storage_context,
image_embed_model=image_embed_model,
)
# 保存索引
# index.storage_context.persist(persist_dir="./storage")
# # 加载索引
# from llama_index import load_index_from_storage
# storage_context = StorageContext.from_defaults(
# vector_store=text_store, persist_dir="./storage"
# )
# index = load_index_from_storage(storage_context, image_store=image_store)
from llama_index.core.prompts import PromptTemplate
from llama_index.core.query_engine import SimpleMultiModalQueryEngine
qa_tmpl_str = (
"Context information is below.\n"
"---------------------\n"
"{context_str}\n"
"---------------------\n"
"Given the context information and not prior knowledge, "
"answer the query.\n"
"Query: {query_str}\n"
"Answer: "
)
qa_tmpl = PromptTemplate(qa_tmpl_str)
query_engine = index.as_query_engine(
multi_modal_llm=mm_model, text_qa_template=qa_tmpl
)
query_str = "Tell me more about the Porsche"
response = query_engine.query(query_str)
print(str(response))
The image shows a Porsche sports car displayed at an auto show. It appears to be the latest model, possibly the Taycan Cross Turismo or a similar variant, which is designed for off-road use and has raised suspension. This type of vehicle combines the performance of a sports car with the utility of an SUV, allowing it to handle rougher terrain and provide more cargo space than a traditional two-door sports car. The design incorporates sleek lines and aerodynamic elements typical of modern electric vehicles, which are often associated with luxury and high performance.
from PIL import Image
import matplotlib.pyplot as plt
import os
def plot_images(image_paths):
images_shown = 0
plt.figure(figsize=(16, 9))
for img_path in image_paths:
if os.path.isfile(img_path):
image = Image.open(img_path)
plt.subplot(2, 3, images_shown + 1)
plt.imshow(image)
plt.xticks([])
plt.yticks([])
images_shown += 1
if images_shown >= 9:
break
# 显示来源
from llama_index.core.response.notebook_utils import display_source_node
for text_node in response.metadata["text_nodes"]:
display_source_node(text_node, source_length=200)
plot_images(
[n.metadata["file_path"] for n in response.metadata["image_nodes"]]
)
Node ID: 3face2c9-3b86-4445-b21e-5b7fc9683adb
Similarity: 0.8281288080117539
Text: === Porsche Mission E Cross Turismo ===
The Porsche Mission E Cross Turismo previewed the Taycan Cross Turismo, and was presented at the 2018 Geneva Motor Show. The design language of the Mission E...
Node ID: ef43aa15-30b6-4f0f-bade-fd91f90bfd0b
Similarity: 0.8281039313464207
Text: The Porsche Taycan is a battery electric saloon and shooting brake produced by German automobile manufacturer Porsche. The concept version of the Taycan, named the Porsche Mission E, debuted at the...