In [ ]:
Copied!
%pip install llama-index-multi-modal-llms-azure-openai
%pip install llama-index-multi-modal-llms-azure-openai
In [ ]:
Copied!
!pip install openai
!pip install openai
使用GPT4V来理解来自URL/base64的图像¶
In [ ]:
Copied!
import os
os.environ["AZURE_OPENAI_API_KEY"] = "<your-api-key>"
os.environ[
"AZURE_OPENAI_ENDPOINT"
] = "https://<your-resource-name>.openai.azure.com/"
os.environ["OPENAI_API_VERSION"] = "2023-12-01-preview"
import os
os.environ["AZURE_OPENAI_API_KEY"] = ""
os.environ[
"AZURE_OPENAI_ENDPOINT"
] = "https://.openai.azure.com/"
os.environ["OPENAI_API_VERSION"] = "2023-12-01-preview"
初始化 AzureOpenAIMultiModal
并从URL加载图像¶
与普通的 OpenAI
不同,你需要除了 model
外还需要传递一个 engine
参数。engine
是你在 Azure OpenAI Studio 中选择的模型部署的名称。
In [ ]:
Copied!
from llama_index.multi_modal_llms.azure_openai import AzureOpenAIMultiModal
from llama_index.multi_modal_llms.azure_openai import AzureOpenAIMultiModal
In [ ]:
Copied!
azure_openai_mm_llm = AzureOpenAIMultiModal(
engine="gpt-4-vision-preview",
api_version="2023-12-01-preview",
model="gpt-4-vision-preview",
max_new_tokens=300,
)
azure_openai_mm_llm = AzureOpenAIMultiModal(
engine="gpt-4-vision-preview",
api_version="2023-12-01-preview",
model="gpt-4-vision-preview",
max_new_tokens=300,
)
或者,您也可以跳过设置环境变量,直接通过构造函数直接传递参数。
In [ ]:
Copied!
azure_openai_mm_llm = AzureOpenAIMultiModal(
azure_endpoint="https://<your-endpoint>.openai.azure.com",
engine="gpt-4-vision-preview",
api_version="2023-12-01-preview",
model="gpt-4-vision-preview",
max_new_tokens=300,
)
azure_openai_mm_llm = AzureOpenAIMultiModal(
azure_endpoint="https://.openai.azure.com",
engine="gpt-4-vision-preview",
api_version="2023-12-01-preview",
model="gpt-4-vision-preview",
max_new_tokens=300,
)
In [ ]:
Copied!
import base64
import requests
from llama_index.core.schema import ImageDocument
image_url = "https://www.visualcapitalist.com/wp-content/uploads/2023/10/US_Mortgage_Rate_Surge-Sept-11-1.jpg"
response = requests.get(image_url)
if response.status_code != 200:
raise ValueError("Error: Could not retrieve image from URL.")
base64str = base64.b64encode(response.content).decode("utf-8")
image_document = ImageDocument(image=base64str, image_mimetype="image/jpeg")
import base64
import requests
from llama_index.core.schema import ImageDocument
image_url = "https://www.visualcapitalist.com/wp-content/uploads/2023/10/US_Mortgage_Rate_Surge-Sept-11-1.jpg"
response = requests.get(image_url)
if response.status_code != 200:
raise ValueError("Error: Could not retrieve image from URL.")
base64str = base64.b64encode(response.content).decode("utf-8")
image_document = ImageDocument(image=base64str, image_mimetype="image/jpeg")
In [ ]:
Copied!
from IPython.display import HTML
HTML(f'<img width=400 src="data:image/jpeg;base64,{base64str}"/>')
from IPython.display import HTML
HTML(f'
')
Out[ ]:
使用图片完成一个提示¶
In [ ]:
Copied!
complete_response = azure_openai_mm_llm.complete(
prompt="Describe the images as an alternative text",
image_documents=[image_document],
)
complete_response = azure_openai_mm_llm.complete(
prompt="Describe the images as an alternative text",
image_documents=[image_document],
)
In [ ]:
Copied!
print(complete_response)
print(complete_response)
The image is a line graph showing the U.S. 30-year fixed-rate mortgage percentage rate and existing home sales from 2015 to 2021. The mortgage rate is represented by a red line, while the home sales are represented by a blue line. The graph shows that the mortgage rate has reached its highest level in over 20 years, while home sales have fluctuated over the same period. There is also a note that the data is sourced from the U.S. Federal Reserve, Trading Economics, and Visual Capitalist.