使用可对话代理生成 Dalle 图像

本笔记本演示了如何为可对话代理添加图像生成功能。

需求

此笔记本需要一些额外的依赖项，可以通过 pip 安装：

pip install pyautogen[lmm]

更多信息，请参阅安装指南。

首先，让我们导入所有所需的模块来运行此示例。

import os
import re
from typing import Dict, Optional

from IPython.display import display
from PIL.Image import Image

import autogen
from autogen.agentchat.contrib import img_utils
from autogen.agentchat.contrib.capabilities import generate_images
from autogen.cache import Cache
from autogen.oai import openai_utils

让我们定义我们的 LLM 配置。

gpt_config = {
    "config_list": [{"model": "gpt-4-turbo-preview", "api_key": os.environ["OPENAI_API_KEY"]}],
    "timeout": 120,
    "temperature": 0.7,
}
gpt_vision_config = {
    "config_list": [{"model": "gpt-4-vision-preview", "api_key": os.environ["OPENAI_API_KEY"]}],
    "timeout": 120,
    "temperature": 0.7,
}
dalle_config = {
    "config_list": [{"model": "dall-e-3", "api_key": os.environ["OPENAI_API_KEY"]}],
    "timeout": 120,
    "temperature": 0.7,
}

tip

了解有关为代理配置 LLM 的更多信息，请点击此处。

我们的系统将由两个主要的代理组成：1. 图像生成器代理。2. 评论家代理。

图像生成器代理将与评论家进行对话，并根据评论家的要求生成图像。

CRITIC_SYSTEM_MESSAGE = """You need to improve the prompt of the figures you saw.
How to create an image that is better in terms of color, shape, text (clarity), and other things.
Reply with the following format:

CRITICS: the image needs to improve...
PROMPT: here is the updated prompt!

If you have no critique or a prompt, just say TERMINATE
"""

def _is_termination_message(msg) -> bool:
    # 检测是否应该终止对话
    if isinstance(msg.get("content"), str):
        return msg["content"].rstrip().endswith("TERMINATE")
    elif isinstance(msg.get("content"), list):
        for content in msg["content"]:
            if isinstance(content, dict) and "text" in content:
                return content["text"].rstrip().endswith("TERMINATE")
    return False


def critic_agent() -> autogen.ConversableAgent:
    return autogen.ConversableAgent(
        name="critic",
        llm_config=gpt_vision_config,
        system_message=CRITIC_SYSTEM_MESSAGE,
        max_consecutive_auto_reply=3,
        human_input_mode="NEVER",
        is_termination_msg=lambda msg: _is_termination_message(msg),
    )


def image_generator_agent() -> autogen.ConversableAgent:
    # 创建代理
    agent = autogen.ConversableAgent(
        name="dalle",
        llm_config=gpt_vision_config,
        max_consecutive_auto_reply=3,
        human_input_mode="NEVER",
        is_termination_msg=lambda msg: _is_termination_message(msg),
    )

    # 为代理添加图像生成能力
    dalle_gen = generate_images.DalleImageGenerator(llm_config=dalle_config)
    image_gen_capability = generate_images.ImageGeneration(
        image_generator=dalle_gen, text_analyzer_llm_config=gpt_config
    )

    image_gen_capability.add_to_agent(agent)
    return agent

我们将定义extract_img函数来帮助我们提取图像生成器代理生成的图像。

def extract_images(sender: autogen.ConversableAgent, recipient: autogen.ConversableAgent) -> Image:
    images = []
    all_messages = sender.chat_messages[recipient]

    for message in reversed(all_messages):
        # GPT-4V的格式，其中内容是一个数据数组
        contents = message.get("content", [])
        for content in contents:
            if isinstance(content, str):
                continue
            if content.get("type", "") == "image_url":
                img_data = content["image_url"]["url"]
                images.append(img_utils.get_pil_image(img_data))

    if not images:
        raise ValueError("在消息中找不到图像数据。")

    return images

开始对话

dalle = image_generator_agent()
critic = critic_agent()

img_prompt = "一只穿着写着'I Love AutoGen'字样的快乐狗。确保文字清晰可见。"
# img_prompt = "问我最近怎么样"

result = dalle.initiate_chat(critic, message=img_prompt)

dalle (to critic):

一只穿着写着'I Love AutoGen'字样的快乐狗。确保文字清晰可见。

--------------------------------------------------------------------------------
critic (to dalle):

评论家：图像需要提高对比度和文字大小，以增强清晰度，并且衬衫的颜色不应与狗的毛色相冲突，以保持和谐的色彩方案。

提示：这是更新后的提示！
创建一张快乐的狗的图像，它的外套颜色与毛色形成对比，穿着一件有着清晰可读的大字体的衬衫，上面写着'I Love AutoGen'。

--------------------------------------------------------------------------------
dalle (to critic):

我生成了一张带有提示的图像：快乐的狗，外套颜色与毛色形成对比，衬衫上有着清晰可读的大字体"I Love AutoGen"。<image>

--------------------------------------------------------------------------------
critic (to dalle):

评论家：图像有效地展示了一只快乐的狗，衬衫颜色与狗的毛色形成对比，'I Love AutoGen'的文字大而粗，确保清晰可读。

提示：终止对话

--------------------------------------------------------------------------------

让我们显示Dalle生成的所有图像

images = extract_images(dalle, critic)

for image in reversed(images):
    display(image.resize((300, 300)))