在 MLflow 部署服务器中查询端点

既然部署服务器已经运行，现在是时候向它发送一些数据了。你可以使用部署API或REST API与网关服务器进行交互。在这个例子中，为了简单起见，我们将使用部署API。

让我们详细说明三种支持的模型类型：

1. Completions: This type of model is used to generate predictions or suggestions based on the input provided, helping to “complete” a sequence or pattern.

2. Chat: These models facilitate interactive conversations, capable of understanding and responding to user inputs in a conversational manner.

3. Embeddings: Embedding models transform input data (like text or images) into a numerical vector space, where similar items are positioned closely in the space, facilitating various machine learning tasks.

在接下来的步骤中，我们将探讨如何使用这些模型类型查询网关服务器。

示例 1: 补全

完成模型旨在完成句子或响应提示。

要通过 MLflow AI 网关查询这些模型，您需要提供一个 prompt 参数，这是语言模型 (LLM) 将响应的字符串。网关服务器还支持各种其他参数。有关详细信息，请参阅文档。

from mlflow.deployments import get_deploy_client

client = get_deploy_client("http://localhost:5000")
name = "completions"
data = dict(
    prompt="Name three potions or spells in harry potter that sound like an insult. Only show the names.",
    n=2,
    temperature=0.2,
    max_tokens=1000,
)

response = client.predict(endpoint=name, inputs=data)
print(response)

示例 2：聊天

聊天模型促进与用户的互动对话，随着时间的推移逐渐积累上下文。

创建聊天负载相对于其他模型类型来说稍微复杂一些，因为它可以容纳来自三个不同角色（system、user 和 assistant）的无限数量的消息。要通过 MLflow AI Gateway 设置聊天负载，您需要指定一个 messages 参数。该参数接受一个字典列表，格式如下：

{"role": "system/user/assistant", "content": "用户指定的内容"}

如需更多详情，请查阅文档。

from mlflow.deployments import get_deploy_client

client = get_deploy_client("http://localhost:5000")
name = "chat_3.5"
data = dict(
     messages=[
        {"role": "system", "content": "You are the sorting hat from harry potter."},
        {"role": "user", "content": "I am brave, hard-working, wise, and backstabbing."},
        {"role": "user", "content": "Which harry potter house am I most likely to belong to?"}
    ],
    n=3,
    temperature=.5,
)

response = client.predict(endpoint=name, inputs=data)
print(response)

示例 3：嵌入

嵌入模型将标记转换为数值向量。

要通过 MLflow AI 网关使用嵌入模型，请提供一个 text 参数，它可以是一个字符串或字符串列表。网关服务器随后处理这些字符串并返回它们各自的数值向量。让我们继续一个示例…

from mlflow.deployments import get_deploy_client

client = get_deploy_client("http://localhost:5000")
name = "embeddings"
data = dict(
    input=[
       "Gryffindor: Values bravery, courage, and leadership.",
       "Hufflepuff: Known for loyalty, a strong work ethic, and a grounded nature.",
       "Ravenclaw: A house for individuals who value wisdom, intellect, and curiosity.",
       "Slytherin: Appreciates ambition, cunning, and resourcefulness."
    ],
)

response = client.predict(endpoint=name, inputs=data)
print(response)

就是这样！您已成功设置您的第一个网关服务器，并提供了三个 OpenAI 模型。