在 MLflow Transformers 风格中使用大型模型

警告

本指南中描述的功能适用于熟悉 Transformers 和 MLflow 的高级用户。在使用这些功能之前，请了解其限制和潜在风险。

MLflow Transformers 风格允许你在 MLflow 中跟踪各种 Transformers 模型。然而，由于其大小和内存需求，记录大型模型（如大型语言模型（LLMs））可能会非常消耗资源。本指南概述了 MLflow 在记录模型时减少内存和磁盘使用量的功能，使你能够在资源受限的环境中使用大型模型。

概述

下表总结了使用 Transformers 风格记录模型的不同方法。请注意，每种方法都有一定的限制和要求，如以下各节所述。

保存方法

描述

内存使用

磁盘使用情况

示例

基于管道的正常日志记录

使用管道实例或管道组件的字典记录模型。

高

import mlflow
import transformers

pipeline = transformers.pipeline(
    task="text-generation",
    model="meta-llama/Meta-Llama-3.1-70B",
)

with mlflow.start_run():
    mlflow.transformers.log_model(
        transformers_model=pipeline,
        artifact_path="model",
    )

内存高效模型日志记录

通过指定本地检查点的路径来记录模型，避免将模型加载到内存中。

低

高

import mlflow

with mlflow.start_run():
    mlflow.transformers.log_model(
        # Pass a path to local checkpoint as a model
        transformers_model="/path/to/local/checkpoint",
        # Task argument is required for this saving mode.
        task="text-generation",
        artifact_path="model",
    )

存储高效的模型日志记录

通过保存对 HuggingFace Hub 仓库的引用来记录模型，而不是保存模型权重。

高

低

import mlflow
import transformers

pipeline = transformers.pipeline(
    task="text-generation",
    model="meta-llama/Meta-Llama-3.1-70B",
)

with mlflow.start_run():
    mlflow.transformers.log_model(
        transformers_model=pipeline,
        artifact_path="model",
        # Set save_pretrained to False to save storage space
        save_pretrained=False,
    )

内存高效模型日志记录

在 MLflow 2.16.1 中引入的这种方法允许您在不将模型加载到内存中的情况下记录模型：

import mlflow

with mlflow.start_run():
    mlflow.transformers.log_model(
        # Pass a path to local checkpoint as a model to avoid loading the model instance
        transformers_model="path/to/local/checkpoint",
        # Task argument is required for this saving mode.
        task="text-generation",
        artifact_path="model",
    )

在上面的例子中，我们将本地模型检查点/权重的路径作为 mlflow.transformers.log_model() API 中的模型参数传递，而不是传递一个管道实例。MLflow 将检查检查点的模型元数据，并记录模型权重而不将其加载到内存中。这样，你可以用最少的计算资源将一个巨大的、拥有数十亿参数的模型记录到 MLflow 中。

重要说明

在使用此功能时，请注意以下要求和限制：

检查点目录**必须**包含一个有效的config.json文件和模型权重文件。如果需要分词器，其状态文件也必须存在于检查点目录中。您可以通过调用``tokenizer.save_pretrained(“path/to/local/checkpoint”)``方法将分词器状态保存到检查点目录中。
你必须使用适当的任务名称指定 task 参数，该任务名称是模型设计用于的。
在这种模式下，MLflow 可能无法准确推断模型的依赖关系。有关管理模型依赖关系的更多信息，请参阅管理 MLflow 模型中的依赖关系。

警告

确保你指定了正确的任务参数，因为不兼容的任务会导致模型在加载时**失败**。你可以在HuggingFace Hub上查看你的模型的有效任务类型。

存储高效的模型日志记录

通常，当 MLflow 记录一个 ML 模型时，它会保存模型权重的副本到 artifact 存储中。然而，当你使用来自 HuggingFace Hub 的预训练模型并且无意在记录之前对模型或其权重进行微调或其他操作时，这并不是最佳的。对于这种非常常见的情况，在开发提示、测试推理参数等过程中，复制（通常非常大的）模型权重是多余的，这只不过是浪费存储空间。

为了解决这个问题，MLflow 2.11.0 在 mlflow.transformers.save_model() 和 mlflow.transformers.log_model() API 中引入了一个新的参数 save_pretrained。当该参数设置为 False 时，MLflow 将放弃保存预训练的模型权重，而是选择存储指向 HuggingFace Hub 上底层仓库条目的引用；具体来说，当记录您的组件或管道时，会存储仓库名称和模型权重的唯一提交哈希。当加载这种 仅引用 模型时，MLflow 将从保存的元数据中检查仓库名称和提交哈希，并从 HuggingFace Hub 下载模型权重，或者使用 HuggingFace 本地缓存目录中的本地缓存模型。

以下是使用 save_pretrained 参数记录模型的示例

import transformers

pipeline = transformers.pipeline(
    task="text-generation",
    model="meta-llama/Meta-Llama-3.1-70B",
    torch_dtype="torch.float16",
)

with mlflow.start_run():
    mlflow.transformers.log_model(
        transformers_model=pipeline,
        artifact_path="model",
        # Set save_pretrained to False to save storage space
        save_pretrained=False,
    )

在上面的例子中，MLflow 不会保存 Llama-3.1-70B 模型的权重副本，而是将以下元数据记录为对 HuggingFace Hub 模型的引用。这将节省大约 150GB 的存储空间，并且在开发过程中每次启动运行时，也会显著减少日志记录的延迟。

通过导航到 MLflow UI，您可以看到使用仓库 ID 和提交哈希记录的模型：

flavors:
    ...
    transformers:
        source_model_name: meta-llama/Meta-Llama-3.1-70B-Instruct
        source_model_revision: 33101ce6ccc08fa6249c10a543ebfcac65173393
        ...

在生产部署之前，您可能希望持久化模型权重而不是仓库引用。为此，您可以使用 mlflow.transformers.persist_pretrained_model() API 从 HuggingFace Hub 下载模型权重并将其保存到工件位置。有关更多信息，请参阅 OSS 模型注册表或传统工作区模型注册表部分。

为生产环境注册仅引用模型

使用上述任何一种优化方法记录的模型是“仅引用”的，这意味着模型权重不会保存到工件存储中，而只会保存对 HuggingFace Hub 仓库的引用。当你正常加载模型时，MLflow 会从 HuggingFace Hub 下载模型权重。

然而，这可能不适合生产用例，因为模型权重可能不可用或由于网络问题下载可能失败。MLflow 提供了一个解决方案，用于在将参考模型注册到模型注册表时解决此问题。

Databricks Unity Catalog

将仅引用模型注册到 Databricks Unity Catalog 模型注册表不需要比正常模型注册过程 更多的步骤。MLflow 会自动下载并将模型权重与模型元数据一起注册到 Unity Catalog。

import mlflow

mlflow.set_registry_uri("databricks-uc")

# Log the repository ID as a model. The model weight will not be saved to the artifact store
with mlflow.start_run():
    model_info = mlflow.transformers.log_model(
        transformers_model="meta-llama/Meta-Llama-3.1-70B-Instruct",
        artifact_path="model",
    )

# When registering the model to Unity Catalog Model Registry, MLflow will automatically
# persist the model weight files. This may take a several minutes for large models.
mlflow.register_model(model_info.model_uri, "your.model.name")

OSS 模型注册表或传统工作区模型注册表

对于 Databricks 中的 OSS 模型注册表或旧版工作区模型注册表，您需要在注册模型之前手动将模型权重持久化到工件存储中。您可以使用 mlflow.transformers.persist_pretrained_model() API 从 HuggingFace Hub 下载模型权重并将其保存到工件位置。此过程 不需要重新记录模型，而是高效地就地更新现有模型和元数据。

import mlflow

# Log the repository ID as a model. The model weight will not be saved to the artifact store
with mlflow.start_run():
    model_info = mlflow.transformers.log_model(
        transformers_model="meta-llama/Meta-Llama-3.1-70B-Instruct",
        artifact_path="model",
    )

# Before registering the model to the non-UC model registry, persist the model weight
# from the HuggingFace Hub to the artifact location.
mlflow.transformers.persist_pretrained_model(model_info.model_uri)

# Register the model
mlflow.register_model(model_info.model_uri, "your.model.name")

跳过保存预训练模型权重的注意事项

虽然这些功能对于节省计算资源和存储空间以记录大型模型非常有用，但需要注意一些注意事项：

模型可用性的变化: 如果您正在使用来自其他用户仓库的模型，该模型可能会在 HuggingFace Hub 中被删除或变为私有。在这种情况下，MLflow 无法重新加载该模型。对于生产用例，建议在从开发或暂存环境迁移到生产环境之前，将模型权重副本保存到工件存储中。
HuggingFace Hub 访问: 从 HuggingFace Hub 下载模型可能会由于网络延迟或 HuggingFace Hub 服务状态而变得缓慢或不稳定。MLflow 并未为从 HuggingFace Hub 下载模型提供任何重试机制或强大的错误处理。因此，你不应依赖此功能进行最终的生产候选运行。

通过理解这些方法及其局限性，你可以在优化资源使用的同时，有效地在 MLflow 中使用大型 Transformers 模型。