监督模型

配置类¶

Bases: ModelConfig

自动特征交互配置.

Parameters:

Name	Type	Description	Default
`attn_embed_dim`	`int`	多头注意力层中的隐藏单元数量.默认为 32	`32`
`num_heads`	`int`	多头注意力层中的头数.默认为 2	`2`
`num_attn_blocks`	`int`	堆叠的多头注意力层的层数.默认为 3	`3`
`attn_dropouts`	`float`	多头注意力层之间的 dropout.默认为 0.0	`0.0`
`has_residuals`	`bool`	标志,用于在嵌入输出和注意力层输出之间添加残差连接.默认为 True	`True`
`embedding_dim`	`int`	连续和分类列的嵌入维度.默认为 16	`16`
`embedding_initialization`	`Optional[str]`	嵌入层的初始化方案.默认为 `kaiming`.可选值为: [`kaiming_uniform`,`kaiming_normal`]	`'kaiming_uniform'`
`embedding_bias`	`bool`	标志,用于开启嵌入偏置.默认为 True	`True`
`share_embedding`	`bool`	标志,用于在输入嵌入过程中开启共享嵌入.关键思想是为特征整体以及该列的每个唯一值提供嵌入.更多详情请参阅 TabTransformer 论文的附录 A.默认为 False	`False`
`share_embedding_strategy`	`Optional[str]`	添加共享嵌入有两种策略.1. `add` - 为特征添加一个单独的嵌入到特征唯一值的嵌入中.2. `fraction` - 输入嵌入的一部分保留给特征的共享嵌入.默认为 fraction.可选值为: [`add`,`fraction`]	`'fraction'`
`shared_embedding_fraction`	`float`	保留给共享嵌入的输入嵌入维度的一部分.应小于 1.默认为 0.25	`0.25`
`deep_layers`	`bool`	标志,用于在多头注意力层之前启用深层 MLP 层.默认为 False	`False`
`layers`	`str`	深层 MLP 中的层数和单元数,用连字符分隔.默认为 128-64-32	`'128-64-32'`
`activation`	`str`	深层 MLP 中的激活类型.默认激活类型为 PyTorch 中的 ReLU、TanH、LeakyReLU 等. https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity 默认为 ReLU	`'ReLU'`
`use_batch_norm`	`bool`	标志,用于在深层 MLP 中的每个线性层+DropOut 后添加 BatchNorm 层.默认为 False	`False`
`initialization`	`str`	深层 MLP 中线性层的初始化方案.默认为 `kaiming`.可选值为: [`kaiming`,`xavier`,`random`]	`'kaiming'`
`dropout`	`float`	深层 MLP 中元素被置零的概率.默认为 0.0	`0.0`
`attention_pooling`	`bool`	如果为 True,将组合每个块的注意力输出以进行最终预测.默认为 False	`False`
`task`	`str`	指定问题是回归还是分类.`backbone` 是一种任务,将模型视为生成特征的骨干.主要用于 SSL 及相关任务.可选值为: [`regression`,`classification`,`backbone`]	required
`head`	`Optional[str]`	模型使用的头部.应为 `pytorch_tabular.models.common.heads` 中定义的头部之一.默认为 LinearHead.可选值为: [`None`,`LinearHead`,`MixtureDensityHead`]	`'LinearHead'`
`head_config`	`Optional[Dict]`	定义头部的配置字典.如果为空,将初始化为默认的线性头部.	`lambda: {'layers': ''}()`
`embedding_dims`	`Optional[List]`	每个分类列的嵌入维度列表,格式为 (基数, 嵌入维度).如果为空,将根据分类列的基数推断,规则为 min(50, (x + 1) // 2)	`None`
`embedding_dropout`	`float`	应用于分类嵌入的 dropout.默认为 0.0	`0.0`
`batch_norm_continuous_input`	`bool`	如果为 True,将通过 BatchNorm 层对连续层进行归一化.	`True`
`learning_rate`	`float`	模型的学习率.默认为 1e-3	`0.001`
`loss`	`Optional[str]`	应用的损失函数.默认情况下,回归为 MSELoss,分类为 CrossEntropyLoss.除非你确定自己在做什么,否则请保持为 MSELoss 或 L1Loss 用于回归,CrossEntropyLoss 用于分类	`None`
`metrics`	`Optional[List[str]]`	训练期间需要跟踪的指标列表.指标应为 `torchmetrics` 中实现的功能性指标之一.默认情况下,分类为 accuracy,回归为 mean_squared_error	`None`
`metrics_params`	`Optional[List]`	传递给指标函数的参数	`None`
`metrics_prob_input`	`Optional[List]`	配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为 None	`None`
`target_range`	`Optional[List]`	输出变量应限制的范围.当前在多目标回归中被忽略.通常用于回归问题.如果为空,将不应用任何限制	`None`
`seed`	`int`	用于可重复性的种子.默认为 42	`42`

Source code in src/pytorch_tabular/models/autoint/config.py

@dataclass
class AutoIntConfig(ModelConfig):
    """自动特征交互配置.

    Parameters:
        attn_embed_dim (int): 多头注意力层中的隐藏单元数量.默认为 32

        num_heads (int): 多头注意力层中的头数.默认为 2

        num_attn_blocks (int): 堆叠的多头注意力层的层数.默认为 3

        attn_dropouts (float): 多头注意力层之间的 dropout.默认为 0.0

        has_residuals (bool): 标志,用于在嵌入输出和注意力层输出之间添加残差连接.默认为 True

        embedding_dim (int): 连续和分类列的嵌入维度.默认为 16

        embedding_initialization (Optional[str]): 嵌入层的初始化方案.默认为 `kaiming`.可选值为: [`kaiming_uniform`,`kaiming_normal`]

        embedding_bias (bool): 标志,用于开启嵌入偏置.默认为 True

        share_embedding (bool): 标志,用于在输入嵌入过程中开启共享嵌入.关键思想是为特征整体以及该列的每个唯一值提供嵌入.更多详情请参阅 TabTransformer 论文的附录 A.默认为 False

        share_embedding_strategy (Optional[str]): 添加共享嵌入有两种策略.1. `add` - 为特征添加一个单独的嵌入到特征唯一值的嵌入中.2. `fraction` - 输入嵌入的一部分保留给特征的共享嵌入.默认为 fraction.可选值为: [`add`,`fraction`]

        shared_embedding_fraction (float): 保留给共享嵌入的输入嵌入维度的一部分.应小于 1.默认为 0.25

        deep_layers (bool): 标志,用于在多头注意力层之前启用深层 MLP 层.默认为 False

        layers (str): 深层 MLP 中的层数和单元数,用连字符分隔.默认为 128-64-32

        activation (str): 深层 MLP 中的激活类型.默认激活类型为 PyTorch 中的 ReLU、TanH、LeakyReLU 等.
                https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity
                默认为 ReLU

        use_batch_norm (bool): 标志,用于在深层 MLP 中的每个线性层+DropOut 后添加 BatchNorm 层.默认为 False

        initialization (str): 深层 MLP 中线性层的初始化方案.默认为 `kaiming`.可选值为: [`kaiming`,`xavier`,`random`]

        dropout (float): 深层 MLP 中元素被置零的概率.默认为 0.0

        attention_pooling (bool): 如果为 True,将组合每个块的注意力输出以进行最终预测.默认为 False

        task (str): 指定问题是回归还是分类.`backbone` 是一种任务,将模型视为生成特征的骨干.主要用于 SSL 及相关任务.可选值为: [`regression`,`classification`,`backbone`]

        head (Optional[str]): 模型使用的头部.应为 `pytorch_tabular.models.common.heads` 中定义的头部之一.默认为 LinearHead.可选值为: [`None`,`LinearHead`,`MixtureDensityHead`]

        head_config (Optional[Dict]): 定义头部的配置字典.如果为空,将初始化为默认的线性头部.

        embedding_dims (Optional[List]): 每个分类列的嵌入维度列表,格式为 (基数, 嵌入维度).如果为空,将根据分类列的基数推断,规则为 min(50, (x + 1) // 2)

        embedding_dropout (float): 应用于分类嵌入的 dropout.默认为 0.0

        batch_norm_continuous_input (bool): 如果为 True,将通过 BatchNorm 层对连续层进行归一化.

        learning_rate (float): 模型的学习率.默认为 1e-3

        loss (Optional[str]): 应用的损失函数.默认情况下,回归为 MSELoss,分类为 CrossEntropyLoss.除非你确定自己在做什么,否则请保持为 MSELoss 或 L1Loss 用于回归,CrossEntropyLoss 用于分类

        metrics (Optional[List[str]]): 训练期间需要跟踪的指标列表.指标应为 ``torchmetrics`` 中实现的功能性指标之一.默认情况下,分类为 accuracy,回归为 mean_squared_error

        metrics_params (Optional[List]): 传递给指标函数的参数

        metrics_prob_input (Optional[List]): 配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为 None

        target_range (Optional[List]): 输出变量应限制的范围.当前在多目标回归中被忽略.通常用于回归问题.如果为空,将不应用任何限制

        seed (int): 用于可重复性的种子.默认为 42"""

    attn_embed_dim: int = field(
        default=32,
        metadata={"help": "The number of hidden units in the Multi-Headed Attention layers. Defaults to 32"},
    )
    num_heads: int = field(
        default=2,
        metadata={"help": "The number of heads in the Multi-Headed Attention layer. Defaults to 2"},
    )
    num_attn_blocks: int = field(
        default=3,
        metadata={"help": "The number of layers of stacked Multi-Headed Attention layers. Defaults to 3"},
    )
    attn_dropouts: float = field(
        default=0.0,
        metadata={"help": "Dropout between layers of Multi-Headed Attention Layers. Defaults to 0.0"},
    )
    has_residuals: bool = field(
        default=True,
        metadata={
            "help": "Flag to have a residual connect from enbedded output to attention layer output. Defaults to True"
        },
    )
    embedding_dim: int = field(
        default=16,
        metadata={"help": "The dimensions of the embedding for continuous and categorical columns. Defaults to 16"},
    )
    embedding_initialization: Optional[str] = field(
        default="kaiming_uniform",
        metadata={
            "help": "Initialization scheme for the embedding layers. Defaults to `kaiming`",
            "choices": ["kaiming_uniform", "kaiming_normal"],
        },
    )
    embedding_bias: bool = field(
        default=True,
        metadata={"help": "Flag to turn on Embedding Bias. Defaults to True"},
    )
    share_embedding: bool = field(
        default=False,
        metadata={
            "help": "The flag turns on shared embeddings in the input embedding process."
            " The key idea here is to have an embedding for the feature as a whole along with embeddings"
            " of each unique values of that column."
            " For more details refer to Appendix A of the TabTransformer paper. Defaults to False"
        },
    )
    share_embedding_strategy: Optional[str] = field(
        default="fraction",
        metadata={
            "help": "There are two strategies in adding shared embeddings."
            " 1. `add` - A separate embedding for the feature is added to the embedding"
            " of the unique values of the feature."
            " 2. `fraction` - A fraction of the input embedding is reserved"
            " for the shared embedding of the feature. Defaults to fraction.",
            "choices": ["add", "fraction"],
        },
    )
    shared_embedding_fraction: float = field(
        default=0.25,
        metadata={
            "help": "Fraction of the input_embed_dim to be reserved by the shared embedding."
            " Should be less than one. Defaults to 0.25"
        },
    )
    deep_layers: bool = field(
        default=False,
        metadata={"help": "Flag to enable a deep MLP layer before the Multi-Headed Attention layer. Defaults to False"},
    )
    layers: str = field(
        default="128-64-32",
        metadata={"help": "Hyphen-separated number of layers and units in the deep MLP. Defaults to 128-64-32"},
    )
    activation: str = field(
        default="ReLU",
        metadata={
            "help": "The activation type in the deep MLP. The default activation in PyTorch"
            " like ReLU, TanH, LeakyReLU, etc."
            " https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity."
            " Defaults to ReLU"
        },
    )
    use_batch_norm: bool = field(
        default=False,
        metadata={
            "help": "Flag to include a BatchNorm layer after each Linear Layer+DropOut in the deep MLP."
            " Defaults to False"
        },
    )
    initialization: str = field(
        default="kaiming",
        metadata={
            "help": "Initialization scheme for the linear layers in the deep MLP. Defaults to `kaiming`",
            "choices": ["kaiming", "xavier", "random"],
        },
    )
    dropout: float = field(
        default=0.0,
        metadata={"help": "Probability of an element to be zeroed in the deep MLP. Defaults to 0.0"},
    )
    attention_pooling: bool = field(
        default=False,
        metadata={
            "help": "If True, will combine the attention outputs of each block for final prediction. Defaults to False"
        },
    )
    _module_src: str = field(default="models.autoint")
    _model_name: str = field(default="AutoIntModel")
    _backbone_name: str = field(default="AutoIntBackbone")
    _config_name: str = field(default="AutoIntConfig")

Bases: ModelConfig

类别嵌入模型配置.

Parameters:

Name	Type	Description	Default
`layers`	`str`	已弃用: 分类头中层数和单元数的连字符分隔字符串.例如 32-64-32. 默认为 128-64-32	`'128-64-32'`
`activation`	`str`	已弃用: 分类头中的激活类型.默认激活类型为 PyTorch 中的 ReLU、TanH、LeakyReLU 等. https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity. 默认为 ReLU	`'ReLU'`
`use_batch_norm`	`bool`	已弃用: 标志,用于在每个线性层+DropOut 后包含一个 BatchNorm 层.默认为 False	`False`
`initialization`	`str`	已弃用: 线性层的初始化方案.默认为 `kaiming`.可选值为: [`kaiming`,`xavier`,`random`].	`'kaiming'`
`dropout`	`float`	已弃用: 分类元素被置零的概率.这会添加到每个线性层.默认为 0.0	`0.0`
`task`	`str`	指定问题是回归还是分类.`backbone` 是一种任务,将模型视为生成特征的骨干.主要用于 SSL 及相关任务. 可选值为: [`regression`,`classification`,`backbone`].	required
`head`	`Optional[str]`	模型使用的头部.应为 `pytorch_tabular.models.common.heads` 中定义的头部之一. 默认为 LinearHead.可选值为: [`None`,`LinearHead`,`MixtureDensityHead`].	`'LinearHead'`
`head_config`	`Optional[Dict]`	定义头部的配置字典.如果留空,将初始化为默认的线性头部.	`lambda: {'layers': ''}()`
`embedding_dims`	`Optional[List]`	每个分类列的嵌入维度列表,格式为 (基数, 嵌入维度).如果留空,将根据分类列的基数推断, 使用规则 min(50, (x + 1) // 2)	`None`
`embedding_dropout`	`float`	应用于分类嵌入的 Dropout.默认为 0.0	`0.0`
`batch_norm_continuous_input`	`bool`	如果为 True,我们将通过 BatchNorm 层对连续层进行归一化.	`True`
`learning_rate`	`float`	模型的学习率.默认为 1e-3.	`0.001`
`loss`	`Optional[str]`	要应用的损失函数.默认情况下,回归为 MSELoss,分类为 CrossEntropyLoss.除非你确定自己在做什么, 否则请保留为 MSELoss 或 L1Loss 用于回归,CrossEntropyLoss 用于分类	`None`
`metrics`	`Optional[List[str]]`	训练期间需要跟踪的指标列表.指标应为 `torchmetrics` 中实现的功能性指标之一. 默认情况下,分类为 accuracy,回归为 mean_squared_error	`None`
`metrics_params`	`Optional[List]`	传递给指标函数的参数.`task` 强制为 `multiclass`,因为多分类版本可以处理二分类, 为了简单起见,我们只使用 `multiclass`.	`None`
`metrics_prob_input`	`Optional[List]`	配置中定义的分类指标的强制参数. 这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为 None.	`None`
`target_range`	`Optional[List]`	限制输出变量的范围.目前忽略多目标回归.通常用于回归问题.如果留空,将不应用任何限制	`None`
`seed`	`int`	用于可重复性的种子.默认为 42	`42`

Source code in src/pytorch_tabular/models/category_embedding/config.py

@dataclass
class CategoryEmbeddingModelConfig(ModelConfig):
    """类别嵌入模型配置.

    Parameters:
        layers (str): 已弃用: 分类头中层数和单元数的连字符分隔字符串.例如 32-64-32.
                默认为 128-64-32

        activation (str): 已弃用: 分类头中的激活类型.默认激活类型为 PyTorch 中的 ReLU、TanH、LeakyReLU 等.
                https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity.
                默认为 ReLU

        use_batch_norm (bool): 已弃用: 标志,用于在每个线性层+DropOut 后包含一个 BatchNorm 层.默认为 False

        initialization (str): 已弃用: 线性层的初始化方案.默认为 `kaiming`.可选值为: [`kaiming`,`xavier`,`random`].

        dropout (float): 已弃用: 分类元素被置零的概率.这会添加到每个线性层.默认为 0.0


        task (str): 指定问题是回归还是分类.`backbone` 是一种任务,将模型视为生成特征的骨干.主要用于 SSL 及相关任务.
                可选值为: [`regression`,`classification`,`backbone`].

        head (Optional[str]): 模型使用的头部.应为 `pytorch_tabular.models.common.heads` 中定义的头部之一.
                默认为 LinearHead.可选值为: [`None`,`LinearHead`,`MixtureDensityHead`].

        head_config (Optional[Dict]): 定义头部的配置字典.如果留空,将初始化为默认的线性头部.

        embedding_dims (Optional[List]): 每个分类列的嵌入维度列表,格式为 (基数, 嵌入维度).如果留空,将根据分类列的基数推断,
                使用规则 min(50, (x + 1) // 2)

        embedding_dropout (float): 应用于分类嵌入的 Dropout.默认为 0.0

        batch_norm_continuous_input (bool): 如果为 True,我们将通过 BatchNorm 层对连续层进行归一化.

        learning_rate (float): 模型的学习率.默认为 1e-3.

        loss (Optional[str]): 要应用的损失函数.默认情况下,回归为 MSELoss,分类为 CrossEntropyLoss.除非你确定自己在做什么,
                否则请保留为 MSELoss 或 L1Loss 用于回归,CrossEntropyLoss 用于分类

        metrics (Optional[List[str]]): 训练期间需要跟踪的指标列表.指标应为 ``torchmetrics`` 中实现的功能性指标之一.
                默认情况下,分类为 accuracy,回归为 mean_squared_error

        metrics_params (Optional[List]): 传递给指标函数的参数.`task` 强制为 `multiclass`,因为多分类版本可以处理二分类,
                为了简单起见,我们只使用 `multiclass`.

        metrics_prob_input (Optional[List]): 配置中定义的分类指标的强制参数.
            这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为 None.

        target_range (Optional[List]): 限制输出变量的范围.目前忽略多目标回归.通常用于回归问题.如果留空,将不应用任何限制

        seed (int): 用于可重复性的种子.默认为 42"""

    layers: str = field(
        default="128-64-32",
        metadata={
            "help": (
                "Hyphen-separated number of layers and units in the classification"
                " head. eg. 32-64-32. Defaults to 128-64-32"
            )
        },
    )
    activation: str = field(
        default="ReLU",
        metadata={
            "help": (
                "The activation type in the classification head. The default"
                " activation in PyTorch like ReLU, TanH, LeakyReLU, etc."
                " https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity."
                " Defaults to ReLU"
            )
        },
    )
    use_batch_norm: bool = field(
        default=False,
        metadata={"help": ("Flag to include a BatchNorm layer after each Linear Layer+DropOut." " Defaults to False")},
    )
    initialization: str = field(
        default="kaiming",
        metadata={
            "help": ("Initialization scheme for the linear layers. Defaults to `kaiming`"),
            "choices": ["kaiming", "xavier", "random"],
        },
    )
    dropout: float = field(
        default=0.0,
        metadata={
            "help": (
                "probability of an classification element to be zeroed."
                " This is added to each linear layer. Defaults to 0.0"
            )
        },
    )

    # def __post_init__(self):
    #     deprecated_args = [
    #         "layers",
    #         "activation",
    #         "use_batch_norm",
    #         "initialization",
    #         "dropout",
    #     ]
    #     # for arg in deprecated_args:
    #     if any([getattr(self, arg) is not None for arg in deprecated_args]):
    #         warnings.warn(
    #             f"{deprecated_args} are deprecated and will be remoevd in next version. "
    #             "Please use 'head' and `head_config` and set deprecated args "
    #             "to `None` to turn off warning. CategoricalEmbedding model is just a "
    #             "linear head with embedding layers."
    #         )
    #     return super().__post_init__()

    _module_src: str = field(default="models.category_embedding")
    _model_name: str = field(default="CategoryEmbeddingModel")
    _backbone_name: str = field(default="CategoryEmbeddingBackbone")
    _config_name: str = field(default="CategoryEmbeddingModelConfig")

Bases: ModelConfig

DANet 配置.

Parameters:

Name	Type	Description	Default
`n_layers`	`int`	DANet 中块的数量.8、20、32 是论文评估的配置.默认为 8	`8`
`abstlay_dim_1`	`int`	块中第一个 ABSTLAY 层中间输出的维度.默认为 32	`32`
`abstlay_dim_2`	`int`	块中第二个 ABSTLAY 层中间输出的维度.默认为 64	`None`
`k`	`int`	ABSTLAY 层中特征组的数量.默认为 5	`5`
`dropout_rate`	`float`	块中应用的 dropout.默认为 0.1	`0.1`
`task`	`str`	指定问题是回归还是分类.`backbone` 是一种任务,将模型视为生成特征的骨干.主要用于 SSL 及相关任务.可选值为: [`regression`, `classification`, `backbone`].	required
`head`	`Optional[str]`	模型使用的头部.应为 `pytorch_tabular.models.common.heads` 中定义的头部之一.默认为 LinearHead.可选值为: [`None`, `LinearHead`, `MixtureDensityHead`].	`'LinearHead'`
`head_config`	`Optional[Dict]`	定义头部的配置字典.如果留空,将初始化为默认的线性头部.	`lambda: {'layers': ''}()`
`embedding_dims`	`Optional[List]`	每个分类列的嵌入维度,格式为列表中的元组 (基数, 嵌入维度).如果留空,将根据分类列的基数推断,规则为 min(50, (x + 1) // 2)	`None`
`embedding_dropout`	`float`	分类嵌入应用的 dropout.默认为 0.0	`0.0`
`batch_norm_continuous_input`	`bool`	如果为 True,将通过 BatchNorm 层对连续层进行归一化.	`True`
`learning_rate`	`float`	模型的学习率.默认为 1e-3.	`0.001`
`loss`	`Optional[str]`	应用的损失函数.默认情况下,回归为 MSELoss,分类为 CrossEntropyLoss.除非你确定自己在做什么,否则请保留为 MSELoss 或 L1Loss 用于回归,CrossEntropyLoss 用于分类	`None`
`metrics`	`Optional[List[str]]`	训练期间需要跟踪的指标列表.指标应为 `torchmetrics` 中实现的功能性指标之一.默认情况下,分类为 accuracy,回归为 mean_squared_error	`None`
`metrics_params`	`Optional[List]`	传递给指标函数的参数.`task` 强制为 `multiclass`,因为多分类版本可以处理二分类,为了简单起见,我们只使用 `multiclass`.	`None`
`metrics_prob_input`	`Optional[List]`	是配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为 None.	`None`
`target_range`	`Optional[List]`	限制输出变量的范围.目前忽略多目标回归.通常用于回归问题.如果留空,将不应用任何限制	`None`
`seed`	`int`	可重复性的种子.默认为 42	`42`

Source code in src/pytorch_tabular/models/danet/config.py

@dataclass
class DANetConfig(ModelConfig):
    """DANet 配置.

    Parameters:
        n_layers (int): DANet 中块的数量.8、20、32 是论文评估的配置.默认为 8

        abstlay_dim_1 (int): 块中第一个 ABSTLAY 层中间输出的维度.默认为 32

        abstlay_dim_2 (int): 块中第二个 ABSTLAY 层中间输出的维度.默认为 64

        k (int): ABSTLAY 层中特征组的数量.默认为 5

        dropout_rate (float): 块中应用的 dropout.默认为 0.1

        task (str): 指定问题是回归还是分类.`backbone` 是一种任务,将模型视为生成特征的骨干.主要用于 SSL 及相关任务.可选值为: [`regression`, `classification`, `backbone`].

        head (Optional[str]): 模型使用的头部.应为 `pytorch_tabular.models.common.heads` 中定义的头部之一.默认为 LinearHead.可选值为: [`None`, `LinearHead`, `MixtureDensityHead`].

        head_config (Optional[Dict]): 定义头部的配置字典.如果留空,将初始化为默认的线性头部.

        embedding_dims (Optional[List]): 每个分类列的嵌入维度,格式为列表中的元组 (基数, 嵌入维度).如果留空,将根据分类列的基数推断,规则为 min(50, (x + 1) // 2)

        embedding_dropout (float): 分类嵌入应用的 dropout.默认为 0.0

        batch_norm_continuous_input (bool): 如果为 True,将通过 BatchNorm 层对连续层进行归一化.

        learning_rate (float): 模型的学习率.默认为 1e-3.

        loss (Optional[str]): 应用的损失函数.默认情况下,回归为 MSELoss,分类为 CrossEntropyLoss.除非你确定自己在做什么,否则请保留为 MSELoss 或 L1Loss 用于回归,CrossEntropyLoss 用于分类

        metrics (Optional[List[str]]): 训练期间需要跟踪的指标列表.指标应为 ``torchmetrics`` 中实现的功能性指标之一.默认情况下,分类为 accuracy,回归为 mean_squared_error

        metrics_params (Optional[List]): 传递给指标函数的参数.`task` 强制为 `multiclass`,因为多分类版本可以处理二分类,为了简单起见,我们只使用 `multiclass`.

        metrics_prob_input (Optional[List]): 是配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为 None.

        target_range (Optional[List]): 限制输出变量的范围.目前忽略多目标回归.通常用于回归问题.如果留空,将不应用任何限制

        seed (int): 可重复性的种子.默认为 42"""

    n_layers: int = field(
        default=8,
        metadata={"help": "Number of Blocks in the DANet. Each block has 2 Abstlay Blocks each. Defaults to 8"},
    )

    abstlay_dim_1: int = field(
        default=32,
        metadata={
            "help": "The dimension for the intermediate output in the first ABSTLAY layer in a Block. Defaults to 32"
        },
    )

    abstlay_dim_2: Optional[int] = field(
        default=None,
        metadata={
            "help": "The dimension for the intermediate output in the second ABSTLAY layer in a Block."
            "If None, it will be twice abstlay_dim_1. Defaults to None"
        },
    )
    k: int = field(
        default=5,
        metadata={"help": "The number of feature groups in the ABSTLAY layer. Defaults to 5"},
    )
    dropout_rate: float = field(
        default=0.1,
        metadata={"help": "Dropout to be applied in the Block. Defaults to 0.1"},
    )
    block_activation: str = field(
        default="LeakyReLU",
        metadata={
            "help": "The activation type in the classification head. The default activation in PyTorch"
            " like ReLU, TanH, LeakyReLU, etc."
            " https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity"
        },
    )
    virtual_batch_size: Optional[int] = field(
        default=256,
        metadata={
            "help": "If not None, all BatchNorms will be converted to GhostBatchNorm's "
            " with this virtual batch size. Defaults to None"
        },
    )

    _module_src: str = field(default="models.danet")
    _model_name: str = field(default="DANetModel")
    _backbone_name: str = field(default="DANetBackbone")
    _config_name: str = field(default="DANetConfig")

    def __post_init__(self):
        if self.abstlay_dim_2 is None:
            self.abstlay_dim_2 = self.abstlay_dim_1 * 2
        return super().__post_init__()

Bases: ModelConfig

Tab Transformer 配置.

Parameters:

Name	Type	Description	Default
`input_embed_dim`	`int`	输入分类特征的嵌入维度.默认为 32	`32`
`embedding_initialization`	`Optional[str]`	嵌入层的初始化方案.默认为 `kaiming`.可选值为: [`kaiming_uniform`,`kaiming_normal`].	`'kaiming_uniform'`
`embedding_bias`	`bool`	是否开启嵌入偏置的标志.默认为 True	`True`
`share_embedding`	`bool`	该标志用于在输入嵌入过程中开启共享嵌入.其核心思想是为整个特征及其每个唯一值分别设置嵌入.更多详情请参阅 TabTransformer 论文的附录 A.默认为 False	`False`
`share_embedding_strategy`	`Optional[str]`	添加共享嵌入有两种策略.1. `add` - 为特征添加一个独立的嵌入,并将其与特征唯一值的嵌入相加.2. `fraction` - 输入嵌入的一部分保留给特征的共享嵌入.默认为 fraction.可选值为: [`add`,`fraction`].	`'fraction'`
`shared_embedding_fraction`	`float`	保留给共享嵌入的输入嵌入维度比例.应小于 1.默认为 0.25	`0.25`
`attn_feature_importance`	`bool`	如果遇到内存问题,可以关闭特征重要性,这样就不会保存注意力权重.默认为 True	`True`
`num_heads`	`int`	多头注意力层中的头数.默认为 8	`8`
`num_attn_blocks`	`int`	堆叠的多头注意力层数.默认为 6	`6`
`transformer_head_dim`	`Optional[int]`	多头注意力层中的隐藏单元数.默认为 None,将与输入维度相同.	`None`
`attn_dropout`	`float`	多头注意力后应用的 dropout.默认为 0.1	`0.1`
`add_norm_dropout`	`float`	AddNorm 层中应用的 dropout.默认为 0.1	`0.1`
`ff_dropout`	`float`	位置前馈网络中应用的 dropout.默认为 0.1	`0.1`
`ff_hidden_multiplier`	`int`	位置前馈层对输入的缩放倍数.默认为 4	`4`
`transformer_activation`	`str`	变换器前馈层中的激活类型.除了 PyTorch 中的默认激活函数如 ReLU、TanH、LeakyReLU 等（https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity）,还实现了 GEGLU、ReGLU 和 SwiGLU（https://arxiv.org/pdf/2002.05202.pdf）.默认为 GEGLU	`'GEGLU'`
`task`	`str`	指定问题是回归还是分类.`backbone` 是一种任务,将模型视为生成特征的骨干网络.主要用于 SSL 及相关任务.可选值为: [`regression`,`classification`,`backbone`].	required
`head`	`Optional[str]`	模型使用的头部.应为 `pytorch_tabular.models.common.heads` 中定义的头部之一.默认为 LinearHead.可选值为: [`None`,`LinearHead`,`MixtureDensityHead`].	`'LinearHead'`
`head_config`	`Optional[Dict]`	定义头部的配置字典.如果留空,将初始化为默认的线性头部.	`lambda: {'layers': ''}()`
`embedding_dims`	`Optional[List]`	每个分类列的嵌入维度列表,格式为 (基数, 嵌入维度).如果留空,将根据分类列的基数推断,规则为 min(50, (x + 1) // 2)	`None`
`embedding_dropout`	`float`	分类嵌入中应用的 dropout.默认为 0.0	`0.0`
`batch_norm_continuous_input`	`bool`	如果为 True,将通过 BatchNorm 层对连续层进行归一化.	`True`
`learning_rate`	`float`	模型的学习率.默认为 1e-3.	`0.001`
`loss`	`Optional[str]`	应用的损失函数.默认情况下,回归为 MSELoss,分类为 CrossEntropyLoss.除非你确定自己在做什么,否则请保持为 MSELoss 或 L1Loss 用于回归,CrossEntropyLoss 用于分类	`None`
`metrics`	`Optional[List[str]]`	训练期间需要跟踪的指标列表.指标应为 `torchmetrics` 中实现的功能性指标之一.默认情况下,分类为 accuracy,回归为 mean_squared_error	`None`
`metrics_params`	`Optional[List]`	传递给指标函数的参数.`task` 强制为 `multiclass`,因为多分类版本可以处理二分类,并且为了简化,我们仅使用 `multiclass`.	`None`
`metrics_prob_input`	`Optional[List]`	是配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为 None.	`None`
`target_range`	`Optional[List]`	限制输出变量的范围.目前多目标回归中忽略.通常用于回归问题.如果留空,将不应用任何限制	`None`
`seed`	`int`	用于可重复性的种子.默认为 42	`42`

Source code in src/pytorch_tabular/models/ft_transformer/config.py

@dataclass
class FTTransformerConfig(ModelConfig):
    """Tab Transformer 配置.

    Parameters:
        input_embed_dim (int): 输入分类特征的嵌入维度.默认为 32

        embedding_initialization (Optional[str]): 嵌入层的初始化方案.默认为 `kaiming`.可选值为: [`kaiming_uniform`,`kaiming_normal`].

        embedding_bias (bool): 是否开启嵌入偏置的标志.默认为 True

        share_embedding (bool): 该标志用于在输入嵌入过程中开启共享嵌入.其核心思想是为整个特征及其每个唯一值分别设置嵌入.更多详情请参阅 TabTransformer 论文的附录 A.默认为 False

        share_embedding_strategy (Optional[str]): 添加共享嵌入有两种策略.1. `add` - 为特征添加一个独立的嵌入,并将其与特征唯一值的嵌入相加.2. `fraction` - 输入嵌入的一部分保留给特征的共享嵌入.默认为 fraction.可选值为: [`add`,`fraction`].

        shared_embedding_fraction (float): 保留给共享嵌入的输入嵌入维度比例.应小于 1.默认为 0.25

        attn_feature_importance (bool): 如果遇到内存问题,可以关闭特征重要性,这样就不会保存注意力权重.默认为 True

        num_heads (int): 多头注意力层中的头数.默认为 8

        num_attn_blocks (int): 堆叠的多头注意力层数.默认为 6

        transformer_head_dim (Optional[int]): 多头注意力层中的隐藏单元数.默认为 None,将与输入维度相同.

        attn_dropout (float): 多头注意力后应用的 dropout.默认为 0.1

        add_norm_dropout (float): AddNorm 层中应用的 dropout.默认为 0.1

        ff_dropout (float): 位置前馈网络中应用的 dropout.默认为 0.1

        ff_hidden_multiplier (int): 位置前馈层对输入的缩放倍数.默认为 4

        transformer_activation (str): 变换器前馈层中的激活类型.除了 PyTorch 中的默认激活函数如 ReLU、TanH、LeakyReLU 等（https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity）,还实现了 GEGLU、ReGLU 和 SwiGLU（https://arxiv.org/pdf/2002.05202.pdf）.默认为 GEGLU

        task (str): 指定问题是回归还是分类.`backbone` 是一种任务,将模型视为生成特征的骨干网络.主要用于 SSL 及相关任务.可选值为: [`regression`,`classification`,`backbone`].

        head (Optional[str]): 模型使用的头部.应为 `pytorch_tabular.models.common.heads` 中定义的头部之一.默认为 LinearHead.可选值为: [`None`,`LinearHead`,`MixtureDensityHead`].

        head_config (Optional[Dict]): 定义头部的配置字典.如果留空,将初始化为默认的线性头部.

        embedding_dims (Optional[List]): 每个分类列的嵌入维度列表,格式为 (基数, 嵌入维度).如果留空,将根据分类列的基数推断,规则为 min(50, (x + 1) // 2)

        embedding_dropout (float): 分类嵌入中应用的 dropout.默认为 0.0

        batch_norm_continuous_input (bool): 如果为 True,将通过 BatchNorm 层对连续层进行归一化.

        learning_rate (float): 模型的学习率.默认为 1e-3.

        loss (Optional[str]): 应用的损失函数.默认情况下,回归为 MSELoss,分类为 CrossEntropyLoss.除非你确定自己在做什么,否则请保持为 MSELoss 或 L1Loss 用于回归,CrossEntropyLoss 用于分类

        metrics (Optional[List[str]]): 训练期间需要跟踪的指标列表.指标应为 ``torchmetrics`` 中实现的功能性指标之一.默认情况下,分类为 accuracy,回归为 mean_squared_error

        metrics_params (Optional[List]): 传递给指标函数的参数.`task` 强制为 `multiclass`,因为多分类版本可以处理二分类,并且为了简化,我们仅使用 `multiclass`.

        metrics_prob_input (Optional[List]): 是配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为 None.

        target_range (Optional[List]): 限制输出变量的范围.目前多目标回归中忽略.通常用于回归问题.如果留空,将不应用任何限制

        seed (int): 用于可重复性的种子.默认为 42"""

    input_embed_dim: int = field(
        default=32,
        metadata={"help": "The embedding dimension for the input categorical features. Defaults to 32"},
    )
    embedding_initialization: Optional[str] = field(
        default="kaiming_uniform",
        metadata={
            "help": "Initialization scheme for the embedding layers. Defaults to `kaiming`",
            "choices": ["kaiming_uniform", "kaiming_normal"],
        },
    )
    embedding_bias: bool = field(
        default=True,
        metadata={"help": "Flag to turn on Embedding Bias. Defaults to True"},
    )
    share_embedding: bool = field(
        default=False,
        metadata={
            "help": "The flag turns on shared embeddings in the input embedding process."
            " The key idea here is to have an embedding for the feature as a whole along with embeddings"
            " of each unique values of that column. For more details refer"
            " to Appendix A of the TabTransformer paper. Defaults to False"
        },
    )
    share_embedding_strategy: Optional[str] = field(
        default="fraction",
        metadata={
            "help": "There are two strategies in adding shared embeddings."
            " 1. `add` - A separate embedding for the feature is added to the embedding"
            " of the unique values of the feature."
            " 2. `fraction` - A fraction of the input embedding is reserved"
            " for the shared embedding of the feature. Defaults to fraction.",
            "choices": ["add", "fraction"],
        },
    )
    shared_embedding_fraction: float = field(
        default=0.25,
        metadata={
            "help": "Fraction of the input_embed_dim to be reserved by the shared embedding."
            " Should be less than one. Defaults to 0.25"
        },
    )
    attn_feature_importance: bool = field(
        default=True,
        metadata={
            "help": "If you are facing memory issues, you can turn off feature importance"
            " which will not save the attention weights. Defaults to True"
        },
    )
    num_heads: int = field(
        default=8,
        metadata={"help": "The number of heads in the Multi-Headed Attention layer. Defaults to 8"},
    )
    num_attn_blocks: int = field(
        default=6,
        metadata={"help": "The number of layers of stacked Multi-Headed Attention layers. Defaults to 6"},
    )
    transformer_head_dim: Optional[int] = field(
        default=None,
        metadata={
            "help": "The number of hidden units in the Multi-Headed Attention layers."
            " Defaults to None and will be same as input_dim."
        },
    )
    attn_dropout: float = field(
        default=0.1,
        metadata={"help": "Dropout to be applied after Multi headed Attention. Defaults to 0.1"},
    )
    add_norm_dropout: float = field(
        default=0.1,
        metadata={"help": "Dropout to be applied in the AddNorm Layer. Defaults to 0.1"},
    )
    ff_dropout: float = field(
        default=0.1,
        metadata={"help": "Dropout to be applied in the Positionwise FeedForward Network. Defaults to 0.1"},
    )
    ff_hidden_multiplier: int = field(
        default=4,
        metadata={"help": "Multiple by which the Positionwise FF layer scales the input. Defaults to 4"},
    )

    transformer_activation: str = field(
        default="GEGLU",
        metadata={
            "help": "The activation type in the transformer feed forward layers."
            " In addition to the default activation in PyTorch like ReLU, TanH, LeakyReLU, etc."
            " https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity,"
            " GEGLU, ReGLU and SwiGLU are also implemented (https://arxiv.org/pdf/2002.05202.pdf)."
            " Defaults to GEGLU",
        },
    )

    _module_src: str = field(default="models.ft_transformer")
    _model_name: str = field(default="FTTransformerModel")
    _backbone_name: str = field(default="FTTransformerBackbone")
    _config_name: str = field(default="FTTransformerConfig")

Bases: ModelConfig

门控自适应网络用于深度自动化特征学习（GANDALF）配置.

Parameters:

Name	Type	Description	Default
`gflu_stages`	`int`	特征抽象层的层数.默认为 6	`6`
`gflu_dropout`	`float`	特征抽象层的丢弃率.默认为 0.0	`0.0`
`gflu_feature_init_sparsity`	`float`	仅对 t-softmax 有效.在每个 GFLU 阶段中选择的特征百分比.这只是初始化值,在学习过程中可能会改变.默认为 0.3	`0.3`
`learnable_sparsity`	`bool`	仅对 t-softmax 有效.如果为 True,稀疏性参数将被学习.如果为 False,稀疏性参数将固定为 `gflu_feature_init_sparsity` 和 `tree_feature_init_sparsity` 中指定的初始值.默认为 True	`True`
`task`	`str`	指定问题是回归还是分类.`backbone` 是一种将模型视为生成特征的主干的任务.主要用于 SSL 及相关任务.可选值为: [`regression`,`classification`,`backbone`]	required
`head`	`Optional[str]`	模型使用的头部.应为 `pytorch_tabular.models.common.heads` 中定义的头部之一.默认为 LinearHead.可选值为: [`None`,`LinearHead`,`MixtureDensityHead`]	`'LinearHead'`
`head_config`	`Optional[Dict]`	定义头部的配置字典.如果留空,将初始化为默认的线性头部.	`lambda: {'layers': ''}()`
`embedding_dims`	`Optional[List]`	每个分类列的嵌入维度,格式为 (基数, 嵌入维度) 的元组列表.如果留空,将根据分类列的基数推断,规则为 min(50, (x + 1) // 2)	`None`
`embedding_dropout`	`float`	应用于分类嵌入的丢弃率.默认为 0.0	`0.0`
`batch_norm_continuous_input`	`bool`	如果为 True,将通过 BatchNorm 层对连续层进行归一化.	`True`
`learning_rate`	`float`	模型的学习率.默认为 1e-3	`0.001`
`loss`	`Optional[str]`	应用的损失函数.默认情况下,回归为 MSELoss,分类为 CrossEntropyLoss.除非你确定自己在做什么,否则请保持为 MSELoss 或 L1Loss 用于回归,CrossEntropyLoss 用于分类	`None`
`metrics`	`Optional[List[str]]`	训练期间需要跟踪的指标列表.指标应为 `torchmetrics` 中实现的功能性指标之一.默认情况下,分类为 accuracy,回归为 mean_squared_error	`None`
`metrics_params`	`Optional[List]`	传递给指标函数的参数.`task` 强制为 `multiclass`,因为多分类版本可以处理二分类,并且为了简单起见,我们仅使用 `multiclass`	`None`
`metrics_prob_input`	`Optional[List]`	是配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为 None	`None`
`target_range`	`Optional[List]`	限制输出变量的范围.目前忽略多目标回归.通常用于回归问题.如果留空,将不应用任何限制	`None`
`seed`	`int`	用于可重复性的种子.默认为 42	`42`

Source code in src/pytorch_tabular/models/gandalf/config.py

@dataclass
class GANDALFConfig(ModelConfig):
    """门控自适应网络用于深度自动化特征学习（GANDALF）配置.

    Parameters:
        gflu_stages (int): 特征抽象层的层数.默认为 6

        gflu_dropout (float): 特征抽象层的丢弃率.默认为 0.0

        gflu_feature_init_sparsity (float): 仅对 t-softmax 有效.在每个 GFLU 阶段中选择的特征百分比.这只是初始化值,在学习过程中可能会改变.默认为 0.3

        learnable_sparsity (bool): 仅对 t-softmax 有效.如果为 True,稀疏性参数将被学习.如果为 False,稀疏性参数将固定为 `gflu_feature_init_sparsity` 和 `tree_feature_init_sparsity` 中指定的初始值.默认为 True

        task (str): 指定问题是回归还是分类.`backbone` 是一种将模型视为生成特征的主干的任务.主要用于 SSL 及相关任务.可选值为: [`regression`,`classification`,`backbone`]

        head (Optional[str]): 模型使用的头部.应为 `pytorch_tabular.models.common.heads` 中定义的头部之一.默认为 LinearHead.可选值为: [`None`,`LinearHead`,`MixtureDensityHead`]

        head_config (Optional[Dict]): 定义头部的配置字典.如果留空,将初始化为默认的线性头部.

        embedding_dims (Optional[List]): 每个分类列的嵌入维度,格式为 (基数, 嵌入维度) 的元组列表.如果留空,将根据分类列的基数推断,规则为 min(50, (x + 1) // 2)

        embedding_dropout (float): 应用于分类嵌入的丢弃率.默认为 0.0

        batch_norm_continuous_input (bool): 如果为 True,将通过 BatchNorm 层对连续层进行归一化.

        learning_rate (float): 模型的学习率.默认为 1e-3

        loss (Optional[str]): 应用的损失函数.默认情况下,回归为 MSELoss,分类为 CrossEntropyLoss.除非你确定自己在做什么,否则请保持为 MSELoss 或 L1Loss 用于回归,CrossEntropyLoss 用于分类

        metrics (Optional[List[str]]): 训练期间需要跟踪的指标列表.指标应为 ``torchmetrics`` 中实现的功能性指标之一.默认情况下,分类为 accuracy,回归为 mean_squared_error

        metrics_params (Optional[List]): 传递给指标函数的参数.`task` 强制为 `multiclass`,因为多分类版本可以处理二分类,并且为了简单起见,我们仅使用 `multiclass`

        metrics_prob_input (Optional[List]): 是配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为 None

        target_range (Optional[List]): 限制输出变量的范围.目前忽略多目标回归.通常用于回归问题.如果留空,将不应用任何限制

        seed (int): 用于可重复性的种子.默认为 42"""

    gflu_stages: int = field(
        default=6,
        metadata={"help": "Number of layers in the feature abstraction layer. Defaults to 6"},
    )

    gflu_dropout: float = field(
        default=0.0, metadata={"help": "Dropout rate for the feature abstraction layer. Defaults to 0.0"}
    )

    gflu_feature_init_sparsity: float = field(
        default=0.3,
        metadata={
            "help": "Only valid for t-softmax. The perecentge of features to be selected in "
            "each GFLU stage. This is just initialized and during learning it may change"
        },
    )
    learnable_sparsity: bool = field(
        default=True,
        metadata={
            "help": "Only valid for t-softmax. If True, the sparsity parameters will be learned."
            "If False, the sparsity parameters will be fixed to the initial values specified in "
            "`gflu_feature_init_sparsity` and `tree_feature_init_sparsity`"
        },
    )
    _module_src: str = field(default="models.gandalf")
    _model_name: str = field(default="GANDALFModel")
    _backbone_name: str = field(default="GANDALFBackbone")
    _config_name: str = field(default="GANDALFConfig")

    def __post_init__(self):
        assert self.gflu_stages > 0, "gflu_stages should be greater than 0"
        return super().__post_init__()

Bases: ModelConfig

门控加性树集成配置.

Parameters:

Name	Type	Description	Default
`gflu_stages`	`int`	特征抽象层的层数.默认为6	`6`
`gflu_dropout`	`float`	特征抽象层的dropout率.默认为0.0	`0.0`
`tree_depth`	`int`	树的深度.默认为5	`4`
`num_trees`	`int`	集成中使用的树的数量.默认为20	`10`
`binning_activation`	`str`	使用的分箱函数.默认为entmoid.可选值为: [`entmoid`,`sparsemoid`,`sigmoid`].	`'sparsemoid'`
`feature_mask_function`	`str`	使用的特征掩码函数.默认为sparsemax.可选值为: [`entmax`,`sparsemax`,`softmax`].	`'t-softmax'`
`tree_dropout`	`float`	树分箱变换中的dropout概率.默认为0.0	`0.0`
`chain_trees`	`bool`	如果为True,我们将把树串联起来.等同于提升（串联树）或装袋（并行树）.默认为True	`True`
`tree_wise_attention`	`bool`	如果为True,我们将使用树级注意力来组合树.默认为True	`True`
`tree_wise_attention_dropout`	`float`	树级注意力层中的dropout概率.默认为0.0	`0.0`
`share_head_weights`	`bool`	如果为True,我们将共享头部的权重.默认为True	`True`
`task`	`str`	指定问题是回归还是分类.`backbone`是一个任务,它将模型视为生成特征的骨干.主要用于SSL及相关任务.可选值为: [`regression`,`classification`,`backbone`].	required
`head`	`Optional[str]`	模型使用的头部.应为`pytorch_tabular.models.common.heads`中定义的头部之一.默认为LinearHead.可选值为: [`None`,`LinearHead`,`MixtureDensityHead`].	`'LinearHead'`
`head_config`	`Optional[Dict]`	定义头部的配置字典.如果为空,将初始化为默认的线性头部.	`lambda: {'layers': ''}()`
`embedding_dims`	`Optional[List]`	每个分类列的嵌入维度列表,格式为(基数, 嵌入维度).如果为空,将根据分类列的基数推断,规则为min(50, (x + 1) // 2)	`None`
`embedding_dropout`	`float`	应用于分类嵌入的dropout.默认为0.0	`0.0`
`batch_norm_continuous_input`	`bool`	如果为True,我们将通过BatchNorm层对连续层进行归一化.	`True`
`learning_rate`	`float`	模型的学习率.默认为1e-3.	`0.001`
`loss`	`Optional[str]`	应用的损失函数.默认情况下,回归为MSELoss,分类为CrossEntropyLoss.除非你确定自己在做什么,否则请保持为MSELoss或L1Loss用于回归,CrossEntropyLoss用于分类	`None`
`metrics`	`Optional[List[str]]`	训练期间需要跟踪的指标列表.指标应为`torchmetrics`中实现的功能性指标之一.默认情况下,分类为accuracy,回归为mean_squared_error	`None`
`metrics_params`	`Optional[List]`	传递给指标函数的参数.`task`强制为`multiclass`,因为多分类版本可以处理二分类,并且为了简单起见,我们只使用`multiclass`.	`None`
`metrics_prob_input`	`Optional[List]`	是配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为None.	`None`
`target_range`	`Optional[List]`	限制输出变量的范围.目前忽略多目标回归.通常用于回归问题.如果为空,将不应用任何限制	`None`
`seed`	`int`	可重复性的种子.默认为42	`42`

Source code in src/pytorch_tabular/models/gate/config.py

@dataclass
class GatedAdditiveTreeEnsembleConfig(ModelConfig):
    """门控加性树集成配置.

    Parameters:
        gflu_stages (int): 特征抽象层的层数.默认为6

        gflu_dropout (float): 特征抽象层的dropout率.默认为0.0

        tree_depth (int): 树的深度.默认为5

        num_trees (int): 集成中使用的树的数量.默认为20

        binning_activation (str): 使用的分箱函数.默认为entmoid.可选值为: [`entmoid`,`sparsemoid`,`sigmoid`].

        feature_mask_function (str): 使用的特征掩码函数.默认为sparsemax.可选值为: [`entmax`,`sparsemax`,`softmax`].

        tree_dropout (float): 树分箱变换中的dropout概率.默认为0.0

        chain_trees (bool): 如果为True,我们将把树串联起来.等同于提升（串联树）或装袋（并行树）.默认为True

        tree_wise_attention (bool): 如果为True,我们将使用树级注意力来组合树.默认为True

        tree_wise_attention_dropout (float): 树级注意力层中的dropout概率.默认为0.0

        share_head_weights (bool): 如果为True,我们将共享头部的权重.默认为True


        task (str): 指定问题是回归还是分类.`backbone`是一个任务,它将模型视为生成特征的骨干.主要用于SSL及相关任务.可选值为: [`regression`,`classification`,`backbone`].

        head (Optional[str]): 模型使用的头部.应为`pytorch_tabular.models.common.heads`中定义的头部之一.默认为LinearHead.可选值为: [`None`,`LinearHead`,`MixtureDensityHead`].

        head_config (Optional[Dict]): 定义头部的配置字典.如果为空,将初始化为默认的线性头部.

        embedding_dims (Optional[List]): 每个分类列的嵌入维度列表,格式为(基数, 嵌入维度).如果为空,将根据分类列的基数推断,规则为min(50, (x + 1) // 2)

        embedding_dropout (float): 应用于分类嵌入的dropout.默认为0.0

        batch_norm_continuous_input (bool): 如果为True,我们将通过BatchNorm层对连续层进行归一化.

        learning_rate (float): 模型的学习率.默认为1e-3.

        loss (Optional[str]): 应用的损失函数.默认情况下,回归为MSELoss,分类为CrossEntropyLoss.除非你确定自己在做什么,否则请保持为MSELoss或L1Loss用于回归,CrossEntropyLoss用于分类

        metrics (Optional[List[str]]): 训练期间需要跟踪的指标列表.指标应为``torchmetrics``中实现的功能性指标之一.默认情况下,分类为accuracy,回归为mean_squared_error

        metrics_params (Optional[List]): 传递给指标函数的参数.`task`强制为`multiclass`,因为多分类版本可以处理二分类,并且为了简单起见,我们只使用`multiclass`.

        metrics_prob_input (Optional[List]): 是配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为None.

        target_range (Optional[List]): 限制输出变量的范围.目前忽略多目标回归.通常用于回归问题.如果为空,将不应用任何限制

        seed (int): 可重复性的种子.默认为42"""

    gflu_stages: int = field(
        default=6,
        metadata={"help": "Number of layers in the feature abstraction layer. Defaults to 6"},
    )

    gflu_dropout: float = field(
        default=0.0, metadata={"help": "Dropout rate for the feature abstraction layer. Defaults to 0.0"}
    )

    tree_depth: int = field(default=4, metadata={"help": "Depth of the tree. Defaults to 5"})

    num_trees: int = field(
        default=10,
        metadata={"help": "Number of trees to use in the ensemble. Defaults to 20"},
    )

    binning_activation: str = field(
        default="sparsemoid",
        metadata={
            "help": "The binning function to use. Defaults to entmoid. Defaults to entmoid",
            "choices": ["entmoid", "sparsemoid", "sigmoid"],
        },
    )
    feature_mask_function: str = field(
        default="t-softmax",
        metadata={
            "help": "The feature mask function to use. Defaults to entmax",
            "choices": ["entmax", "sparsemax", "softmax", "t-softmax"],
        },
    )
    gflu_feature_init_sparsity: float = field(
        default=0.3,
        metadata={
            "help": "Only valid for t-softmax. The percentage of features to be dropped in "
            "each GFLU stage. This is just initialized and during learning it may change"
        },
    )
    tree_feature_init_sparsity: float = field(
        default=0.8,
        metadata={
            "help": "Only valid for t-softmax. The perecentge of features to be dropped in "
            "each split in the tree. This is just initialized and during learning it may change"
        },
    )
    learnable_sparsity: bool = field(
        default=True,
        metadata={
            "help": "Only valid for t-softmax. If True, the sparsity parameters will be learned."
            "If False, the sparsity parameters will be fixed to the initial values specified in "
            "`gflu_feature_init_sparsity` and `tree_feature_init_sparsity`"
        },
    )

    tree_dropout: float = field(
        default=0.0,
        metadata={"help": "probability of dropout in tree binning transformation. Defaults to 0.0"},
    )
    chain_trees: bool = field(
        default=True,
        metadata={
            "help": "If True, we will chain the trees together."
            " Synonymous to boosting (chaining trees) or bagging (parallel trees). Defaults to True"
        },
    )
    tree_wise_attention: bool = field(
        default=True,
        metadata={"help": "If True, we will use tree wise attention to combine trees. Defaults to True"},
    )
    tree_wise_attention_dropout: float = field(
        default=0.0,
        metadata={"help": "probability of dropout in the tree wise attention layer. Defaults to 0.0"},
    )
    share_head_weights: bool = field(
        default=True,
        metadata={"help": "If True, we will share the weights between the heads. Defaults to True"},
    )

    _module_src: str = field(default="models.gate")
    _model_name: str = field(default="GatedAdditiveTreeEnsembleModel")
    _backbone_name: str = field(default="GatedAdditiveTreesBackbone")
    _config_name: str = field(default="GatedAdditiveTreeEnsembleConfig")

    def __post_init__(self):
        assert self.tree_depth > 0, "tree_depth should be greater than 0"
        # Either gflu_stages or num_trees should be greater than 0
        assert self.num_trees > 0, (
            "`num_trees` must be greater than 0." "If you want a lighter model which performs better, use GANDALF."
        )
        super().__post_init__()

Bases: ModelConfig

MDN配置.

Parameters:

Name	Type	Description	Default
`backbone_config_class`	`str`	用于定义Backbone的配置类.配置类应为`models`中的有效模块路径,例如`FTTransformerConfig`.	`None`
`backbone_config_params`	`Dict`	用于定义Backbone的配置参数字典.	`None`
`task`	`str`	指定问题是回归还是分类.`backbone`是一种将模型视为生成特征的Backbone的任务.主要用于SSL及相关任务.可选值为:[`regression`, `classification`, `backbone`].	required
`head`	`str`		`'LinearHead'`
`head_config`	`Dict`	用于定义混合密度网络头部的配置.	`None`
`embedding_dims`	`Optional[List]`	每个分类列的嵌入维度,以元组列表形式表示（基数,嵌入维度）.如果留空,将根据分类列的基数推断,使用规则min(50, (x + 1) // 2).	`None`
`embedding_dropout`	`float`	应用于分类嵌入的Dropout.默认为0.0.	`0.0`
`batch_norm_continuous_input`	`bool`	如果为True,将通过BatchNorm层对连续层进行归一化.	`True`
`learning_rate`	`float`	模型的学习率.默认为1e-3.	`0.001`
`loss`	`Optional[str]`	要应用的损失函数.默认情况下,回归为MSELoss,分类为CrossEntropyLoss.除非你确定自己在做什么,否则请保留为MSELoss或L1Loss用于回归,CrossEntropyLoss用于分类.	`None`
`metrics`	`Optional[List[str]]`	训练期间需要跟踪的指标列表.指标应为`torchmetrics`中实现的功能性指标之一.默认情况下,分类为accuracy,回归为mean_squared_error.	`None`
`metrics_params`	`Optional[List]`	传递给指标函数的参数.`task`强制为`multiclass`,因为多分类版本可以处理二分类,并且为了简化,我们仅使用`multiclass`.	`None`
`metrics_prob_input`	`Optional[List]`	是配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为None.	`None`
`target_range`	`Optional[List]`	限制输出变量的范围.当前在多目标回归中被忽略.通常用于回归问题.如果留空,将不应用任何限制.	`None`
`seed`	`int`	用于可重复性的种子.默认为42.	`42`

Source code in src/pytorch_tabular/models/mixture_density/config.py

@dataclass
class MDNConfig(ModelConfig):
    """MDN配置.

    Parameters:
        backbone_config_class (str): 用于定义Backbone的配置类.配置类应为`models`中的有效模块路径,例如`FTTransformerConfig`.

        backbone_config_params (Dict): 用于定义Backbone的配置参数字典.

        task (str): 指定问题是回归还是分类.`backbone`是一种将模型视为生成特征的Backbone的任务.主要用于SSL及相关任务.可选值为:[`regression`, `classification`, `backbone`].

        head (str):

        head_config (Dict): 用于定义混合密度网络头部的配置.

        embedding_dims (Optional[List]): 每个分类列的嵌入维度,以元组列表形式表示（基数,嵌入维度）.如果留空,将根据分类列的基数推断,使用规则min(50, (x + 1) // 2).

        embedding_dropout (float): 应用于分类嵌入的Dropout.默认为0.0.

        batch_norm_continuous_input (bool): 如果为True,将通过BatchNorm层对连续层进行归一化.

        learning_rate (float): 模型的学习率.默认为1e-3.

        loss (Optional[str]): 要应用的损失函数.默认情况下,回归为MSELoss,分类为CrossEntropyLoss.除非你确定自己在做什么,否则请保留为MSELoss或L1Loss用于回归,CrossEntropyLoss用于分类.

        metrics (Optional[List[str]]): 训练期间需要跟踪的指标列表.指标应为``torchmetrics``中实现的功能性指标之一.默认情况下,分类为accuracy,回归为mean_squared_error.

        metrics_params (Optional[List]): 传递给指标函数的参数.`task`强制为`multiclass`,因为多分类版本可以处理二分类,并且为了简化,我们仅使用`multiclass`.

        metrics_prob_input (Optional[List]): 是配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为None.

        target_range (Optional[List]): 限制输出变量的范围.当前在多目标回归中被忽略.通常用于回归问题.如果留空,将不应用任何限制.

        seed (int): 用于可重复性的种子.默认为42."""

    backbone_config_class: str = field(
        default=None,
        metadata={
            "help": "The config class for defining the Backbone."
            " The config class should be a valid module path from `models`. e.g. `FTTransformerConfig`"
        },
    )
    backbone_config_params: Dict = field(
        default=None,
        metadata={"help": "The dict of config parameters for defining the Backbone."},
    )
    head: str = field(init=False, default="MixtureDensityHead")
    head_config: Dict = field(
        default=None,
        metadata={"help": "The config for defining the Mixed Density Network Head"},
    )
    _module_src: str = field(default="models.mixture_density")
    _model_name: str = field(default="MDNModel")
    _config_name: str = field(default="MDNConfig")
    _probabilistic: bool = field(default=True)

    def __post_init__(self):
        assert (
            self.backbone_config_class not in INCOMPATIBLE_BACKBONES
        ), f"{self.backbone_config_class} is not a supported backbone for MDN head"
        assert self.head == "MixtureDensityHead"
        return super().__post_init__()

Bases: ModelConfig

神经遗忘决策集成用于表格数据的深度学习配置.

Parameters:

Name	Type	Description	Default
`num_layers`	`int`	密集架构中遗忘决策树层的数量	`1`
`num_trees`	`int`	每层中遗忘决策树的数量	`2048`
`additional_tree_output_dim`	`int`	仅用于在架构的不同层之间传递的额外输出维度.只有前 output_dim 个输出将用于预测	`3`
`depth`	`int`	单个遗忘决策树的深度	`6`
`choice_function`	`str`	生成稀疏概率分布以用作特征权重（即软特征选择）.可选值为:[`entmax15`,`sparsemax`]	`'entmax15'`
`bin_function`	`str`	生成稀疏概率分布以用作树叶子权重.可选值为:[`entmoid15`,`sparsemoid`]	`'entmoid15'`
`max_features`	`Optional[int]`	如果不为 None,则设置在密集架构中从一层传递到下一层的特征数量的最大限制	`None`
`input_dropout`	`float`	在密集架构的层之间应用于输入的 Dropout	`0.0`
`initialize_response`	`str`	初始化遗忘决策树中的响应变量.默认情况下,它是标准正态分布.可选值为:[`normal`,`uniform`]	`'normal'`
`initialize_selection_logits`	`str`	初始化特征选择器.默认情况下,是特征上的均匀分布.可选值为:[`uniform`,`normal`]	`'uniform'`
`threshold_init_beta`	`float`	用于数据感知初始化阈值,其中阈值随机初始化（使用 beta 分布）为第一个批次中的特征值.它将阈值初始化为数据点的 q-th 分位数,其中 q ~ Beta(:threshold_init_beta:, :threshold_init_beta:).如果此参数设置为 1,初始阈值将具有与数据点相同的分布;如果大于 1（例如 10）,阈值将更接近中位数数据值;如果小于 1（例如 0.1）,阈值将接近最小/最大数据值	`1.0`
`threshold_init_cutoff`	`float`	用于数据感知初始化尺度（用于缩放 ODTs）.它以这样的方式初始化,使得第一个批次中的所有样本都属于 entmoid/sparsemoid（二进制选择器）的线性区域,从而具有非零梯度.阈值对数温度初始化器,在 (0, inf) 范围内.默认情况下（1.0）,对数温度以这样的方式初始化,使得所有二进制选择器最终都位于稀疏-sigmoid 的线性区域.然后温度由该参数缩放.设置此值 > 1.0 将在数据点和稀疏-sigmoid 截止值之间产生一些余量;设置此值 < 1.0 将导致 (1 - 值) 部分数据点最终位于稀疏-sigmoid 的平坦区域.例如,threshold_init_cutoff = 0.9 将设置 10% 的点等于 0.0 或 1.0.设置此值 > 1.0 将在数据点和稀疏-sigmoid 截止值之间产生余量.所有点将介于 (0.5 - 0.5 / threshold_init_cutoff) 和 (0.5 + 0.5 / threshold_init_cutoff) 之间	`1.0`
`task`	`str`	指定问题是回归还是分类.`backbone` 是一种将模型视为生成特征的骨干的任务.主要用于内部 SSL 及相关任务.可选值为:[`regression`,`classification`,`backbone`]	required
`head`	`Optional[str]`	模型使用的头部.应为 `pytorch_tabular.models.common.heads` 中定义的头部之一.默认为 LinearHead.可选值为:[`None`,`LinearHead`,`MixtureDensityHead`]	`None`
`head_config`	`Optional[Dict]`	定义头部的配置字典.如果留空,将初始化为默认的线性头部	`lambda: {'layers': ''}()`
`embedding_dims`	`Optional[List]`	每个分类列的嵌入维度列表,格式为 (基数, 嵌入维度).如果留空,将根据分类列的基数推断,使用规则 min(50, (x + 1) // 2)	`None`
`embedding_dropout`	`float`	应用于分类嵌入的 Dropout.默认为 0.0	`0.0`
`batch_norm_continuous_input`	`bool`	如果为 True,我们将通过 BatchNorm 层对连续层进行归一化	`True`
`learning_rate`	`float`	模型的学习率.默认为 1e-3	`0.001`
`loss`	`Optional[str]`	要应用的损失函数.默认情况下,回归为 MSELoss,分类为 CrossEntropyLoss.除非你确定自己在做什么,否则请保留为 MSELoss 或 L1Loss 用于回归,CrossEntropyLoss 用于分类	`None`
`metrics`	`Optional[List[str]]`	训练期间需要跟踪的指标列表.指标应为 `torchmetrics` 中实现的功能性指标之一.默认情况下,分类为 accuracy,回归为 mean_squared_error	`None`
`metrics_params`	`Optional[List]`	传递给指标函数的参数.`task` 强制为 `multiclass`,因为多类版本可以处理二进制分类,并且为了简单起见,我们仅使用 `multiclass`	`None`
`metrics_prob_input`	`Optional[List]`	是配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为 None	`None`
`target_range`	`Optional[List]`	我们应该限制输出变量的范围.当前在多目标回归中被忽略.通常用于回归问题.如果留空,将不应用任何限制	`None`
`seed`	`int`	用于可重复性的种子.默认为 42	`42`

Source code in src/pytorch_tabular/models/node/config.py

@dataclass
class NodeConfig(ModelConfig):
    """神经遗忘决策集成用于表格数据的深度学习配置.

    Parameters:
        num_layers (int): 密集架构中遗忘决策树层的数量

        num_trees (int): 每层中遗忘决策树的数量

        additional_tree_output_dim (int): 仅用于在架构的不同层之间传递的额外输出维度.只有前 output_dim 个输出将用于预测

        depth (int): 单个遗忘决策树的深度

        choice_function (str): 生成稀疏概率分布以用作特征权重（即软特征选择）.可选值为:[`entmax15`,`sparsemax`]

        bin_function (str): 生成稀疏概率分布以用作树叶子权重.可选值为:[`entmoid15`,`sparsemoid`]

        max_features (Optional[int]): 如果不为 None,则设置在密集架构中从一层传递到下一层的特征数量的最大限制

        input_dropout (float): 在密集架构的层之间应用于输入的 Dropout

        initialize_response (str): 初始化遗忘决策树中的响应变量.默认情况下,它是标准正态分布.可选值为:[`normal`,`uniform`]

        initialize_selection_logits (str): 初始化特征选择器.默认情况下,是特征上的均匀分布.可选值为:[`uniform`,`normal`]

        threshold_init_beta (float): 用于数据感知初始化阈值,其中阈值随机初始化（使用 beta 分布）为第一个批次中的特征值.它将阈值初始化为数据点的 q-th 分位数,其中 q ~ Beta(:threshold_init_beta:, :threshold_init_beta:).如果此参数设置为 1,初始阈值将具有与数据点相同的分布;如果大于 1（例如 10）,阈值将更接近中位数数据值;如果小于 1（例如 0.1）,阈值将接近最小/最大数据值

        threshold_init_cutoff (float): 用于数据感知初始化尺度（用于缩放 ODTs）.它以这样的方式初始化,使得第一个批次中的所有样本都属于 entmoid/sparsemoid（二进制选择器）的线性区域,从而具有非零梯度.阈值对数温度初始化器,在 (0, inf) 范围内.默认情况下（1.0）,对数温度以这样的方式初始化,使得所有二进制选择器最终都位于稀疏-sigmoid 的线性区域.然后温度由该参数缩放.设置此值 > 1.0 将在数据点和稀疏-sigmoid 截止值之间产生一些余量;设置此值 < 1.0 将导致 (1 - 值) 部分数据点最终位于稀疏-sigmoid 的平坦区域.例如,threshold_init_cutoff = 0.9 将设置 10% 的点等于 0.0 或 1.0.设置此值 > 1.0 将在数据点和稀疏-sigmoid 截止值之间产生余量.所有点将介于 (0.5 - 0.5 / threshold_init_cutoff) 和 (0.5 + 0.5 / threshold_init_cutoff) 之间

        task (str): 指定问题是回归还是分类.`backbone` 是一种将模型视为生成特征的骨干的任务.主要用于内部 SSL 及相关任务.可选值为:[`regression`,`classification`,`backbone`]

        head (Optional[str]): 模型使用的头部.应为 `pytorch_tabular.models.common.heads` 中定义的头部之一.默认为 LinearHead.可选值为:[`None`,`LinearHead`,`MixtureDensityHead`]

        head_config (Optional[Dict]): 定义头部的配置字典.如果留空,将初始化为默认的线性头部

        embedding_dims (Optional[List]): 每个分类列的嵌入维度列表,格式为 (基数, 嵌入维度).如果留空,将根据分类列的基数推断,使用规则 min(50, (x + 1) // 2)

        embedding_dropout (float): 应用于分类嵌入的 Dropout.默认为 0.0

        batch_norm_continuous_input (bool): 如果为 True,我们将通过 BatchNorm 层对连续层进行归一化

        learning_rate (float): 模型的学习率.默认为 1e-3

        loss (Optional[str]): 要应用的损失函数.默认情况下,回归为 MSELoss,分类为 CrossEntropyLoss.除非你确定自己在做什么,否则请保留为 MSELoss 或 L1Loss 用于回归,CrossEntropyLoss 用于分类

        metrics (Optional[List[str]]): 训练期间需要跟踪的指标列表.指标应为 ``torchmetrics`` 中实现的功能性指标之一.默认情况下,分类为 accuracy,回归为 mean_squared_error

        metrics_params (Optional[List]): 传递给指标函数的参数.`task` 强制为 `multiclass`,因为多类版本可以处理二进制分类,并且为了简单起见,我们仅使用 `multiclass`

        metrics_prob_input (Optional[List]): 是配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为 None

        target_range (Optional[List]): 我们应该限制输出变量的范围.当前在多目标回归中被忽略.通常用于回归问题.如果留空,将不应用任何限制

        seed (int): 用于可重复性的种子.默认为 42"""

    num_layers: int = field(
        default=1,
        metadata={"help": "Number of Oblivious Decision Tree Layers in the Dense Architecture"},
    )
    num_trees: int = field(
        default=2048,
        metadata={"help": "Number of Oblivious Decision Trees in each layer"},
    )
    additional_tree_output_dim: int = field(
        default=3,
        metadata={
            "help": "The additional output dimensions which is only used to pass through different layers"
            " of the architectures. Only the first output_dim outputs will be used for prediction"
        },
    )
    depth: int = field(
        default=6,
        metadata={"help": "The depth of the individual Oblivious Decision Trees"},
    )
    choice_function: str = field(
        default="entmax15",
        metadata={
            "help": "Generates a sparse probability distribution to be used"
            " as feature weights(aka, soft feature selection)",
            "choices": ["entmax15", "sparsemax"],
        },
    )
    bin_function: str = field(
        default="entmoid15",
        metadata={
            "help": "Generates a sparse probability distribution to be used as tree leaf weights",
            "choices": ["entmoid15", "sparsemoid"],
        },
    )
    max_features: Optional[int] = field(
        default=None,
        metadata={
            "help": "If not None, sets a max limit on the number of features to be carried forward"
            " from layer to layer in the Dense Architecture"
        },
    )
    input_dropout: float = field(
        default=0.0,
        metadata={"help": "Dropout to be applied to the inputs between layers of the Dense Architecture"},
    )
    initialize_response: str = field(
        default="normal",
        metadata={
            "help": "Initializing the response variable in the Oblivious Decision Trees."
            " By default, it is a standard normal distribution",
            "choices": ["normal", "uniform"],
        },
    )
    initialize_selection_logits: str = field(
        default="uniform",
        metadata={
            "help": "Initializing the feature selector. By default is a uniform distribution across the features",
            "choices": ["uniform", "normal"],
        },
    )
    threshold_init_beta: float = field(
        default=1.0,
        metadata={
            "help": """
                Used in the Data-aware initialization of thresholds where the threshold is initialized randomly
                (with a beta distribution) to feature values in the first batch.
                It initializes threshold to a q-th quantile of data points.
                where q ~ Beta(:threshold_init_beta:, :threshold_init_beta:)
                If this param is set to 1, initial thresholds will have the same distribution as data points
                If greater than 1 (e.g. 10), thresholds will be closer to median data value
                If less than 1 (e.g. 0.1), thresholds will approach min/max data values.
            """
        },
    )
    threshold_init_cutoff: float = field(
        default=1.0,
        metadata={
            "help": """
                Used in the Data-aware initialization of scales(used in the scaling ODTs).
                It is initialized in such a way that all the samples in the first batch belong to the linear
                region of the entmoid/sparsemoid(bin-selectors) and thereby have non-zero gradients
                Threshold log-temperatures initializer, in (0, inf)
                By default(1.0), log-temperatures are initialized in such a way that all bin selectors
                end up in the linear region of sparse-sigmoid. The temperatures are then scaled by this parameter.
                Setting this value > 1.0 will result in some margin between data points and sparse-sigmoid cutoff value
                Setting this value < 1.0 will cause (1 - value) part of data points to end up in flat sparse-sigmoid
                region. For instance, threshold_init_cutoff = 0.9 will set 10% points equal to 0.0 or 1.0
                Setting this value > 1.0 will result in a margin between data points and sparse-sigmoid cutoff value
                All points will be between (0.5 - 0.5 / threshold_init_cutoff) and (0.5 + 0.5 / threshold_init_cutoff)
            """
        },
    )

    head: Optional[str] = field(
        default=None,
    )

    _module_src: str = field(default="models.node")
    _model_name: str = field(default="NODEModel")
    _backbone_name: str = field(default="NODEBackbone")
    _config_name: str = field(default="NodeConfig")

    def __post_init__(self):
        if self.head is not None:
            warnings.warn(
                "`head` and `head_config` is ignored as NODE has a specific"
                " head which subsets the tree outputs. Set `head=None`"
                " to turn off the warning"
            )
        else:
            # Setting Head to LinearHead for compatibility
            self.head = "LinearHead"
        return super().__post_init__()

Bases: ModelConfig

TabNet: 注意力可解释表格学习配置

Parameters:

Name	Type	Description	Default
`n_d`	`int`	预测层的维度（通常在4到64之间）	`8`
`n_a`	`int`	注意力层的维度（通常在4到64之间）	`8`
`n_steps`	`int`	网络中连续步骤的数量（通常在3到10之间）	`3`
`gamma`	`float`	大于1的浮点数,注意力更新的缩放因子（通常在1.0到2.0之间）	`1.3`
`n_independent`	`int`	每个GLU块中独立GLU层的数量（默认2）	`2`
`n_shared`	`int`	每个GLU块中独立GLU层的数量（默认2）	`2`
`virtual_batch_size`	`int`	Ghost Batch Normalization的批次大小	`128`
`mask_type`	`str`	使用的掩码函数,可以是'sparsemax'或'entmax'.选择包括: [`sparsemax`,`entmax`].	`'sparsemax'`
`task`	`str`	指定问题是回归还是分类.`backbone`是一种任务,将模型视为生成特征的骨干.主要用于内部SSL及相关任务.选择包括: [`regression`,`classification`,`backbone`].	required
`head`	`Optional[str]`	模型使用的头部.应为`pytorch_tabular.models.common.heads`中定义的头部之一.默认为LinearHead.选择包括: [`None`,`LinearHead`,`MixtureDensityHead`].	`'LinearHead'`
`head_config`	`Optional[Dict]`	定义头部的配置字典.如果留空,将初始化为默认的线性头部.	`lambda: {'layers': ''}()`
`embedding_dims`	`Optional[List]`	每个分类列的嵌入维度列表,格式为(基数, 嵌入维度).如果留空,将根据分类列的基数推断,规则为min(50, (x + 1) // 2)	`None`
`embedding_dropout`	`float`	应用于分类嵌入的丢弃率.默认为0.0	`0.0`
`batch_norm_continuous_input`	`bool`	如果为True,将通过BatchNorm层对连续层进行归一化.	`True`
`learning_rate`	`float`	模型的学习率.默认为1e-3.	`0.001`
`loss`	`Optional[str]`	应用的损失函数.默认情况下,回归为MSELoss,分类为CrossEntropyLoss.除非你确定自己在做什么,否则请保留为MSELoss或L1Loss用于回归,CrossEntropyLoss用于分类	`None`
`metrics`	`Optional[List[str]]`	训练期间需要跟踪的指标列表.指标应为`torchmetrics`中实现的功能性指标之一.默认情况下,分类为accuracy,回归为mean_squared_error	`None`
`metrics_params`	`Optional[List]`	传递给指标函数的参数.`task`强制为`multiclass`,因为多分类版本可以处理二分类,并且为了简单起见,我们仅使用`multiclass`.	`None`
`metrics_prob_input`	`Optional[List]`	是配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为None.	`None`
`target_range`	`Optional[List]`	应限制输出变量的范围.当前忽略多目标回归.通常用于回归问题.如果留空,将不应用任何限制	`None`
`seed`	`int`	可重复性的种子.默认为42	`42`

Source code in src/pytorch_tabular/models/tabnet/config.py

@dataclass
class TabNetModelConfig(ModelConfig):
    """TabNet: 注意力可解释表格学习配置

    Parameters:
        n_d (int): 预测层的维度（通常在4到64之间）

        n_a (int): 注意力层的维度（通常在4到64之间）

        n_steps (int): 网络中连续步骤的数量（通常在3到10之间）

        gamma (float): 大于1的浮点数,注意力更新的缩放因子（通常在1.0到2.0之间）

        n_independent (int): 每个GLU块中独立GLU层的数量（默认2）

        n_shared (int): 每个GLU块中独立GLU层的数量（默认2）

        virtual_batch_size (int): Ghost Batch Normalization的批次大小

        mask_type (str): 使用的掩码函数,可以是'sparsemax'或'entmax'.选择包括:
                [`sparsemax`,`entmax`].

        task (str): 指定问题是回归还是分类.`backbone`是一种任务,将模型视为生成特征的骨干.主要用于内部SSL及相关任务.选择包括:
                [`regression`,`classification`,`backbone`].

        head (Optional[str]): 模型使用的头部.应为`pytorch_tabular.models.common.heads`中定义的头部之一.默认为LinearHead.选择包括:
                [`None`,`LinearHead`,`MixtureDensityHead`].

        head_config (Optional[Dict]): 定义头部的配置字典.如果留空,将初始化为默认的线性头部.

        embedding_dims (Optional[List]): 每个分类列的嵌入维度列表,格式为(基数, 嵌入维度).如果留空,将根据分类列的基数推断,规则为min(50, (x + 1) // 2)

        embedding_dropout (float): 应用于分类嵌入的丢弃率.默认为0.0

        batch_norm_continuous_input (bool): 如果为True,将通过BatchNorm层对连续层进行归一化.

        learning_rate (float): 模型的学习率.默认为1e-3.

        loss (Optional[str]): 应用的损失函数.默认情况下,回归为MSELoss,分类为CrossEntropyLoss.除非你确定自己在做什么,否则请保留为MSELoss或L1Loss用于回归,CrossEntropyLoss用于分类

        metrics (Optional[List[str]]): 训练期间需要跟踪的指标列表.指标应为``torchmetrics``中实现的功能性指标之一.默认情况下,分类为accuracy,回归为mean_squared_error

        metrics_params (Optional[List]): 传递给指标函数的参数.`task`强制为`multiclass`,因为多分类版本可以处理二分类,并且为了简单起见,我们仅使用`multiclass`.

        metrics_prob_input (Optional[List]): 是配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为None.

        target_range (Optional[List]): 应限制输出变量的范围.当前忽略多目标回归.通常用于回归问题.如果留空,将不应用任何限制

        seed (int): 可重复性的种子.默认为42"""

    n_d: int = field(
        default=8,
        metadata={"help": "Dimension of the prediction  layer (usually between 4 and 64)"},
    )
    n_a: int = field(
        default=8,
        metadata={"help": "Dimension of the attention  layer (usually between 4 and 64)"},
    )
    n_steps: int = field(
        default=3,
        metadata={"help": ("Number of successive steps in the network (usually between 3 and 10)")},
    )
    gamma: float = field(
        default=1.3,
        metadata={"help": ("Float above 1, scaling factor for attention updates (usually between" " 1.0 to 2.0)")},
    )
    n_independent: int = field(
        default=2,
        metadata={"help": "Number of independent GLU layer in each GLU block (default 2)"},
    )
    n_shared: int = field(
        default=2,
        metadata={"help": "Number of independent GLU layer in each GLU block (default 2)"},
    )
    virtual_batch_size: int = field(
        default=128,
        metadata={"help": "Batch size for Ghost Batch Normalization"},
    )
    mask_type: str = field(
        default="sparsemax",
        metadata={
            "help": ("Either 'sparsemax' or 'entmax' : this is the masking function to use"),
            "choices": ["sparsemax", "entmax"],
        },
    )
    grouped_features: Optional[List[List[str]]] = field(
        default=None,
        metadata={
            "help": (
                "List of list of feature names to be grouped together. This allows the"
                " model to share it's attention accross feature inside a same group."
                " This can be especially useful when your preprocessing generates"
                " correlated or dependant features: like if you use a TF-IDF or a PCA"
                " on a text column. Note that feature importance will be exactly the"
                " same between features on a same group. Please also note that"
                " embeddings generated for a categorical variable are always inside a"
                " same group."
            )
        },
    )
    _module_src: str = field(default="models.tabnet")
    _model_name: str = field(default="TabNetModel")
    _config_name: str = field(default="TabNetModelConfig")

Bases: ModelConfig

Tab Transformer 配置.

Parameters:

Name	Type	Description	Default
`input_embed_dim`	`int`	输入分类特征的嵌入维度.默认为 32	`32`
`embedding_initialization`	`Optional[str]`	嵌入层的初始化方案.默认为 `kaiming`.可选值为: [`kaiming_uniform`,`kaiming_normal`].	`'kaiming_uniform'`
`embedding_bias`	`bool`	是否开启嵌入偏置的标志.默认为 False	`False`
`share_embedding`	`bool`	在输入嵌入过程中开启共享嵌入的标志.其核心思想是为整个特征及其每个唯一值分别设置嵌入.更多详情请参阅 TabTransformer 论文的附录 A.默认为 False	`False`
`share_embedding_strategy`	`Optional[str]`	添加共享嵌入有两种策略.1. `add` - 为特征添加一个独立的嵌入,并将其与特征唯一值的嵌入相加.2. `fraction` - 输入嵌入的一部分保留给特征的共享嵌入.默认为 fraction.可选值为: [`add`,`fraction`].	`'fraction'`
`shared_embedding_fraction`	`float`	共享嵌入保留的 input_embed_dim 的比例.应小于 1.默认为 0.25	`0.25`
`num_heads`	`int`	多头注意力层中的头数.默认为 8	`8`
`num_attn_blocks`	`int`	堆叠的多头注意力层的层数.默认为 6	`6`
`transformer_head_dim`	`Optional[int]`	多头注意力层中的隐藏单元数.默认为 None,将与 input_dim 相同.	`None`
`attn_dropout`	`float`	多头注意力后应用的 dropout.默认为 0.1	`0.1`
`add_norm_dropout`	`float`	AddNorm 层中应用的 dropout.默认为 0.1	`0.1`
`ff_dropout`	`float`	逐位置前馈网络中应用的 dropout.默认为 0.1	`0.1`
`ff_hidden_multiplier`	`int`	逐位置前馈层对输入的缩放倍数.默认为 4	`4`
`transformer_activation`	`str`	变换器前馈层中的激活类型.除了 PyTorch 中的默认激活函数如 ReLU、TanH、LeakyReLU 等（https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity）,还实现了 GEGLU、ReGLU 和 SwiGLU（https://arxiv.org/pdf/2002.05202.pdf）.默认为 GEGLU	`'GEGLU'`
`task`	`str`	指定问题是回归还是分类.`backbone` 是一种将模型视为生成特征的主干的任务.主要用于 SSL 及相关任务.可选值为: [`regression`,`classification`,`backbone`].	required
`head`	`Optional[str]`	模型使用的头部.应为 `pytorch_tabular.models.common.heads` 中定义的头部之一.默认为 LinearHead.可选值为: [`None`,`LinearHead`,`MixtureDensityHead`].	`'LinearHead'`
`head_config`	`Optional[Dict]`	定义头部的配置字典.如果留空,将初始化为默认的线性头部.	`lambda: {'layers': ''}()`
`embedding_dims`	`Optional[List]`	每个分类列的嵌入维度列表,格式为 (基数, 嵌入维度).如果留空,将根据分类列的基数推断,规则为 min(50, (x + 1) // 2)	`None`
`embedding_dropout`	`float`	分类嵌入中应用的 dropout.默认为 0.0	`0.0`
`batch_norm_continuous_input`	`bool`	如果为 True,将通过 BatchNorm 层对连续层进行归一化.	`True`
`learning_rate`	`float`	模型的学习率.默认为 1e-3.	`0.001`
`loss`	`Optional[str]`	应用的损失函数.默认情况下,回归为 MSELoss,分类为 CrossEntropyLoss.除非你确定自己在做什么,否则请保持为 MSELoss 或 L1Loss 用于回归,CrossEntropyLoss 用于分类	`None`
`metrics`	`Optional[List[str]]`	训练期间需要跟踪的指标列表.指标应为 `torchmetrics` 中实现的功能性指标之一.默认情况下,分类为 accuracy,回归为 mean_squared_error	`None`
`metrics_params`	`Optional[List]`	传递给指标函数的参数.`task` 强制为 `multiclass`,因为多分类版本可以处理二分类,并且为了简化,我们仅使用 `multiclass`.	`None`
`metrics_prob_input`	`Optional[List]`	是配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为 None.	`None`
`target_range`	`Optional[List]`	限制输出变量的范围.目前多目标回归中忽略.通常用于回归问题.如果留空,将不应用任何限制	`None`
`seed`	`int`	用于可重复性的种子.默认为 42	`42`

Source code in src/pytorch_tabular/models/tab_transformer/config.py

@dataclass
class TabTransformerConfig(ModelConfig):
    """Tab Transformer 配置.

    Parameters:
        input_embed_dim (int): 输入分类特征的嵌入维度.默认为 32

        embedding_initialization (Optional[str]): 嵌入层的初始化方案.默认为 `kaiming`.可选值为: [`kaiming_uniform`,`kaiming_normal`].

        embedding_bias (bool): 是否开启嵌入偏置的标志.默认为 False

        share_embedding (bool): 在输入嵌入过程中开启共享嵌入的标志.其核心思想是为整个特征及其每个唯一值分别设置嵌入.更多详情请参阅 TabTransformer 论文的附录 A.默认为 False

        share_embedding_strategy (Optional[str]): 添加共享嵌入有两种策略.1. `add` - 为特征添加一个独立的嵌入,并将其与特征唯一值的嵌入相加.2. `fraction` - 输入嵌入的一部分保留给特征的共享嵌入.默认为 fraction.可选值为: [`add`,`fraction`].

        shared_embedding_fraction (float): 共享嵌入保留的 input_embed_dim 的比例.应小于 1.默认为 0.25

        num_heads (int): 多头注意力层中的头数.默认为 8

        num_attn_blocks (int): 堆叠的多头注意力层的层数.默认为 6

        transformer_head_dim (Optional[int]): 多头注意力层中的隐藏单元数.默认为 None,将与 input_dim 相同.

        attn_dropout (float): 多头注意力后应用的 dropout.默认为 0.1

        add_norm_dropout (float): AddNorm 层中应用的 dropout.默认为 0.1

        ff_dropout (float): 逐位置前馈网络中应用的 dropout.默认为 0.1

        ff_hidden_multiplier (int): 逐位置前馈层对输入的缩放倍数.默认为 4

        transformer_activation (str): 变换器前馈层中的激活类型.除了 PyTorch 中的默认激活函数如 ReLU、TanH、LeakyReLU 等（https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity）,还实现了 GEGLU、ReGLU 和 SwiGLU（https://arxiv.org/pdf/2002.05202.pdf）.默认为 GEGLU

        task (str): 指定问题是回归还是分类.`backbone` 是一种将模型视为生成特征的主干的任务.主要用于 SSL 及相关任务.可选值为: [`regression`,`classification`,`backbone`].

        head (Optional[str]): 模型使用的头部.应为 `pytorch_tabular.models.common.heads` 中定义的头部之一.默认为 LinearHead.可选值为: [`None`,`LinearHead`,`MixtureDensityHead`].

        head_config (Optional[Dict]): 定义头部的配置字典.如果留空,将初始化为默认的线性头部.

        embedding_dims (Optional[List]): 每个分类列的嵌入维度列表,格式为 (基数, 嵌入维度).如果留空,将根据分类列的基数推断,规则为 min(50, (x + 1) // 2)

        embedding_dropout (float): 分类嵌入中应用的 dropout.默认为 0.0

        batch_norm_continuous_input (bool): 如果为 True,将通过 BatchNorm 层对连续层进行归一化.

        learning_rate (float): 模型的学习率.默认为 1e-3.

        loss (Optional[str]): 应用的损失函数.默认情况下,回归为 MSELoss,分类为 CrossEntropyLoss.除非你确定自己在做什么,否则请保持为 MSELoss 或 L1Loss 用于回归,CrossEntropyLoss 用于分类

        metrics (Optional[List[str]]): 训练期间需要跟踪的指标列表.指标应为 ``torchmetrics`` 中实现的功能性指标之一.默认情况下,分类为 accuracy,回归为 mean_squared_error

        metrics_params (Optional[List]): 传递给指标函数的参数.`task` 强制为 `multiclass`,因为多分类版本可以处理二分类,并且为了简化,我们仅使用 `multiclass`.

        metrics_prob_input (Optional[List]): 是配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为 None.

        target_range (Optional[List]): 限制输出变量的范围.目前多目标回归中忽略.通常用于回归问题.如果留空,将不应用任何限制

        seed (int): 用于可重复性的种子.默认为 42"""

    input_embed_dim: int = field(
        default=32,
        metadata={"help": "The embedding dimension for the input categorical features. Defaults to 32"},
    )
    embedding_initialization: Optional[str] = field(
        default="kaiming_uniform",
        metadata={
            "help": "Initialization scheme for the embedding layers. Defaults to `kaiming`",
            "choices": ["kaiming_uniform", "kaiming_normal"],
        },
    )
    embedding_bias: bool = field(
        default=False,
        metadata={"help": "Flag to turn on Embedding Bias. Defaults to False"},
    )
    share_embedding: bool = field(
        default=False,
        metadata={
            "help": "The flag turns on shared embeddings in the input embedding process."
            " The key idea here is to have an embedding for the feature as a whole along with embeddings"
            " of each unique values of that column. For more details refer"
            " to Appendix A of the TabTransformer paper. Defaults to False"
        },
    )
    share_embedding_strategy: Optional[str] = field(
        default="fraction",
        metadata={
            "help": "There are two strategies in adding shared embeddings."
            " 1. `add` - A separate embedding for the feature is added to the embedding"
            " of the unique values of the feature."
            " 2. `fraction` - A fraction of the input embedding is reserved"
            " for the shared embedding of the feature. Defaults to fraction.",
            "choices": ["add", "fraction"],
        },
    )
    shared_embedding_fraction: float = field(
        default=0.25,
        metadata={
            "help": "Fraction of the input_embed_dim to be reserved by the shared embedding."
            " Should be less than one. Defaults to 0.25"
        },
    )
    num_heads: int = field(
        default=8,
        metadata={"help": "The number of heads in the Multi-Headed Attention layer. Defaults to 8"},
    )
    num_attn_blocks: int = field(
        default=6,
        metadata={"help": "The number of layers of stacked Multi-Headed Attention layers. Defaults to 6"},
    )
    transformer_head_dim: Optional[int] = field(
        default=None,
        metadata={
            "help": "The number of hidden units in the Multi-Headed Attention layers."
            " Defaults to None and will be same as input_dim."
        },
    )
    attn_dropout: float = field(
        default=0.1,
        metadata={"help": "Dropout to be applied after Multi headed Attention. Defaults to 0.1"},
    )
    add_norm_dropout: float = field(
        default=0.1,
        metadata={"help": "Dropout to be applied in the AddNorm Layer. Defaults to 0.1"},
    )
    ff_dropout: float = field(
        default=0.1,
        metadata={"help": "Dropout to be applied in the Positionwise FeedForward Network. Defaults to 0.1"},
    )
    ff_hidden_multiplier: int = field(
        default=4,
        metadata={"help": "Multiple by which the Positionwise FF layer scales the input. Defaults to 4"},
    )
    transformer_activation: str = field(
        default="GEGLU",
        metadata={
            "help": "The activation type in the transformer feed forward layers."
            " In addition to the default activation in PyTorch like ReLU, TanH, LeakyReLU, etc."
            " https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity,"
            " GEGLU, ReGLU and SwiGLU are also implemented(https://arxiv.org/pdf/2002.05202.pdf)."
            " Defaults to GEGLU",
        },
    )
    _module_src: str = field(default="models.tab_transformer")
    _model_name: str = field(default="TabTransformerModel")
    _backbone_name: str = field(default="TabTransformerBackbone")
    _config_name: str = field(default="TabTransformerConfig")

基础模型配置.

Parameters:

Name	Type	Description	Default
`task`	`str`	指定问题是回归还是分类.`backbone` 是一种将模型视为生成特征的主干的任务.主要用于内部SSL及相关任务.可选值为:[`regression`,`classification`,`backbone`].	required
`head`	`Optional[str]`	模型使用的头部.应为 `pytorch_tabular.models.common.heads` 中定义的头部之一.默认为 LinearHead.可选值为:[`None`,`LinearHead`,`MixtureDensityHead`].	`'LinearHead'`
`head_config`	`Optional[Dict]`	定义头部的配置字典.如果留空,将初始化为默认的线性头部.	`lambda: {'layers': ''}()`
`embedding_dims`	`Optional[List]`	每个分类列的嵌入维度列表,格式为 (基数, 嵌入维度).如果留空,将根据分类列的基数推断,规则为 min(50, (x + 1) // 2).	`None`
`embedding_dropout`	`float`	应用于分类嵌入的丢弃率.默认为 0.0.	`0.0`
`batch_norm_continuous_input`	`bool`	如果为 True,将通过 BatchNorm 层对连续层进行归一化.	`True`
`virtual_batch_size`	`Optional[int]`	如果不为 None,所有 BatchNorm 将被转换为 GhostBatchNorm,并指定虚拟批量大小.默认为 None.	`None`
`learning_rate`	`float`	模型的学习率.默认为 1e-3.	`0.001`
`loss`	`Optional[str]`	应用的损失函数.默认情况下,回归为 MSELoss,分类为 CrossEntropyLoss.除非你确定自己在做什么,否则请保持为 MSELoss 或 L1Loss 用于回归,CrossEntropyLoss 用于分类.	`None`
`metrics`	`Optional[List[str]]`	训练期间需要跟踪的指标列表.指标应为 `torchmetrics` 中实现的功能性指标之一.默认情况下,分类为 accuracy,回归为 mean_squared_error.	`None`
`metrics_prob_input`	`Optional[bool]`	配置中定义的分类指标的强制参数.定义指标函数的输入是概率还是类别.长度应与指标数量相同.默认为 None.	`None`
`metrics_params`	`Optional[List]`	传递给指标函数的参数.`task` 强制为 `multiclass`,因为多分类版本可以处理二分类,并且为了简化,我们仅使用 `multiclass`.	`None`
`target_range`	`Optional[List]`	限制输出变量的范围.当前忽略多目标回归.通常用于回归问题.如果留空,将不应用任何限制.	`None`
`seed`	`int`	用于可重复性的种子.默认为 42.	`42`

Source code in src/pytorch_tabular/config/config.py

@dataclass
class ModelConfig:
    """基础模型配置.

    Parameters:
        task (str): 指定问题是回归还是分类.`backbone` 是一种将模型视为生成特征的主干的任务.主要用于内部SSL及相关任务.可选值为:[`regression`,`classification`,`backbone`].

        head (Optional[str]): 模型使用的头部.应为 `pytorch_tabular.models.common.heads` 中定义的头部之一.默认为 LinearHead.可选值为:[`None`,`LinearHead`,`MixtureDensityHead`].

        head_config (Optional[Dict]): 定义头部的配置字典.如果留空,将初始化为默认的线性头部.

        embedding_dims (Optional[List]): 每个分类列的嵌入维度列表,格式为 (基数, 嵌入维度).如果留空,将根据分类列的基数推断,规则为 min(50, (x + 1) // 2).

        embedding_dropout (float): 应用于分类嵌入的丢弃率.默认为 0.0.

        batch_norm_continuous_input (bool): 如果为 True,将通过 BatchNorm 层对连续层进行归一化.

        virtual_batch_size (Optional[int]): 如果不为 None,所有 BatchNorm 将被转换为 GhostBatchNorm,并指定虚拟批量大小.默认为 None.

        learning_rate (float): 模型的学习率.默认为 1e-3.

        loss (Optional[str]): 应用的损失函数.默认情况下,回归为 MSELoss,分类为 CrossEntropyLoss.除非你确定自己在做什么,否则请保持为 MSELoss 或 L1Loss 用于回归,CrossEntropyLoss 用于分类.

        metrics (Optional[List[str]]): 训练期间需要跟踪的指标列表.指标应为 ``torchmetrics`` 中实现的功能性指标之一.默认情况下,分类为 accuracy,回归为 mean_squared_error.

        metrics_prob_input (Optional[bool]): 配置中定义的分类指标的强制参数.定义指标函数的输入是概率还是类别.长度应与指标数量相同.默认为 None.

        metrics_params (Optional[List]): 传递给指标函数的参数.`task` 强制为 `multiclass`,因为多分类版本可以处理二分类,并且为了简化,我们仅使用 `multiclass`.

        target_range (Optional[List]): 限制输出变量的范围.当前忽略多目标回归.通常用于回归问题.如果留空,将不应用任何限制.

        seed (int): 用于可重复性的种子.默认为 42."""

    task: str = field(
        metadata={
            "help": "Specify whether the problem is regression or classification."
            " `backbone` is a task which considers the model as a backbone to generate features."
            " Mostly used internally for SSL and related tasks.",
            "choices": ["regression", "classification", "backbone"],
        }
    )

    head: Optional[str] = field(
        default="LinearHead",
        metadata={
            "help": "The head to be used for the model. Should be one of the heads defined"
            " in `pytorch_tabular.models.common.heads`. Defaults to  LinearHead",
            "choices": [None, "LinearHead", "MixtureDensityHead"],
        },
    )

    head_config: Optional[Dict] = field(
        default_factory=lambda: {"layers": ""},
        metadata={
            "help": "The config as a dict which defines the head."
            " If left empty, will be initialized as default linear head."
        },
    )
    embedding_dims: Optional[List] = field(
        default=None,
        metadata={
            "help": "The dimensions of the embedding for each categorical column as a list of tuples "
            "(cardinality, embedding_dim). If left empty, will infer using the cardinality of the "
            "categorical column using the rule min(50, (x + 1) // 2)"
        },
    )
    embedding_dropout: float = field(
        default=0.0,
        metadata={"help": "Dropout to be applied to the Categorical Embedding. Defaults to 0.0"},
    )
    batch_norm_continuous_input: bool = field(
        default=True,
        metadata={"help": "If True, we will normalize the continuous layer by passing it through a BatchNorm layer."},
    )

    learning_rate: float = field(
        default=1e-3,
        metadata={"help": "The learning rate of the model. Defaults to 1e-3."},
    )
    loss: Optional[str] = field(
        default=None,
        metadata={
            "help": "The loss function to be applied. By Default it is MSELoss for regression "
            "and CrossEntropyLoss for classification. Unless you are sure what you are doing, "
            "leave it at MSELoss or L1Loss for regression and CrossEntropyLoss for classification"
        },
    )
    metrics: Optional[List[str]] = field(
        default=None,
        metadata={
            "help": "the list of metrics you need to track during training. The metrics should be one "
            "of the functional metrics implemented in ``torchmetrics``. To use your own metric, please "
            "use the `metric` param in the `fit` method By default, it is accuracy if classification "
            "and mean_squared_error for regression"
        },
    )
    metrics_prob_input: Optional[List[bool]] = field(
        default=None,
        metadata={
            "help": "Is a mandatory parameter for classification metrics defined in the config. This defines "
            "whether the input to the metric function is the probability or the class. Length should be same "
            "as the number of metrics. Defaults to None."
        },
    )
    metrics_params: Optional[List] = field(
        default=None,
        metadata={
            "help": "The parameters to be passed to the metrics function. `task` is forced to be `multiclass`` "
            "because the multiclass version can handle binary as well and for simplicity we are only using "
            "`multiclass`."
        },
    )
    target_range: Optional[List] = field(
        default=None,
        metadata={
            "help": "The range in which we should limit the output variable. "
            "Currently ignored for multi-target regression. Typically used for Regression problems. "
            "If left empty, will not apply any restrictions"
        },
    )

    virtual_batch_size: Optional[int] = field(
        default=None,
        metadata={
            "help": "If not None, all BatchNorms will be converted to GhostBatchNorm's "
            " with this virtual batch size. Defaults to None"
        },
    )

    seed: int = field(
        default=42,
        metadata={"help": "The seed for reproducibility. Defaults to 42"},
    )

    _module_src: str = field(default="models")
    _model_name: str = field(default="Model")
    _backbone_name: str = field(default="Backbone")
    _config_name: str = field(default="Config")

    def __post_init__(self):
        if self.task == "regression":
            self.loss = self.loss or "MSELoss"
            self.metrics = self.metrics or ["mean_squared_error"]
            self.metrics_params = [{} for _ in self.metrics] if self.metrics_params is None else self.metrics_params
            self.metrics_prob_input = [False for _ in self.metrics]  # not used in Regression. just for compatibility
        elif self.task == "classification":
            self.loss = self.loss or "CrossEntropyLoss"
            self.metrics = self.metrics or ["accuracy"]
            self.metrics_params = [{} for _ in self.metrics] if self.metrics_params is None else self.metrics_params
            self.metrics_prob_input = (
                [False for _ in self.metrics] if self.metrics_prob_input is None else self.metrics_prob_input
            )
        elif self.task == "backbone":
            self.loss = None
            self.metrics = None
            self.metrics_params = None
            if self.head is not None:
                logger.warning("`head` is not a valid parameter for backbone task. Making `head=None`")
                self.head = None
                self.head_config = None
        else:
            raise NotImplementedError(
                f"{self.task} is not a valid task. Should be one of "
                f"{self.__dataclass_fields__['task'].metadata['choices']}"
            )
        if self.metrics is not None:
            assert len(self.metrics) == len(self.metrics_params), "metrics and metric_params should have same length"

        if self.task != "backbone":
            assert self.head in dir(heads.blocks), f"{self.head} is not a valid head"
            if hasattr(self, "_config_name") and self._config_name != "MDNConfig":
                assert self.head != "MixtureDensityHead", "MixtureDensityHead is not supported as a head for regular "
                "models. Use `MDNConfig` instead. Please see Probabilistic Regression with MDN How-to-Guide in "
                "documentation for the right usage."
            _head_callable = getattr(heads.blocks, self.head)
            ideal_head_config = _head_callable._config_template
            invalid_keys = set(self.head_config.keys()) - set(ideal_head_config.__dict__.keys())
            assert len(invalid_keys) == 0, f"`head_config` has some invalid keys: {invalid_keys}"

        # For Custom models, setting these values for compatibility
        if not hasattr(self, "_config_name"):
            self._config_name = type(self).__name__
        if not hasattr(self, "_model_name"):
            self._model_name = re.sub("[Cc]onfig", "Model", self._config_name)
        if not hasattr(self, "_backbone_name"):
            self._backbone_name = re.sub("[Cc]onfig", "Backbone", self._config_name)
        _validate_choices(self)

模型类¶

Bases: BaseModel

Source code in src/pytorch_tabular/models/autoint/autoint.py

class AutoIntModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        # Backbone
        self._backbone = AutoIntBackbone(self.hparams)
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # Head
        self._head = self._get_head_from_config()

Bases: BaseModel

Source code in src/pytorch_tabular/models/category_embedding/category_embedding_model.py

class CategoryEmbeddingModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        # Backbone
        self._backbone = CategoryEmbeddingBackbone(self.hparams)
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # Head
        self.head = self._get_head_from_config()

Bases: BaseModel

Source code in src/pytorch_tabular/models/danet/danet.py

class DANetModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        if self.hparams.virtual_batch_size > self.hparams.batch_size:
            warnings.warn(
                f"virtual_batch_size({self.hparams.virtual_batch_size}) is greater "
                f"than batch_size ({self.hparams.batch_size}). Setting virtual_batch_size "
                f"to {self.hparams.batch_size}. DANet uses Ghost Batch Normalization, "
                f"which works best when virtual_batch_size is small. Consider setting "
                "virtual_batch_size to something like 256 or 512."
            )
            self.hparams.virtual_batch_size = self.hparams.batch_size
        # Backbone
        self._backbone = DANetBackbone(
            cat_embedding_dims=self.hparams.embedding_dims,
            n_continuous_features=self.hparams.continuous_dim,
            n_layers=self.hparams.n_layers,
            abstlay_dim_1=self.hparams.abstlay_dim_1,
            abstlay_dim_2=self.hparams.abstlay_dim_2,
            k=self.hparams.k,
            dropout_rate=self.hparams.dropout_rate,
            block_activation=getattr(nn, self.hparams.block_activation)(),
            virtual_batch_size=self.hparams.virtual_batch_size,
            embedding_dropout=self.hparams.embedding_dropout,
            batch_norm_continuous_input=self.hparams.batch_norm_continuous_input,
        )
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # Head
        self._head = self._get_head_from_config()

Bases: BaseModel

Source code in src/pytorch_tabular/models/ft_transformer/ft_transformer.py

class FTTransformerModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        # Backbone
        self._backbone = FTTransformerBackbone(self.hparams)
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # Head
        self._head = self._get_head_from_config()

    def feature_importance(self):
        if self.hparams.attn_feature_importance:
            return super().feature_importance()
        else:
            raise ValueError("If you want Feature Importance, `attn_feature_weights` should be `True`.")

Bases: BaseModel

Source code in src/pytorch_tabular/models/gandalf/gandalf.py

class GANDALFModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        # Backbone
        self._backbone = GANDALFBackbone(
            cat_embedding_dims=self.hparams.embedding_dims,
            n_continuous_features=self.hparams.continuous_dim,
            gflu_stages=self.hparams.gflu_stages,
            gflu_dropout=self.hparams.gflu_dropout,
            gflu_feature_init_sparsity=self.hparams.gflu_feature_init_sparsity,
            learnable_sparsity=self.hparams.learnable_sparsity,
            batch_norm_continuous_input=self.hparams.batch_norm_continuous_input,
            embedding_dropout=self.hparams.embedding_dropout,
            virtual_batch_size=self.hparams.virtual_batch_size,
        )
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # Head
        self.T0 = nn.Parameter(torch.rand(self.hparams.output_dim), requires_grad=True)
        self._head = nn.Sequential(self._get_head_from_config(), Add(self.T0))

    def data_aware_initialization(self, datamodule):
        if self.hparams.task == "regression":
            logger.info("Data Aware Initialization of T0")
            # Need a big batch to initialize properly
            alt_loader = datamodule.train_dataloader(batch_size=self.hparams.data_aware_init_batch_size)
            batch = next(iter(alt_loader))
            self.T0.data = torch.mean(batch["target"], dim=0)

Bases: BaseModel

Source code in src/pytorch_tabular/models/gate/gate_model.py

class GatedAdditiveTreeEnsembleModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        # Backbone
        self._backbone = GatedAdditiveTreesBackbone(
            n_continuous_features=self.hparams.continuous_dim,
            cat_embedding_dims=self.hparams.embedding_dims,
            gflu_stages=self.hparams.gflu_stages,
            gflu_dropout=self.hparams.gflu_dropout,
            num_trees=self.hparams.num_trees,
            tree_depth=self.hparams.tree_depth,
            tree_dropout=self.hparams.tree_dropout,
            binning_activation=self.hparams.binning_activation,
            feature_mask_function=self.hparams.feature_mask_function,
            batch_norm_continuous_input=self.hparams.batch_norm_continuous_input,
            chain_trees=self.hparams.chain_trees,
            tree_wise_attention=self.hparams.tree_wise_attention,
            tree_wise_attention_dropout=self.hparams.tree_wise_attention_dropout,
            gflu_feature_init_sparsity=self.hparams.gflu_feature_init_sparsity,
            tree_feature_init_sparsity=self.hparams.tree_feature_init_sparsity,
            virtual_batch_size=self.hparams.virtual_batch_size,
        )
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # Head
        if self.hparams.num_trees == 0:
            self.T0 = nn.Parameter(torch.rand(self.hparams.output_dim), requires_grad=True)
            self._head = nn.Sequential(self._get_head_from_config(), Add(self.T0))
        else:
            self._head = CustomHead(self.backbone.output_dim, self.hparams)

    def data_aware_initialization(self, datamodule):
        if self.hparams.task == "regression":
            logger.info("Data Aware Initialization of T0")
            # Need a big batch to initialize properly
            alt_loader = datamodule.train_dataloader(batch_size=self.hparams.data_aware_init_batch_size)
            batch = next(iter(alt_loader))
            t0 = torch.mean(batch["target"], dim=0)
            if self.hparams.num_trees != 0:
                self.head.T0.data = t0
            else:
                self.T0.data = t0

Bases: BaseModel

Source code in src/pytorch_tabular/models/mixture_density/mdn.py

class MDNModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        assert "inferred_config" in kwargs, "inferred_config not found in initialization arguments"
        self.inferred_config = kwargs["inferred_config"]
        assert config.task == "regression", "MDN is only implemented for Regression"
        super().__init__(config, **kwargs)
        assert self.hparams.output_dim == 1, "MDN is not implemented for multi-targets"
        if config.target_range is not None:
            logger.warning("MDN does not use target range. Ignoring it.")
        self._val_output = []

    def _get_head_from_config(self):
        _head_callable = getattr(blocks, self.hparams.head)
        self.hparams.head_config.input_dim = self.backbone.output_dim
        return _head_callable(
            config=_head_callable._config_template(**self.hparams.head_config),
        )  # output_dim auto-calculated from other configs

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        # Backbone
        callable, config = (
            self.hparams.backbone_config_class,
            self.hparams.backbone_config_params,
        )
        try:
            callable = getattr(models, callable)
        except ModuleNotFoundError as e:
            logger.error(
                "`config class` in `backbone_config` is not valid."
                " The config class should be a valid module path from `models`."
                " e.g. `ft_transformer.FTTransformerConfig`."
            )
            raise e
        assert issubclass(callable, ModelConfig), "`config_class` should be a subclass of `ModelConfig`"
        backbone_config = callable(**config)
        backbone_callable = getattr_nested(backbone_config._module_src, backbone_config._backbone_name)
        # Merging the config and inferred config
        backbone_config = safe_merge_config(OmegaConf.structured(backbone_config), self.inferred_config)
        self._backbone = backbone_callable(backbone_config)
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # Head
        self._head = self._get_head_from_config()

    # Redefining forward because TabTransformer flow is slightly different
    def forward(self, x: Dict):
        if isinstance(self.backbone, TabTransformerBackbone):
            if self.hparams.categorical_dim > 0:
                x_cat = self.embed_input({"categorical": x["categorical"]})
            x = self.compute_backbone({"categorical": x_cat, "continuous": x["continuous"]})
        else:
            x = self.embedding_layer(x)
            x = self.compute_backbone(x)
        return self.compute_head(x)

        # Redefining compute_backbone because TabTransformer flow flow is slightly different

    def compute_backbone(self, x: Union[Dict, torch.Tensor]):
        # Returns output
        if isinstance(self.backbone, TabTransformerBackbone):
            x = self.backbone(x["categorical"], x["continuous"])
        else:
            x = self.backbone(x)
        return x

    def compute_head(self, x: Tensor):
        pi, sigma, mu = self.head(x)
        return {"pi": pi, "sigma": sigma, "mu": mu, "backbone_features": x}

    def predict(self, x: Dict):
        ret_value = self.forward(x)
        return self.head.generate_point_predictions(ret_value["pi"], ret_value["sigma"], ret_value["mu"])

    def sample(self, x: Dict, n_samples: Optional[int] = None, ret_model_output=False):
        ret_value = self.forward(x)
        samples = self.head.generate_samples(ret_value["pi"], ret_value["sigma"], ret_value["mu"], n_samples)
        if ret_model_output:
            return samples, ret_value
        else:
            return samples

    def calculate_loss(self, y, pi, sigma, mu, tag="train"):
        # NLL Loss
        log_prob = self.head.log_prob(pi, sigma, mu, y)
        loss = torch.mean(-log_prob)
        if self.head.hparams.weight_regularization is not None:
            sigma_l1_reg = 0
            pi_l1_reg = 0
            mu_l1_reg = 0
            if self.head.hparams.lambda_sigma > 0:
                # Weight Regularization Sigma
                sigma_params = torch.cat([x.view(-1) for x in self.head.sigma.parameters()])
                sigma_l1_reg = self.head.hparams.lambda_sigma * torch.norm(
                    sigma_params, self.head.hparams.weight_regularization
                )
            if self.head.hparams.lambda_pi > 0:
                pi_params = torch.cat([x.view(-1) for x in self.head.pi.parameters()])
                pi_l1_reg = self.head.hparams.lambda_pi * torch.norm(pi_params, self.head.hparams.weight_regularization)
            if self.head.hparams.lambda_mu > 0:
                mu_params = torch.cat([x.view(-1) for x in self.head.mu.parameters()])
                mu_l1_reg = self.head.hparams.lambda_mu * torch.norm(mu_params, self.head.hparams.weight_regularization)

            loss = loss + sigma_l1_reg + pi_l1_reg + mu_l1_reg
        self.log(
            f"{tag}_loss",
            loss,
            on_epoch=(tag == "valid") or (tag == "test"),
            on_step=(tag == "train"),
            # on_step=False,
            logger=True,
            prog_bar=True,
        )
        return loss

    def training_step(self, batch, batch_idx):
        y = batch["target"]
        ret_value = self(batch)
        loss = self.calculate_loss(y, ret_value["pi"], ret_value["sigma"], ret_value["mu"], tag="train")
        if self.head.hparams.speedup_training:
            pass
        else:
            y_hat = self.head.generate_point_predictions(ret_value["pi"], ret_value["sigma"], ret_value["mu"])
            self.calculate_metrics(y, y_hat, tag="train")
        return loss

    def validation_step(self, batch, batch_idx):
        y = batch["target"]
        ret_value = self(batch)
        self.calculate_loss(y, ret_value["pi"], ret_value["sigma"], ret_value["mu"], tag="valid")
        y_hat = self.head.generate_point_predictions(ret_value["pi"], ret_value["sigma"], ret_value["mu"])
        self.calculate_metrics(y, y_hat, tag="valid")
        return y_hat, y, ret_value

    def test_step(self, batch, batch_idx):
        y = batch["target"]
        ret_value = self(batch)
        self.calculate_loss(y, ret_value["pi"], ret_value["sigma"], ret_value["mu"], tag="test")
        y_hat = self.head.generate_point_predictions(ret_value["pi"], ret_value["sigma"], ret_value["mu"])
        self.calculate_metrics(y, y_hat, tag="test")
        return y_hat, y

    def on_validation_batch_end(self, outputs, batch, batch_idx: int) -> None:
        self._val_output.append(outputs)
        super().on_validation_batch_end(outputs, batch, batch_idx)

    def on_validation_epoch_end(self) -> None:
        pi = [
            nn.functional.gumbel_softmax(output[2]["pi"], tau=self.head.hparams.softmax_temperature, dim=-1)
            for output in self._val_output
        ]
        pi = torch.cat(pi).detach().cpu()
        for i in range(self.head.hparams.num_gaussian):
            self.log(
                f"mean_pi_{i}",
                pi[:, i].mean(),
                on_epoch=True,
                on_step=False,
                logger=True,
                prog_bar=False,
            )

        mu = [output[2]["mu"] for output in self._val_output]
        mu = torch.cat(mu).detach().cpu()
        for i in range(self.head.hparams.num_gaussian):
            self.log(
                f"mean_mu_{i}",
                mu[:, i].mean(),
                on_epoch=True,
                on_step=False,
                logger=True,
                prog_bar=False,
            )

        sigma = [output[2]["sigma"] for output in self._val_output]
        sigma = torch.cat(sigma).detach().cpu()
        for i in range(self.head.hparams.num_gaussian):
            self.log(
                f"mean_sigma_{i}",
                sigma[:, i].mean(),
                on_epoch=True,
                on_step=False,
                logger=True,
                prog_bar=False,
            )
        if self.do_log_logits:
            logits = [output[0] for output in self._val_output]
            logits = torch.cat(logits).detach().cpu()
            fig = self.create_plotly_histogram(logits.unsqueeze(1), "logits")
            wandb.log(
                {
                    "valid_logits": fig,
                    "global_step": self.global_step,
                },
                commit=False,
            )
            if self.head.hparams.log_debug_plot:
                fig = self.create_plotly_histogram(pi, "pi", bin_dict={"start": 0.0, "end": 1.0, "size": 0.1})
                wandb.log(
                    {
                        "valid_pi": fig,
                        "global_step": self.global_step,
                    },
                    commit=False,
                )

                fig = self.create_plotly_histogram(mu, "mu")
                wandb.log(
                    {
                        "valid_mu": fig,
                        "global_step": self.global_step,
                    },
                    commit=False,
                )

                fig = self.create_plotly_histogram(sigma, "sigma")
                wandb.log(
                    {
                        "valid_sigma": fig,
                        "global_step": self.global_step,
                    },
                    commit=False,
                )
        self._val_output = []

Bases: BaseModel

Source code in src/pytorch_tabular/models/node/node_model.py

class NODEModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    def subset(self, x):
        return x[..., : self.hparams.output_dim].mean(dim=-2)

    def data_aware_initialization(self, datamodule):
        """执行针对 NODE 的数据感知初始化."""
        logger.info(
            "Data Aware Initialization of NODE using a forward pass with "
            f"{self.hparams.data_aware_init_batch_size} batch size...."
        )
        # Need a big batch to initialize properly
        alt_loader = datamodule.train_dataloader(batch_size=self.hparams.data_aware_init_batch_size)
        batch = next(iter(alt_loader))
        for k, v in batch.items():
            if isinstance(v, list) and (len(v) == 0):
                # Skipping empty list
                continue
            # batch[k] = v.to("cpu" if self.config.gpu == 0 else "cuda")
            batch[k] = v.to(self.device)

        # single forward pass to initialize the ODST
        with torch.no_grad():
            self(batch)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        self._backbone = NODEBackbone(self.hparams)
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # average first n channels of every tree, where n is the number of output targets for regression
        # and number of classes for classification
        # Not using config head because NODE has a specific head
        warnings.warn("Ignoring head config because NODE has a specific head which subsets the tree outputs")
        self._head = Lambda(self.subset)

`data_aware_initialization(datamodule)` ¶

执行针对 NODE 的数据感知初始化.

Source code in src/pytorch_tabular/models/node/node_model.py

def data_aware_initialization(self, datamodule):
    """执行针对 NODE 的数据感知初始化."""
    logger.info(
        "Data Aware Initialization of NODE using a forward pass with "
        f"{self.hparams.data_aware_init_batch_size} batch size...."
    )
    # Need a big batch to initialize properly
    alt_loader = datamodule.train_dataloader(batch_size=self.hparams.data_aware_init_batch_size)
    batch = next(iter(alt_loader))
    for k, v in batch.items():
        if isinstance(v, list) and (len(v) == 0):
            # Skipping empty list
            continue
        # batch[k] = v.to("cpu" if self.config.gpu == 0 else "cuda")
        batch[k] = v.to(self.device)

    # single forward pass to initialize the ODST
    with torch.no_grad():
        self(batch)

Bases: BaseModel

Source code in src/pytorch_tabular/models/tabnet/tabnet_model.py

class TabNetModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        assert config.task in [
            "regression",
            "classification",
        ], "TabNet is only implemented for Regression and Classification"
        super().__init__(config, **kwargs)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        # TabNet has its own embedding layer.
        # So we are not using the embedding layer from BaseModel
        self._embedding_layer = nn.Identity()
        self._backbone = TabNetBackbone(self.hparams)
        setattr(self.backbone, "output_dim", self.hparams.output_dim)
        # TabNet has its own head
        self._head = nn.Identity()

    def extract_embedding(self):
        raise ValueError("Extracting Embeddings is not supported by Tabnet. Please use another" " compatible model")

Bases: BaseModel

Source code in src/pytorch_tabular/models/tab_transformer/tab_transformer.py

class TabTransformerModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        # Backbone
        self._backbone = TabTransformerBackbone(self.hparams)
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # Head
        self._head = self._get_head_from_config()

    # Redefining forward because this model flow is slightly different
    def forward(self, x: Dict):
        if self.hparams.categorical_dim > 0:
            x_cat = self.embed_input({"categorical": x["categorical"]})
        else:
            x_cat = None
        x = self.compute_backbone({"categorical": x_cat, "continuous": x["continuous"]})
        return self.compute_head(x)

    # Redefining compute_backbone because this model flow is slightly different
    def compute_backbone(self, x: Dict):
        # Returns output
        x = self.backbone(x["categorical"], x["continuous"])
        return x

基础模型类¶

Bases: LightningModule

Source code in src/pytorch_tabular/models/base_model.py

class BaseModel(pl.LightningModule, metaclass=ABCMeta):
    def __init__(
        self,
        config: DictConfig,
        custom_loss: Optional[torch.nn.Module] = None,
        custom_metrics: Optional[List[Callable]] = None,
        custom_metrics_prob_inputs: Optional[List[bool]] = None,
        custom_optimizer: Optional[torch.optim.Optimizer] = None,
        custom_optimizer_params: Dict = {},
        **kwargs,
    ):
        """    PyTorch Tabular 的基础模型.

Parameters:
    config (DictConfig): 模型的配置.
    custom_loss (Optional[torch.nn.Module], optional): 自定义损失函数.默认为 None.
    custom_metrics (Optional[List[Callable]], optional): 自定义指标列表.默认为 None.
    custom_metrics_prob_inputs (Optional[List[bool]], optional): 布尔值列表,指示指标是否需要概率输入.默认为 None.
    custom_optimizer (Optional[torch.optim.Optimizer], optional): 自定义优化器,可为可调用对象或导入的字符串.默认为 None.
    custom_optimizer_params (Dict, optional): 自定义优化器参数的字典.默认为 {}.
    kwargs (Dict, optional): 其他关键字参数.
"""
        super().__init__()
        assert "inferred_config" in kwargs, "inferred_config not found in initialization arguments"
        inferred_config = kwargs["inferred_config"]
        # Merging the config and inferred config
        config = safe_merge_config(config, inferred_config)
        self.custom_loss = custom_loss
        self.custom_metrics = custom_metrics
        self.custom_metrics_prob_inputs = custom_metrics_prob_inputs
        self.custom_optimizer = custom_optimizer
        self.custom_optimizer_params = custom_optimizer_params
        self.kwargs = kwargs
        # Updating config with custom parameters for experiment tracking
        if self.custom_loss is not None:
            config.loss = str(self.custom_loss)
        if self.custom_metrics is not None:
            # Adding metrics to config for hparams logging and tracking
            config.metrics = []
            config.metrics_params = []
            for metric in self.custom_metrics:
                if isinstance(metric, partial):
                    # extracting func names from partial functions
                    config.metrics.append(metric.func.__name__)
                    config.metrics_params.append(metric.keywords)
                else:
                    config.metrics.append(metric.__name__)
                    config.metrics_params.append(vars(metric))
            if config.task == "classification":
                config.metrics_prob_input = self.custom_metrics_prob_inputs
                for i, mp in enumerate(config.metrics_params):
                    mp.sub_params_list = []
                    for j, num_classes in enumerate(inferred_config.output_cardinality):
                        config.metrics_params[i].sub_params_list.append(
                            OmegaConf.create(
                                {
                                    "task": mp.get("task", "multiclass"),
                                    "num_classes": mp.get("num_classes", num_classes),
                                }
                            )
                        )

        # Updating default metrics in config
        elif config.task == "classification":
            # Adding metric_params to config for classification task
            for i, mp in enumerate(config.metrics_params):
                mp.sub_params_list = []
                for j, num_classes in enumerate(inferred_config.output_cardinality):
                    # config.metrics_params[i][j]["task"] = mp.get("task", "multiclass")
                    # config.metrics_params[i][j]["num_classes"] = mp.get("num_classes", num_classes)

                    config.metrics_params[i].sub_params_list.append(
                        OmegaConf.create(
                            {"task": mp.get("task", "multiclass"), "num_classes": mp.get("num_classes", num_classes)}
                        )
                    )

                    if config.metrics[i] in (
                        "accuracy",
                        "precision",
                        "recall",
                        "precision_recall",
                        "specificity",
                        "f1_score",
                        "fbeta_score",
                    ):
                        config.metrics_params[i].sub_params_list[j]["top_k"] = mp.get("top_k", 1)

        if self.custom_optimizer is not None:
            config.optimizer = str(self.custom_optimizer.__class__.__name__)
        if len(self.custom_optimizer_params) > 0:
            config.optimizer_params = self.custom_optimizer_params
        self.save_hyperparameters(config)
        # The concatenated output dim of the embedding layer
        self._build_network()
        self._setup_loss()
        self._setup_metrics()
        self._check_and_verify()
        self.do_log_logits = (
            hasattr(self.hparams, "log_logits") and self.hparams.log_logits and self.hparams.log_target == "wandb"
        )
        if self.do_log_logits:
            self._val_logits = []
        if not WANDB_INSTALLED and self.do_log_logits:
            self.do_log_logits = False
            warnings.warn(
                "Wandb is not installed. Please install wandb to log logits. "
                "You can install wandb using pip install wandb or install PyTorch Tabular"
                " using pip install pytorch-tabular[extra]"
            )
        if not PLOTLY_INSTALLED and self.do_log_logits:
            self.do_log_logits = False
            warnings.warn(
                "Plotly is not installed. Please install plotly to log logits. "
                "You can install plotly using pip install plotly or install PyTorch Tabular"
                " using pip install pytorch-tabular[extra]"
            )

    @abstractmethod
    def _build_network(self):
        pass

    @property
    def backbone(self):
        raise NotImplementedError("backbone property needs to be implemented by inheriting classes")

    @property
    def embedding_layer(self):
        raise NotImplementedError("embedding_layer property needs to be implemented by inheriting classes")

    @property
    def head(self):
        raise NotImplementedError("head property needs to be implemented by inheriting classes")

    def _check_and_verify(self):
        assert hasattr(self, "backbone"), "Model has no attribute called `backbone`"
        assert hasattr(self.backbone, "output_dim"), "Backbone needs to have attribute `output_dim`"
        assert hasattr(self, "head"), "Model has no attribute called `head`"

    def _get_head_from_config(self):
        _head_callable = getattr(blocks, self.hparams.head)
        return _head_callable(
            in_units=self.backbone.output_dim,
            output_dim=self.hparams.output_dim,
            config=_head_callable._config_template(**self.hparams.head_config),
        )  # output_dim auto-calculated from other configs

    def _setup_loss(self):
        if self.custom_loss is None:
            try:
                self.loss = getattr(nn, self.hparams.loss)()
            except AttributeError as e:
                logger.error(f"{self.hparams.loss} is not a valid loss defined in the torch.nn module")
                raise e
        else:
            self.loss = self.custom_loss

    def _setup_metrics(self):
        if self.custom_metrics is None:
            self.metrics = []
            task_module = torchmetrics.functional
            for metric in self.hparams.metrics:
                try:
                    self.metrics.append(getattr(task_module, metric))
                except AttributeError as e:
                    logger.error(
                        f"{metric} is not a valid functional metric defined in the torchmetrics.functional module"
                    )
                    raise e
        else:
            self.metrics = self.custom_metrics

    def calculate_loss(self, output: Dict, y: torch.Tensor, tag: str) -> torch.Tensor:
        """计算模型的损失.

Parameters:
    output (Dict): 模型输出的字典
    y (torch.Tensor): 目标张量
    tag (str): 用于日志记录的标签

Returns:
    torch.Tensor: 损失值
"""
        y_hat = output["logits"]
        reg_terms = [k for k, v in output.items() if "regularization" in k]
        reg_loss = 0
        for t in reg_terms:
            # Log only if non-zero
            if output[t] != 0:
                reg_loss += output[t]
                self.log(
                    f"{tag}_{t}_loss",
                    output[t],
                    on_epoch=True,
                    on_step=False,
                    logger=True,
                    prog_bar=False,
                )
        if self.hparams.task == "regression":
            computed_loss = reg_loss
            for i in range(self.hparams.output_dim):
                _loss = self.loss(y_hat[:, i], y[:, i])
                computed_loss += _loss
                if self.hparams.output_dim > 1:
                    self.log(
                        f"{tag}_loss_{i}",
                        _loss,
                        on_epoch=True,
                        on_step=False,
                        logger=True,
                        prog_bar=False,
                    )
        else:
            # TODO loss fails with batch size of 1?
            computed_loss = reg_loss
            start_index = 0
            for i in range(len(self.hparams.output_cardinality)):
                end_index = start_index + self.hparams.output_cardinality[i]
                _loss = self.loss(y_hat[:, start_index:end_index], y[:, i])
                computed_loss += _loss
                if self.hparams.output_dim > 1:
                    self.log(
                        f"{tag}_loss_{i}",
                        _loss,
                        on_epoch=True,
                        on_step=False,
                        logger=True,
                        prog_bar=False,
                    )
                start_index = end_index
        self.log(
            f"{tag}_loss",
            computed_loss,
            on_epoch=(tag in ["valid", "test"]),
            on_step=(tag == "train"),
            # on_step=False,
            logger=True,
            prog_bar=True,
        )
        return computed_loss

    def calculate_metrics(self, y: torch.Tensor, y_hat: torch.Tensor, tag: str) -> List[torch.Tensor]:
        """    计算模型的各项指标.

Parameters:
    y (torch.Tensor): 目标张量

    y_hat (torch.Tensor): 预测张量

    tag (str): 用于日志记录的标签

Returns:
    List[torch.Tensor]: 指标值列表
"""
        metrics = []
        for metric, metric_str, prob_inp, metric_params in zip(
            self.metrics,
            self.hparams.metrics,
            self.hparams.metrics_prob_input,
            self.hparams.metrics_params,
        ):
            if self.hparams.task == "regression":
                _metrics = []
                for i in range(self.hparams.output_dim):
                    name = metric.func.__name__ if isinstance(metric, partial) else metric.__name__
                    if name == torchmetrics.functional.mean_squared_log_error.__name__:
                        # MSLE should only be used in strictly positive targets. It is undefined otherwise
                        _metric = metric(
                            torch.clamp(y_hat[:, i], min=0),
                            torch.clamp(y[:, i], min=0),
                            **metric_params,
                        )
                    else:
                        _metric = metric(y_hat[:, i], y[:, i], **metric_params)
                    if self.hparams.output_dim > 1:
                        self.log(
                            f"{tag}_{metric_str}_{i}",
                            _metric,
                            on_epoch=True,
                            on_step=False,
                            logger=True,
                            prog_bar=False,
                        )
                    _metrics.append(_metric)
                avg_metric = torch.stack(_metrics, dim=0).sum()
            else:
                _metrics = []
                start_index = 0
                for i, cardinality in enumerate(self.hparams.output_cardinality):
                    end_index = start_index + cardinality
                    y_hat_i = nn.Softmax(dim=-1)(y_hat[:, start_index:end_index].squeeze())
                    if prob_inp:
                        _metric = metric(y_hat_i, y[:, i : i + 1].squeeze(), **metric_params.sub_params_list[i])
                    else:
                        _metric = metric(
                            torch.argmax(y_hat_i, dim=-1), y[:, i : i + 1].squeeze(), **metric_params.sub_params_list[i]
                        )
                    if len(self.hparams.output_cardinality) > 1:
                        self.log(
                            f"{tag}_{metric_str}_{i}",
                            _metric,
                            on_epoch=True,
                            on_step=False,
                            logger=True,
                            prog_bar=False,
                        )
                    _metrics.append(_metric)
                    start_index = end_index
                avg_metric = torch.stack(_metrics, dim=0).sum()
            metrics.append(avg_metric)
            self.log(
                f"{tag}_{metric_str}",
                avg_metric,
                on_epoch=True,
                on_step=False,
                logger=True,
                prog_bar=True,
            )
        return metrics

    def data_aware_initialization(self, datamodule):
        """在定义模型时执行数据感知初始化."""
        pass

    def compute_backbone(self, x: Dict) -> torch.Tensor:
        # Returns output
        x = self.backbone(x)
        return x

    def embed_input(self, x: Dict) -> torch.Tensor:
        return self.embedding_layer(x)

    def apply_output_sigmoid_scaling(self, y_hat: torch.Tensor) -> torch.Tensor:
        """对模型输出应用Sigmoid缩放（如果任务是回归且目标范围已定义）.

Parameters:
    y_hat (torch.Tensor): 模型的输出

Returns:
    torch.Tensor: 应用了Sigmoid缩放的模型输出
"""
        if (self.hparams.task == "regression") and (self.hparams.target_range is not None):
            for i in range(self.hparams.output_dim):
                y_min, y_max = self.hparams.target_range[i]
                y_hat[:, i] = y_min + nn.Sigmoid()(y_hat[:, i]) * (y_max - y_min)
        return y_hat

    def pack_output(self, y_hat: torch.Tensor, backbone_features: torch.tensor) -> Dict[str, Any]:
        """打包模型的输出.

Parameters:
    y_hat (torch.Tensor): 模型的输出

    backbone_features (torch.tensor): 主干网络的特征

Returns:
    打包后的模型输出
"""
        # if self.head is the Identity function it means that we cannot extract backbone features,
        # because the model cannot be divide in backbone and head (i.e. TabNet)
        if type(self.head) is nn.Identity:
            return {"logits": y_hat}
        return {"logits": y_hat, "backbone_features": backbone_features}

    def compute_head(self, backbone_features: Tensor) -> Dict[str, Any]:
        """    计算模型的头部.

Parameters:
    backbone_features (Tensor): 主干网络的特征

Returns:
    模型的输出
"""
        y_hat = self.head(backbone_features)
        y_hat = self.apply_output_sigmoid_scaling(y_hat)
        return self.pack_output(y_hat, backbone_features)

    def forward(self, x: Dict) -> Dict[str, Any]:
        """   模型的前向传播.

Parameters:
    x (Dict): 模型的输入,包含'continuous'和'categorical'键
"""
        x = self.embed_input(x)
        x = self.compute_backbone(x)
        return self.compute_head(x)

    def predict(self, x: Dict, ret_model_output: bool = False) -> Union[torch.Tensor, Tuple[torch.Tensor, Dict]]:
        """    预测模型的输出.

Parameters:
    x (Dict): 模型的输入,包含'continuous'和'categorical'键

    ret_model_output (bool): 如果为True,方法返回模型的输出

Returns:
    模型的输出
"""
        assert self.hparams.task != "ssl", "It's not allowed to use the method predict in case of ssl task"
        ret_value = self.forward(x)
        if ret_model_output:
            return ret_value.get("logits"), ret_value
        return ret_value.get("logits")

    def forward_pass(self, batch):
        return self(batch), None

    def extract_embedding(self):
        """提取模型的嵌入.

这在 `CategoricalEmbeddingTransformer` 中使用
"""
        if self.hparams.categorical_dim > 0:
            if not isinstance(self.embedding_layer, PreEncoded1dLayer):
                return self.embedding_layer.cat_embedding_layers
            else:
                raise ValueError(
                    "Cannot extract embedding for PreEncoded1dLayer. Please use a different embedding layer."
                )
        else:
            raise ValueError(
                "Model has been trained with no categorical feature and therefore can't be used"
                " as a Categorical Encoder"
            )

    def training_step(self, batch, batch_idx):
        output, y = self.forward_pass(batch)
        # y is not None for SSL task.Rest of the tasks target is
        # fetched from the batch
        y = batch["target"] if y is None else y
        y_hat = output["logits"]
        loss = self.calculate_loss(output, y, tag="train")
        self.calculate_metrics(y, y_hat, tag="train")
        return loss

    def validation_step(self, batch, batch_idx):
        with torch.no_grad():
            output, y = self.forward_pass(batch)
            # y is not None for SSL task.Rest of the tasks target is
            # fetched from the batch
            y = batch["target"] if y is None else y
            y_hat = output["logits"]
            self.calculate_loss(output, y, tag="valid")
            self.calculate_metrics(y, y_hat, tag="valid")
        return y_hat, y

    def test_step(self, batch, batch_idx):
        with torch.no_grad():
            output, y = self.forward_pass(batch)
            # y is not None for SSL task.Rest of the tasks target is
            # fetched from the batch
            y = batch["target"] if y is None else y
            y_hat = output["logits"]
            self.calculate_loss(output, y, tag="test")
            self.calculate_metrics(y, y_hat, tag="test")
        return y_hat, y

    def configure_optimizers(self):
        if self.custom_optimizer is None:
            # Loading from the config
            try:
                self._optimizer = _create_optimizer(self.hparams.optimizer)
                opt = self._optimizer(
                    self.parameters(),
                    lr=self.hparams.learning_rate,
                    **self.hparams.optimizer_params,
                )
            except AttributeError as e:
                logger.error(f"{self.hparams.optimizer} is not a valid optimizer defined in the torch.optim module")
                raise e
        else:
            # Loading from custom fit arguments
            self._optimizer = _create_optimizer(self.custom_optimizer)

            opt = self._optimizer(
                self.parameters(),
                lr=self.hparams.learning_rate,
                **self.custom_optimizer_params,
            )
        if self.hparams.lr_scheduler is not None:
            try:
                self._lr_scheduler = getattr(torch.optim.lr_scheduler, self.hparams.lr_scheduler)
            except AttributeError as e:
                logger.error(
                    f"{self.hparams.lr_scheduler} is not a valid learning rate sheduler defined"
                    f" in the torch.optim.lr_scheduler module"
                )
                raise e
            if isinstance(self._lr_scheduler, torch.optim.lr_scheduler._LRScheduler):
                return {
                    "optimizer": opt,
                    "lr_scheduler": self._lr_scheduler(opt, **self.hparams.lr_scheduler_params),
                }
            return {
                "optimizer": opt,
                "lr_scheduler": self._lr_scheduler(opt, **self.hparams.lr_scheduler_params),
                "monitor": self.hparams.lr_scheduler_monitor_metric,
            }
        else:
            return opt

    def create_plotly_histogram(self, arr, name, bin_dict=None):
        fig = go.Figure()
        for i in range(arr.shape[-1]):
            fig.add_trace(
                go.Histogram(
                    x=arr[:, i],
                    histnorm="probability",
                    name=f"{name}_{i}",
                    xbins=bin_dict,  # dict(start=0.0, end=1.0, size=0.1),  # bins used for histogram
                )
            )
        # Overlay both histograms
        fig.update_layout(
            barmode="overlay",
            legend={"orientation": "h", "yanchor": "bottom", "y": 1.02, "xanchor": "right", "x": 1},
        )
        # Reduce opacity to see both histograms
        fig.update_traces(opacity=0.5)
        return fig

    def on_validation_batch_end(self, outputs, batch, batch_idx: int) -> None:
        if self.do_log_logits:
            self._val_logits.append(outputs[0][0])
        super().on_validation_batch_end(outputs, batch, batch_idx)

    def on_validation_epoch_end(self) -> None:
        if self.do_log_logits:
            logits = torch.cat(self._val_logits).detach().cpu()
            self._val_logits = []
            fig = self.create_plotly_histogram(logits, "logits")
            wandb.log(
                {"valid_logits": wandb.Plotly(fig), "global_step": self.global_step},
                commit=False,
            )
        super().on_validation_epoch_end()

    def reset_weights(self):
        reset_all_weights(self.backbone)
        reset_all_weights(self.head)
        reset_all_weights(self.embedding_layer)

    def feature_importance(self) -> DataFrame:
        """返回一个包含模型特征重要性的数据框."""
        if hasattr(self.backbone, "feature_importance_"):
            imp = self.backbone.feature_importance_
            n_feat = len(self.hparams.categorical_cols + self.hparams.continuous_cols)
            if self.hparams.categorical_dim > 0:
                if imp.shape[0] != n_feat:
                    # Combining Cat Embedded Dimensions to a single one by averaging
                    wt = []
                    norm = []
                    ft_idx = 0
                    for _, embd_dim in self.hparams.embedding_dims:
                        wt.extend([ft_idx] * embd_dim)
                        norm.append(embd_dim)
                        ft_idx += 1
                    for _ in self.hparams.continuous_cols:
                        wt.extend([ft_idx])
                        norm.append(1)
                        ft_idx += 1
                    imp = np.bincount(wt, weights=imp) / np.array(norm)
                else:
                    # For models like FTTransformer, we dont need to do anything
                    # It takes categorical and continuous as individual 2-D features
                    pass
            importance_df = DataFrame(
                {
                    "Features": self.hparams.categorical_cols + self.hparams.continuous_cols,
                    "importance": imp,
                }
            )
            return importance_df
        else:
            raise ValueError("Feature Importance unavailable for this model.")

`init(config, custom_loss=None, custom_metrics=None, custom_metrics_prob_inputs=None, custom_optimizer=None, custom_optimizer_params={}, **kwargs)` ¶

PyTorch Tabular 的基础模型.

Parameters:

Name	Type	Description	Default
`config`	`DictConfig`	模型的配置.	required
`custom_loss`	`Optional[Module]`	自定义损失函数.默认为 None.	`None`
`custom_metrics`	`Optional[List[Callable]]`	自定义指标列表.默认为 None.	`None`
`custom_metrics_prob_inputs`	`Optional[List[bool]]`	布尔值列表,指示指标是否需要概率输入.默认为 None.	`None`
`custom_optimizer`	`Optional[Optimizer]`	自定义优化器,可为可调用对象或导入的字符串.默认为 None.	`None`
`custom_optimizer_params`	`Dict`	自定义优化器参数的字典.默认为 {}.	`{}`
`kwargs`	`Dict`	其他关键字参数.	`{}`

Source code in src/pytorch_tabular/models/base_model.py

    def __init__(
        self,
        config: DictConfig,
        custom_loss: Optional[torch.nn.Module] = None,
        custom_metrics: Optional[List[Callable]] = None,
        custom_metrics_prob_inputs: Optional[List[bool]] = None,
        custom_optimizer: Optional[torch.optim.Optimizer] = None,
        custom_optimizer_params: Dict = {},
        **kwargs,
    ):
        """    PyTorch Tabular 的基础模型.

Parameters:
    config (DictConfig): 模型的配置.
    custom_loss (Optional[torch.nn.Module], optional): 自定义损失函数.默认为 None.
    custom_metrics (Optional[List[Callable]], optional): 自定义指标列表.默认为 None.
    custom_metrics_prob_inputs (Optional[List[bool]], optional): 布尔值列表,指示指标是否需要概率输入.默认为 None.
    custom_optimizer (Optional[torch.optim.Optimizer], optional): 自定义优化器,可为可调用对象或导入的字符串.默认为 None.
    custom_optimizer_params (Dict, optional): 自定义优化器参数的字典.默认为 {}.
    kwargs (Dict, optional): 其他关键字参数.
"""
        super().__init__()
        assert "inferred_config" in kwargs, "inferred_config not found in initialization arguments"
        inferred_config = kwargs["inferred_config"]
        # Merging the config and inferred config
        config = safe_merge_config(config, inferred_config)
        self.custom_loss = custom_loss
        self.custom_metrics = custom_metrics
        self.custom_metrics_prob_inputs = custom_metrics_prob_inputs
        self.custom_optimizer = custom_optimizer
        self.custom_optimizer_params = custom_optimizer_params
        self.kwargs = kwargs
        # Updating config with custom parameters for experiment tracking
        if self.custom_loss is not None:
            config.loss = str(self.custom_loss)
        if self.custom_metrics is not None:
            # Adding metrics to config for hparams logging and tracking
            config.metrics = []
            config.metrics_params = []
            for metric in self.custom_metrics:
                if isinstance(metric, partial):
                    # extracting func names from partial functions
                    config.metrics.append(metric.func.__name__)
                    config.metrics_params.append(metric.keywords)
                else:
                    config.metrics.append(metric.__name__)
                    config.metrics_params.append(vars(metric))
            if config.task == "classification":
                config.metrics_prob_input = self.custom_metrics_prob_inputs
                for i, mp in enumerate(config.metrics_params):
                    mp.sub_params_list = []
                    for j, num_classes in enumerate(inferred_config.output_cardinality):
                        config.metrics_params[i].sub_params_list.append(
                            OmegaConf.create(
                                {
                                    "task": mp.get("task", "multiclass"),
                                    "num_classes": mp.get("num_classes", num_classes),
                                }
                            )
                        )

        # Updating default metrics in config
        elif config.task == "classification":
            # Adding metric_params to config for classification task
            for i, mp in enumerate(config.metrics_params):
                mp.sub_params_list = []
                for j, num_classes in enumerate(inferred_config.output_cardinality):
                    # config.metrics_params[i][j]["task"] = mp.get("task", "multiclass")
                    # config.metrics_params[i][j]["num_classes"] = mp.get("num_classes", num_classes)

                    config.metrics_params[i].sub_params_list.append(
                        OmegaConf.create(
                            {"task": mp.get("task", "multiclass"), "num_classes": mp.get("num_classes", num_classes)}
                        )
                    )

                    if config.metrics[i] in (
                        "accuracy",
                        "precision",
                        "recall",
                        "precision_recall",
                        "specificity",
                        "f1_score",
                        "fbeta_score",
                    ):
                        config.metrics_params[i].sub_params_list[j]["top_k"] = mp.get("top_k", 1)

        if self.custom_optimizer is not None:
            config.optimizer = str(self.custom_optimizer.__class__.__name__)
        if len(self.custom_optimizer_params) > 0:
            config.optimizer_params = self.custom_optimizer_params
        self.save_hyperparameters(config)
        # The concatenated output dim of the embedding layer
        self._build_network()
        self._setup_loss()
        self._setup_metrics()
        self._check_and_verify()
        self.do_log_logits = (
            hasattr(self.hparams, "log_logits") and self.hparams.log_logits and self.hparams.log_target == "wandb"
        )
        if self.do_log_logits:
            self._val_logits = []
        if not WANDB_INSTALLED and self.do_log_logits:
            self.do_log_logits = False
            warnings.warn(
                "Wandb is not installed. Please install wandb to log logits. "
                "You can install wandb using pip install wandb or install PyTorch Tabular"
                " using pip install pytorch-tabular[extra]"
            )
        if not PLOTLY_INSTALLED and self.do_log_logits:
            self.do_log_logits = False
            warnings.warn(
                "Plotly is not installed. Please install plotly to log logits. "
                "You can install plotly using pip install plotly or install PyTorch Tabular"
                " using pip install pytorch-tabular[extra]"
            )

`apply_output_sigmoid_scaling(y_hat)` ¶

对模型输出应用Sigmoid缩放（如果任务是回归且目标范围已定义）.

Parameters:

Name	Type	Description	Default
`y_hat`	`Tensor`	模型的输出	required

Returns:

Type	Description
`Tensor`	torch.Tensor: 应用了Sigmoid缩放的模型输出

Source code in src/pytorch_tabular/models/base_model.py

    def apply_output_sigmoid_scaling(self, y_hat: torch.Tensor) -> torch.Tensor:
        """对模型输出应用Sigmoid缩放（如果任务是回归且目标范围已定义）.

Parameters:
    y_hat (torch.Tensor): 模型的输出

Returns:
    torch.Tensor: 应用了Sigmoid缩放的模型输出
"""
        if (self.hparams.task == "regression") and (self.hparams.target_range is not None):
            for i in range(self.hparams.output_dim):
                y_min, y_max = self.hparams.target_range[i]
                y_hat[:, i] = y_min + nn.Sigmoid()(y_hat[:, i]) * (y_max - y_min)
        return y_hat

`calculate_loss(output, y, tag)` ¶

计算模型的损失.

Parameters:

Name	Type	Description	Default
`output`	`Dict`	模型输出的字典	required
`y`	`Tensor`	目标张量	required
`tag`	`str`	用于日志记录的标签	required

Returns:

Type	Description
`Tensor`	torch.Tensor: 损失值

Source code in src/pytorch_tabular/models/base_model.py

    def calculate_loss(self, output: Dict, y: torch.Tensor, tag: str) -> torch.Tensor:
        """计算模型的损失.

Parameters:
    output (Dict): 模型输出的字典
    y (torch.Tensor): 目标张量
    tag (str): 用于日志记录的标签

Returns:
    torch.Tensor: 损失值
"""
        y_hat = output["logits"]
        reg_terms = [k for k, v in output.items() if "regularization" in k]
        reg_loss = 0
        for t in reg_terms:
            # Log only if non-zero
            if output[t] != 0:
                reg_loss += output[t]
                self.log(
                    f"{tag}_{t}_loss",
                    output[t],
                    on_epoch=True,
                    on_step=False,
                    logger=True,
                    prog_bar=False,
                )
        if self.hparams.task == "regression":
            computed_loss = reg_loss
            for i in range(self.hparams.output_dim):
                _loss = self.loss(y_hat[:, i], y[:, i])
                computed_loss += _loss
                if self.hparams.output_dim > 1:
                    self.log(
                        f"{tag}_loss_{i}",
                        _loss,
                        on_epoch=True,
                        on_step=False,
                        logger=True,
                        prog_bar=False,
                    )
        else:
            # TODO loss fails with batch size of 1?
            computed_loss = reg_loss
            start_index = 0
            for i in range(len(self.hparams.output_cardinality)):
                end_index = start_index + self.hparams.output_cardinality[i]
                _loss = self.loss(y_hat[:, start_index:end_index], y[:, i])
                computed_loss += _loss
                if self.hparams.output_dim > 1:
                    self.log(
                        f"{tag}_loss_{i}",
                        _loss,
                        on_epoch=True,
                        on_step=False,
                        logger=True,
                        prog_bar=False,
                    )
                start_index = end_index
        self.log(
            f"{tag}_loss",
            computed_loss,
            on_epoch=(tag in ["valid", "test"]),
            on_step=(tag == "train"),
            # on_step=False,
            logger=True,
            prog_bar=True,
        )
        return computed_loss

`calculate_metrics(y, y_hat, tag)` ¶

计算模型的各项指标.

Parameters:

Name	Type	Description	Default
`y`	`Tensor`	目标张量	required
`y_hat`	`Tensor`	预测张量	required
`tag`	`str`	用于日志记录的标签	required

Returns:

Type	Description
`List[Tensor]`	List[torch.Tensor]: 指标值列表

Source code in src/pytorch_tabular/models/base_model.py

    def calculate_metrics(self, y: torch.Tensor, y_hat: torch.Tensor, tag: str) -> List[torch.Tensor]:
        """    计算模型的各项指标.

Parameters:
    y (torch.Tensor): 目标张量

    y_hat (torch.Tensor): 预测张量

    tag (str): 用于日志记录的标签

Returns:
    List[torch.Tensor]: 指标值列表
"""
        metrics = []
        for metric, metric_str, prob_inp, metric_params in zip(
            self.metrics,
            self.hparams.metrics,
            self.hparams.metrics_prob_input,
            self.hparams.metrics_params,
        ):
            if self.hparams.task == "regression":
                _metrics = []
                for i in range(self.hparams.output_dim):
                    name = metric.func.__name__ if isinstance(metric, partial) else metric.__name__
                    if name == torchmetrics.functional.mean_squared_log_error.__name__:
                        # MSLE should only be used in strictly positive targets. It is undefined otherwise
                        _metric = metric(
                            torch.clamp(y_hat[:, i], min=0),
                            torch.clamp(y[:, i], min=0),
                            **metric_params,
                        )
                    else:
                        _metric = metric(y_hat[:, i], y[:, i], **metric_params)
                    if self.hparams.output_dim > 1:
                        self.log(
                            f"{tag}_{metric_str}_{i}",
                            _metric,
                            on_epoch=True,
                            on_step=False,
                            logger=True,
                            prog_bar=False,
                        )
                    _metrics.append(_metric)
                avg_metric = torch.stack(_metrics, dim=0).sum()
            else:
                _metrics = []
                start_index = 0
                for i, cardinality in enumerate(self.hparams.output_cardinality):
                    end_index = start_index + cardinality
                    y_hat_i = nn.Softmax(dim=-1)(y_hat[:, start_index:end_index].squeeze())
                    if prob_inp:
                        _metric = metric(y_hat_i, y[:, i : i + 1].squeeze(), **metric_params.sub_params_list[i])
                    else:
                        _metric = metric(
                            torch.argmax(y_hat_i, dim=-1), y[:, i : i + 1].squeeze(), **metric_params.sub_params_list[i]
                        )
                    if len(self.hparams.output_cardinality) > 1:
                        self.log(
                            f"{tag}_{metric_str}_{i}",
                            _metric,
                            on_epoch=True,
                            on_step=False,
                            logger=True,
                            prog_bar=False,
                        )
                    _metrics.append(_metric)
                    start_index = end_index
                avg_metric = torch.stack(_metrics, dim=0).sum()
            metrics.append(avg_metric)
            self.log(
                f"{tag}_{metric_str}",
                avg_metric,
                on_epoch=True,
                on_step=False,
                logger=True,
                prog_bar=True,
            )
        return metrics

`compute_head(backbone_features)` ¶

计算模型的头部.

Parameters:

Name	Type	Description	Default
`backbone_features`	`Tensor`	主干网络的特征	required

Returns:

Type	Description
`Dict[str, Any]`	模型的输出

Source code in src/pytorch_tabular/models/base_model.py

    def compute_head(self, backbone_features: Tensor) -> Dict[str, Any]:
        """    计算模型的头部.

Parameters:
    backbone_features (Tensor): 主干网络的特征

Returns:
    模型的输出
"""
        y_hat = self.head(backbone_features)
        y_hat = self.apply_output_sigmoid_scaling(y_hat)
        return self.pack_output(y_hat, backbone_features)

`data_aware_initialization(datamodule)` ¶

在定义模型时执行数据感知初始化.

Source code in src/pytorch_tabular/models/base_model.py

def data_aware_initialization(self, datamodule):
    """在定义模型时执行数据感知初始化."""
    pass

`extract_embedding()` ¶

提取模型的嵌入.

这在 CategoricalEmbeddingTransformer 中使用

Source code in src/pytorch_tabular/models/base_model.py

    def extract_embedding(self):
        """提取模型的嵌入.

这在 `CategoricalEmbeddingTransformer` 中使用
"""
        if self.hparams.categorical_dim > 0:
            if not isinstance(self.embedding_layer, PreEncoded1dLayer):
                return self.embedding_layer.cat_embedding_layers
            else:
                raise ValueError(
                    "Cannot extract embedding for PreEncoded1dLayer. Please use a different embedding layer."
                )
        else:
            raise ValueError(
                "Model has been trained with no categorical feature and therefore can't be used"
                " as a Categorical Encoder"
            )

`feature_importance()` ¶

返回一个包含模型特征重要性的数据框.

Source code in src/pytorch_tabular/models/base_model.py

def feature_importance(self) -> DataFrame:
    """返回一个包含模型特征重要性的数据框."""
    if hasattr(self.backbone, "feature_importance_"):
        imp = self.backbone.feature_importance_
        n_feat = len(self.hparams.categorical_cols + self.hparams.continuous_cols)
        if self.hparams.categorical_dim > 0:
            if imp.shape[0] != n_feat:
                # Combining Cat Embedded Dimensions to a single one by averaging
                wt = []
                norm = []
                ft_idx = 0
                for _, embd_dim in self.hparams.embedding_dims:
                    wt.extend([ft_idx] * embd_dim)
                    norm.append(embd_dim)
                    ft_idx += 1
                for _ in self.hparams.continuous_cols:
                    wt.extend([ft_idx])
                    norm.append(1)
                    ft_idx += 1
                imp = np.bincount(wt, weights=imp) / np.array(norm)
            else:
                # For models like FTTransformer, we dont need to do anything
                # It takes categorical and continuous as individual 2-D features
                pass
        importance_df = DataFrame(
            {
                "Features": self.hparams.categorical_cols + self.hparams.continuous_cols,
                "importance": imp,
            }
        )
        return importance_df
    else:
        raise ValueError("Feature Importance unavailable for this model.")

`forward(x)` ¶

模型的前向传播.

Parameters:

Name	Type	Description	Default
`x`	`Dict`	模型的输入,包含'continuous'和'categorical'键	required

Source code in src/pytorch_tabular/models/base_model.py

    def forward(self, x: Dict) -> Dict[str, Any]:
        """   模型的前向传播.

Parameters:
    x (Dict): 模型的输入,包含'continuous'和'categorical'键
"""
        x = self.embed_input(x)
        x = self.compute_backbone(x)
        return self.compute_head(x)

`pack_output(y_hat, backbone_features)` ¶

打包模型的输出.

Parameters:

Name	Type	Description	Default
`y_hat`	`Tensor`	模型的输出	required
`backbone_features`	`tensor`	主干网络的特征	required

Returns:

Type	Description
`Dict[str, Any]`	打包后的模型输出

Source code in src/pytorch_tabular/models/base_model.py

    def pack_output(self, y_hat: torch.Tensor, backbone_features: torch.tensor) -> Dict[str, Any]:
        """打包模型的输出.

Parameters:
    y_hat (torch.Tensor): 模型的输出

    backbone_features (torch.tensor): 主干网络的特征

Returns:
    打包后的模型输出
"""
        # if self.head is the Identity function it means that we cannot extract backbone features,
        # because the model cannot be divide in backbone and head (i.e. TabNet)
        if type(self.head) is nn.Identity:
            return {"logits": y_hat}
        return {"logits": y_hat, "backbone_features": backbone_features}

`predict(x, ret_model_output=False)` ¶

预测模型的输出.

Parameters:

Name	Type	Description	Default
`x`	`Dict`	模型的输入,包含'continuous'和'categorical'键	required
`ret_model_output`	`bool`	如果为True,方法返回模型的输出	`False`

Returns:

Type	Description
`Union[Tensor, Tuple[Tensor, Dict]]`	模型的输出

Source code in src/pytorch_tabular/models/base_model.py

    def predict(self, x: Dict, ret_model_output: bool = False) -> Union[torch.Tensor, Tuple[torch.Tensor, Dict]]:
        """    预测模型的输出.

Parameters:
    x (Dict): 模型的输入,包含'continuous'和'categorical'键

    ret_model_output (bool): 如果为True,方法返回模型的输出

Returns:
    模型的输出
"""
        assert self.hparams.task != "ssl", "It's not allowed to use the method predict in case of ssl task"
        ret_value = self.forward(x)
        if ret_model_output:
            return ret_value.get("logits"), ret_value
        return ret_value.get("logits")

监督模型

配置类¶

模型类¶

data_aware_initialization(datamodule) ¶

基础模型类¶

__init__(config, custom_loss=None, custom_metrics=None, custom_metrics_prob_inputs=None, custom_optimizer=None, custom_optimizer_params={}, **kwargs) ¶

apply_output_sigmoid_scaling(y_hat) ¶

calculate_loss(output, y, tag) ¶

calculate_metrics(y, y_hat, tag) ¶

compute_head(backbone_features) ¶

data_aware_initialization(datamodule) ¶

extract_embedding() ¶

feature_importance() ¶

forward(x) ¶

pack_output(y_hat, backbone_features) ¶

predict(x, ret_model_output=False) ¶

`data_aware_initialization(datamodule)` ¶

`init(config, custom_loss=None, custom_metrics=None, custom_metrics_prob_inputs=None, custom_optimizer=None, custom_optimizer_params={}, **kwargs)` ¶

`apply_output_sigmoid_scaling(y_hat)` ¶

`calculate_loss(output, y, tag)` ¶

`calculate_metrics(y, y_hat, tag)` ¶

`compute_head(backbone_features)` ¶

`data_aware_initialization(datamodule)` ¶

`extract_embedding()` ¶

`feature_importance()` ¶

`forward(x)` ¶

`pack_output(y_hat, backbone_features)` ¶

`predict(x, ret_model_output=False)` ¶