Skip to content

监督模型

配置类

Bases: ModelConfig

自动特征交互配置.

Parameters:

Name Type Description Default
attn_embed_dim int

多头注意力层中的隐藏单元数量.默认为 32

32
num_heads int

多头注意力层中的头数.默认为 2

2
num_attn_blocks int

堆叠的多头注意力层的层数.默认为 3

3
attn_dropouts float

多头注意力层之间的 dropout.默认为 0.0

0.0
has_residuals bool

标志,用于在嵌入输出和注意力层输出之间添加残差连接.默认为 True

True
embedding_dim int

连续和分类列的嵌入维度.默认为 16

16
embedding_initialization Optional[str]

嵌入层的初始化方案.默认为 kaiming.可选值为: [kaiming_uniform,kaiming_normal]

'kaiming_uniform'
embedding_bias bool

标志,用于开启嵌入偏置.默认为 True

True
share_embedding bool

标志,用于在输入嵌入过程中开启共享嵌入.关键思想是为特征整体以及该列的每个唯一值提供嵌入.更多详情请参阅 TabTransformer 论文的附录 A.默认为 False

False
share_embedding_strategy Optional[str]

添加共享嵌入有两种策略.1. add - 为特征添加一个单独的嵌入到特征唯一值的嵌入中.2. fraction - 输入嵌入的一部分保留给特征的共享嵌入.默认为 fraction.可选值为: [add,fraction]

'fraction'
shared_embedding_fraction float

保留给共享嵌入的输入嵌入维度的一部分.应小于 1.默认为 0.25

0.25
deep_layers bool

标志,用于在多头注意力层之前启用深层 MLP 层.默认为 False

False
layers str

深层 MLP 中的层数和单元数,用连字符分隔.默认为 128-64-32

'128-64-32'
activation str

深层 MLP 中的激活类型.默认激活类型为 PyTorch 中的 ReLU、TanH、LeakyReLU 等. https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity 默认为 ReLU

'ReLU'
use_batch_norm bool

标志,用于在深层 MLP 中的每个线性层+DropOut 后添加 BatchNorm 层.默认为 False

False
initialization str

深层 MLP 中线性层的初始化方案.默认为 kaiming.可选值为: [kaiming,xavier,random]

'kaiming'
dropout float

深层 MLP 中元素被置零的概率.默认为 0.0

0.0
attention_pooling bool

如果为 True,将组合每个块的注意力输出以进行最终预测.默认为 False

False
task str

指定问题是回归还是分类.backbone 是一种任务,将模型视为生成特征的骨干.主要用于 SSL 及相关任务.可选值为: [regression,classification,backbone]

required
head Optional[str]

模型使用的头部.应为 pytorch_tabular.models.common.heads 中定义的头部之一.默认为 LinearHead.可选值为: [None,LinearHead,MixtureDensityHead]

'LinearHead'
head_config Optional[Dict]

定义头部的配置字典.如果为空,将初始化为默认的线性头部.

lambda: {'layers': ''}()
embedding_dims Optional[List]

每个分类列的嵌入维度列表,格式为 (基数, 嵌入维度).如果为空,将根据分类列的基数推断,规则为 min(50, (x + 1) // 2)

None
embedding_dropout float

应用于分类嵌入的 dropout.默认为 0.0

0.0
batch_norm_continuous_input bool

如果为 True,将通过 BatchNorm 层对连续层进行归一化.

True
learning_rate float

模型的学习率.默认为 1e-3

0.001
loss Optional[str]

应用的损失函数.默认情况下,回归为 MSELoss,分类为 CrossEntropyLoss.除非你确定自己在做什么,否则请保持为 MSELoss 或 L1Loss 用于回归,CrossEntropyLoss 用于分类

None
metrics Optional[List[str]]

训练期间需要跟踪的指标列表.指标应为 torchmetrics 中实现的功能性指标之一.默认情况下,分类为 accuracy,回归为 mean_squared_error

None
metrics_params Optional[List]

传递给指标函数的参数

None
metrics_prob_input Optional[List]

配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为 None

None
target_range Optional[List]

输出变量应限制的范围.当前在多目标回归中被忽略.通常用于回归问题.如果为空,将不应用任何限制

None
seed int

用于可重复性的种子.默认为 42

42
Source code in src/pytorch_tabular/models/autoint/config.py
@dataclass
class AutoIntConfig(ModelConfig):
    """自动特征交互配置.

    Parameters:
        attn_embed_dim (int): 多头注意力层中的隐藏单元数量.默认为 32

        num_heads (int): 多头注意力层中的头数.默认为 2

        num_attn_blocks (int): 堆叠的多头注意力层的层数.默认为 3

        attn_dropouts (float): 多头注意力层之间的 dropout.默认为 0.0

        has_residuals (bool): 标志,用于在嵌入输出和注意力层输出之间添加残差连接.默认为 True

        embedding_dim (int): 连续和分类列的嵌入维度.默认为 16

        embedding_initialization (Optional[str]): 嵌入层的初始化方案.默认为 `kaiming`.可选值为: [`kaiming_uniform`,`kaiming_normal`]

        embedding_bias (bool): 标志,用于开启嵌入偏置.默认为 True

        share_embedding (bool): 标志,用于在输入嵌入过程中开启共享嵌入.关键思想是为特征整体以及该列的每个唯一值提供嵌入.更多详情请参阅 TabTransformer 论文的附录 A.默认为 False

        share_embedding_strategy (Optional[str]): 添加共享嵌入有两种策略.1. `add` - 为特征添加一个单独的嵌入到特征唯一值的嵌入中.2. `fraction` - 输入嵌入的一部分保留给特征的共享嵌入.默认为 fraction.可选值为: [`add`,`fraction`]

        shared_embedding_fraction (float): 保留给共享嵌入的输入嵌入维度的一部分.应小于 1.默认为 0.25

        deep_layers (bool): 标志,用于在多头注意力层之前启用深层 MLP 层.默认为 False

        layers (str): 深层 MLP 中的层数和单元数,用连字符分隔.默认为 128-64-32

        activation (str): 深层 MLP 中的激活类型.默认激活类型为 PyTorch 中的 ReLU、TanH、LeakyReLU 等.
                https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity
                默认为 ReLU

        use_batch_norm (bool): 标志,用于在深层 MLP 中的每个线性层+DropOut 后添加 BatchNorm 层.默认为 False

        initialization (str): 深层 MLP 中线性层的初始化方案.默认为 `kaiming`.可选值为: [`kaiming`,`xavier`,`random`]

        dropout (float): 深层 MLP 中元素被置零的概率.默认为 0.0

        attention_pooling (bool): 如果为 True,将组合每个块的注意力输出以进行最终预测.默认为 False

        task (str): 指定问题是回归还是分类.`backbone` 是一种任务,将模型视为生成特征的骨干.主要用于 SSL 及相关任务.可选值为: [`regression`,`classification`,`backbone`]

        head (Optional[str]): 模型使用的头部.应为 `pytorch_tabular.models.common.heads` 中定义的头部之一.默认为 LinearHead.可选值为: [`None`,`LinearHead`,`MixtureDensityHead`]

        head_config (Optional[Dict]): 定义头部的配置字典.如果为空,将初始化为默认的线性头部.

        embedding_dims (Optional[List]): 每个分类列的嵌入维度列表,格式为 (基数, 嵌入维度).如果为空,将根据分类列的基数推断,规则为 min(50, (x + 1) // 2)

        embedding_dropout (float): 应用于分类嵌入的 dropout.默认为 0.0

        batch_norm_continuous_input (bool): 如果为 True,将通过 BatchNorm 层对连续层进行归一化.

        learning_rate (float): 模型的学习率.默认为 1e-3

        loss (Optional[str]): 应用的损失函数.默认情况下,回归为 MSELoss,分类为 CrossEntropyLoss.除非你确定自己在做什么,否则请保持为 MSELoss 或 L1Loss 用于回归,CrossEntropyLoss 用于分类

        metrics (Optional[List[str]]): 训练期间需要跟踪的指标列表.指标应为 ``torchmetrics`` 中实现的功能性指标之一.默认情况下,分类为 accuracy,回归为 mean_squared_error

        metrics_params (Optional[List]): 传递给指标函数的参数

        metrics_prob_input (Optional[List]): 配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为 None

        target_range (Optional[List]): 输出变量应限制的范围.当前在多目标回归中被忽略.通常用于回归问题.如果为空,将不应用任何限制

        seed (int): 用于可重复性的种子.默认为 42"""

    attn_embed_dim: int = field(
        default=32,
        metadata={"help": "The number of hidden units in the Multi-Headed Attention layers. Defaults to 32"},
    )
    num_heads: int = field(
        default=2,
        metadata={"help": "The number of heads in the Multi-Headed Attention layer. Defaults to 2"},
    )
    num_attn_blocks: int = field(
        default=3,
        metadata={"help": "The number of layers of stacked Multi-Headed Attention layers. Defaults to 3"},
    )
    attn_dropouts: float = field(
        default=0.0,
        metadata={"help": "Dropout between layers of Multi-Headed Attention Layers. Defaults to 0.0"},
    )
    has_residuals: bool = field(
        default=True,
        metadata={
            "help": "Flag to have a residual connect from enbedded output to attention layer output. Defaults to True"
        },
    )
    embedding_dim: int = field(
        default=16,
        metadata={"help": "The dimensions of the embedding for continuous and categorical columns. Defaults to 16"},
    )
    embedding_initialization: Optional[str] = field(
        default="kaiming_uniform",
        metadata={
            "help": "Initialization scheme for the embedding layers. Defaults to `kaiming`",
            "choices": ["kaiming_uniform", "kaiming_normal"],
        },
    )
    embedding_bias: bool = field(
        default=True,
        metadata={"help": "Flag to turn on Embedding Bias. Defaults to True"},
    )
    share_embedding: bool = field(
        default=False,
        metadata={
            "help": "The flag turns on shared embeddings in the input embedding process."
            " The key idea here is to have an embedding for the feature as a whole along with embeddings"
            " of each unique values of that column."
            " For more details refer to Appendix A of the TabTransformer paper. Defaults to False"
        },
    )
    share_embedding_strategy: Optional[str] = field(
        default="fraction",
        metadata={
            "help": "There are two strategies in adding shared embeddings."
            " 1. `add` - A separate embedding for the feature is added to the embedding"
            " of the unique values of the feature."
            " 2. `fraction` - A fraction of the input embedding is reserved"
            " for the shared embedding of the feature. Defaults to fraction.",
            "choices": ["add", "fraction"],
        },
    )
    shared_embedding_fraction: float = field(
        default=0.25,
        metadata={
            "help": "Fraction of the input_embed_dim to be reserved by the shared embedding."
            " Should be less than one. Defaults to 0.25"
        },
    )
    deep_layers: bool = field(
        default=False,
        metadata={"help": "Flag to enable a deep MLP layer before the Multi-Headed Attention layer. Defaults to False"},
    )
    layers: str = field(
        default="128-64-32",
        metadata={"help": "Hyphen-separated number of layers and units in the deep MLP. Defaults to 128-64-32"},
    )
    activation: str = field(
        default="ReLU",
        metadata={
            "help": "The activation type in the deep MLP. The default activation in PyTorch"
            " like ReLU, TanH, LeakyReLU, etc."
            " https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity."
            " Defaults to ReLU"
        },
    )
    use_batch_norm: bool = field(
        default=False,
        metadata={
            "help": "Flag to include a BatchNorm layer after each Linear Layer+DropOut in the deep MLP."
            " Defaults to False"
        },
    )
    initialization: str = field(
        default="kaiming",
        metadata={
            "help": "Initialization scheme for the linear layers in the deep MLP. Defaults to `kaiming`",
            "choices": ["kaiming", "xavier", "random"],
        },
    )
    dropout: float = field(
        default=0.0,
        metadata={"help": "Probability of an element to be zeroed in the deep MLP. Defaults to 0.0"},
    )
    attention_pooling: bool = field(
        default=False,
        metadata={
            "help": "If True, will combine the attention outputs of each block for final prediction. Defaults to False"
        },
    )
    _module_src: str = field(default="models.autoint")
    _model_name: str = field(default="AutoIntModel")
    _backbone_name: str = field(default="AutoIntBackbone")
    _config_name: str = field(default="AutoIntConfig")

Bases: ModelConfig

类别嵌入模型配置.

Parameters:

Name Type Description Default
layers str

已弃用: 分类头中层数和单元数的连字符分隔字符串.例如 32-64-32. 默认为 128-64-32

'128-64-32'
activation str

已弃用: 分类头中的激活类型.默认激活类型为 PyTorch 中的 ReLU、TanH、LeakyReLU 等. https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity. 默认为 ReLU

'ReLU'
use_batch_norm bool

已弃用: 标志,用于在每个线性层+DropOut 后包含一个 BatchNorm 层.默认为 False

False
initialization str

已弃用: 线性层的初始化方案.默认为 kaiming.可选值为: [kaiming,xavier,random].

'kaiming'
dropout float

已弃用: 分类元素被置零的概率.这会添加到每个线性层.默认为 0.0

0.0
task str

指定问题是回归还是分类.backbone 是一种任务,将模型视为生成特征的骨干.主要用于 SSL 及相关任务. 可选值为: [regression,classification,backbone].

required
head Optional[str]

模型使用的头部.应为 pytorch_tabular.models.common.heads 中定义的头部之一. 默认为 LinearHead.可选值为: [None,LinearHead,MixtureDensityHead].

'LinearHead'
head_config Optional[Dict]

定义头部的配置字典.如果留空,将初始化为默认的线性头部.

lambda: {'layers': ''}()
embedding_dims Optional[List]

每个分类列的嵌入维度列表,格式为 (基数, 嵌入维度).如果留空,将根据分类列的基数推断, 使用规则 min(50, (x + 1) // 2)

None
embedding_dropout float

应用于分类嵌入的 Dropout.默认为 0.0

0.0
batch_norm_continuous_input bool

如果为 True,我们将通过 BatchNorm 层对连续层进行归一化.

True
learning_rate float

模型的学习率.默认为 1e-3.

0.001
loss Optional[str]

要应用的损失函数.默认情况下,回归为 MSELoss,分类为 CrossEntropyLoss.除非你确定自己在做什么, 否则请保留为 MSELoss 或 L1Loss 用于回归,CrossEntropyLoss 用于分类

None
metrics Optional[List[str]]

训练期间需要跟踪的指标列表.指标应为 torchmetrics 中实现的功能性指标之一. 默认情况下,分类为 accuracy,回归为 mean_squared_error

None
metrics_params Optional[List]

传递给指标函数的参数.task 强制为 multiclass,因为多分类版本可以处理二分类, 为了简单起见,我们只使用 multiclass.

None
metrics_prob_input Optional[List]

配置中定义的分类指标的强制参数. 这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为 None.

None
target_range Optional[List]

限制输出变量的范围.目前忽略多目标回归.通常用于回归问题.如果留空,将不应用任何限制

None
seed int

用于可重复性的种子.默认为 42

42
Source code in src/pytorch_tabular/models/category_embedding/config.py
@dataclass
class CategoryEmbeddingModelConfig(ModelConfig):
    """类别嵌入模型配置.

    Parameters:
        layers (str): 已弃用: 分类头中层数和单元数的连字符分隔字符串.例如 32-64-32.
                默认为 128-64-32

        activation (str): 已弃用: 分类头中的激活类型.默认激活类型为 PyTorch 中的 ReLU、TanH、LeakyReLU 等.
                https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity.
                默认为 ReLU

        use_batch_norm (bool): 已弃用: 标志,用于在每个线性层+DropOut 后包含一个 BatchNorm 层.默认为 False

        initialization (str): 已弃用: 线性层的初始化方案.默认为 `kaiming`.可选值为: [`kaiming`,`xavier`,`random`].

        dropout (float): 已弃用: 分类元素被置零的概率.这会添加到每个线性层.默认为 0.0


        task (str): 指定问题是回归还是分类.`backbone` 是一种任务,将模型视为生成特征的骨干.主要用于 SSL 及相关任务.
                可选值为: [`regression`,`classification`,`backbone`].

        head (Optional[str]): 模型使用的头部.应为 `pytorch_tabular.models.common.heads` 中定义的头部之一.
                默认为 LinearHead.可选值为: [`None`,`LinearHead`,`MixtureDensityHead`].

        head_config (Optional[Dict]): 定义头部的配置字典.如果留空,将初始化为默认的线性头部.

        embedding_dims (Optional[List]): 每个分类列的嵌入维度列表,格式为 (基数, 嵌入维度).如果留空,将根据分类列的基数推断,
                使用规则 min(50, (x + 1) // 2)

        embedding_dropout (float): 应用于分类嵌入的 Dropout.默认为 0.0

        batch_norm_continuous_input (bool): 如果为 True,我们将通过 BatchNorm 层对连续层进行归一化.

        learning_rate (float): 模型的学习率.默认为 1e-3.

        loss (Optional[str]): 要应用的损失函数.默认情况下,回归为 MSELoss,分类为 CrossEntropyLoss.除非你确定自己在做什么,
                否则请保留为 MSELoss 或 L1Loss 用于回归,CrossEntropyLoss 用于分类

        metrics (Optional[List[str]]): 训练期间需要跟踪的指标列表.指标应为 ``torchmetrics`` 中实现的功能性指标之一.
                默认情况下,分类为 accuracy,回归为 mean_squared_error

        metrics_params (Optional[List]): 传递给指标函数的参数.`task` 强制为 `multiclass`,因为多分类版本可以处理二分类,
                为了简单起见,我们只使用 `multiclass`.

        metrics_prob_input (Optional[List]): 配置中定义的分类指标的强制参数.
            这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为 None.

        target_range (Optional[List]): 限制输出变量的范围.目前忽略多目标回归.通常用于回归问题.如果留空,将不应用任何限制

        seed (int): 用于可重复性的种子.默认为 42"""

    layers: str = field(
        default="128-64-32",
        metadata={
            "help": (
                "Hyphen-separated number of layers and units in the classification"
                " head. eg. 32-64-32. Defaults to 128-64-32"
            )
        },
    )
    activation: str = field(
        default="ReLU",
        metadata={
            "help": (
                "The activation type in the classification head. The default"
                " activation in PyTorch like ReLU, TanH, LeakyReLU, etc."
                " https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity."
                " Defaults to ReLU"
            )
        },
    )
    use_batch_norm: bool = field(
        default=False,
        metadata={"help": ("Flag to include a BatchNorm layer after each Linear Layer+DropOut." " Defaults to False")},
    )
    initialization: str = field(
        default="kaiming",
        metadata={
            "help": ("Initialization scheme for the linear layers. Defaults to `kaiming`"),
            "choices": ["kaiming", "xavier", "random"],
        },
    )
    dropout: float = field(
        default=0.0,
        metadata={
            "help": (
                "probability of an classification element to be zeroed."
                " This is added to each linear layer. Defaults to 0.0"
            )
        },
    )

    # def __post_init__(self):
    #     deprecated_args = [
    #         "layers",
    #         "activation",
    #         "use_batch_norm",
    #         "initialization",
    #         "dropout",
    #     ]
    #     # for arg in deprecated_args:
    #     if any([getattr(self, arg) is not None for arg in deprecated_args]):
    #         warnings.warn(
    #             f"{deprecated_args} are deprecated and will be remoevd in next version. "
    #             "Please use 'head' and `head_config` and set deprecated args "
    #             "to `None` to turn off warning. CategoricalEmbedding model is just a "
    #             "linear head with embedding layers."
    #         )
    #     return super().__post_init__()

    _module_src: str = field(default="models.category_embedding")
    _model_name: str = field(default="CategoryEmbeddingModel")
    _backbone_name: str = field(default="CategoryEmbeddingBackbone")
    _config_name: str = field(default="CategoryEmbeddingModelConfig")

Bases: ModelConfig

DANet 配置.

Parameters:

Name Type Description Default
n_layers int

DANet 中块的数量.8、20、32 是论文评估的配置.默认为 8

8
abstlay_dim_1 int

块中第一个 ABSTLAY 层中间输出的维度.默认为 32

32
abstlay_dim_2 int

块中第二个 ABSTLAY 层中间输出的维度.默认为 64

None
k int

ABSTLAY 层中特征组的数量.默认为 5

5
dropout_rate float

块中应用的 dropout.默认为 0.1

0.1
task str

指定问题是回归还是分类.backbone 是一种任务,将模型视为生成特征的骨干.主要用于 SSL 及相关任务.可选值为: [regression, classification, backbone].

required
head Optional[str]

模型使用的头部.应为 pytorch_tabular.models.common.heads 中定义的头部之一.默认为 LinearHead.可选值为: [None, LinearHead, MixtureDensityHead].

'LinearHead'
head_config Optional[Dict]

定义头部的配置字典.如果留空,将初始化为默认的线性头部.

lambda: {'layers': ''}()
embedding_dims Optional[List]

每个分类列的嵌入维度,格式为列表中的元组 (基数, 嵌入维度).如果留空,将根据分类列的基数推断,规则为 min(50, (x + 1) // 2)

None
embedding_dropout float

分类嵌入应用的 dropout.默认为 0.0

0.0
batch_norm_continuous_input bool

如果为 True,将通过 BatchNorm 层对连续层进行归一化.

True
learning_rate float

模型的学习率.默认为 1e-3.

0.001
loss Optional[str]

应用的损失函数.默认情况下,回归为 MSELoss,分类为 CrossEntropyLoss.除非你确定自己在做什么,否则请保留为 MSELoss 或 L1Loss 用于回归,CrossEntropyLoss 用于分类

None
metrics Optional[List[str]]

训练期间需要跟踪的指标列表.指标应为 torchmetrics 中实现的功能性指标之一.默认情况下,分类为 accuracy,回归为 mean_squared_error

None
metrics_params Optional[List]

传递给指标函数的参数.task 强制为 multiclass,因为多分类版本可以处理二分类,为了简单起见,我们只使用 multiclass.

None
metrics_prob_input Optional[List]

是配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为 None.

None
target_range Optional[List]

限制输出变量的范围.目前忽略多目标回归.通常用于回归问题.如果留空,将不应用任何限制

None
seed int

可重复性的种子.默认为 42

42
Source code in src/pytorch_tabular/models/danet/config.py
@dataclass
class DANetConfig(ModelConfig):
    """DANet 配置.

    Parameters:
        n_layers (int): DANet 中块的数量.8、20、32 是论文评估的配置.默认为 8

        abstlay_dim_1 (int): 块中第一个 ABSTLAY 层中间输出的维度.默认为 32

        abstlay_dim_2 (int): 块中第二个 ABSTLAY 层中间输出的维度.默认为 64

        k (int): ABSTLAY 层中特征组的数量.默认为 5

        dropout_rate (float): 块中应用的 dropout.默认为 0.1

        task (str): 指定问题是回归还是分类.`backbone` 是一种任务,将模型视为生成特征的骨干.主要用于 SSL 及相关任务.可选值为: [`regression`, `classification`, `backbone`].

        head (Optional[str]): 模型使用的头部.应为 `pytorch_tabular.models.common.heads` 中定义的头部之一.默认为 LinearHead.可选值为: [`None`, `LinearHead`, `MixtureDensityHead`].

        head_config (Optional[Dict]): 定义头部的配置字典.如果留空,将初始化为默认的线性头部.

        embedding_dims (Optional[List]): 每个分类列的嵌入维度,格式为列表中的元组 (基数, 嵌入维度).如果留空,将根据分类列的基数推断,规则为 min(50, (x + 1) // 2)

        embedding_dropout (float): 分类嵌入应用的 dropout.默认为 0.0

        batch_norm_continuous_input (bool): 如果为 True,将通过 BatchNorm 层对连续层进行归一化.

        learning_rate (float): 模型的学习率.默认为 1e-3.

        loss (Optional[str]): 应用的损失函数.默认情况下,回归为 MSELoss,分类为 CrossEntropyLoss.除非你确定自己在做什么,否则请保留为 MSELoss 或 L1Loss 用于回归,CrossEntropyLoss 用于分类

        metrics (Optional[List[str]]): 训练期间需要跟踪的指标列表.指标应为 ``torchmetrics`` 中实现的功能性指标之一.默认情况下,分类为 accuracy,回归为 mean_squared_error

        metrics_params (Optional[List]): 传递给指标函数的参数.`task` 强制为 `multiclass`,因为多分类版本可以处理二分类,为了简单起见,我们只使用 `multiclass`.

        metrics_prob_input (Optional[List]): 是配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为 None.

        target_range (Optional[List]): 限制输出变量的范围.目前忽略多目标回归.通常用于回归问题.如果留空,将不应用任何限制

        seed (int): 可重复性的种子.默认为 42"""

    n_layers: int = field(
        default=8,
        metadata={"help": "Number of Blocks in the DANet. Each block has 2 Abstlay Blocks each. Defaults to 8"},
    )

    abstlay_dim_1: int = field(
        default=32,
        metadata={
            "help": "The dimension for the intermediate output in the first ABSTLAY layer in a Block. Defaults to 32"
        },
    )

    abstlay_dim_2: Optional[int] = field(
        default=None,
        metadata={
            "help": "The dimension for the intermediate output in the second ABSTLAY layer in a Block."
            "If None, it will be twice abstlay_dim_1. Defaults to None"
        },
    )
    k: int = field(
        default=5,
        metadata={"help": "The number of feature groups in the ABSTLAY layer. Defaults to 5"},
    )
    dropout_rate: float = field(
        default=0.1,
        metadata={"help": "Dropout to be applied in the Block. Defaults to 0.1"},
    )
    block_activation: str = field(
        default="LeakyReLU",
        metadata={
            "help": "The activation type in the classification head. The default activation in PyTorch"
            " like ReLU, TanH, LeakyReLU, etc."
            " https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity"
        },
    )
    virtual_batch_size: Optional[int] = field(
        default=256,
        metadata={
            "help": "If not None, all BatchNorms will be converted to GhostBatchNorm's "
            " with this virtual batch size. Defaults to None"
        },
    )

    _module_src: str = field(default="models.danet")
    _model_name: str = field(default="DANetModel")
    _backbone_name: str = field(default="DANetBackbone")
    _config_name: str = field(default="DANetConfig")

    def __post_init__(self):
        if self.abstlay_dim_2 is None:
            self.abstlay_dim_2 = self.abstlay_dim_1 * 2
        return super().__post_init__()

Bases: ModelConfig

Tab Transformer 配置.

Parameters:

Name Type Description Default
input_embed_dim int

输入分类特征的嵌入维度.默认为 32

32
embedding_initialization Optional[str]

嵌入层的初始化方案.默认为 kaiming.可选值为: [kaiming_uniform,kaiming_normal].

'kaiming_uniform'
embedding_bias bool

是否开启嵌入偏置的标志.默认为 True

True
share_embedding bool

该标志用于在输入嵌入过程中开启共享嵌入.其核心思想是为整个特征及其每个唯一值分别设置嵌入.更多详情请参阅 TabTransformer 论文的附录 A.默认为 False

False
share_embedding_strategy Optional[str]

添加共享嵌入有两种策略.1. add - 为特征添加一个独立的嵌入,并将其与特征唯一值的嵌入相加.2. fraction - 输入嵌入的一部分保留给特征的共享嵌入.默认为 fraction.可选值为: [add,fraction].

'fraction'
shared_embedding_fraction float

保留给共享嵌入的输入嵌入维度比例.应小于 1.默认为 0.25

0.25
attn_feature_importance bool

如果遇到内存问题,可以关闭特征重要性,这样就不会保存注意力权重.默认为 True

True
num_heads int

多头注意力层中的头数.默认为 8

8
num_attn_blocks int

堆叠的多头注意力层数.默认为 6

6
transformer_head_dim Optional[int]

多头注意力层中的隐藏单元数.默认为 None,将与输入维度相同.

None
attn_dropout float

多头注意力后应用的 dropout.默认为 0.1

0.1
add_norm_dropout float

AddNorm 层中应用的 dropout.默认为 0.1

0.1
ff_dropout float

位置前馈网络中应用的 dropout.默认为 0.1

0.1
ff_hidden_multiplier int

位置前馈层对输入的缩放倍数.默认为 4

4
transformer_activation str

变换器前馈层中的激活类型.除了 PyTorch 中的默认激活函数如 ReLU、TanH、LeakyReLU 等(https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity),还实现了 GEGLU、ReGLU 和 SwiGLU(https://arxiv.org/pdf/2002.05202.pdf).默认为 GEGLU

'GEGLU'
task str

指定问题是回归还是分类.backbone 是一种任务,将模型视为生成特征的骨干网络.主要用于 SSL 及相关任务.可选值为: [regression,classification,backbone].

required
head Optional[str]

模型使用的头部.应为 pytorch_tabular.models.common.heads 中定义的头部之一.默认为 LinearHead.可选值为: [None,LinearHead,MixtureDensityHead].

'LinearHead'
head_config Optional[Dict]

定义头部的配置字典.如果留空,将初始化为默认的线性头部.

lambda: {'layers': ''}()
embedding_dims Optional[List]

每个分类列的嵌入维度列表,格式为 (基数, 嵌入维度).如果留空,将根据分类列的基数推断,规则为 min(50, (x + 1) // 2)

None
embedding_dropout float

分类嵌入中应用的 dropout.默认为 0.0

0.0
batch_norm_continuous_input bool

如果为 True,将通过 BatchNorm 层对连续层进行归一化.

True
learning_rate float

模型的学习率.默认为 1e-3.

0.001
loss Optional[str]

应用的损失函数.默认情况下,回归为 MSELoss,分类为 CrossEntropyLoss.除非你确定自己在做什么,否则请保持为 MSELoss 或 L1Loss 用于回归,CrossEntropyLoss 用于分类

None
metrics Optional[List[str]]

训练期间需要跟踪的指标列表.指标应为 torchmetrics 中实现的功能性指标之一.默认情况下,分类为 accuracy,回归为 mean_squared_error

None
metrics_params Optional[List]

传递给指标函数的参数.task 强制为 multiclass,因为多分类版本可以处理二分类,并且为了简化,我们仅使用 multiclass.

None
metrics_prob_input Optional[List]

是配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为 None.

None
target_range Optional[List]

限制输出变量的范围.目前多目标回归中忽略.通常用于回归问题.如果留空,将不应用任何限制

None
seed int

用于可重复性的种子.默认为 42

42
Source code in src/pytorch_tabular/models/ft_transformer/config.py
@dataclass
class FTTransformerConfig(ModelConfig):
    """Tab Transformer 配置.

    Parameters:
        input_embed_dim (int): 输入分类特征的嵌入维度.默认为 32

        embedding_initialization (Optional[str]): 嵌入层的初始化方案.默认为 `kaiming`.可选值为: [`kaiming_uniform`,`kaiming_normal`].

        embedding_bias (bool): 是否开启嵌入偏置的标志.默认为 True

        share_embedding (bool): 该标志用于在输入嵌入过程中开启共享嵌入.其核心思想是为整个特征及其每个唯一值分别设置嵌入.更多详情请参阅 TabTransformer 论文的附录 A.默认为 False

        share_embedding_strategy (Optional[str]): 添加共享嵌入有两种策略.1. `add` - 为特征添加一个独立的嵌入,并将其与特征唯一值的嵌入相加.2. `fraction` - 输入嵌入的一部分保留给特征的共享嵌入.默认为 fraction.可选值为: [`add`,`fraction`].

        shared_embedding_fraction (float): 保留给共享嵌入的输入嵌入维度比例.应小于 1.默认为 0.25

        attn_feature_importance (bool): 如果遇到内存问题,可以关闭特征重要性,这样就不会保存注意力权重.默认为 True

        num_heads (int): 多头注意力层中的头数.默认为 8

        num_attn_blocks (int): 堆叠的多头注意力层数.默认为 6

        transformer_head_dim (Optional[int]): 多头注意力层中的隐藏单元数.默认为 None,将与输入维度相同.

        attn_dropout (float): 多头注意力后应用的 dropout.默认为 0.1

        add_norm_dropout (float): AddNorm 层中应用的 dropout.默认为 0.1

        ff_dropout (float): 位置前馈网络中应用的 dropout.默认为 0.1

        ff_hidden_multiplier (int): 位置前馈层对输入的缩放倍数.默认为 4

        transformer_activation (str): 变换器前馈层中的激活类型.除了 PyTorch 中的默认激活函数如 ReLU、TanH、LeakyReLU 等(https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity),还实现了 GEGLU、ReGLU 和 SwiGLU(https://arxiv.org/pdf/2002.05202.pdf).默认为 GEGLU

        task (str): 指定问题是回归还是分类.`backbone` 是一种任务,将模型视为生成特征的骨干网络.主要用于 SSL 及相关任务.可选值为: [`regression`,`classification`,`backbone`].

        head (Optional[str]): 模型使用的头部.应为 `pytorch_tabular.models.common.heads` 中定义的头部之一.默认为 LinearHead.可选值为: [`None`,`LinearHead`,`MixtureDensityHead`].

        head_config (Optional[Dict]): 定义头部的配置字典.如果留空,将初始化为默认的线性头部.

        embedding_dims (Optional[List]): 每个分类列的嵌入维度列表,格式为 (基数, 嵌入维度).如果留空,将根据分类列的基数推断,规则为 min(50, (x + 1) // 2)

        embedding_dropout (float): 分类嵌入中应用的 dropout.默认为 0.0

        batch_norm_continuous_input (bool): 如果为 True,将通过 BatchNorm 层对连续层进行归一化.

        learning_rate (float): 模型的学习率.默认为 1e-3.

        loss (Optional[str]): 应用的损失函数.默认情况下,回归为 MSELoss,分类为 CrossEntropyLoss.除非你确定自己在做什么,否则请保持为 MSELoss 或 L1Loss 用于回归,CrossEntropyLoss 用于分类

        metrics (Optional[List[str]]): 训练期间需要跟踪的指标列表.指标应为 ``torchmetrics`` 中实现的功能性指标之一.默认情况下,分类为 accuracy,回归为 mean_squared_error

        metrics_params (Optional[List]): 传递给指标函数的参数.`task` 强制为 `multiclass`,因为多分类版本可以处理二分类,并且为了简化,我们仅使用 `multiclass`.

        metrics_prob_input (Optional[List]): 是配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为 None.

        target_range (Optional[List]): 限制输出变量的范围.目前多目标回归中忽略.通常用于回归问题.如果留空,将不应用任何限制

        seed (int): 用于可重复性的种子.默认为 42"""

    input_embed_dim: int = field(
        default=32,
        metadata={"help": "The embedding dimension for the input categorical features. Defaults to 32"},
    )
    embedding_initialization: Optional[str] = field(
        default="kaiming_uniform",
        metadata={
            "help": "Initialization scheme for the embedding layers. Defaults to `kaiming`",
            "choices": ["kaiming_uniform", "kaiming_normal"],
        },
    )
    embedding_bias: bool = field(
        default=True,
        metadata={"help": "Flag to turn on Embedding Bias. Defaults to True"},
    )
    share_embedding: bool = field(
        default=False,
        metadata={
            "help": "The flag turns on shared embeddings in the input embedding process."
            " The key idea here is to have an embedding for the feature as a whole along with embeddings"
            " of each unique values of that column. For more details refer"
            " to Appendix A of the TabTransformer paper. Defaults to False"
        },
    )
    share_embedding_strategy: Optional[str] = field(
        default="fraction",
        metadata={
            "help": "There are two strategies in adding shared embeddings."
            " 1. `add` - A separate embedding for the feature is added to the embedding"
            " of the unique values of the feature."
            " 2. `fraction` - A fraction of the input embedding is reserved"
            " for the shared embedding of the feature. Defaults to fraction.",
            "choices": ["add", "fraction"],
        },
    )
    shared_embedding_fraction: float = field(
        default=0.25,
        metadata={
            "help": "Fraction of the input_embed_dim to be reserved by the shared embedding."
            " Should be less than one. Defaults to 0.25"
        },
    )
    attn_feature_importance: bool = field(
        default=True,
        metadata={
            "help": "If you are facing memory issues, you can turn off feature importance"
            " which will not save the attention weights. Defaults to True"
        },
    )
    num_heads: int = field(
        default=8,
        metadata={"help": "The number of heads in the Multi-Headed Attention layer. Defaults to 8"},
    )
    num_attn_blocks: int = field(
        default=6,
        metadata={"help": "The number of layers of stacked Multi-Headed Attention layers. Defaults to 6"},
    )
    transformer_head_dim: Optional[int] = field(
        default=None,
        metadata={
            "help": "The number of hidden units in the Multi-Headed Attention layers."
            " Defaults to None and will be same as input_dim."
        },
    )
    attn_dropout: float = field(
        default=0.1,
        metadata={"help": "Dropout to be applied after Multi headed Attention. Defaults to 0.1"},
    )
    add_norm_dropout: float = field(
        default=0.1,
        metadata={"help": "Dropout to be applied in the AddNorm Layer. Defaults to 0.1"},
    )
    ff_dropout: float = field(
        default=0.1,
        metadata={"help": "Dropout to be applied in the Positionwise FeedForward Network. Defaults to 0.1"},
    )
    ff_hidden_multiplier: int = field(
        default=4,
        metadata={"help": "Multiple by which the Positionwise FF layer scales the input. Defaults to 4"},
    )

    transformer_activation: str = field(
        default="GEGLU",
        metadata={
            "help": "The activation type in the transformer feed forward layers."
            " In addition to the default activation in PyTorch like ReLU, TanH, LeakyReLU, etc."
            " https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity,"
            " GEGLU, ReGLU and SwiGLU are also implemented (https://arxiv.org/pdf/2002.05202.pdf)."
            " Defaults to GEGLU",
        },
    )

    _module_src: str = field(default="models.ft_transformer")
    _model_name: str = field(default="FTTransformerModel")
    _backbone_name: str = field(default="FTTransformerBackbone")
    _config_name: str = field(default="FTTransformerConfig")

Bases: ModelConfig

门控自适应网络用于深度自动化特征学习(GANDALF)配置.

Parameters:

Name Type Description Default
gflu_stages int

特征抽象层的层数.默认为 6

6
gflu_dropout float

特征抽象层的丢弃率.默认为 0.0

0.0
gflu_feature_init_sparsity float

仅对 t-softmax 有效.在每个 GFLU 阶段中选择的特征百分比.这只是初始化值,在学习过程中可能会改变.默认为 0.3

0.3
learnable_sparsity bool

仅对 t-softmax 有效.如果为 True,稀疏性参数将被学习.如果为 False,稀疏性参数将固定为 gflu_feature_init_sparsitytree_feature_init_sparsity 中指定的初始值.默认为 True

True
task str

指定问题是回归还是分类.backbone 是一种将模型视为生成特征的主干的任务.主要用于 SSL 及相关任务.可选值为: [regression,classification,backbone]

required
head Optional[str]

模型使用的头部.应为 pytorch_tabular.models.common.heads 中定义的头部之一.默认为 LinearHead.可选值为: [None,LinearHead,MixtureDensityHead]

'LinearHead'
head_config Optional[Dict]

定义头部的配置字典.如果留空,将初始化为默认的线性头部.

lambda: {'layers': ''}()
embedding_dims Optional[List]

每个分类列的嵌入维度,格式为 (基数, 嵌入维度) 的元组列表.如果留空,将根据分类列的基数推断,规则为 min(50, (x + 1) // 2)

None
embedding_dropout float

应用于分类嵌入的丢弃率.默认为 0.0

0.0
batch_norm_continuous_input bool

如果为 True,将通过 BatchNorm 层对连续层进行归一化.

True
learning_rate float

模型的学习率.默认为 1e-3

0.001
loss Optional[str]

应用的损失函数.默认情况下,回归为 MSELoss,分类为 CrossEntropyLoss.除非你确定自己在做什么,否则请保持为 MSELoss 或 L1Loss 用于回归,CrossEntropyLoss 用于分类

None
metrics Optional[List[str]]

训练期间需要跟踪的指标列表.指标应为 torchmetrics 中实现的功能性指标之一.默认情况下,分类为 accuracy,回归为 mean_squared_error

None
metrics_params Optional[List]

传递给指标函数的参数.task 强制为 multiclass,因为多分类版本可以处理二分类,并且为了简单起见,我们仅使用 multiclass

None
metrics_prob_input Optional[List]

是配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为 None

None
target_range Optional[List]

限制输出变量的范围.目前忽略多目标回归.通常用于回归问题.如果留空,将不应用任何限制

None
seed int

用于可重复性的种子.默认为 42

42
Source code in src/pytorch_tabular/models/gandalf/config.py
@dataclass
class GANDALFConfig(ModelConfig):
    """门控自适应网络用于深度自动化特征学习(GANDALF)配置.

    Parameters:
        gflu_stages (int): 特征抽象层的层数.默认为 6

        gflu_dropout (float): 特征抽象层的丢弃率.默认为 0.0

        gflu_feature_init_sparsity (float): 仅对 t-softmax 有效.在每个 GFLU 阶段中选择的特征百分比.这只是初始化值,在学习过程中可能会改变.默认为 0.3

        learnable_sparsity (bool): 仅对 t-softmax 有效.如果为 True,稀疏性参数将被学习.如果为 False,稀疏性参数将固定为 `gflu_feature_init_sparsity` 和 `tree_feature_init_sparsity` 中指定的初始值.默认为 True

        task (str): 指定问题是回归还是分类.`backbone` 是一种将模型视为生成特征的主干的任务.主要用于 SSL 及相关任务.可选值为: [`regression`,`classification`,`backbone`]

        head (Optional[str]): 模型使用的头部.应为 `pytorch_tabular.models.common.heads` 中定义的头部之一.默认为 LinearHead.可选值为: [`None`,`LinearHead`,`MixtureDensityHead`]

        head_config (Optional[Dict]): 定义头部的配置字典.如果留空,将初始化为默认的线性头部.

        embedding_dims (Optional[List]): 每个分类列的嵌入维度,格式为 (基数, 嵌入维度) 的元组列表.如果留空,将根据分类列的基数推断,规则为 min(50, (x + 1) // 2)

        embedding_dropout (float): 应用于分类嵌入的丢弃率.默认为 0.0

        batch_norm_continuous_input (bool): 如果为 True,将通过 BatchNorm 层对连续层进行归一化.

        learning_rate (float): 模型的学习率.默认为 1e-3

        loss (Optional[str]): 应用的损失函数.默认情况下,回归为 MSELoss,分类为 CrossEntropyLoss.除非你确定自己在做什么,否则请保持为 MSELoss 或 L1Loss 用于回归,CrossEntropyLoss 用于分类

        metrics (Optional[List[str]]): 训练期间需要跟踪的指标列表.指标应为 ``torchmetrics`` 中实现的功能性指标之一.默认情况下,分类为 accuracy,回归为 mean_squared_error

        metrics_params (Optional[List]): 传递给指标函数的参数.`task` 强制为 `multiclass`,因为多分类版本可以处理二分类,并且为了简单起见,我们仅使用 `multiclass`

        metrics_prob_input (Optional[List]): 是配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为 None

        target_range (Optional[List]): 限制输出变量的范围.目前忽略多目标回归.通常用于回归问题.如果留空,将不应用任何限制

        seed (int): 用于可重复性的种子.默认为 42"""

    gflu_stages: int = field(
        default=6,
        metadata={"help": "Number of layers in the feature abstraction layer. Defaults to 6"},
    )

    gflu_dropout: float = field(
        default=0.0, metadata={"help": "Dropout rate for the feature abstraction layer. Defaults to 0.0"}
    )

    gflu_feature_init_sparsity: float = field(
        default=0.3,
        metadata={
            "help": "Only valid for t-softmax. The perecentge of features to be selected in "
            "each GFLU stage. This is just initialized and during learning it may change"
        },
    )
    learnable_sparsity: bool = field(
        default=True,
        metadata={
            "help": "Only valid for t-softmax. If True, the sparsity parameters will be learned."
            "If False, the sparsity parameters will be fixed to the initial values specified in "
            "`gflu_feature_init_sparsity` and `tree_feature_init_sparsity`"
        },
    )
    _module_src: str = field(default="models.gandalf")
    _model_name: str = field(default="GANDALFModel")
    _backbone_name: str = field(default="GANDALFBackbone")
    _config_name: str = field(default="GANDALFConfig")

    def __post_init__(self):
        assert self.gflu_stages > 0, "gflu_stages should be greater than 0"
        return super().__post_init__()

Bases: ModelConfig

门控加性树集成配置.

Parameters:

Name Type Description Default
gflu_stages int

特征抽象层的层数.默认为6

6
gflu_dropout float

特征抽象层的dropout率.默认为0.0

0.0
tree_depth int

树的深度.默认为5

4
num_trees int

集成中使用的树的数量.默认为20

10
binning_activation str

使用的分箱函数.默认为entmoid.可选值为: [entmoid,sparsemoid,sigmoid].

'sparsemoid'
feature_mask_function str

使用的特征掩码函数.默认为sparsemax.可选值为: [entmax,sparsemax,softmax].

't-softmax'
tree_dropout float

树分箱变换中的dropout概率.默认为0.0

0.0
chain_trees bool

如果为True,我们将把树串联起来.等同于提升(串联树)或装袋(并行树).默认为True

True
tree_wise_attention bool

如果为True,我们将使用树级注意力来组合树.默认为True

True
tree_wise_attention_dropout float

树级注意力层中的dropout概率.默认为0.0

0.0
share_head_weights bool

如果为True,我们将共享头部的权重.默认为True

True
task str

指定问题是回归还是分类.backbone是一个任务,它将模型视为生成特征的骨干.主要用于SSL及相关任务.可选值为: [regression,classification,backbone].

required
head Optional[str]

模型使用的头部.应为pytorch_tabular.models.common.heads中定义的头部之一.默认为LinearHead.可选值为: [None,LinearHead,MixtureDensityHead].

'LinearHead'
head_config Optional[Dict]

定义头部的配置字典.如果为空,将初始化为默认的线性头部.

lambda: {'layers': ''}()
embedding_dims Optional[List]

每个分类列的嵌入维度列表,格式为(基数, 嵌入维度).如果为空,将根据分类列的基数推断,规则为min(50, (x + 1) // 2)

None
embedding_dropout float

应用于分类嵌入的dropout.默认为0.0

0.0
batch_norm_continuous_input bool

如果为True,我们将通过BatchNorm层对连续层进行归一化.

True
learning_rate float

模型的学习率.默认为1e-3.

0.001
loss Optional[str]

应用的损失函数.默认情况下,回归为MSELoss,分类为CrossEntropyLoss.除非你确定自己在做什么,否则请保持为MSELoss或L1Loss用于回归,CrossEntropyLoss用于分类

None
metrics Optional[List[str]]

训练期间需要跟踪的指标列表.指标应为torchmetrics中实现的功能性指标之一.默认情况下,分类为accuracy,回归为mean_squared_error

None
metrics_params Optional[List]

传递给指标函数的参数.task强制为multiclass,因为多分类版本可以处理二分类,并且为了简单起见,我们只使用multiclass.

None
metrics_prob_input Optional[List]

是配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为None.

None
target_range Optional[List]

限制输出变量的范围.目前忽略多目标回归.通常用于回归问题.如果为空,将不应用任何限制

None
seed int

可重复性的种子.默认为42

42
Source code in src/pytorch_tabular/models/gate/config.py
@dataclass
class GatedAdditiveTreeEnsembleConfig(ModelConfig):
    """门控加性树集成配置.

    Parameters:
        gflu_stages (int): 特征抽象层的层数.默认为6

        gflu_dropout (float): 特征抽象层的dropout率.默认为0.0

        tree_depth (int): 树的深度.默认为5

        num_trees (int): 集成中使用的树的数量.默认为20

        binning_activation (str): 使用的分箱函数.默认为entmoid.可选值为: [`entmoid`,`sparsemoid`,`sigmoid`].

        feature_mask_function (str): 使用的特征掩码函数.默认为sparsemax.可选值为: [`entmax`,`sparsemax`,`softmax`].

        tree_dropout (float): 树分箱变换中的dropout概率.默认为0.0

        chain_trees (bool): 如果为True,我们将把树串联起来.等同于提升(串联树)或装袋(并行树).默认为True

        tree_wise_attention (bool): 如果为True,我们将使用树级注意力来组合树.默认为True

        tree_wise_attention_dropout (float): 树级注意力层中的dropout概率.默认为0.0

        share_head_weights (bool): 如果为True,我们将共享头部的权重.默认为True


        task (str): 指定问题是回归还是分类.`backbone`是一个任务,它将模型视为生成特征的骨干.主要用于SSL及相关任务.可选值为: [`regression`,`classification`,`backbone`].

        head (Optional[str]): 模型使用的头部.应为`pytorch_tabular.models.common.heads`中定义的头部之一.默认为LinearHead.可选值为: [`None`,`LinearHead`,`MixtureDensityHead`].

        head_config (Optional[Dict]): 定义头部的配置字典.如果为空,将初始化为默认的线性头部.

        embedding_dims (Optional[List]): 每个分类列的嵌入维度列表,格式为(基数, 嵌入维度).如果为空,将根据分类列的基数推断,规则为min(50, (x + 1) // 2)

        embedding_dropout (float): 应用于分类嵌入的dropout.默认为0.0

        batch_norm_continuous_input (bool): 如果为True,我们将通过BatchNorm层对连续层进行归一化.

        learning_rate (float): 模型的学习率.默认为1e-3.

        loss (Optional[str]): 应用的损失函数.默认情况下,回归为MSELoss,分类为CrossEntropyLoss.除非你确定自己在做什么,否则请保持为MSELoss或L1Loss用于回归,CrossEntropyLoss用于分类

        metrics (Optional[List[str]]): 训练期间需要跟踪的指标列表.指标应为``torchmetrics``中实现的功能性指标之一.默认情况下,分类为accuracy,回归为mean_squared_error

        metrics_params (Optional[List]): 传递给指标函数的参数.`task`强制为`multiclass`,因为多分类版本可以处理二分类,并且为了简单起见,我们只使用`multiclass`.

        metrics_prob_input (Optional[List]): 是配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为None.

        target_range (Optional[List]): 限制输出变量的范围.目前忽略多目标回归.通常用于回归问题.如果为空,将不应用任何限制

        seed (int): 可重复性的种子.默认为42"""

    gflu_stages: int = field(
        default=6,
        metadata={"help": "Number of layers in the feature abstraction layer. Defaults to 6"},
    )

    gflu_dropout: float = field(
        default=0.0, metadata={"help": "Dropout rate for the feature abstraction layer. Defaults to 0.0"}
    )

    tree_depth: int = field(default=4, metadata={"help": "Depth of the tree. Defaults to 5"})

    num_trees: int = field(
        default=10,
        metadata={"help": "Number of trees to use in the ensemble. Defaults to 20"},
    )

    binning_activation: str = field(
        default="sparsemoid",
        metadata={
            "help": "The binning function to use. Defaults to entmoid. Defaults to entmoid",
            "choices": ["entmoid", "sparsemoid", "sigmoid"],
        },
    )
    feature_mask_function: str = field(
        default="t-softmax",
        metadata={
            "help": "The feature mask function to use. Defaults to entmax",
            "choices": ["entmax", "sparsemax", "softmax", "t-softmax"],
        },
    )
    gflu_feature_init_sparsity: float = field(
        default=0.3,
        metadata={
            "help": "Only valid for t-softmax. The percentage of features to be dropped in "
            "each GFLU stage. This is just initialized and during learning it may change"
        },
    )
    tree_feature_init_sparsity: float = field(
        default=0.8,
        metadata={
            "help": "Only valid for t-softmax. The perecentge of features to be dropped in "
            "each split in the tree. This is just initialized and during learning it may change"
        },
    )
    learnable_sparsity: bool = field(
        default=True,
        metadata={
            "help": "Only valid for t-softmax. If True, the sparsity parameters will be learned."
            "If False, the sparsity parameters will be fixed to the initial values specified in "
            "`gflu_feature_init_sparsity` and `tree_feature_init_sparsity`"
        },
    )

    tree_dropout: float = field(
        default=0.0,
        metadata={"help": "probability of dropout in tree binning transformation. Defaults to 0.0"},
    )
    chain_trees: bool = field(
        default=True,
        metadata={
            "help": "If True, we will chain the trees together."
            " Synonymous to boosting (chaining trees) or bagging (parallel trees). Defaults to True"
        },
    )
    tree_wise_attention: bool = field(
        default=True,
        metadata={"help": "If True, we will use tree wise attention to combine trees. Defaults to True"},
    )
    tree_wise_attention_dropout: float = field(
        default=0.0,
        metadata={"help": "probability of dropout in the tree wise attention layer. Defaults to 0.0"},
    )
    share_head_weights: bool = field(
        default=True,
        metadata={"help": "If True, we will share the weights between the heads. Defaults to True"},
    )

    _module_src: str = field(default="models.gate")
    _model_name: str = field(default="GatedAdditiveTreeEnsembleModel")
    _backbone_name: str = field(default="GatedAdditiveTreesBackbone")
    _config_name: str = field(default="GatedAdditiveTreeEnsembleConfig")

    def __post_init__(self):
        assert self.tree_depth > 0, "tree_depth should be greater than 0"
        # Either gflu_stages or num_trees should be greater than 0
        assert self.num_trees > 0, (
            "`num_trees` must be greater than 0." "If you want a lighter model which performs better, use GANDALF."
        )
        super().__post_init__()

Bases: ModelConfig

MDN配置.

Parameters:

Name Type Description Default
backbone_config_class str

用于定义Backbone的配置类.配置类应为models中的有效模块路径,例如FTTransformerConfig.

None
backbone_config_params Dict

用于定义Backbone的配置参数字典.

None
task str

指定问题是回归还是分类.backbone是一种将模型视为生成特征的Backbone的任务.主要用于SSL及相关任务.可选值为:[regression, classification, backbone].

required
head str
'LinearHead'
head_config Dict

用于定义混合密度网络头部的配置.

None
embedding_dims Optional[List]

每个分类列的嵌入维度,以元组列表形式表示(基数,嵌入维度).如果留空,将根据分类列的基数推断,使用规则min(50, (x + 1) // 2).

None
embedding_dropout float

应用于分类嵌入的Dropout.默认为0.0.

0.0
batch_norm_continuous_input bool

如果为True,将通过BatchNorm层对连续层进行归一化.

True
learning_rate float

模型的学习率.默认为1e-3.

0.001
loss Optional[str]

要应用的损失函数.默认情况下,回归为MSELoss,分类为CrossEntropyLoss.除非你确定自己在做什么,否则请保留为MSELoss或L1Loss用于回归,CrossEntropyLoss用于分类.

None
metrics Optional[List[str]]

训练期间需要跟踪的指标列表.指标应为torchmetrics中实现的功能性指标之一.默认情况下,分类为accuracy,回归为mean_squared_error.

None
metrics_params Optional[List]

传递给指标函数的参数.task强制为multiclass,因为多分类版本可以处理二分类,并且为了简化,我们仅使用multiclass.

None
metrics_prob_input Optional[List]

是配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为None.

None
target_range Optional[List]

限制输出变量的范围.当前在多目标回归中被忽略.通常用于回归问题.如果留空,将不应用任何限制.

None
seed int

用于可重复性的种子.默认为42.

42
Source code in src/pytorch_tabular/models/mixture_density/config.py
@dataclass
class MDNConfig(ModelConfig):
    """MDN配置.

    Parameters:
        backbone_config_class (str): 用于定义Backbone的配置类.配置类应为`models`中的有效模块路径,例如`FTTransformerConfig`.

        backbone_config_params (Dict): 用于定义Backbone的配置参数字典.

        task (str): 指定问题是回归还是分类.`backbone`是一种将模型视为生成特征的Backbone的任务.主要用于SSL及相关任务.可选值为:[`regression`, `classification`, `backbone`].

        head (str):

        head_config (Dict): 用于定义混合密度网络头部的配置.

        embedding_dims (Optional[List]): 每个分类列的嵌入维度,以元组列表形式表示(基数,嵌入维度).如果留空,将根据分类列的基数推断,使用规则min(50, (x + 1) // 2).

        embedding_dropout (float): 应用于分类嵌入的Dropout.默认为0.0.

        batch_norm_continuous_input (bool): 如果为True,将通过BatchNorm层对连续层进行归一化.

        learning_rate (float): 模型的学习率.默认为1e-3.

        loss (Optional[str]): 要应用的损失函数.默认情况下,回归为MSELoss,分类为CrossEntropyLoss.除非你确定自己在做什么,否则请保留为MSELoss或L1Loss用于回归,CrossEntropyLoss用于分类.

        metrics (Optional[List[str]]): 训练期间需要跟踪的指标列表.指标应为``torchmetrics``中实现的功能性指标之一.默认情况下,分类为accuracy,回归为mean_squared_error.

        metrics_params (Optional[List]): 传递给指标函数的参数.`task`强制为`multiclass`,因为多分类版本可以处理二分类,并且为了简化,我们仅使用`multiclass`.

        metrics_prob_input (Optional[List]): 是配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为None.

        target_range (Optional[List]): 限制输出变量的范围.当前在多目标回归中被忽略.通常用于回归问题.如果留空,将不应用任何限制.

        seed (int): 用于可重复性的种子.默认为42."""

    backbone_config_class: str = field(
        default=None,
        metadata={
            "help": "The config class for defining the Backbone."
            " The config class should be a valid module path from `models`. e.g. `FTTransformerConfig`"
        },
    )
    backbone_config_params: Dict = field(
        default=None,
        metadata={"help": "The dict of config parameters for defining the Backbone."},
    )
    head: str = field(init=False, default="MixtureDensityHead")
    head_config: Dict = field(
        default=None,
        metadata={"help": "The config for defining the Mixed Density Network Head"},
    )
    _module_src: str = field(default="models.mixture_density")
    _model_name: str = field(default="MDNModel")
    _config_name: str = field(default="MDNConfig")
    _probabilistic: bool = field(default=True)

    def __post_init__(self):
        assert (
            self.backbone_config_class not in INCOMPATIBLE_BACKBONES
        ), f"{self.backbone_config_class} is not a supported backbone for MDN head"
        assert self.head == "MixtureDensityHead"
        return super().__post_init__()

Bases: ModelConfig

神经遗忘决策集成用于表格数据的深度学习配置.

Parameters:

Name Type Description Default
num_layers int

密集架构中遗忘决策树层的数量

1
num_trees int

每层中遗忘决策树的数量

2048
additional_tree_output_dim int

仅用于在架构的不同层之间传递的额外输出维度.只有前 output_dim 个输出将用于预测

3
depth int

单个遗忘决策树的深度

6
choice_function str

生成稀疏概率分布以用作特征权重(即软特征选择).可选值为:[entmax15,sparsemax]

'entmax15'
bin_function str

生成稀疏概率分布以用作树叶子权重.可选值为:[entmoid15,sparsemoid]

'entmoid15'
max_features Optional[int]

如果不为 None,则设置在密集架构中从一层传递到下一层的特征数量的最大限制

None
input_dropout float

在密集架构的层之间应用于输入的 Dropout

0.0
initialize_response str

初始化遗忘决策树中的响应变量.默认情况下,它是标准正态分布.可选值为:[normal,uniform]

'normal'
initialize_selection_logits str

初始化特征选择器.默认情况下,是特征上的均匀分布.可选值为:[uniform,normal]

'uniform'
threshold_init_beta float

用于数据感知初始化阈值,其中阈值随机初始化(使用 beta 分布)为第一个批次中的特征值.它将阈值初始化为数据点的 q-th 分位数,其中 q ~ Beta(:threshold_init_beta:, :threshold_init_beta:).如果此参数设置为 1,初始阈值将具有与数据点相同的分布;如果大于 1(例如 10),阈值将更接近中位数数据值;如果小于 1(例如 0.1),阈值将接近最小/最大数据值

1.0
threshold_init_cutoff float

用于数据感知初始化尺度(用于缩放 ODTs).它以这样的方式初始化,使得第一个批次中的所有样本都属于 entmoid/sparsemoid(二进制选择器)的线性区域,从而具有非零梯度.阈值对数温度初始化器,在 (0, inf) 范围内.默认情况下(1.0),对数温度以这样的方式初始化,使得所有二进制选择器最终都位于稀疏-sigmoid 的线性区域.然后温度由该参数缩放.设置此值 > 1.0 将在数据点和稀疏-sigmoid 截止值之间产生一些余量;设置此值 < 1.0 将导致 (1 - 值) 部分数据点最终位于稀疏-sigmoid 的平坦区域.例如,threshold_init_cutoff = 0.9 将设置 10% 的点等于 0.0 或 1.0.设置此值 > 1.0 将在数据点和稀疏-sigmoid 截止值之间产生余量.所有点将介于 (0.5 - 0.5 / threshold_init_cutoff) 和 (0.5 + 0.5 / threshold_init_cutoff) 之间

1.0
task str

指定问题是回归还是分类.backbone 是一种将模型视为生成特征的骨干的任务.主要用于内部 SSL 及相关任务.可选值为:[regression,classification,backbone]

required
head Optional[str]

模型使用的头部.应为 pytorch_tabular.models.common.heads 中定义的头部之一.默认为 LinearHead.可选值为:[None,LinearHead,MixtureDensityHead]

None
head_config Optional[Dict]

定义头部的配置字典.如果留空,将初始化为默认的线性头部

lambda: {'layers': ''}()
embedding_dims Optional[List]

每个分类列的嵌入维度列表,格式为 (基数, 嵌入维度).如果留空,将根据分类列的基数推断,使用规则 min(50, (x + 1) // 2)

None
embedding_dropout float

应用于分类嵌入的 Dropout.默认为 0.0

0.0
batch_norm_continuous_input bool

如果为 True,我们将通过 BatchNorm 层对连续层进行归一化

True
learning_rate float

模型的学习率.默认为 1e-3

0.001
loss Optional[str]

要应用的损失函数.默认情况下,回归为 MSELoss,分类为 CrossEntropyLoss.除非你确定自己在做什么,否则请保留为 MSELoss 或 L1Loss 用于回归,CrossEntropyLoss 用于分类

None
metrics Optional[List[str]]

训练期间需要跟踪的指标列表.指标应为 torchmetrics 中实现的功能性指标之一.默认情况下,分类为 accuracy,回归为 mean_squared_error

None
metrics_params Optional[List]

传递给指标函数的参数.task 强制为 multiclass,因为多类版本可以处理二进制分类,并且为了简单起见,我们仅使用 multiclass

None
metrics_prob_input Optional[List]

是配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为 None

None
target_range Optional[List]

我们应该限制输出变量的范围.当前在多目标回归中被忽略.通常用于回归问题.如果留空,将不应用任何限制

None
seed int

用于可重复性的种子.默认为 42

42
Source code in src/pytorch_tabular/models/node/config.py
@dataclass
class NodeConfig(ModelConfig):
    """神经遗忘决策集成用于表格数据的深度学习配置.

    Parameters:
        num_layers (int): 密集架构中遗忘决策树层的数量

        num_trees (int): 每层中遗忘决策树的数量

        additional_tree_output_dim (int): 仅用于在架构的不同层之间传递的额外输出维度.只有前 output_dim 个输出将用于预测

        depth (int): 单个遗忘决策树的深度

        choice_function (str): 生成稀疏概率分布以用作特征权重(即软特征选择).可选值为:[`entmax15`,`sparsemax`]

        bin_function (str): 生成稀疏概率分布以用作树叶子权重.可选值为:[`entmoid15`,`sparsemoid`]

        max_features (Optional[int]): 如果不为 None,则设置在密集架构中从一层传递到下一层的特征数量的最大限制

        input_dropout (float): 在密集架构的层之间应用于输入的 Dropout

        initialize_response (str): 初始化遗忘决策树中的响应变量.默认情况下,它是标准正态分布.可选值为:[`normal`,`uniform`]

        initialize_selection_logits (str): 初始化特征选择器.默认情况下,是特征上的均匀分布.可选值为:[`uniform`,`normal`]

        threshold_init_beta (float): 用于数据感知初始化阈值,其中阈值随机初始化(使用 beta 分布)为第一个批次中的特征值.它将阈值初始化为数据点的 q-th 分位数,其中 q ~ Beta(:threshold_init_beta:, :threshold_init_beta:).如果此参数设置为 1,初始阈值将具有与数据点相同的分布;如果大于 1(例如 10),阈值将更接近中位数数据值;如果小于 1(例如 0.1),阈值将接近最小/最大数据值

        threshold_init_cutoff (float): 用于数据感知初始化尺度(用于缩放 ODTs).它以这样的方式初始化,使得第一个批次中的所有样本都属于 entmoid/sparsemoid(二进制选择器)的线性区域,从而具有非零梯度.阈值对数温度初始化器,在 (0, inf) 范围内.默认情况下(1.0),对数温度以这样的方式初始化,使得所有二进制选择器最终都位于稀疏-sigmoid 的线性区域.然后温度由该参数缩放.设置此值 > 1.0 将在数据点和稀疏-sigmoid 截止值之间产生一些余量;设置此值 < 1.0 将导致 (1 - 值) 部分数据点最终位于稀疏-sigmoid 的平坦区域.例如,threshold_init_cutoff = 0.9 将设置 10% 的点等于 0.0 或 1.0.设置此值 > 1.0 将在数据点和稀疏-sigmoid 截止值之间产生余量.所有点将介于 (0.5 - 0.5 / threshold_init_cutoff) 和 (0.5 + 0.5 / threshold_init_cutoff) 之间

        task (str): 指定问题是回归还是分类.`backbone` 是一种将模型视为生成特征的骨干的任务.主要用于内部 SSL 及相关任务.可选值为:[`regression`,`classification`,`backbone`]

        head (Optional[str]): 模型使用的头部.应为 `pytorch_tabular.models.common.heads` 中定义的头部之一.默认为 LinearHead.可选值为:[`None`,`LinearHead`,`MixtureDensityHead`]

        head_config (Optional[Dict]): 定义头部的配置字典.如果留空,将初始化为默认的线性头部

        embedding_dims (Optional[List]): 每个分类列的嵌入维度列表,格式为 (基数, 嵌入维度).如果留空,将根据分类列的基数推断,使用规则 min(50, (x + 1) // 2)

        embedding_dropout (float): 应用于分类嵌入的 Dropout.默认为 0.0

        batch_norm_continuous_input (bool): 如果为 True,我们将通过 BatchNorm 层对连续层进行归一化

        learning_rate (float): 模型的学习率.默认为 1e-3

        loss (Optional[str]): 要应用的损失函数.默认情况下,回归为 MSELoss,分类为 CrossEntropyLoss.除非你确定自己在做什么,否则请保留为 MSELoss 或 L1Loss 用于回归,CrossEntropyLoss 用于分类

        metrics (Optional[List[str]]): 训练期间需要跟踪的指标列表.指标应为 ``torchmetrics`` 中实现的功能性指标之一.默认情况下,分类为 accuracy,回归为 mean_squared_error

        metrics_params (Optional[List]): 传递给指标函数的参数.`task` 强制为 `multiclass`,因为多类版本可以处理二进制分类,并且为了简单起见,我们仅使用 `multiclass`

        metrics_prob_input (Optional[List]): 是配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为 None

        target_range (Optional[List]): 我们应该限制输出变量的范围.当前在多目标回归中被忽略.通常用于回归问题.如果留空,将不应用任何限制

        seed (int): 用于可重复性的种子.默认为 42"""

    num_layers: int = field(
        default=1,
        metadata={"help": "Number of Oblivious Decision Tree Layers in the Dense Architecture"},
    )
    num_trees: int = field(
        default=2048,
        metadata={"help": "Number of Oblivious Decision Trees in each layer"},
    )
    additional_tree_output_dim: int = field(
        default=3,
        metadata={
            "help": "The additional output dimensions which is only used to pass through different layers"
            " of the architectures. Only the first output_dim outputs will be used for prediction"
        },
    )
    depth: int = field(
        default=6,
        metadata={"help": "The depth of the individual Oblivious Decision Trees"},
    )
    choice_function: str = field(
        default="entmax15",
        metadata={
            "help": "Generates a sparse probability distribution to be used"
            " as feature weights(aka, soft feature selection)",
            "choices": ["entmax15", "sparsemax"],
        },
    )
    bin_function: str = field(
        default="entmoid15",
        metadata={
            "help": "Generates a sparse probability distribution to be used as tree leaf weights",
            "choices": ["entmoid15", "sparsemoid"],
        },
    )
    max_features: Optional[int] = field(
        default=None,
        metadata={
            "help": "If not None, sets a max limit on the number of features to be carried forward"
            " from layer to layer in the Dense Architecture"
        },
    )
    input_dropout: float = field(
        default=0.0,
        metadata={"help": "Dropout to be applied to the inputs between layers of the Dense Architecture"},
    )
    initialize_response: str = field(
        default="normal",
        metadata={
            "help": "Initializing the response variable in the Oblivious Decision Trees."
            " By default, it is a standard normal distribution",
            "choices": ["normal", "uniform"],
        },
    )
    initialize_selection_logits: str = field(
        default="uniform",
        metadata={
            "help": "Initializing the feature selector. By default is a uniform distribution across the features",
            "choices": ["uniform", "normal"],
        },
    )
    threshold_init_beta: float = field(
        default=1.0,
        metadata={
            "help": """
                Used in the Data-aware initialization of thresholds where the threshold is initialized randomly
                (with a beta distribution) to feature values in the first batch.
                It initializes threshold to a q-th quantile of data points.
                where q ~ Beta(:threshold_init_beta:, :threshold_init_beta:)
                If this param is set to 1, initial thresholds will have the same distribution as data points
                If greater than 1 (e.g. 10), thresholds will be closer to median data value
                If less than 1 (e.g. 0.1), thresholds will approach min/max data values.
            """
        },
    )
    threshold_init_cutoff: float = field(
        default=1.0,
        metadata={
            "help": """
                Used in the Data-aware initialization of scales(used in the scaling ODTs).
                It is initialized in such a way that all the samples in the first batch belong to the linear
                region of the entmoid/sparsemoid(bin-selectors) and thereby have non-zero gradients
                Threshold log-temperatures initializer, in (0, inf)
                By default(1.0), log-temperatures are initialized in such a way that all bin selectors
                end up in the linear region of sparse-sigmoid. The temperatures are then scaled by this parameter.
                Setting this value > 1.0 will result in some margin between data points and sparse-sigmoid cutoff value
                Setting this value < 1.0 will cause (1 - value) part of data points to end up in flat sparse-sigmoid
                region. For instance, threshold_init_cutoff = 0.9 will set 10% points equal to 0.0 or 1.0
                Setting this value > 1.0 will result in a margin between data points and sparse-sigmoid cutoff value
                All points will be between (0.5 - 0.5 / threshold_init_cutoff) and (0.5 + 0.5 / threshold_init_cutoff)
            """
        },
    )

    head: Optional[str] = field(
        default=None,
    )

    _module_src: str = field(default="models.node")
    _model_name: str = field(default="NODEModel")
    _backbone_name: str = field(default="NODEBackbone")
    _config_name: str = field(default="NodeConfig")

    def __post_init__(self):
        if self.head is not None:
            warnings.warn(
                "`head` and `head_config` is ignored as NODE has a specific"
                " head which subsets the tree outputs. Set `head=None`"
                " to turn off the warning"
            )
        else:
            # Setting Head to LinearHead for compatibility
            self.head = "LinearHead"
        return super().__post_init__()

Bases: ModelConfig

TabNet: 注意力可解释表格学习配置

Parameters:

Name Type Description Default
n_d int

预测层的维度(通常在4到64之间)

8
n_a int

注意力层的维度(通常在4到64之间)

8
n_steps int

网络中连续步骤的数量(通常在3到10之间)

3
gamma float

大于1的浮点数,注意力更新的缩放因子(通常在1.0到2.0之间)

1.3
n_independent int

每个GLU块中独立GLU层的数量(默认2)

2
n_shared int

每个GLU块中独立GLU层的数量(默认2)

2
virtual_batch_size int

Ghost Batch Normalization的批次大小

128
mask_type str

使用的掩码函数,可以是'sparsemax'或'entmax'.选择包括: [sparsemax,entmax].

'sparsemax'
task str

指定问题是回归还是分类.backbone是一种任务,将模型视为生成特征的骨干.主要用于内部SSL及相关任务.选择包括: [regression,classification,backbone].

required
head Optional[str]

模型使用的头部.应为pytorch_tabular.models.common.heads中定义的头部之一.默认为LinearHead.选择包括: [None,LinearHead,MixtureDensityHead].

'LinearHead'
head_config Optional[Dict]

定义头部的配置字典.如果留空,将初始化为默认的线性头部.

lambda: {'layers': ''}()
embedding_dims Optional[List]

每个分类列的嵌入维度列表,格式为(基数, 嵌入维度).如果留空,将根据分类列的基数推断,规则为min(50, (x + 1) // 2)

None
embedding_dropout float

应用于分类嵌入的丢弃率.默认为0.0

0.0
batch_norm_continuous_input bool

如果为True,将通过BatchNorm层对连续层进行归一化.

True
learning_rate float

模型的学习率.默认为1e-3.

0.001
loss Optional[str]

应用的损失函数.默认情况下,回归为MSELoss,分类为CrossEntropyLoss.除非你确定自己在做什么,否则请保留为MSELoss或L1Loss用于回归,CrossEntropyLoss用于分类

None
metrics Optional[List[str]]

训练期间需要跟踪的指标列表.指标应为torchmetrics中实现的功能性指标之一.默认情况下,分类为accuracy,回归为mean_squared_error

None
metrics_params Optional[List]

传递给指标函数的参数.task强制为multiclass,因为多分类版本可以处理二分类,并且为了简单起见,我们仅使用multiclass.

None
metrics_prob_input Optional[List]

是配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为None.

None
target_range Optional[List]

应限制输出变量的范围.当前忽略多目标回归.通常用于回归问题.如果留空,将不应用任何限制

None
seed int

可重复性的种子.默认为42

42
Source code in src/pytorch_tabular/models/tabnet/config.py
@dataclass
class TabNetModelConfig(ModelConfig):
    """TabNet: 注意力可解释表格学习配置

    Parameters:
        n_d (int): 预测层的维度(通常在4到64之间)

        n_a (int): 注意力层的维度(通常在4到64之间)

        n_steps (int): 网络中连续步骤的数量(通常在3到10之间)

        gamma (float): 大于1的浮点数,注意力更新的缩放因子(通常在1.0到2.0之间)

        n_independent (int): 每个GLU块中独立GLU层的数量(默认2)

        n_shared (int): 每个GLU块中独立GLU层的数量(默认2)

        virtual_batch_size (int): Ghost Batch Normalization的批次大小

        mask_type (str): 使用的掩码函数,可以是'sparsemax'或'entmax'.选择包括:
                [`sparsemax`,`entmax`].

        task (str): 指定问题是回归还是分类.`backbone`是一种任务,将模型视为生成特征的骨干.主要用于内部SSL及相关任务.选择包括:
                [`regression`,`classification`,`backbone`].

        head (Optional[str]): 模型使用的头部.应为`pytorch_tabular.models.common.heads`中定义的头部之一.默认为LinearHead.选择包括:
                [`None`,`LinearHead`,`MixtureDensityHead`].

        head_config (Optional[Dict]): 定义头部的配置字典.如果留空,将初始化为默认的线性头部.

        embedding_dims (Optional[List]): 每个分类列的嵌入维度列表,格式为(基数, 嵌入维度).如果留空,将根据分类列的基数推断,规则为min(50, (x + 1) // 2)

        embedding_dropout (float): 应用于分类嵌入的丢弃率.默认为0.0

        batch_norm_continuous_input (bool): 如果为True,将通过BatchNorm层对连续层进行归一化.

        learning_rate (float): 模型的学习率.默认为1e-3.

        loss (Optional[str]): 应用的损失函数.默认情况下,回归为MSELoss,分类为CrossEntropyLoss.除非你确定自己在做什么,否则请保留为MSELoss或L1Loss用于回归,CrossEntropyLoss用于分类

        metrics (Optional[List[str]]): 训练期间需要跟踪的指标列表.指标应为``torchmetrics``中实现的功能性指标之一.默认情况下,分类为accuracy,回归为mean_squared_error

        metrics_params (Optional[List]): 传递给指标函数的参数.`task`强制为`multiclass`,因为多分类版本可以处理二分类,并且为了简单起见,我们仅使用`multiclass`.

        metrics_prob_input (Optional[List]): 是配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为None.

        target_range (Optional[List]): 应限制输出变量的范围.当前忽略多目标回归.通常用于回归问题.如果留空,将不应用任何限制

        seed (int): 可重复性的种子.默认为42"""

    n_d: int = field(
        default=8,
        metadata={"help": "Dimension of the prediction  layer (usually between 4 and 64)"},
    )
    n_a: int = field(
        default=8,
        metadata={"help": "Dimension of the attention  layer (usually between 4 and 64)"},
    )
    n_steps: int = field(
        default=3,
        metadata={"help": ("Number of successive steps in the network (usually between 3 and 10)")},
    )
    gamma: float = field(
        default=1.3,
        metadata={"help": ("Float above 1, scaling factor for attention updates (usually between" " 1.0 to 2.0)")},
    )
    n_independent: int = field(
        default=2,
        metadata={"help": "Number of independent GLU layer in each GLU block (default 2)"},
    )
    n_shared: int = field(
        default=2,
        metadata={"help": "Number of independent GLU layer in each GLU block (default 2)"},
    )
    virtual_batch_size: int = field(
        default=128,
        metadata={"help": "Batch size for Ghost Batch Normalization"},
    )
    mask_type: str = field(
        default="sparsemax",
        metadata={
            "help": ("Either 'sparsemax' or 'entmax' : this is the masking function to use"),
            "choices": ["sparsemax", "entmax"],
        },
    )
    grouped_features: Optional[List[List[str]]] = field(
        default=None,
        metadata={
            "help": (
                "List of list of feature names to be grouped together. This allows the"
                " model to share it's attention accross feature inside a same group."
                " This can be especially useful when your preprocessing generates"
                " correlated or dependant features: like if you use a TF-IDF or a PCA"
                " on a text column. Note that feature importance will be exactly the"
                " same between features on a same group. Please also note that"
                " embeddings generated for a categorical variable are always inside a"
                " same group."
            )
        },
    )
    _module_src: str = field(default="models.tabnet")
    _model_name: str = field(default="TabNetModel")
    _config_name: str = field(default="TabNetModelConfig")

Bases: ModelConfig

Tab Transformer 配置.

Parameters:

Name Type Description Default
input_embed_dim int

输入分类特征的嵌入维度.默认为 32

32
embedding_initialization Optional[str]

嵌入层的初始化方案.默认为 kaiming.可选值为: [kaiming_uniform,kaiming_normal].

'kaiming_uniform'
embedding_bias bool

是否开启嵌入偏置的标志.默认为 False

False
share_embedding bool

在输入嵌入过程中开启共享嵌入的标志.其核心思想是为整个特征及其每个唯一值分别设置嵌入.更多详情请参阅 TabTransformer 论文的附录 A.默认为 False

False
share_embedding_strategy Optional[str]

添加共享嵌入有两种策略.1. add - 为特征添加一个独立的嵌入,并将其与特征唯一值的嵌入相加.2. fraction - 输入嵌入的一部分保留给特征的共享嵌入.默认为 fraction.可选值为: [add,fraction].

'fraction'
shared_embedding_fraction float

共享嵌入保留的 input_embed_dim 的比例.应小于 1.默认为 0.25

0.25
num_heads int

多头注意力层中的头数.默认为 8

8
num_attn_blocks int

堆叠的多头注意力层的层数.默认为 6

6
transformer_head_dim Optional[int]

多头注意力层中的隐藏单元数.默认为 None,将与 input_dim 相同.

None
attn_dropout float

多头注意力后应用的 dropout.默认为 0.1

0.1
add_norm_dropout float

AddNorm 层中应用的 dropout.默认为 0.1

0.1
ff_dropout float

逐位置前馈网络中应用的 dropout.默认为 0.1

0.1
ff_hidden_multiplier int

逐位置前馈层对输入的缩放倍数.默认为 4

4
transformer_activation str

变换器前馈层中的激活类型.除了 PyTorch 中的默认激活函数如 ReLU、TanH、LeakyReLU 等(https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity),还实现了 GEGLU、ReGLU 和 SwiGLU(https://arxiv.org/pdf/2002.05202.pdf).默认为 GEGLU

'GEGLU'
task str

指定问题是回归还是分类.backbone 是一种将模型视为生成特征的主干的任务.主要用于 SSL 及相关任务.可选值为: [regression,classification,backbone].

required
head Optional[str]

模型使用的头部.应为 pytorch_tabular.models.common.heads 中定义的头部之一.默认为 LinearHead.可选值为: [None,LinearHead,MixtureDensityHead].

'LinearHead'
head_config Optional[Dict]

定义头部的配置字典.如果留空,将初始化为默认的线性头部.

lambda: {'layers': ''}()
embedding_dims Optional[List]

每个分类列的嵌入维度列表,格式为 (基数, 嵌入维度).如果留空,将根据分类列的基数推断,规则为 min(50, (x + 1) // 2)

None
embedding_dropout float

分类嵌入中应用的 dropout.默认为 0.0

0.0
batch_norm_continuous_input bool

如果为 True,将通过 BatchNorm 层对连续层进行归一化.

True
learning_rate float

模型的学习率.默认为 1e-3.

0.001
loss Optional[str]

应用的损失函数.默认情况下,回归为 MSELoss,分类为 CrossEntropyLoss.除非你确定自己在做什么,否则请保持为 MSELoss 或 L1Loss 用于回归,CrossEntropyLoss 用于分类

None
metrics Optional[List[str]]

训练期间需要跟踪的指标列表.指标应为 torchmetrics 中实现的功能性指标之一.默认情况下,分类为 accuracy,回归为 mean_squared_error

None
metrics_params Optional[List]

传递给指标函数的参数.task 强制为 multiclass,因为多分类版本可以处理二分类,并且为了简化,我们仅使用 multiclass.

None
metrics_prob_input Optional[List]

是配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为 None.

None
target_range Optional[List]

限制输出变量的范围.目前多目标回归中忽略.通常用于回归问题.如果留空,将不应用任何限制

None
seed int

用于可重复性的种子.默认为 42

42
Source code in src/pytorch_tabular/models/tab_transformer/config.py
@dataclass
class TabTransformerConfig(ModelConfig):
    """Tab Transformer 配置.

    Parameters:
        input_embed_dim (int): 输入分类特征的嵌入维度.默认为 32

        embedding_initialization (Optional[str]): 嵌入层的初始化方案.默认为 `kaiming`.可选值为: [`kaiming_uniform`,`kaiming_normal`].

        embedding_bias (bool): 是否开启嵌入偏置的标志.默认为 False

        share_embedding (bool): 在输入嵌入过程中开启共享嵌入的标志.其核心思想是为整个特征及其每个唯一值分别设置嵌入.更多详情请参阅 TabTransformer 论文的附录 A.默认为 False

        share_embedding_strategy (Optional[str]): 添加共享嵌入有两种策略.1. `add` - 为特征添加一个独立的嵌入,并将其与特征唯一值的嵌入相加.2. `fraction` - 输入嵌入的一部分保留给特征的共享嵌入.默认为 fraction.可选值为: [`add`,`fraction`].

        shared_embedding_fraction (float): 共享嵌入保留的 input_embed_dim 的比例.应小于 1.默认为 0.25

        num_heads (int): 多头注意力层中的头数.默认为 8

        num_attn_blocks (int): 堆叠的多头注意力层的层数.默认为 6

        transformer_head_dim (Optional[int]): 多头注意力层中的隐藏单元数.默认为 None,将与 input_dim 相同.

        attn_dropout (float): 多头注意力后应用的 dropout.默认为 0.1

        add_norm_dropout (float): AddNorm 层中应用的 dropout.默认为 0.1

        ff_dropout (float): 逐位置前馈网络中应用的 dropout.默认为 0.1

        ff_hidden_multiplier (int): 逐位置前馈层对输入的缩放倍数.默认为 4

        transformer_activation (str): 变换器前馈层中的激活类型.除了 PyTorch 中的默认激活函数如 ReLU、TanH、LeakyReLU 等(https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity),还实现了 GEGLU、ReGLU 和 SwiGLU(https://arxiv.org/pdf/2002.05202.pdf).默认为 GEGLU

        task (str): 指定问题是回归还是分类.`backbone` 是一种将模型视为生成特征的主干的任务.主要用于 SSL 及相关任务.可选值为: [`regression`,`classification`,`backbone`].

        head (Optional[str]): 模型使用的头部.应为 `pytorch_tabular.models.common.heads` 中定义的头部之一.默认为 LinearHead.可选值为: [`None`,`LinearHead`,`MixtureDensityHead`].

        head_config (Optional[Dict]): 定义头部的配置字典.如果留空,将初始化为默认的线性头部.

        embedding_dims (Optional[List]): 每个分类列的嵌入维度列表,格式为 (基数, 嵌入维度).如果留空,将根据分类列的基数推断,规则为 min(50, (x + 1) // 2)

        embedding_dropout (float): 分类嵌入中应用的 dropout.默认为 0.0

        batch_norm_continuous_input (bool): 如果为 True,将通过 BatchNorm 层对连续层进行归一化.

        learning_rate (float): 模型的学习率.默认为 1e-3.

        loss (Optional[str]): 应用的损失函数.默认情况下,回归为 MSELoss,分类为 CrossEntropyLoss.除非你确定自己在做什么,否则请保持为 MSELoss 或 L1Loss 用于回归,CrossEntropyLoss 用于分类

        metrics (Optional[List[str]]): 训练期间需要跟踪的指标列表.指标应为 ``torchmetrics`` 中实现的功能性指标之一.默认情况下,分类为 accuracy,回归为 mean_squared_error

        metrics_params (Optional[List]): 传递给指标函数的参数.`task` 强制为 `multiclass`,因为多分类版本可以处理二分类,并且为了简化,我们仅使用 `multiclass`.

        metrics_prob_input (Optional[List]): 是配置中定义的分类指标的强制参数.这定义了指标函数的输入是概率还是类别.长度应与指标数量相同.默认为 None.

        target_range (Optional[List]): 限制输出变量的范围.目前多目标回归中忽略.通常用于回归问题.如果留空,将不应用任何限制

        seed (int): 用于可重复性的种子.默认为 42"""

    input_embed_dim: int = field(
        default=32,
        metadata={"help": "The embedding dimension for the input categorical features. Defaults to 32"},
    )
    embedding_initialization: Optional[str] = field(
        default="kaiming_uniform",
        metadata={
            "help": "Initialization scheme for the embedding layers. Defaults to `kaiming`",
            "choices": ["kaiming_uniform", "kaiming_normal"],
        },
    )
    embedding_bias: bool = field(
        default=False,
        metadata={"help": "Flag to turn on Embedding Bias. Defaults to False"},
    )
    share_embedding: bool = field(
        default=False,
        metadata={
            "help": "The flag turns on shared embeddings in the input embedding process."
            " The key idea here is to have an embedding for the feature as a whole along with embeddings"
            " of each unique values of that column. For more details refer"
            " to Appendix A of the TabTransformer paper. Defaults to False"
        },
    )
    share_embedding_strategy: Optional[str] = field(
        default="fraction",
        metadata={
            "help": "There are two strategies in adding shared embeddings."
            " 1. `add` - A separate embedding for the feature is added to the embedding"
            " of the unique values of the feature."
            " 2. `fraction` - A fraction of the input embedding is reserved"
            " for the shared embedding of the feature. Defaults to fraction.",
            "choices": ["add", "fraction"],
        },
    )
    shared_embedding_fraction: float = field(
        default=0.25,
        metadata={
            "help": "Fraction of the input_embed_dim to be reserved by the shared embedding."
            " Should be less than one. Defaults to 0.25"
        },
    )
    num_heads: int = field(
        default=8,
        metadata={"help": "The number of heads in the Multi-Headed Attention layer. Defaults to 8"},
    )
    num_attn_blocks: int = field(
        default=6,
        metadata={"help": "The number of layers of stacked Multi-Headed Attention layers. Defaults to 6"},
    )
    transformer_head_dim: Optional[int] = field(
        default=None,
        metadata={
            "help": "The number of hidden units in the Multi-Headed Attention layers."
            " Defaults to None and will be same as input_dim."
        },
    )
    attn_dropout: float = field(
        default=0.1,
        metadata={"help": "Dropout to be applied after Multi headed Attention. Defaults to 0.1"},
    )
    add_norm_dropout: float = field(
        default=0.1,
        metadata={"help": "Dropout to be applied in the AddNorm Layer. Defaults to 0.1"},
    )
    ff_dropout: float = field(
        default=0.1,
        metadata={"help": "Dropout to be applied in the Positionwise FeedForward Network. Defaults to 0.1"},
    )
    ff_hidden_multiplier: int = field(
        default=4,
        metadata={"help": "Multiple by which the Positionwise FF layer scales the input. Defaults to 4"},
    )
    transformer_activation: str = field(
        default="GEGLU",
        metadata={
            "help": "The activation type in the transformer feed forward layers."
            " In addition to the default activation in PyTorch like ReLU, TanH, LeakyReLU, etc."
            " https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity,"
            " GEGLU, ReGLU and SwiGLU are also implemented(https://arxiv.org/pdf/2002.05202.pdf)."
            " Defaults to GEGLU",
        },
    )
    _module_src: str = field(default="models.tab_transformer")
    _model_name: str = field(default="TabTransformerModel")
    _backbone_name: str = field(default="TabTransformerBackbone")
    _config_name: str = field(default="TabTransformerConfig")

基础模型配置.

Parameters:

Name Type Description Default
task str

指定问题是回归还是分类.backbone 是一种将模型视为生成特征的主干的任务.主要用于内部SSL及相关任务.可选值为:[regression,classification,backbone].

required
head Optional[str]

模型使用的头部.应为 pytorch_tabular.models.common.heads 中定义的头部之一.默认为 LinearHead.可选值为:[None,LinearHead,MixtureDensityHead].

'LinearHead'
head_config Optional[Dict]

定义头部的配置字典.如果留空,将初始化为默认的线性头部.

lambda: {'layers': ''}()
embedding_dims Optional[List]

每个分类列的嵌入维度列表,格式为 (基数, 嵌入维度).如果留空,将根据分类列的基数推断,规则为 min(50, (x + 1) // 2).

None
embedding_dropout float

应用于分类嵌入的丢弃率.默认为 0.0.

0.0
batch_norm_continuous_input bool

如果为 True,将通过 BatchNorm 层对连续层进行归一化.

True
virtual_batch_size Optional[int]

如果不为 None,所有 BatchNorm 将被转换为 GhostBatchNorm,并指定虚拟批量大小.默认为 None.

None
learning_rate float

模型的学习率.默认为 1e-3.

0.001
loss Optional[str]

应用的损失函数.默认情况下,回归为 MSELoss,分类为 CrossEntropyLoss.除非你确定自己在做什么,否则请保持为 MSELoss 或 L1Loss 用于回归,CrossEntropyLoss 用于分类.

None
metrics Optional[List[str]]

训练期间需要跟踪的指标列表.指标应为 torchmetrics 中实现的功能性指标之一.默认情况下,分类为 accuracy,回归为 mean_squared_error.

None
metrics_prob_input Optional[bool]

配置中定义的分类指标的强制参数.定义指标函数的输入是概率还是类别.长度应与指标数量相同.默认为 None.

None
metrics_params Optional[List]

传递给指标函数的参数.task 强制为 multiclass,因为多分类版本可以处理二分类,并且为了简化,我们仅使用 multiclass.

None
target_range Optional[List]

限制输出变量的范围.当前忽略多目标回归.通常用于回归问题.如果留空,将不应用任何限制.

None
seed int

用于可重复性的种子.默认为 42.

42
Source code in src/pytorch_tabular/config/config.py
@dataclass
class ModelConfig:
    """基础模型配置.

    Parameters:
        task (str): 指定问题是回归还是分类.`backbone` 是一种将模型视为生成特征的主干的任务.主要用于内部SSL及相关任务.可选值为:[`regression`,`classification`,`backbone`].

        head (Optional[str]): 模型使用的头部.应为 `pytorch_tabular.models.common.heads` 中定义的头部之一.默认为 LinearHead.可选值为:[`None`,`LinearHead`,`MixtureDensityHead`].

        head_config (Optional[Dict]): 定义头部的配置字典.如果留空,将初始化为默认的线性头部.

        embedding_dims (Optional[List]): 每个分类列的嵌入维度列表,格式为 (基数, 嵌入维度).如果留空,将根据分类列的基数推断,规则为 min(50, (x + 1) // 2).

        embedding_dropout (float): 应用于分类嵌入的丢弃率.默认为 0.0.

        batch_norm_continuous_input (bool): 如果为 True,将通过 BatchNorm 层对连续层进行归一化.

        virtual_batch_size (Optional[int]): 如果不为 None,所有 BatchNorm 将被转换为 GhostBatchNorm,并指定虚拟批量大小.默认为 None.

        learning_rate (float): 模型的学习率.默认为 1e-3.

        loss (Optional[str]): 应用的损失函数.默认情况下,回归为 MSELoss,分类为 CrossEntropyLoss.除非你确定自己在做什么,否则请保持为 MSELoss 或 L1Loss 用于回归,CrossEntropyLoss 用于分类.

        metrics (Optional[List[str]]): 训练期间需要跟踪的指标列表.指标应为 ``torchmetrics`` 中实现的功能性指标之一.默认情况下,分类为 accuracy,回归为 mean_squared_error.

        metrics_prob_input (Optional[bool]): 配置中定义的分类指标的强制参数.定义指标函数的输入是概率还是类别.长度应与指标数量相同.默认为 None.

        metrics_params (Optional[List]): 传递给指标函数的参数.`task` 强制为 `multiclass`,因为多分类版本可以处理二分类,并且为了简化,我们仅使用 `multiclass`.

        target_range (Optional[List]): 限制输出变量的范围.当前忽略多目标回归.通常用于回归问题.如果留空,将不应用任何限制.

        seed (int): 用于可重复性的种子.默认为 42."""

    task: str = field(
        metadata={
            "help": "Specify whether the problem is regression or classification."
            " `backbone` is a task which considers the model as a backbone to generate features."
            " Mostly used internally for SSL and related tasks.",
            "choices": ["regression", "classification", "backbone"],
        }
    )

    head: Optional[str] = field(
        default="LinearHead",
        metadata={
            "help": "The head to be used for the model. Should be one of the heads defined"
            " in `pytorch_tabular.models.common.heads`. Defaults to  LinearHead",
            "choices": [None, "LinearHead", "MixtureDensityHead"],
        },
    )

    head_config: Optional[Dict] = field(
        default_factory=lambda: {"layers": ""},
        metadata={
            "help": "The config as a dict which defines the head."
            " If left empty, will be initialized as default linear head."
        },
    )
    embedding_dims: Optional[List] = field(
        default=None,
        metadata={
            "help": "The dimensions of the embedding for each categorical column as a list of tuples "
            "(cardinality, embedding_dim). If left empty, will infer using the cardinality of the "
            "categorical column using the rule min(50, (x + 1) // 2)"
        },
    )
    embedding_dropout: float = field(
        default=0.0,
        metadata={"help": "Dropout to be applied to the Categorical Embedding. Defaults to 0.0"},
    )
    batch_norm_continuous_input: bool = field(
        default=True,
        metadata={"help": "If True, we will normalize the continuous layer by passing it through a BatchNorm layer."},
    )

    learning_rate: float = field(
        default=1e-3,
        metadata={"help": "The learning rate of the model. Defaults to 1e-3."},
    )
    loss: Optional[str] = field(
        default=None,
        metadata={
            "help": "The loss function to be applied. By Default it is MSELoss for regression "
            "and CrossEntropyLoss for classification. Unless you are sure what you are doing, "
            "leave it at MSELoss or L1Loss for regression and CrossEntropyLoss for classification"
        },
    )
    metrics: Optional[List[str]] = field(
        default=None,
        metadata={
            "help": "the list of metrics you need to track during training. The metrics should be one "
            "of the functional metrics implemented in ``torchmetrics``. To use your own metric, please "
            "use the `metric` param in the `fit` method By default, it is accuracy if classification "
            "and mean_squared_error for regression"
        },
    )
    metrics_prob_input: Optional[List[bool]] = field(
        default=None,
        metadata={
            "help": "Is a mandatory parameter for classification metrics defined in the config. This defines "
            "whether the input to the metric function is the probability or the class. Length should be same "
            "as the number of metrics. Defaults to None."
        },
    )
    metrics_params: Optional[List] = field(
        default=None,
        metadata={
            "help": "The parameters to be passed to the metrics function. `task` is forced to be `multiclass`` "
            "because the multiclass version can handle binary as well and for simplicity we are only using "
            "`multiclass`."
        },
    )
    target_range: Optional[List] = field(
        default=None,
        metadata={
            "help": "The range in which we should limit the output variable. "
            "Currently ignored for multi-target regression. Typically used for Regression problems. "
            "If left empty, will not apply any restrictions"
        },
    )

    virtual_batch_size: Optional[int] = field(
        default=None,
        metadata={
            "help": "If not None, all BatchNorms will be converted to GhostBatchNorm's "
            " with this virtual batch size. Defaults to None"
        },
    )

    seed: int = field(
        default=42,
        metadata={"help": "The seed for reproducibility. Defaults to 42"},
    )

    _module_src: str = field(default="models")
    _model_name: str = field(default="Model")
    _backbone_name: str = field(default="Backbone")
    _config_name: str = field(default="Config")

    def __post_init__(self):
        if self.task == "regression":
            self.loss = self.loss or "MSELoss"
            self.metrics = self.metrics or ["mean_squared_error"]
            self.metrics_params = [{} for _ in self.metrics] if self.metrics_params is None else self.metrics_params
            self.metrics_prob_input = [False for _ in self.metrics]  # not used in Regression. just for compatibility
        elif self.task == "classification":
            self.loss = self.loss or "CrossEntropyLoss"
            self.metrics = self.metrics or ["accuracy"]
            self.metrics_params = [{} for _ in self.metrics] if self.metrics_params is None else self.metrics_params
            self.metrics_prob_input = (
                [False for _ in self.metrics] if self.metrics_prob_input is None else self.metrics_prob_input
            )
        elif self.task == "backbone":
            self.loss = None
            self.metrics = None
            self.metrics_params = None
            if self.head is not None:
                logger.warning("`head` is not a valid parameter for backbone task. Making `head=None`")
                self.head = None
                self.head_config = None
        else:
            raise NotImplementedError(
                f"{self.task} is not a valid task. Should be one of "
                f"{self.__dataclass_fields__['task'].metadata['choices']}"
            )
        if self.metrics is not None:
            assert len(self.metrics) == len(self.metrics_params), "metrics and metric_params should have same length"

        if self.task != "backbone":
            assert self.head in dir(heads.blocks), f"{self.head} is not a valid head"
            if hasattr(self, "_config_name") and self._config_name != "MDNConfig":
                assert self.head != "MixtureDensityHead", "MixtureDensityHead is not supported as a head for regular "
                "models. Use `MDNConfig` instead. Please see Probabilistic Regression with MDN How-to-Guide in "
                "documentation for the right usage."
            _head_callable = getattr(heads.blocks, self.head)
            ideal_head_config = _head_callable._config_template
            invalid_keys = set(self.head_config.keys()) - set(ideal_head_config.__dict__.keys())
            assert len(invalid_keys) == 0, f"`head_config` has some invalid keys: {invalid_keys}"

        # For Custom models, setting these values for compatibility
        if not hasattr(self, "_config_name"):
            self._config_name = type(self).__name__
        if not hasattr(self, "_model_name"):
            self._model_name = re.sub("[Cc]onfig", "Model", self._config_name)
        if not hasattr(self, "_backbone_name"):
            self._backbone_name = re.sub("[Cc]onfig", "Backbone", self._config_name)
        _validate_choices(self)

模型类

Bases: BaseModel

Source code in src/pytorch_tabular/models/autoint/autoint.py
class AutoIntModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        # Backbone
        self._backbone = AutoIntBackbone(self.hparams)
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # Head
        self._head = self._get_head_from_config()

Bases: BaseModel

Source code in src/pytorch_tabular/models/category_embedding/category_embedding_model.py
class CategoryEmbeddingModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        # Backbone
        self._backbone = CategoryEmbeddingBackbone(self.hparams)
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # Head
        self.head = self._get_head_from_config()

Bases: BaseModel

Source code in src/pytorch_tabular/models/danet/danet.py
class DANetModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        if self.hparams.virtual_batch_size > self.hparams.batch_size:
            warnings.warn(
                f"virtual_batch_size({self.hparams.virtual_batch_size}) is greater "
                f"than batch_size ({self.hparams.batch_size}). Setting virtual_batch_size "
                f"to {self.hparams.batch_size}. DANet uses Ghost Batch Normalization, "
                f"which works best when virtual_batch_size is small. Consider setting "
                "virtual_batch_size to something like 256 or 512."
            )
            self.hparams.virtual_batch_size = self.hparams.batch_size
        # Backbone
        self._backbone = DANetBackbone(
            cat_embedding_dims=self.hparams.embedding_dims,
            n_continuous_features=self.hparams.continuous_dim,
            n_layers=self.hparams.n_layers,
            abstlay_dim_1=self.hparams.abstlay_dim_1,
            abstlay_dim_2=self.hparams.abstlay_dim_2,
            k=self.hparams.k,
            dropout_rate=self.hparams.dropout_rate,
            block_activation=getattr(nn, self.hparams.block_activation)(),
            virtual_batch_size=self.hparams.virtual_batch_size,
            embedding_dropout=self.hparams.embedding_dropout,
            batch_norm_continuous_input=self.hparams.batch_norm_continuous_input,
        )
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # Head
        self._head = self._get_head_from_config()

Bases: BaseModel

Source code in src/pytorch_tabular/models/ft_transformer/ft_transformer.py
class FTTransformerModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        # Backbone
        self._backbone = FTTransformerBackbone(self.hparams)
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # Head
        self._head = self._get_head_from_config()

    def feature_importance(self):
        if self.hparams.attn_feature_importance:
            return super().feature_importance()
        else:
            raise ValueError("If you want Feature Importance, `attn_feature_weights` should be `True`.")

Bases: BaseModel

Source code in src/pytorch_tabular/models/gandalf/gandalf.py
class GANDALFModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        # Backbone
        self._backbone = GANDALFBackbone(
            cat_embedding_dims=self.hparams.embedding_dims,
            n_continuous_features=self.hparams.continuous_dim,
            gflu_stages=self.hparams.gflu_stages,
            gflu_dropout=self.hparams.gflu_dropout,
            gflu_feature_init_sparsity=self.hparams.gflu_feature_init_sparsity,
            learnable_sparsity=self.hparams.learnable_sparsity,
            batch_norm_continuous_input=self.hparams.batch_norm_continuous_input,
            embedding_dropout=self.hparams.embedding_dropout,
            virtual_batch_size=self.hparams.virtual_batch_size,
        )
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # Head
        self.T0 = nn.Parameter(torch.rand(self.hparams.output_dim), requires_grad=True)
        self._head = nn.Sequential(self._get_head_from_config(), Add(self.T0))

    def data_aware_initialization(self, datamodule):
        if self.hparams.task == "regression":
            logger.info("Data Aware Initialization of T0")
            # Need a big batch to initialize properly
            alt_loader = datamodule.train_dataloader(batch_size=self.hparams.data_aware_init_batch_size)
            batch = next(iter(alt_loader))
            self.T0.data = torch.mean(batch["target"], dim=0)

Bases: BaseModel

Source code in src/pytorch_tabular/models/gate/gate_model.py
class GatedAdditiveTreeEnsembleModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        # Backbone
        self._backbone = GatedAdditiveTreesBackbone(
            n_continuous_features=self.hparams.continuous_dim,
            cat_embedding_dims=self.hparams.embedding_dims,
            gflu_stages=self.hparams.gflu_stages,
            gflu_dropout=self.hparams.gflu_dropout,
            num_trees=self.hparams.num_trees,
            tree_depth=self.hparams.tree_depth,
            tree_dropout=self.hparams.tree_dropout,
            binning_activation=self.hparams.binning_activation,
            feature_mask_function=self.hparams.feature_mask_function,
            batch_norm_continuous_input=self.hparams.batch_norm_continuous_input,
            chain_trees=self.hparams.chain_trees,
            tree_wise_attention=self.hparams.tree_wise_attention,
            tree_wise_attention_dropout=self.hparams.tree_wise_attention_dropout,
            gflu_feature_init_sparsity=self.hparams.gflu_feature_init_sparsity,
            tree_feature_init_sparsity=self.hparams.tree_feature_init_sparsity,
            virtual_batch_size=self.hparams.virtual_batch_size,
        )
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # Head
        if self.hparams.num_trees == 0:
            self.T0 = nn.Parameter(torch.rand(self.hparams.output_dim), requires_grad=True)
            self._head = nn.Sequential(self._get_head_from_config(), Add(self.T0))
        else:
            self._head = CustomHead(self.backbone.output_dim, self.hparams)

    def data_aware_initialization(self, datamodule):
        if self.hparams.task == "regression":
            logger.info("Data Aware Initialization of T0")
            # Need a big batch to initialize properly
            alt_loader = datamodule.train_dataloader(batch_size=self.hparams.data_aware_init_batch_size)
            batch = next(iter(alt_loader))
            t0 = torch.mean(batch["target"], dim=0)
            if self.hparams.num_trees != 0:
                self.head.T0.data = t0
            else:
                self.T0.data = t0

Bases: BaseModel

Source code in src/pytorch_tabular/models/mixture_density/mdn.py
class MDNModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        assert "inferred_config" in kwargs, "inferred_config not found in initialization arguments"
        self.inferred_config = kwargs["inferred_config"]
        assert config.task == "regression", "MDN is only implemented for Regression"
        super().__init__(config, **kwargs)
        assert self.hparams.output_dim == 1, "MDN is not implemented for multi-targets"
        if config.target_range is not None:
            logger.warning("MDN does not use target range. Ignoring it.")
        self._val_output = []

    def _get_head_from_config(self):
        _head_callable = getattr(blocks, self.hparams.head)
        self.hparams.head_config.input_dim = self.backbone.output_dim
        return _head_callable(
            config=_head_callable._config_template(**self.hparams.head_config),
        )  # output_dim auto-calculated from other configs

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        # Backbone
        callable, config = (
            self.hparams.backbone_config_class,
            self.hparams.backbone_config_params,
        )
        try:
            callable = getattr(models, callable)
        except ModuleNotFoundError as e:
            logger.error(
                "`config class` in `backbone_config` is not valid."
                " The config class should be a valid module path from `models`."
                " e.g. `ft_transformer.FTTransformerConfig`."
            )
            raise e
        assert issubclass(callable, ModelConfig), "`config_class` should be a subclass of `ModelConfig`"
        backbone_config = callable(**config)
        backbone_callable = getattr_nested(backbone_config._module_src, backbone_config._backbone_name)
        # Merging the config and inferred config
        backbone_config = safe_merge_config(OmegaConf.structured(backbone_config), self.inferred_config)
        self._backbone = backbone_callable(backbone_config)
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # Head
        self._head = self._get_head_from_config()

    # Redefining forward because TabTransformer flow is slightly different
    def forward(self, x: Dict):
        if isinstance(self.backbone, TabTransformerBackbone):
            if self.hparams.categorical_dim > 0:
                x_cat = self.embed_input({"categorical": x["categorical"]})
            x = self.compute_backbone({"categorical": x_cat, "continuous": x["continuous"]})
        else:
            x = self.embedding_layer(x)
            x = self.compute_backbone(x)
        return self.compute_head(x)

        # Redefining compute_backbone because TabTransformer flow flow is slightly different

    def compute_backbone(self, x: Union[Dict, torch.Tensor]):
        # Returns output
        if isinstance(self.backbone, TabTransformerBackbone):
            x = self.backbone(x["categorical"], x["continuous"])
        else:
            x = self.backbone(x)
        return x

    def compute_head(self, x: Tensor):
        pi, sigma, mu = self.head(x)
        return {"pi": pi, "sigma": sigma, "mu": mu, "backbone_features": x}

    def predict(self, x: Dict):
        ret_value = self.forward(x)
        return self.head.generate_point_predictions(ret_value["pi"], ret_value["sigma"], ret_value["mu"])

    def sample(self, x: Dict, n_samples: Optional[int] = None, ret_model_output=False):
        ret_value = self.forward(x)
        samples = self.head.generate_samples(ret_value["pi"], ret_value["sigma"], ret_value["mu"], n_samples)
        if ret_model_output:
            return samples, ret_value
        else:
            return samples

    def calculate_loss(self, y, pi, sigma, mu, tag="train"):
        # NLL Loss
        log_prob = self.head.log_prob(pi, sigma, mu, y)
        loss = torch.mean(-log_prob)
        if self.head.hparams.weight_regularization is not None:
            sigma_l1_reg = 0
            pi_l1_reg = 0
            mu_l1_reg = 0
            if self.head.hparams.lambda_sigma > 0:
                # Weight Regularization Sigma
                sigma_params = torch.cat([x.view(-1) for x in self.head.sigma.parameters()])
                sigma_l1_reg = self.head.hparams.lambda_sigma * torch.norm(
                    sigma_params, self.head.hparams.weight_regularization
                )
            if self.head.hparams.lambda_pi > 0:
                pi_params = torch.cat([x.view(-1) for x in self.head.pi.parameters()])
                pi_l1_reg = self.head.hparams.lambda_pi * torch.norm(pi_params, self.head.hparams.weight_regularization)
            if self.head.hparams.lambda_mu > 0:
                mu_params = torch.cat([x.view(-1) for x in self.head.mu.parameters()])
                mu_l1_reg = self.head.hparams.lambda_mu * torch.norm(mu_params, self.head.hparams.weight_regularization)

            loss = loss + sigma_l1_reg + pi_l1_reg + mu_l1_reg
        self.log(
            f"{tag}_loss",
            loss,
            on_epoch=(tag == "valid") or (tag == "test"),
            on_step=(tag == "train"),
            # on_step=False,
            logger=True,
            prog_bar=True,
        )
        return loss

    def training_step(self, batch, batch_idx):
        y = batch["target"]
        ret_value = self(batch)
        loss = self.calculate_loss(y, ret_value["pi"], ret_value["sigma"], ret_value["mu"], tag="train")
        if self.head.hparams.speedup_training:
            pass
        else:
            y_hat = self.head.generate_point_predictions(ret_value["pi"], ret_value["sigma"], ret_value["mu"])
            self.calculate_metrics(y, y_hat, tag="train")
        return loss

    def validation_step(self, batch, batch_idx):
        y = batch["target"]
        ret_value = self(batch)
        self.calculate_loss(y, ret_value["pi"], ret_value["sigma"], ret_value["mu"], tag="valid")
        y_hat = self.head.generate_point_predictions(ret_value["pi"], ret_value["sigma"], ret_value["mu"])
        self.calculate_metrics(y, y_hat, tag="valid")
        return y_hat, y, ret_value

    def test_step(self, batch, batch_idx):
        y = batch["target"]
        ret_value = self(batch)
        self.calculate_loss(y, ret_value["pi"], ret_value["sigma"], ret_value["mu"], tag="test")
        y_hat = self.head.generate_point_predictions(ret_value["pi"], ret_value["sigma"], ret_value["mu"])
        self.calculate_metrics(y, y_hat, tag="test")
        return y_hat, y

    def on_validation_batch_end(self, outputs, batch, batch_idx: int) -> None:
        self._val_output.append(outputs)
        super().on_validation_batch_end(outputs, batch, batch_idx)

    def on_validation_epoch_end(self) -> None:
        pi = [
            nn.functional.gumbel_softmax(output[2]["pi"], tau=self.head.hparams.softmax_temperature, dim=-1)
            for output in self._val_output
        ]
        pi = torch.cat(pi).detach().cpu()
        for i in range(self.head.hparams.num_gaussian):
            self.log(
                f"mean_pi_{i}",
                pi[:, i].mean(),
                on_epoch=True,
                on_step=False,
                logger=True,
                prog_bar=False,
            )

        mu = [output[2]["mu"] for output in self._val_output]
        mu = torch.cat(mu).detach().cpu()
        for i in range(self.head.hparams.num_gaussian):
            self.log(
                f"mean_mu_{i}",
                mu[:, i].mean(),
                on_epoch=True,
                on_step=False,
                logger=True,
                prog_bar=False,
            )

        sigma = [output[2]["sigma"] for output in self._val_output]
        sigma = torch.cat(sigma).detach().cpu()
        for i in range(self.head.hparams.num_gaussian):
            self.log(
                f"mean_sigma_{i}",
                sigma[:, i].mean(),
                on_epoch=True,
                on_step=False,
                logger=True,
                prog_bar=False,
            )
        if self.do_log_logits:
            logits = [output[0] for output in self._val_output]
            logits = torch.cat(logits).detach().cpu()
            fig = self.create_plotly_histogram(logits.unsqueeze(1), "logits")
            wandb.log(
                {
                    "valid_logits": fig,
                    "global_step": self.global_step,
                },
                commit=False,
            )
            if self.head.hparams.log_debug_plot:
                fig = self.create_plotly_histogram(pi, "pi", bin_dict={"start": 0.0, "end": 1.0, "size": 0.1})
                wandb.log(
                    {
                        "valid_pi": fig,
                        "global_step": self.global_step,
                    },
                    commit=False,
                )

                fig = self.create_plotly_histogram(mu, "mu")
                wandb.log(
                    {
                        "valid_mu": fig,
                        "global_step": self.global_step,
                    },
                    commit=False,
                )

                fig = self.create_plotly_histogram(sigma, "sigma")
                wandb.log(
                    {
                        "valid_sigma": fig,
                        "global_step": self.global_step,
                    },
                    commit=False,
                )
        self._val_output = []

Bases: BaseModel

Source code in src/pytorch_tabular/models/node/node_model.py
class NODEModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    def subset(self, x):
        return x[..., : self.hparams.output_dim].mean(dim=-2)

    def data_aware_initialization(self, datamodule):
        """执行针对 NODE 的数据感知初始化."""
        logger.info(
            "Data Aware Initialization of NODE using a forward pass with "
            f"{self.hparams.data_aware_init_batch_size} batch size...."
        )
        # Need a big batch to initialize properly
        alt_loader = datamodule.train_dataloader(batch_size=self.hparams.data_aware_init_batch_size)
        batch = next(iter(alt_loader))
        for k, v in batch.items():
            if isinstance(v, list) and (len(v) == 0):
                # Skipping empty list
                continue
            # batch[k] = v.to("cpu" if self.config.gpu == 0 else "cuda")
            batch[k] = v.to(self.device)

        # single forward pass to initialize the ODST
        with torch.no_grad():
            self(batch)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        self._backbone = NODEBackbone(self.hparams)
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # average first n channels of every tree, where n is the number of output targets for regression
        # and number of classes for classification
        # Not using config head because NODE has a specific head
        warnings.warn("Ignoring head config because NODE has a specific head which subsets the tree outputs")
        self._head = Lambda(self.subset)

data_aware_initialization(datamodule)

执行针对 NODE 的数据感知初始化.

Source code in src/pytorch_tabular/models/node/node_model.py
def data_aware_initialization(self, datamodule):
    """执行针对 NODE 的数据感知初始化."""
    logger.info(
        "Data Aware Initialization of NODE using a forward pass with "
        f"{self.hparams.data_aware_init_batch_size} batch size...."
    )
    # Need a big batch to initialize properly
    alt_loader = datamodule.train_dataloader(batch_size=self.hparams.data_aware_init_batch_size)
    batch = next(iter(alt_loader))
    for k, v in batch.items():
        if isinstance(v, list) and (len(v) == 0):
            # Skipping empty list
            continue
        # batch[k] = v.to("cpu" if self.config.gpu == 0 else "cuda")
        batch[k] = v.to(self.device)

    # single forward pass to initialize the ODST
    with torch.no_grad():
        self(batch)

Bases: BaseModel

Source code in src/pytorch_tabular/models/tabnet/tabnet_model.py
class TabNetModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        assert config.task in [
            "regression",
            "classification",
        ], "TabNet is only implemented for Regression and Classification"
        super().__init__(config, **kwargs)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        # TabNet has its own embedding layer.
        # So we are not using the embedding layer from BaseModel
        self._embedding_layer = nn.Identity()
        self._backbone = TabNetBackbone(self.hparams)
        setattr(self.backbone, "output_dim", self.hparams.output_dim)
        # TabNet has its own head
        self._head = nn.Identity()

    def extract_embedding(self):
        raise ValueError("Extracting Embeddings is not supported by Tabnet. Please use another" " compatible model")

Bases: BaseModel

Source code in src/pytorch_tabular/models/tab_transformer/tab_transformer.py
class TabTransformerModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        # Backbone
        self._backbone = TabTransformerBackbone(self.hparams)
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # Head
        self._head = self._get_head_from_config()

    # Redefining forward because this model flow is slightly different
    def forward(self, x: Dict):
        if self.hparams.categorical_dim > 0:
            x_cat = self.embed_input({"categorical": x["categorical"]})
        else:
            x_cat = None
        x = self.compute_backbone({"categorical": x_cat, "continuous": x["continuous"]})
        return self.compute_head(x)

    # Redefining compute_backbone because this model flow is slightly different
    def compute_backbone(self, x: Dict):
        # Returns output
        x = self.backbone(x["categorical"], x["continuous"])
        return x

基础模型类

Bases: LightningModule

Source code in src/pytorch_tabular/models/base_model.py
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
class BaseModel(pl.LightningModule, metaclass=ABCMeta):
    def __init__(
        self,
        config: DictConfig,
        custom_loss: Optional[torch.nn.Module] = None,
        custom_metrics: Optional[List[Callable]] = None,
        custom_metrics_prob_inputs: Optional[List[bool]] = None,
        custom_optimizer: Optional[torch.optim.Optimizer] = None,
        custom_optimizer_params: Dict = {},
        **kwargs,
    ):
        """    PyTorch Tabular 的基础模型.

Parameters:
    config (DictConfig): 模型的配置.
    custom_loss (Optional[torch.nn.Module], optional): 自定义损失函数.默认为 None.
    custom_metrics (Optional[List[Callable]], optional): 自定义指标列表.默认为 None.
    custom_metrics_prob_inputs (Optional[List[bool]], optional): 布尔值列表,指示指标是否需要概率输入.默认为 None.
    custom_optimizer (Optional[torch.optim.Optimizer], optional): 自定义优化器,可为可调用对象或导入的字符串.默认为 None.
    custom_optimizer_params (Dict, optional): 自定义优化器参数的字典.默认为 {}.
    kwargs (Dict, optional): 其他关键字参数.
"""
        super().__init__()
        assert "inferred_config" in kwargs, "inferred_config not found in initialization arguments"
        inferred_config = kwargs["inferred_config"]
        # Merging the config and inferred config
        config = safe_merge_config(config, inferred_config)
        self.custom_loss = custom_loss
        self.custom_metrics = custom_metrics
        self.custom_metrics_prob_inputs = custom_metrics_prob_inputs
        self.custom_optimizer = custom_optimizer
        self.custom_optimizer_params = custom_optimizer_params
        self.kwargs = kwargs
        # Updating config with custom parameters for experiment tracking
        if self.custom_loss is not None:
            config.loss = str(self.custom_loss)
        if self.custom_metrics is not None:
            # Adding metrics to config for hparams logging and tracking
            config.metrics = []
            config.metrics_params = []
            for metric in self.custom_metrics:
                if isinstance(metric, partial):
                    # extracting func names from partial functions
                    config.metrics.append(metric.func.__name__)
                    config.metrics_params.append(metric.keywords)
                else:
                    config.metrics.append(metric.__name__)
                    config.metrics_params.append(vars(metric))
            if config.task == "classification":
                config.metrics_prob_input = self.custom_metrics_prob_inputs
                for i, mp in enumerate(config.metrics_params):
                    mp.sub_params_list = []
                    for j, num_classes in enumerate(inferred_config.output_cardinality):
                        config.metrics_params[i].sub_params_list.append(
                            OmegaConf.create(
                                {
                                    "task": mp.get("task", "multiclass"),
                                    "num_classes": mp.get("num_classes", num_classes),
                                }
                            )
                        )

        # Updating default metrics in config
        elif config.task == "classification":
            # Adding metric_params to config for classification task
            for i, mp in enumerate(config.metrics_params):
                mp.sub_params_list = []
                for j, num_classes in enumerate(inferred_config.output_cardinality):
                    # config.metrics_params[i][j]["task"] = mp.get("task", "multiclass")
                    # config.metrics_params[i][j]["num_classes"] = mp.get("num_classes", num_classes)

                    config.metrics_params[i].sub_params_list.append(
                        OmegaConf.create(
                            {"task": mp.get("task", "multiclass"), "num_classes": mp.get("num_classes", num_classes)}
                        )
                    )

                    if config.metrics[i] in (
                        "accuracy",
                        "precision",
                        "recall",
                        "precision_recall",
                        "specificity",
                        "f1_score",
                        "fbeta_score",
                    ):
                        config.metrics_params[i].sub_params_list[j]["top_k"] = mp.get("top_k", 1)

        if self.custom_optimizer is not None:
            config.optimizer = str(self.custom_optimizer.__class__.__name__)
        if len(self.custom_optimizer_params) > 0:
            config.optimizer_params = self.custom_optimizer_params
        self.save_hyperparameters(config)
        # The concatenated output dim of the embedding layer
        self._build_network()
        self._setup_loss()
        self._setup_metrics()
        self._check_and_verify()
        self.do_log_logits = (
            hasattr(self.hparams, "log_logits") and self.hparams.log_logits and self.hparams.log_target == "wandb"
        )
        if self.do_log_logits:
            self._val_logits = []
        if not WANDB_INSTALLED and self.do_log_logits:
            self.do_log_logits = False
            warnings.warn(
                "Wandb is not installed. Please install wandb to log logits. "
                "You can install wandb using pip install wandb or install PyTorch Tabular"
                " using pip install pytorch-tabular[extra]"
            )
        if not PLOTLY_INSTALLED and self.do_log_logits:
            self.do_log_logits = False
            warnings.warn(
                "Plotly is not installed. Please install plotly to log logits. "
                "You can install plotly using pip install plotly or install PyTorch Tabular"
                " using pip install pytorch-tabular[extra]"
            )

    @abstractmethod
    def _build_network(self):
        pass

    @property
    def backbone(self):
        raise NotImplementedError("backbone property needs to be implemented by inheriting classes")

    @property
    def embedding_layer(self):
        raise NotImplementedError("embedding_layer property needs to be implemented by inheriting classes")

    @property
    def head(self):
        raise NotImplementedError("head property needs to be implemented by inheriting classes")

    def _check_and_verify(self):
        assert hasattr(self, "backbone"), "Model has no attribute called `backbone`"
        assert hasattr(self.backbone, "output_dim"), "Backbone needs to have attribute `output_dim`"
        assert hasattr(self, "head"), "Model has no attribute called `head`"

    def _get_head_from_config(self):
        _head_callable = getattr(blocks, self.hparams.head)
        return _head_callable(
            in_units=self.backbone.output_dim,
            output_dim=self.hparams.output_dim,
            config=_head_callable._config_template(**self.hparams.head_config),
        )  # output_dim auto-calculated from other configs

    def _setup_loss(self):
        if self.custom_loss is None:
            try:
                self.loss = getattr(nn, self.hparams.loss)()
            except AttributeError as e:
                logger.error(f"{self.hparams.loss} is not a valid loss defined in the torch.nn module")
                raise e
        else:
            self.loss = self.custom_loss

    def _setup_metrics(self):
        if self.custom_metrics is None:
            self.metrics = []
            task_module = torchmetrics.functional
            for metric in self.hparams.metrics:
                try:
                    self.metrics.append(getattr(task_module, metric))
                except AttributeError as e:
                    logger.error(
                        f"{metric} is not a valid functional metric defined in the torchmetrics.functional module"
                    )
                    raise e
        else:
            self.metrics = self.custom_metrics

    def calculate_loss(self, output: Dict, y: torch.Tensor, tag: str) -> torch.Tensor:
        """计算模型的损失.

Parameters:
    output (Dict): 模型输出的字典
    y (torch.Tensor): 目标张量
    tag (str): 用于日志记录的标签

Returns:
    torch.Tensor: 损失值
"""
        y_hat = output["logits"]
        reg_terms = [k for k, v in output.items() if "regularization" in k]
        reg_loss = 0
        for t in reg_terms:
            # Log only if non-zero
            if output[t] != 0:
                reg_loss += output[t]
                self.log(
                    f"{tag}_{t}_loss",
                    output[t],
                    on_epoch=True,
                    on_step=False,
                    logger=True,
                    prog_bar=False,
                )
        if self.hparams.task == "regression":
            computed_loss = reg_loss
            for i in range(self.hparams.output_dim):
                _loss = self.loss(y_hat[:, i], y[:, i])
                computed_loss += _loss
                if self.hparams.output_dim > 1:
                    self.log(
                        f"{tag}_loss_{i}",
                        _loss,
                        on_epoch=True,
                        on_step=False,
                        logger=True,
                        prog_bar=False,
                    )
        else:
            # TODO loss fails with batch size of 1?
            computed_loss = reg_loss
            start_index = 0
            for i in range(len(self.hparams.output_cardinality)):
                end_index = start_index + self.hparams.output_cardinality[i]
                _loss = self.loss(y_hat[:, start_index:end_index], y[:, i])
                computed_loss += _loss
                if self.hparams.output_dim > 1:
                    self.log(
                        f"{tag}_loss_{i}",
                        _loss,
                        on_epoch=True,
                        on_step=False,
                        logger=True,
                        prog_bar=False,
                    )
                start_index = end_index
        self.log(
            f"{tag}_loss",
            computed_loss,
            on_epoch=(tag in ["valid", "test"]),
            on_step=(tag == "train"),
            # on_step=False,
            logger=True,
            prog_bar=True,
        )
        return computed_loss

    def calculate_metrics(self, y: torch.Tensor, y_hat: torch.Tensor, tag: str) -> List[torch.Tensor]:
        """    计算模型的各项指标.

Parameters:
    y (torch.Tensor): 目标张量

    y_hat (torch.Tensor): 预测张量

    tag (str): 用于日志记录的标签

Returns:
    List[torch.Tensor]: 指标值列表
"""
        metrics = []
        for metric, metric_str, prob_inp, metric_params in zip(
            self.metrics,
            self.hparams.metrics,
            self.hparams.metrics_prob_input,
            self.hparams.metrics_params,
        ):
            if self.hparams.task == "regression":
                _metrics = []
                for i in range(self.hparams.output_dim):
                    name = metric.func.__name__ if isinstance(metric, partial) else metric.__name__
                    if name == torchmetrics.functional.mean_squared_log_error.__name__:
                        # MSLE should only be used in strictly positive targets. It is undefined otherwise
                        _metric = metric(
                            torch.clamp(y_hat[:, i], min=0),
                            torch.clamp(y[:, i], min=0),
                            **metric_params,
                        )
                    else:
                        _metric = metric(y_hat[:, i], y[:, i], **metric_params)
                    if self.hparams.output_dim > 1:
                        self.log(
                            f"{tag}_{metric_str}_{i}",
                            _metric,
                            on_epoch=True,
                            on_step=False,
                            logger=True,
                            prog_bar=False,
                        )
                    _metrics.append(_metric)
                avg_metric = torch.stack(_metrics, dim=0).sum()
            else:
                _metrics = []
                start_index = 0
                for i, cardinality in enumerate(self.hparams.output_cardinality):
                    end_index = start_index + cardinality
                    y_hat_i = nn.Softmax(dim=-1)(y_hat[:, start_index:end_index].squeeze())
                    if prob_inp:
                        _metric = metric(y_hat_i, y[:, i : i + 1].squeeze(), **metric_params.sub_params_list[i])
                    else:
                        _metric = metric(
                            torch.argmax(y_hat_i, dim=-1), y[:, i : i + 1].squeeze(), **metric_params.sub_params_list[i]
                        )
                    if len(self.hparams.output_cardinality) > 1:
                        self.log(
                            f"{tag}_{metric_str}_{i}",
                            _metric,
                            on_epoch=True,
                            on_step=False,
                            logger=True,
                            prog_bar=False,
                        )
                    _metrics.append(_metric)
                    start_index = end_index
                avg_metric = torch.stack(_metrics, dim=0).sum()
            metrics.append(avg_metric)
            self.log(
                f"{tag}_{metric_str}",
                avg_metric,
                on_epoch=True,
                on_step=False,
                logger=True,
                prog_bar=True,
            )
        return metrics

    def data_aware_initialization(self, datamodule):
        """在定义模型时执行数据感知初始化."""
        pass

    def compute_backbone(self, x: Dict) -> torch.Tensor:
        # Returns output
        x = self.backbone(x)
        return x

    def embed_input(self, x: Dict) -> torch.Tensor:
        return self.embedding_layer(x)

    def apply_output_sigmoid_scaling(self, y_hat: torch.Tensor) -> torch.Tensor:
        """对模型输出应用Sigmoid缩放(如果任务是回归且目标范围已定义).

Parameters:
    y_hat (torch.Tensor): 模型的输出

Returns:
    torch.Tensor: 应用了Sigmoid缩放的模型输出
"""
        if (self.hparams.task == "regression") and (self.hparams.target_range is not None):
            for i in range(self.hparams.output_dim):
                y_min, y_max = self.hparams.target_range[i]
                y_hat[:, i] = y_min + nn.Sigmoid()(y_hat[:, i]) * (y_max - y_min)
        return y_hat

    def pack_output(self, y_hat: torch.Tensor, backbone_features: torch.tensor) -> Dict[str, Any]:
        """打包模型的输出.

Parameters:
    y_hat (torch.Tensor): 模型的输出

    backbone_features (torch.tensor): 主干网络的特征

Returns:
    打包后的模型输出
"""
        # if self.head is the Identity function it means that we cannot extract backbone features,
        # because the model cannot be divide in backbone and head (i.e. TabNet)
        if type(self.head) is nn.Identity:
            return {"logits": y_hat}
        return {"logits": y_hat, "backbone_features": backbone_features}

    def compute_head(self, backbone_features: Tensor) -> Dict[str, Any]:
        """    计算模型的头部.

Parameters:
    backbone_features (Tensor): 主干网络的特征

Returns:
    模型的输出
"""
        y_hat = self.head(backbone_features)
        y_hat = self.apply_output_sigmoid_scaling(y_hat)
        return self.pack_output(y_hat, backbone_features)

    def forward(self, x: Dict) -> Dict[str, Any]:
        """   模型的前向传播.

Parameters:
    x (Dict): 模型的输入,包含'continuous'和'categorical'键
"""
        x = self.embed_input(x)
        x = self.compute_backbone(x)
        return self.compute_head(x)

    def predict(self, x: Dict, ret_model_output: bool = False) -> Union[torch.Tensor, Tuple[torch.Tensor, Dict]]:
        """    预测模型的输出.

Parameters:
    x (Dict): 模型的输入,包含'continuous'和'categorical'键

    ret_model_output (bool): 如果为True,方法返回模型的输出

Returns:
    模型的输出
"""
        assert self.hparams.task != "ssl", "It's not allowed to use the method predict in case of ssl task"
        ret_value = self.forward(x)
        if ret_model_output:
            return ret_value.get("logits"), ret_value
        return ret_value.get("logits")

    def forward_pass(self, batch):
        return self(batch), None

    def extract_embedding(self):
        """提取模型的嵌入.

这在 `CategoricalEmbeddingTransformer` 中使用
"""
        if self.hparams.categorical_dim > 0:
            if not isinstance(self.embedding_layer, PreEncoded1dLayer):
                return self.embedding_layer.cat_embedding_layers
            else:
                raise ValueError(
                    "Cannot extract embedding for PreEncoded1dLayer. Please use a different embedding layer."
                )
        else:
            raise ValueError(
                "Model has been trained with no categorical feature and therefore can't be used"
                " as a Categorical Encoder"
            )

    def training_step(self, batch, batch_idx):
        output, y = self.forward_pass(batch)
        # y is not None for SSL task.Rest of the tasks target is
        # fetched from the batch
        y = batch["target"] if y is None else y
        y_hat = output["logits"]
        loss = self.calculate_loss(output, y, tag="train")
        self.calculate_metrics(y, y_hat, tag="train")
        return loss

    def validation_step(self, batch, batch_idx):
        with torch.no_grad():
            output, y = self.forward_pass(batch)
            # y is not None for SSL task.Rest of the tasks target is
            # fetched from the batch
            y = batch["target"] if y is None else y
            y_hat = output["logits"]
            self.calculate_loss(output, y, tag="valid")
            self.calculate_metrics(y, y_hat, tag="valid")
        return y_hat, y

    def test_step(self, batch, batch_idx):
        with torch.no_grad():
            output, y = self.forward_pass(batch)
            # y is not None for SSL task.Rest of the tasks target is
            # fetched from the batch
            y = batch["target"] if y is None else y
            y_hat = output["logits"]
            self.calculate_loss(output, y, tag="test")
            self.calculate_metrics(y, y_hat, tag="test")
        return y_hat, y

    def configure_optimizers(self):
        if self.custom_optimizer is None:
            # Loading from the config
            try:
                self._optimizer = _create_optimizer(self.hparams.optimizer)
                opt = self._optimizer(
                    self.parameters(),
                    lr=self.hparams.learning_rate,
                    **self.hparams.optimizer_params,
                )
            except AttributeError as e:
                logger.error(f"{self.hparams.optimizer} is not a valid optimizer defined in the torch.optim module")
                raise e
        else:
            # Loading from custom fit arguments
            self._optimizer = _create_optimizer(self.custom_optimizer)

            opt = self._optimizer(
                self.parameters(),
                lr=self.hparams.learning_rate,
                **self.custom_optimizer_params,
            )
        if self.hparams.lr_scheduler is not None:
            try:
                self._lr_scheduler = getattr(torch.optim.lr_scheduler, self.hparams.lr_scheduler)
            except AttributeError as e:
                logger.error(
                    f"{self.hparams.lr_scheduler} is not a valid learning rate sheduler defined"
                    f" in the torch.optim.lr_scheduler module"
                )
                raise e
            if isinstance(self._lr_scheduler, torch.optim.lr_scheduler._LRScheduler):
                return {
                    "optimizer": opt,
                    "lr_scheduler": self._lr_scheduler(opt, **self.hparams.lr_scheduler_params),
                }
            return {
                "optimizer": opt,
                "lr_scheduler": self._lr_scheduler(opt, **self.hparams.lr_scheduler_params),
                "monitor": self.hparams.lr_scheduler_monitor_metric,
            }
        else:
            return opt

    def create_plotly_histogram(self, arr, name, bin_dict=None):
        fig = go.Figure()
        for i in range(arr.shape[-1]):
            fig.add_trace(
                go.Histogram(
                    x=arr[:, i],
                    histnorm="probability",
                    name=f"{name}_{i}",
                    xbins=bin_dict,  # dict(start=0.0, end=1.0, size=0.1),  # bins used for histogram
                )
            )
        # Overlay both histograms
        fig.update_layout(
            barmode="overlay",
            legend={"orientation": "h", "yanchor": "bottom", "y": 1.02, "xanchor": "right", "x": 1},
        )
        # Reduce opacity to see both histograms
        fig.update_traces(opacity=0.5)
        return fig

    def on_validation_batch_end(self, outputs, batch, batch_idx: int) -> None:
        if self.do_log_logits:
            self._val_logits.append(outputs[0][0])
        super().on_validation_batch_end(outputs, batch, batch_idx)

    def on_validation_epoch_end(self) -> None:
        if self.do_log_logits:
            logits = torch.cat(self._val_logits).detach().cpu()
            self._val_logits = []
            fig = self.create_plotly_histogram(logits, "logits")
            wandb.log(
                {"valid_logits": wandb.Plotly(fig), "global_step": self.global_step},
                commit=False,
            )
        super().on_validation_epoch_end()

    def reset_weights(self):
        reset_all_weights(self.backbone)
        reset_all_weights(self.head)
        reset_all_weights(self.embedding_layer)

    def feature_importance(self) -> DataFrame:
        """返回一个包含模型特征重要性的数据框."""
        if hasattr(self.backbone, "feature_importance_"):
            imp = self.backbone.feature_importance_
            n_feat = len(self.hparams.categorical_cols + self.hparams.continuous_cols)
            if self.hparams.categorical_dim > 0:
                if imp.shape[0] != n_feat:
                    # Combining Cat Embedded Dimensions to a single one by averaging
                    wt = []
                    norm = []
                    ft_idx = 0
                    for _, embd_dim in self.hparams.embedding_dims:
                        wt.extend([ft_idx] * embd_dim)
                        norm.append(embd_dim)
                        ft_idx += 1
                    for _ in self.hparams.continuous_cols:
                        wt.extend([ft_idx])
                        norm.append(1)
                        ft_idx += 1
                    imp = np.bincount(wt, weights=imp) / np.array(norm)
                else:
                    # For models like FTTransformer, we dont need to do anything
                    # It takes categorical and continuous as individual 2-D features
                    pass
            importance_df = DataFrame(
                {
                    "Features": self.hparams.categorical_cols + self.hparams.continuous_cols,
                    "importance": imp,
                }
            )
            return importance_df
        else:
            raise ValueError("Feature Importance unavailable for this model.")

__init__(config, custom_loss=None, custom_metrics=None, custom_metrics_prob_inputs=None, custom_optimizer=None, custom_optimizer_params={}, **kwargs)

PyTorch Tabular 的基础模型.

Parameters:

Name Type Description Default
config DictConfig

模型的配置.

required
custom_loss Optional[Module]

自定义损失函数.默认为 None.

None
custom_metrics Optional[List[Callable]]

自定义指标列表.默认为 None.

None
custom_metrics_prob_inputs Optional[List[bool]]

布尔值列表,指示指标是否需要概率输入.默认为 None.

None
custom_optimizer Optional[Optimizer]

自定义优化器,可为可调用对象或导入的字符串.默认为 None.

None
custom_optimizer_params Dict

自定义优化器参数的字典.默认为 {}.

{}
kwargs Dict

其他关键字参数.

{}
Source code in src/pytorch_tabular/models/base_model.py
    def __init__(
        self,
        config: DictConfig,
        custom_loss: Optional[torch.nn.Module] = None,
        custom_metrics: Optional[List[Callable]] = None,
        custom_metrics_prob_inputs: Optional[List[bool]] = None,
        custom_optimizer: Optional[torch.optim.Optimizer] = None,
        custom_optimizer_params: Dict = {},
        **kwargs,
    ):
        """    PyTorch Tabular 的基础模型.

Parameters:
    config (DictConfig): 模型的配置.
    custom_loss (Optional[torch.nn.Module], optional): 自定义损失函数.默认为 None.
    custom_metrics (Optional[List[Callable]], optional): 自定义指标列表.默认为 None.
    custom_metrics_prob_inputs (Optional[List[bool]], optional): 布尔值列表,指示指标是否需要概率输入.默认为 None.
    custom_optimizer (Optional[torch.optim.Optimizer], optional): 自定义优化器,可为可调用对象或导入的字符串.默认为 None.
    custom_optimizer_params (Dict, optional): 自定义优化器参数的字典.默认为 {}.
    kwargs (Dict, optional): 其他关键字参数.
"""
        super().__init__()
        assert "inferred_config" in kwargs, "inferred_config not found in initialization arguments"
        inferred_config = kwargs["inferred_config"]
        # Merging the config and inferred config
        config = safe_merge_config(config, inferred_config)
        self.custom_loss = custom_loss
        self.custom_metrics = custom_metrics
        self.custom_metrics_prob_inputs = custom_metrics_prob_inputs
        self.custom_optimizer = custom_optimizer
        self.custom_optimizer_params = custom_optimizer_params
        self.kwargs = kwargs
        # Updating config with custom parameters for experiment tracking
        if self.custom_loss is not None:
            config.loss = str(self.custom_loss)
        if self.custom_metrics is not None:
            # Adding metrics to config for hparams logging and tracking
            config.metrics = []
            config.metrics_params = []
            for metric in self.custom_metrics:
                if isinstance(metric, partial):
                    # extracting func names from partial functions
                    config.metrics.append(metric.func.__name__)
                    config.metrics_params.append(metric.keywords)
                else:
                    config.metrics.append(metric.__name__)
                    config.metrics_params.append(vars(metric))
            if config.task == "classification":
                config.metrics_prob_input = self.custom_metrics_prob_inputs
                for i, mp in enumerate(config.metrics_params):
                    mp.sub_params_list = []
                    for j, num_classes in enumerate(inferred_config.output_cardinality):
                        config.metrics_params[i].sub_params_list.append(
                            OmegaConf.create(
                                {
                                    "task": mp.get("task", "multiclass"),
                                    "num_classes": mp.get("num_classes", num_classes),
                                }
                            )
                        )

        # Updating default metrics in config
        elif config.task == "classification":
            # Adding metric_params to config for classification task
            for i, mp in enumerate(config.metrics_params):
                mp.sub_params_list = []
                for j, num_classes in enumerate(inferred_config.output_cardinality):
                    # config.metrics_params[i][j]["task"] = mp.get("task", "multiclass")
                    # config.metrics_params[i][j]["num_classes"] = mp.get("num_classes", num_classes)

                    config.metrics_params[i].sub_params_list.append(
                        OmegaConf.create(
                            {"task": mp.get("task", "multiclass"), "num_classes": mp.get("num_classes", num_classes)}
                        )
                    )

                    if config.metrics[i] in (
                        "accuracy",
                        "precision",
                        "recall",
                        "precision_recall",
                        "specificity",
                        "f1_score",
                        "fbeta_score",
                    ):
                        config.metrics_params[i].sub_params_list[j]["top_k"] = mp.get("top_k", 1)

        if self.custom_optimizer is not None:
            config.optimizer = str(self.custom_optimizer.__class__.__name__)
        if len(self.custom_optimizer_params) > 0:
            config.optimizer_params = self.custom_optimizer_params
        self.save_hyperparameters(config)
        # The concatenated output dim of the embedding layer
        self._build_network()
        self._setup_loss()
        self._setup_metrics()
        self._check_and_verify()
        self.do_log_logits = (
            hasattr(self.hparams, "log_logits") and self.hparams.log_logits and self.hparams.log_target == "wandb"
        )
        if self.do_log_logits:
            self._val_logits = []
        if not WANDB_INSTALLED and self.do_log_logits:
            self.do_log_logits = False
            warnings.warn(
                "Wandb is not installed. Please install wandb to log logits. "
                "You can install wandb using pip install wandb or install PyTorch Tabular"
                " using pip install pytorch-tabular[extra]"
            )
        if not PLOTLY_INSTALLED and self.do_log_logits:
            self.do_log_logits = False
            warnings.warn(
                "Plotly is not installed. Please install plotly to log logits. "
                "You can install plotly using pip install plotly or install PyTorch Tabular"
                " using pip install pytorch-tabular[extra]"
            )

apply_output_sigmoid_scaling(y_hat)

对模型输出应用Sigmoid缩放(如果任务是回归且目标范围已定义).

Parameters:

Name Type Description Default
y_hat Tensor

模型的输出

required

Returns:

Type Description
Tensor

torch.Tensor: 应用了Sigmoid缩放的模型输出

Source code in src/pytorch_tabular/models/base_model.py
    def apply_output_sigmoid_scaling(self, y_hat: torch.Tensor) -> torch.Tensor:
        """对模型输出应用Sigmoid缩放(如果任务是回归且目标范围已定义).

Parameters:
    y_hat (torch.Tensor): 模型的输出

Returns:
    torch.Tensor: 应用了Sigmoid缩放的模型输出
"""
        if (self.hparams.task == "regression") and (self.hparams.target_range is not None):
            for i in range(self.hparams.output_dim):
                y_min, y_max = self.hparams.target_range[i]
                y_hat[:, i] = y_min + nn.Sigmoid()(y_hat[:, i]) * (y_max - y_min)
        return y_hat

calculate_loss(output, y, tag)

计算模型的损失.

Parameters:

Name Type Description Default
output Dict

模型输出的字典

required
y Tensor

目标张量

required
tag str

用于日志记录的标签

required

Returns:

Type Description
Tensor

torch.Tensor: 损失值

Source code in src/pytorch_tabular/models/base_model.py
    def calculate_loss(self, output: Dict, y: torch.Tensor, tag: str) -> torch.Tensor:
        """计算模型的损失.

Parameters:
    output (Dict): 模型输出的字典
    y (torch.Tensor): 目标张量
    tag (str): 用于日志记录的标签

Returns:
    torch.Tensor: 损失值
"""
        y_hat = output["logits"]
        reg_terms = [k for k, v in output.items() if "regularization" in k]
        reg_loss = 0
        for t in reg_terms:
            # Log only if non-zero
            if output[t] != 0:
                reg_loss += output[t]
                self.log(
                    f"{tag}_{t}_loss",
                    output[t],
                    on_epoch=True,
                    on_step=False,
                    logger=True,
                    prog_bar=False,
                )
        if self.hparams.task == "regression":
            computed_loss = reg_loss
            for i in range(self.hparams.output_dim):
                _loss = self.loss(y_hat[:, i], y[:, i])
                computed_loss += _loss
                if self.hparams.output_dim > 1:
                    self.log(
                        f"{tag}_loss_{i}",
                        _loss,
                        on_epoch=True,
                        on_step=False,
                        logger=True,
                        prog_bar=False,
                    )
        else:
            # TODO loss fails with batch size of 1?
            computed_loss = reg_loss
            start_index = 0
            for i in range(len(self.hparams.output_cardinality)):
                end_index = start_index + self.hparams.output_cardinality[i]
                _loss = self.loss(y_hat[:, start_index:end_index], y[:, i])
                computed_loss += _loss
                if self.hparams.output_dim > 1:
                    self.log(
                        f"{tag}_loss_{i}",
                        _loss,
                        on_epoch=True,
                        on_step=False,
                        logger=True,
                        prog_bar=False,
                    )
                start_index = end_index
        self.log(
            f"{tag}_loss",
            computed_loss,
            on_epoch=(tag in ["valid", "test"]),
            on_step=(tag == "train"),
            # on_step=False,
            logger=True,
            prog_bar=True,
        )
        return computed_loss

calculate_metrics(y, y_hat, tag)

计算模型的各项指标.

Parameters:

Name Type Description Default
y Tensor

目标张量

required
y_hat Tensor

预测张量

required
tag str

用于日志记录的标签

required

Returns:

Type Description
List[Tensor]

List[torch.Tensor]: 指标值列表

Source code in src/pytorch_tabular/models/base_model.py
    def calculate_metrics(self, y: torch.Tensor, y_hat: torch.Tensor, tag: str) -> List[torch.Tensor]:
        """    计算模型的各项指标.

Parameters:
    y (torch.Tensor): 目标张量

    y_hat (torch.Tensor): 预测张量

    tag (str): 用于日志记录的标签

Returns:
    List[torch.Tensor]: 指标值列表
"""
        metrics = []
        for metric, metric_str, prob_inp, metric_params in zip(
            self.metrics,
            self.hparams.metrics,
            self.hparams.metrics_prob_input,
            self.hparams.metrics_params,
        ):
            if self.hparams.task == "regression":
                _metrics = []
                for i in range(self.hparams.output_dim):
                    name = metric.func.__name__ if isinstance(metric, partial) else metric.__name__
                    if name == torchmetrics.functional.mean_squared_log_error.__name__:
                        # MSLE should only be used in strictly positive targets. It is undefined otherwise
                        _metric = metric(
                            torch.clamp(y_hat[:, i], min=0),
                            torch.clamp(y[:, i], min=0),
                            **metric_params,
                        )
                    else:
                        _metric = metric(y_hat[:, i], y[:, i], **metric_params)
                    if self.hparams.output_dim > 1:
                        self.log(
                            f"{tag}_{metric_str}_{i}",
                            _metric,
                            on_epoch=True,
                            on_step=False,
                            logger=True,
                            prog_bar=False,
                        )
                    _metrics.append(_metric)
                avg_metric = torch.stack(_metrics, dim=0).sum()
            else:
                _metrics = []
                start_index = 0
                for i, cardinality in enumerate(self.hparams.output_cardinality):
                    end_index = start_index + cardinality
                    y_hat_i = nn.Softmax(dim=-1)(y_hat[:, start_index:end_index].squeeze())
                    if prob_inp:
                        _metric = metric(y_hat_i, y[:, i : i + 1].squeeze(), **metric_params.sub_params_list[i])
                    else:
                        _metric = metric(
                            torch.argmax(y_hat_i, dim=-1), y[:, i : i + 1].squeeze(), **metric_params.sub_params_list[i]
                        )
                    if len(self.hparams.output_cardinality) > 1:
                        self.log(
                            f"{tag}_{metric_str}_{i}",
                            _metric,
                            on_epoch=True,
                            on_step=False,
                            logger=True,
                            prog_bar=False,
                        )
                    _metrics.append(_metric)
                    start_index = end_index
                avg_metric = torch.stack(_metrics, dim=0).sum()
            metrics.append(avg_metric)
            self.log(
                f"{tag}_{metric_str}",
                avg_metric,
                on_epoch=True,
                on_step=False,
                logger=True,
                prog_bar=True,
            )
        return metrics

compute_head(backbone_features)

计算模型的头部.

Parameters:

Name Type Description Default
backbone_features Tensor

主干网络的特征

required

Returns:

Type Description
Dict[str, Any]

模型的输出

Source code in src/pytorch_tabular/models/base_model.py
    def compute_head(self, backbone_features: Tensor) -> Dict[str, Any]:
        """    计算模型的头部.

Parameters:
    backbone_features (Tensor): 主干网络的特征

Returns:
    模型的输出
"""
        y_hat = self.head(backbone_features)
        y_hat = self.apply_output_sigmoid_scaling(y_hat)
        return self.pack_output(y_hat, backbone_features)

data_aware_initialization(datamodule)

在定义模型时执行数据感知初始化.

Source code in src/pytorch_tabular/models/base_model.py
def data_aware_initialization(self, datamodule):
    """在定义模型时执行数据感知初始化."""
    pass

extract_embedding()

提取模型的嵌入.

这在 CategoricalEmbeddingTransformer 中使用

Source code in src/pytorch_tabular/models/base_model.py
    def extract_embedding(self):
        """提取模型的嵌入.

这在 `CategoricalEmbeddingTransformer` 中使用
"""
        if self.hparams.categorical_dim > 0:
            if not isinstance(self.embedding_layer, PreEncoded1dLayer):
                return self.embedding_layer.cat_embedding_layers
            else:
                raise ValueError(
                    "Cannot extract embedding for PreEncoded1dLayer. Please use a different embedding layer."
                )
        else:
            raise ValueError(
                "Model has been trained with no categorical feature and therefore can't be used"
                " as a Categorical Encoder"
            )

feature_importance()

返回一个包含模型特征重要性的数据框.

Source code in src/pytorch_tabular/models/base_model.py
def feature_importance(self) -> DataFrame:
    """返回一个包含模型特征重要性的数据框."""
    if hasattr(self.backbone, "feature_importance_"):
        imp = self.backbone.feature_importance_
        n_feat = len(self.hparams.categorical_cols + self.hparams.continuous_cols)
        if self.hparams.categorical_dim > 0:
            if imp.shape[0] != n_feat:
                # Combining Cat Embedded Dimensions to a single one by averaging
                wt = []
                norm = []
                ft_idx = 0
                for _, embd_dim in self.hparams.embedding_dims:
                    wt.extend([ft_idx] * embd_dim)
                    norm.append(embd_dim)
                    ft_idx += 1
                for _ in self.hparams.continuous_cols:
                    wt.extend([ft_idx])
                    norm.append(1)
                    ft_idx += 1
                imp = np.bincount(wt, weights=imp) / np.array(norm)
            else:
                # For models like FTTransformer, we dont need to do anything
                # It takes categorical and continuous as individual 2-D features
                pass
        importance_df = DataFrame(
            {
                "Features": self.hparams.categorical_cols + self.hparams.continuous_cols,
                "importance": imp,
            }
        )
        return importance_df
    else:
        raise ValueError("Feature Importance unavailable for this model.")

forward(x)

模型的前向传播.

Parameters:

Name Type Description Default
x Dict

模型的输入,包含'continuous'和'categorical'键

required
Source code in src/pytorch_tabular/models/base_model.py
    def forward(self, x: Dict) -> Dict[str, Any]:
        """   模型的前向传播.

Parameters:
    x (Dict): 模型的输入,包含'continuous'和'categorical'键
"""
        x = self.embed_input(x)
        x = self.compute_backbone(x)
        return self.compute_head(x)

pack_output(y_hat, backbone_features)

打包模型的输出.

Parameters:

Name Type Description Default
y_hat Tensor

模型的输出

required
backbone_features tensor

主干网络的特征

required

Returns:

Type Description
Dict[str, Any]

打包后的模型输出

Source code in src/pytorch_tabular/models/base_model.py
    def pack_output(self, y_hat: torch.Tensor, backbone_features: torch.tensor) -> Dict[str, Any]:
        """打包模型的输出.

Parameters:
    y_hat (torch.Tensor): 模型的输出

    backbone_features (torch.tensor): 主干网络的特征

Returns:
    打包后的模型输出
"""
        # if self.head is the Identity function it means that we cannot extract backbone features,
        # because the model cannot be divide in backbone and head (i.e. TabNet)
        if type(self.head) is nn.Identity:
            return {"logits": y_hat}
        return {"logits": y_hat, "backbone_features": backbone_features}

predict(x, ret_model_output=False)

预测模型的输出.

Parameters:

Name Type Description Default
x Dict

模型的输入,包含'continuous'和'categorical'键

required
ret_model_output bool

如果为True,方法返回模型的输出

False

Returns:

Type Description
Union[Tensor, Tuple[Tensor, Dict]]

模型的输出

Source code in src/pytorch_tabular/models/base_model.py
    def predict(self, x: Dict, ret_model_output: bool = False) -> Union[torch.Tensor, Tuple[torch.Tensor, Dict]]:
        """    预测模型的输出.

Parameters:
    x (Dict): 模型的输入,包含'continuous'和'categorical'键

    ret_model_output (bool): 如果为True,方法返回模型的输出

Returns:
    模型的输出
"""
        assert self.hparams.task != "ssl", "It's not allowed to use the method predict in case of ssl task"
        ret_value = self.forward(x)
        if ret_model_output:
            return ret_value.get("logits"), ret_value
        return ret_value.get("logits")