头部模型

配置类¶

线性头配置的模型类;作为模板和文档使用.模型接受字典作为输入,但如果存在本模型类中不存在的键,则会抛出异常.

Parameters:

Name	Type	Description	Default
`layers`	`str`	分类/回归头中层数和单元数的连字符分隔字符串. 例如:32-64-32.默认情况下,仅从输入维度映射到输出维度.	`''`
`activation`	`str`	分类头中的激活类型.默认激活类型类似于PyTorch中的ReLU、TanH、LeakyReLU等. 参考:https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity	`'ReLU'`
`dropout`	`float`	分类元素被置零的概率.	`0.0`
`use_batch_norm`	`bool`	标志,用于在每个线性层+DropOut后添加BatchNorm层.	`False`
`initialization`	`str`	线性层的初始化方案.默认为`kaiming`.可选方案有:[`kaiming`,`xavier`,`random`].	`'kaiming'`

Source code in src/pytorch_tabular/models/common/heads/config.py

@dataclass
class LinearHeadConfig:
    """线性头配置的模型类;作为模板和文档使用.模型接受字典作为输入,但如果存在本模型类中不存在的键,则会抛出异常.

    Args:
        layers (str): 分类/回归头中层数和单元数的连字符分隔字符串.
                例如:32-64-32.默认情况下,仅从输入维度映射到输出维度.

        activation (str): 分类头中的激活类型.默认激活类型类似于PyTorch中的ReLU、TanH、LeakyReLU等.
                参考:https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity

        dropout (float): 分类元素被置零的概率.

        use_batch_norm (bool): 标志,用于在每个线性层+DropOut后添加BatchNorm层.

        initialization (str): 线性层的初始化方案.默认为`kaiming`.可选方案有:[`kaiming`,`xavier`,`random`]."""

    layers: str = field(
        default="",
        metadata={
            "help": "Hyphen-separated number of layers and units in the classification/regression head. eg. 32-64-32."
            " Default is just a mapping from intput dimension to output dimension"
        },
    )
    activation: str = field(
        default="ReLU",
        metadata={
            "help": "The activation type in the classification head. The default activation in PyTorch"
            " like ReLU, TanH, LeakyReLU, etc."
            " https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity"
        },
    )
    dropout: float = field(
        default=0.0,
        metadata={"help": "probability of an classification element to be zeroed."},
    )
    use_batch_norm: bool = field(
        default=False,
        metadata={"help": "Flag to include a BatchNorm layer after each Linear Layer+DropOut"},
    )
    initialization: str = field(
        default="kaiming",
        metadata={
            "help": "Initialization scheme for the linear layers. Defaults to `kaiming`",
            "choices": ["kaiming", "xavier", "random"],
        },
    )

混合密度网络头配置.

Parameters:

Name	Type	Description	Default
`num_gaussian`	`int`	混合模型中高斯分布的数量.默认为1	`1`
`sigma_bias_flag`	`bool`	是否在sigma层中包含偏置项.默认为False	`False`
`mu_bias_init`	`Optional[List]`	将mu层的偏置参数初始化为预定义的聚类中心.应为一个与混合模型中高斯数量相同长度的列表.强烈建议设置此参数以对抗模式崩溃.默认为None	`None`
`weight_regularization`	`Optional[int]`	是否对MDN层应用L1或L2范数.默认为L2.可选值为: [`1`,`2`]	`2`
`lambda_sigma`	`Optional[float]`	sigma层权重正则化的正则化常数.默认为0.1	`0.1`
`lambda_pi`	`Optional[float]`	pi层权重正则化的正则化常数.默认为0.1	`0.1`
`lambda_mu`	`Optional[float]`	mu层权重正则化的正则化常数.默认为0	`0`
`softmax_temperature`	`Optional[float]`	用于混合系数gumbel softmax的温度.小于1的值会导致多个成分之间的过渡更尖锐.默认为1	`1`
`n_samples`	`int`	从后验分布中抽取样本以获得预测的数量.默认为100	`100`
`central_tendency`	`str`	用于获取点预测的度量方法.默认为均值.可选值为: [`mean`,`median`]	`'mean'`
`speedup_training`	`bool`	开启此参数将取消训练期间的采样,从而加快训练速度,但也会使您无法查看训练指标.默认为False	`False`
`log_debug_plot`	`bool`	开启此参数将绘制mu、sigma和pi层的直方图,以及logits（如果在实验配置中开启了log_logits）.默认为False	`False`
`input_dim`	`int`	输入到头部的维度.这将在从`backbone.output_dim`初始化时自动填充	`None`

Source code in src/pytorch_tabular/models/common/heads/config.py

@dataclass
class MixtureDensityHeadConfig:
    """混合密度网络头配置.

    Parameters:
        num_gaussian (int): 混合模型中高斯分布的数量.默认为1

        sigma_bias_flag (bool): 是否在sigma层中包含偏置项.默认为False

        mu_bias_init (Optional[List]): 将mu层的偏置参数初始化为预定义的聚类中心.应为一个与混合模型中高斯数量相同长度的列表.强烈建议设置此参数以对抗模式崩溃.默认为None

        weight_regularization (Optional[int]): 是否对MDN层应用L1或L2范数.默认为L2.可选值为: [`1`,`2`]

        lambda_sigma (Optional[float]): sigma层权重正则化的正则化常数.默认为0.1

        lambda_pi (Optional[float]): pi层权重正则化的正则化常数.默认为0.1

        lambda_mu (Optional[float]): mu层权重正则化的正则化常数.默认为0

        softmax_temperature (Optional[float]): 用于混合系数gumbel softmax的温度.小于1的值会导致多个成分之间的过渡更尖锐.默认为1

        n_samples (int): 从后验分布中抽取样本以获得预测的数量.默认为100

        central_tendency (str): 用于获取点预测的度量方法.默认为均值.可选值为: [`mean`,`median`]

        speedup_training (bool): 开启此参数将取消训练期间的采样,从而加快训练速度,但也会使您无法查看训练指标.默认为False

        log_debug_plot (bool): 开启此参数将绘制mu、sigma和pi层的直方图,以及logits（如果在实验配置中开启了log_logits）.默认为False

        input_dim (int): 输入到头部的维度.这将在从`backbone.output_dim`初始化时自动填充"""

    num_gaussian: int = field(
        default=1,
        metadata={
            "help": "Number of Gaussian Distributions in the mixture model. Defaults to 1",
        },
    )
    sigma_bias_flag: bool = field(
        default=False,
        metadata={
            "help": "Whether to have a bias term in the sigma layer. Defaults to False",
        },
    )
    mu_bias_init: Optional[List] = field(
        default=None,
        metadata={
            "help": "To initialize the bias parameter of the mu layer to predefined cluster centers."
            " Should be a list with the same length as number of gaussians in the mixture model."
            " It is highly recommended to set the parameter to combat mode collapse. Defaults to None",
        },
    )

    weight_regularization: Optional[int] = field(
        default=2,
        metadata={
            "help": "Whether to apply L1 or L2 Norm to the MDN layers. Defaults to L2",
            "choices": [1, 2],
        },
    )

    lambda_sigma: Optional[float] = field(
        default=0.1,
        metadata={
            "help": "The regularization constant for weight regularization of sigma layer. Defaults to 0.1",
        },
    )
    lambda_pi: Optional[float] = field(
        default=0.1,
        metadata={
            "help": "The regularization constant for weight regularization of pi layer. Defaults to 0.1",
        },
    )
    lambda_mu: Optional[float] = field(
        default=0,
        metadata={
            "help": "The regularization constant for weight regularization of mu layer. Defaults to 0",
        },
    )
    softmax_temperature: Optional[float] = field(
        default=1,
        metadata={
            "help": "The temperature to be used in the gumbel softmax of the mixing coefficients."
            " Values less than one leads to sharper transition between the multiple components. Defaults to 1",
        },
    )
    n_samples: int = field(
        default=100,
        metadata={
            "help": "Number of samples to draw from the posterior to get prediction. Defaults to 100",
        },
    )
    central_tendency: str = field(
        default="mean",
        metadata={
            "help": "Which measure to use to get the point prediction. Defaults to mean",
            "choices": ["mean", "median"],
        },
    )
    speedup_training: bool = field(
        default=False,
        metadata={
            "help": "Turning on this parameter does away with sampling during training which speeds up training,"
            " but also doesn't give you visibility on train metrics. Defaults to False",
        },
    )
    log_debug_plot: bool = field(
        default=False,
        metadata={
            "help": "Turning on this parameter plots histograms of the mu, sigma, and pi layers in addition"
            " to the logits(if log_logits is turned on in experment config). Defaults to False",
        },
    )
    input_dim: int = field(
        default=None,
        metadata={
            "help": "The input dimensions to the head. This will be automatically filled in while initializing"
            " from the `backbone.output_dim`",
        },
    )
    _probabilistic: bool = field(default=True)

头部类¶

Bases: Head

Source code in src/pytorch_tabular/models/common/heads/blocks.py

class LinearHead(Head):
    _config_template = head_config.LinearHeadConfig

    def __init__(self, in_units: int, output_dim: int, config, **kwargs):
        # Linear Layers
        _layers = []
        _curr_units = in_units
        for units in config.layers.split("-"):
            try:
                int(units)
            except ValueError:
                if units == "":
                    continue
                else:
                    raise ValueError(f"Invalid units {units} in layers {config.layers}")
            _layers.extend(
                _linear_dropout_bn(
                    config.activation,
                    config.initialization,
                    config.use_batch_norm,
                    _curr_units,
                    int(units),
                    config.dropout,
                )
            )
            _curr_units = int(units)
        # Appending Final Output
        _layers.append(nn.Linear(_curr_units, output_dim))
        linear_layers = nn.Sequential(*_layers)
        _initialize_layers(config.activation, config.initialization, linear_layers)
        super().__init__(
            layers=linear_layers,
            config_template=head_config.LinearHeadConfig,
        )

Bases: Module

Source code in src/pytorch_tabular/models/common/heads/blocks.py

class MixtureDensityHead(nn.Module):
    _config_template = head_config.MixtureDensityHeadConfig

    def __init__(self, config: DictConfig, **kwargs):
        self.hparams = config
        super().__init__()
        self._build_network()

    def _build_network(self):
        self.pi = nn.Linear(self.hparams.input_dim, self.hparams.num_gaussian)
        nn.init.normal_(self.pi.weight)
        self.sigma = nn.Linear(
            self.hparams.input_dim,
            self.hparams.num_gaussian,
            bias=self.hparams.sigma_bias_flag,
        )
        self.mu = nn.Linear(self.hparams.input_dim, self.hparams.num_gaussian)
        nn.init.normal_(self.mu.weight)
        if self.hparams.mu_bias_init is not None:
            for i, bias in enumerate(self.hparams.mu_bias_init):
                nn.init.constant_(self.mu.bias[i], bias)

    def forward(self, x):
        pi = self.pi(x)
        sigma = self.sigma(x)
        # Applying modified ELU activation
        sigma = nn.ELU()(sigma) + 1 + 1e-15
        mu = self.mu(x)
        return pi, sigma, mu

    def gaussian_probability(self, sigma, mu, target, log=False):
        """返回在给定高斯混合模型参数 `sigma` 和 `mu` 的条件下,`target` 的概率.

Parameters:
    sigma (BxGxO): 高斯分布的标准差.B 是批量大小,G 是高斯分布的数量,O 是每个高斯分布的维度数.
    mu (BxGxO): 高斯分布的均值.B 是批量大小,G 是高斯分布的数量,O 是每个高斯分布的维度数.
    target (BxI): 目标的批量.B 是批量大小,I 是输入维度数.
Returns:
    probabilities (BxG): 分布在相应 sigma/mu 索引中每个点的概率.
"""
        target = target.expand_as(sigma)
        if log:
            ret = -torch.log(sigma) - 0.5 * LOG2PI - 0.5 * torch.pow((target - mu) / sigma, 2)
        else:
            ret = (ONEOVERSQRT2PI / sigma) * torch.exp(-0.5 * ((target - mu) / sigma) ** 2)
        return ret  # torch.prod(ret, 2)

    def log_prob(self, pi, sigma, mu, y):
        log_component_prob = self.gaussian_probability(sigma, mu, y, log=True)
        log_mix_prob = torch.log(nn.functional.gumbel_softmax(pi, tau=self.hparams.softmax_temperature, dim=-1) + 1e-15)
        return torch.logsumexp(log_component_prob + log_mix_prob, dim=-1)

    def sample(self, pi, sigma, mu):
        """从高斯混合模型 (MoG) 中抽取样本."""
        categorical = Categorical(pi)
        pis = categorical.sample().unsqueeze(1)
        sample = Variable(sigma.data.new(sigma.size(0), 1).normal_())
        # Gathering from the n Gaussian Distribution based on sampled indices
        sample = sample * sigma.gather(1, pis) + mu.gather(1, pis)
        return sample

    def generate_samples(self, pi, sigma, mu, n_samples=None):
        if n_samples is None:
            n_samples = self.hparams.n_samples
        samples = []
        softmax_pi = nn.functional.gumbel_softmax(pi, tau=self.hparams.softmax_temperature, dim=-1)
        assert (softmax_pi < 0).sum().item() == 0, "pi parameter should not have negative"
        for _ in range(n_samples):
            samples.append(self.sample(softmax_pi, sigma, mu))
        samples = torch.cat(samples, dim=1)
        return samples

    def generate_point_predictions(self, pi, sigma, mu, n_samples=None):
        # Sample using n_samples and take average
        samples = self.generate_samples(pi, sigma, mu, n_samples)
        if self.hparams.central_tendency == "mean":
            y_hat = torch.mean(samples, dim=-1)
        elif self.hparams.central_tendency == "median":
            y_hat = torch.median(samples, dim=-1).values
        return y_hat.unsqueeze(1)

`gaussian_probability(sigma, mu, target, log=False)` ¶

返回在给定高斯混合模型参数 sigma 和 mu 的条件下,target 的概率.

Parameters:

Name	Type	Description	Default
`sigma`	`BxGxO`	高斯分布的标准差.B 是批量大小,G 是高斯分布的数量,O 是每个高斯分布的维度数.	required
`mu`	`BxGxO`	高斯分布的均值.B 是批量大小,G 是高斯分布的数量,O 是每个高斯分布的维度数.	required
`target`	`BxI`	目标的批量.B 是批量大小,I 是输入维度数.	required

Returns: probabilities (BxG): 分布在相应 sigma/mu 索引中每个点的概率.

Source code in src/pytorch_tabular/models/common/heads/blocks.py

    def gaussian_probability(self, sigma, mu, target, log=False):
        """返回在给定高斯混合模型参数 `sigma` 和 `mu` 的条件下,`target` 的概率.

Parameters:
    sigma (BxGxO): 高斯分布的标准差.B 是批量大小,G 是高斯分布的数量,O 是每个高斯分布的维度数.
    mu (BxGxO): 高斯分布的均值.B 是批量大小,G 是高斯分布的数量,O 是每个高斯分布的维度数.
    target (BxI): 目标的批量.B 是批量大小,I 是输入维度数.
Returns:
    probabilities (BxG): 分布在相应 sigma/mu 索引中每个点的概率.
"""
        target = target.expand_as(sigma)
        if log:
            ret = -torch.log(sigma) - 0.5 * LOG2PI - 0.5 * torch.pow((target - mu) / sigma, 2)
        else:
            ret = (ONEOVERSQRT2PI / sigma) * torch.exp(-0.5 * ((target - mu) / sigma) ** 2)
        return ret  # torch.prod(ret, 2)

`sample(pi, sigma, mu)` ¶

从高斯混合模型 (MoG) 中抽取样本.

Source code in src/pytorch_tabular/models/common/heads/blocks.py

def sample(self, pi, sigma, mu):
    """从高斯混合模型 (MoG) 中抽取样本."""
    categorical = Categorical(pi)
    pis = categorical.sample().unsqueeze(1)
    sample = Variable(sigma.data.new(sigma.size(0), 1).normal_())
    # Gathering from the n Gaussian Distribution based on sampled indices
    sample = sample * sigma.gather(1, pis) + mu.gather(1, pis)
    return sample

头部模型

配置类¶

头部类¶

gaussian_probability(sigma, mu, target, log=False) ¶

sample(pi, sigma, mu) ¶

`gaussian_probability(sigma, mu, target, log=False)` ¶

`sample(pi, sigma, mu)` ¶