使用预计算的Gram矩阵和加权样本拟合弹性网络#

以下示例展示了如何在使用加权样本的同时预计算Gram矩阵,并使用:class:~sklearn.linear_model.ElasticNet

如果使用加权样本,设计矩阵必须在计算Gram矩阵之前进行中心化,然后按权重向量的平方根重新缩放。

Note

sample_weight 向量也会重新缩放以使其总和为 n_samples ,请参阅 fitsample_weight 参数的文档。

让我们先加载数据集并创建一些样本权重。

import numpy as np

from sklearn.datasets import make_regression

rng = np.random.RandomState(0)

n_samples = int(1e5)
X, y = make_regression(n_samples=n_samples, noise=0.5, random_state=rng)

sample_weight = rng.lognormal(size=n_samples)
# 规范化样本权重
normalized_weights = sample_weight * (n_samples / (sample_weight.sum()))

要使用 precompute 选项和样本权重来拟合弹性网络,我们必须首先对设计矩阵进行中心化,并在计算 Gram 矩阵之前通过归一化权重对其进行重新缩放。

X_offset = np.average(X, axis=0, weights=normalized_weights)
X_centered = X - np.average(X, axis=0, weights=normalized_weights)
X_scaled = X_centered * np.sqrt(normalized_weights)[:, np.newaxis]
gram = np.dot(X_scaled.T, X_scaled)

我们现在可以继续进行拟合了。我们必须将中心化的设计矩阵传递给 fit ,否则弹性网络估计器会检测到它未中心化并丢弃我们传递的 Gram 矩阵。然而,如果我们传递的是缩放后的设计矩阵,预处理代码将错误地再次对其进行缩放。

from sklearn.linear_model import ElasticNet

lm = ElasticNet(alpha=0.01, precompute=gram)
lm.fit(X_centered, y, sample_weight=normalized_weights)
ElasticNet(alpha=0.01,
           precompute=array([[ 9.98809919e+04, -4.48938813e+02, -1.03237920e+03, ...,
        -2.25349312e+02, -3.53959628e+02, -1.67451144e+02],
       [-4.48938813e+02,  1.00768662e+05,  1.19112072e+02, ...,
        -1.07963978e+03,  7.47987268e+01, -5.76195467e+02],
       [-1.03237920e+03,  1.19112072e+02,  1.00393284e+05, ...,
        -3.07582983e+02,  6.66670169e+02,  2.65799352e+02],
       ...,
       [-2.25349312e+02, -1.07963978e+03, -3.07582983e+02, ...,
         9.99891212e+04, -4.58195950e+02, -1.58667835e+02],
       [-3.53959628e+02,  7.47987268e+01,  6.66670169e+02, ...,
        -4.58195950e+02,  9.98350372e+04,  5.60836363e+02],
       [-1.67451144e+02, -5.76195467e+02,  2.65799352e+02, ...,
        -1.58667835e+02,  5.60836363e+02,  1.00911944e+05]]))
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


Total running time of the script: (0 minutes 0.486 seconds)

Related examples

元数据路由

元数据路由

分类器的概率校准

分类器的概率校准

随机梯度下降:加权样本

随机梯度下降:加权样本

SVM:加权样本

SVM:加权样本

Gallery generated by Sphinx-Gallery