StandardScaler#

class sklearn.preprocessing.StandardScaler(*, copy=True, with_mean=True, with_std=True)#

标准化特征通过去除均值并缩放到单位方差。

样本 x 的标准分数计算如下：

z = (x - u) / s

其中 u 是训练样本的均值，如果 with_mean=False ，则为零， s 是训练样本的标准差，如果 with_std=False ，则为一。

通过在训练集中的样本上计算相关统计量，中心化和缩放独立地发生在每个特征上。然后存储均值和标准差，以便在后续数据上使用 transform 。

数据集的标准化是许多机器学习估计器的常见要求：如果各个特征不更接近或看起来像标准正态分布数据（例如，均值为0，方差为1的高斯分布），它们可能会表现不佳。

例如，学习算法的目标函数中使用的许多元素（例如支持向量机的RBF核或线性模型的L1和L2正则化器）假设所有特征都以0为中心，并且具有相同的方差。如果某个特征的方差比其他特征大几个数量级，它可能会主导目标函数，并使估计器无法按预期正确学习其他特征。

StandardScaler 对异常值敏感，并且在存在异常值的情况下，特征可能会彼此不同地缩放。有关示例可视化，请参阅 Compare StandardScaler with other scalers 。

通过传递 with_mean=False ，此缩放器也可以应用于稀疏CSR或CSC矩阵，以避免破坏数据的稀疏结构。

更多信息请参阅 User Guide 。

Parameters:

copybool, default=True: 如果为False，尝试避免复制并就地进行缩放。这不能保证总是就地工作；例如，如果数据不是NumPy数组或scipy.sparse CSR矩阵，仍可能返回副本。
with_meanbool, default=True: 如果为True，在缩放前中心化数据。这在尝试对稀疏矩阵进行操作时（因为中心化它们需要构建一个密集矩阵，这在常见用例中可能太大而无法装入内存）不会工作（并会引发异常）。
with_stdbool, default=True: 如果为True，将数据缩放到单位方差（或等效地，单位标准差）。

Attributes:

scale_ndarray of shape (n_features,) or None: 每个特征的相对缩放数据，以实现零均值和单位方差。通常使用 np.sqrt(var_) 计算。如果方差为零，我们无法实现单位方差，数据保持不变，给出缩放因子为1。当 with_std=False 时， scale_ 等于 None 。

Added in version 0.17: scale_
mean_ndarray of shape (n_features,) or None: 训练集中每个特征的均值。当 with_mean=False 和 with_std=False 时等于 None 。
var_ndarray of shape (n_features,) or None: 训练集中每个特征的方差。用于计算 scale_ 。当 with_mean=False 和 with_std=False 时等于 None 。
n_features_in_int: 在 fit 期间看到的特征数量。

Added in version 0.24.
feature_names_in_ndarray of shape ( n_features_in_ ,): 在 fit 期间看到的特征名称。仅当 X 的特征名称均为字符串时定义。

Added in version 1.0.
n_samples_seen_int or ndarray of shape (n_features,): 估计器为每个特征处理的样本数量。如果没有缺失样本， n_samples_seen 将是一个整数，否则它将是一个dtype为int的数组。如果使用 sample_weights ，它将是一个浮点数（如果没有缺失数据）或一个dtype为float的数组，该数组总和为迄今为止看到的权重。在新调用 fit 时将重置，但在 partial_fit 调用中将增加。

See also

scale: 没有估计器API的等效函数。
PCA: 进一步去除 ‘whiten=True’ 时的线性特征相关性。

Notes

NaNs 被视为缺失值：在 fit 中忽略，在 transform 中保持。

我们使用有偏估计器来计算标准差，等效于 numpy.std(x, ddof=0) 。请注意， ddof 的选择不太可能影响模型性能。

Examples

>>> from sklearn.preprocessing import StandardScaler
>>> data = [[0, 0], [0, 0], [1, 1], [1, 1]]
>>> scaler = StandardScaler()
>>> print(scaler.fit(data))
StandardScaler()
>>> print(scaler.mean_)
[0.5 0.5]
>>> print(scaler.transform(data))
[[-1. -1.]
 [-1. -1.]
 [ 1.  1.]
 [ 1.  1.]]
>>> print(scaler.transform([[2, 2]]))
[[3. 3.]]

fit(X, y=None, sample_weight=None)#

计算用于后续缩放的均值和标准差。

Parameters:

X{array-like, sparse matrix}，形状为 (n_samples, n_features): 用于计算均值和标准差的数据，这些均值和标准差将用于后续沿特征轴的缩放。
yNone: 忽略。
sample_weightarray-like，形状为 (n_samples,)，默认=None: 每个样本的单独权重。

Added in version 0.24: StandardScaler 支持 sample_weight 参数。

Returns:

selfobject: 拟合的缩放器。

fit_transform(X, y=None, **fit_params)#

拟合数据，然后进行转换。

将转换器拟合到 X 和 y ，并带有可选参数 fit_params ，并返回 X 的转换版本。

Parameters:

X形状为 (n_samples, n_features) 的类数组: 输入样本。
y形状为 (n_samples,) 或 (n_samples, n_outputs) 的类数组, 默认=None: 目标值（无监督转换为 None）。
**fit_paramsdict: 其他拟合参数。

Returns:

X_new形状为 (n_samples, n_features_new) 的 ndarray 数组: 转换后的数组。

get_feature_names_out(input_features=None)#

获取变换后的输出特征名称。

Parameters:

input_features字符串数组或None，默认=None

输入特征。

如果 input_features 是 None ，则使用 feature_names_in_ 作为输入特征名称。如果 feature_names_in_ 未定义，则生成以下输入特征名称： ["x0", "x1", ..., "x(n_features_in_ - 1)"] 。
如果 input_features 是数组类型，则 input_features 必须与 feature_names_in_ 匹配（如果 feature_names_in_ 已定义）。

Returns:

feature_names_out字符串对象的ndarray: 与输入特征相同。

get_metadata_routing()#

获取此对象的元数据路由。

请查看用户指南以了解路由机制的工作原理。

Returns:

routingMetadataRequest: MetadataRequest 封装的路由信息。

get_params(deep=True)#

获取此估计器的参数。

Parameters:

deepbool, 默认=True: 如果为True，将返回此估计器和包含的子对象（也是估计器）的参数。

Returns:

paramsdict: 参数名称映射到它们的值。

inverse_transform(X, copy=None)#

将数据缩放回原始表示。

Parameters:

X{array-like, sparse matrix}，形状为 (n_samples, n_features): 用于沿特征轴缩放的数据。
copybool, 默认为 None: 是否复制输入的 X。

Returns:

X_tr{ndarray, sparse matrix}，形状为 (n_samples, n_features): 转换后的数组。

partial_fit(X, y=None, sample_weight=None)#

在线计算X的均值和标准差，以便后续缩放。

所有X被作为一个批次处理。这适用于由于 n_samples 数量非常大或因为X是从连续流中读取而导致无法使用:meth:fit 的情况。

增量均值和标准差的算法在Chan, Tony F., Gene H. Golub, 和 Randall J. LeVeque的 “Algorithms for computing the sample variance: Analysis and recommendations.” The American Statistician 37.3 (1983): 242-247中的方程1.5a,b给出：

Parameters:

X{array-like, sparse matrix}，形状为 (n_samples, n_features): 用于计算均值和标准差的数据，这些均值和标准差将用于后续沿特征轴的缩放。
yNone: 忽略。
sample_weightarray-like，形状为 (n_samples,)，默认=None: 每个样本的单独权重。

Added in version 0.24: 参数 sample_weight 支持 StandardScaler。

Returns:

selfobject: 拟合的缩放器。

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → StandardScaler#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config ). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True : metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False : metadata is not requested and the meta-estimator will not pass it to fit .
None : metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str : metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default ( sklearn.utils.metadata_routing.UNCHANGED ) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline . Otherwise it has no effect.

Parameters:

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in fit .

Returns:

selfobject: The updated object.

set_inverse_transform_request(*, copy: bool | None | str = '$UNCHANGED$') → StandardScaler#

Request metadata passed to the inverse_transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config ). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True : metadata is requested, and passed to inverse_transform if provided. The request is ignored if metadata is not provided.
False : metadata is not requested and the meta-estimator will not pass it to inverse_transform .
None : metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str : metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default ( sklearn.utils.metadata_routing.UNCHANGED ) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline . Otherwise it has no effect.

Parameters:

copystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for copy parameter in inverse_transform .

Returns:

selfobject: The updated object.

set_output(*, transform=None)#

设置输出容器。

请参阅介绍 set_output API 以了解如何使用API的示例。

Parameters:

transform{“default”, “pandas”, “polars”}, 默认=None

配置 transform 和 fit_transform 的输出。

"default" : 转换器的默认输出格式
"pandas" : DataFrame 输出
"polars" : Polars 输出
None : 转换配置不变

Added in version 1.4: "polars" 选项已添加。

Returns:

self估计器实例: 估计器实例。

set_params(**params)#

设置此估计器的参数。

该方法适用于简单估计器以及嵌套对象（例如 Pipeline ）。后者具有形式为 <component>__<parameter> 的参数，以便可以更新嵌套对象的每个组件。

Parameters:

**paramsdict: 估计器参数。

Returns:

selfestimator instance: 估计器实例。

set_partial_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → StandardScaler#

Request metadata passed to the partial_fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config ). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True : metadata is requested, and passed to partial_fit if provided. The request is ignored if metadata is not provided.
False : metadata is not requested and the meta-estimator will not pass it to partial_fit .
None : metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str : metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default ( sklearn.utils.metadata_routing.UNCHANGED ) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline . Otherwise it has no effect.

Parameters:

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in partial_fit .

Returns:

selfobject: The updated object.

set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') → StandardScaler#

Request metadata passed to the transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config ). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True : metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.
False : metadata is not requested and the meta-estimator will not pass it to transform .
None : metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str : metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default ( sklearn.utils.metadata_routing.UNCHANGED ) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline . Otherwise it has no effect.

Parameters:

copystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for copy parameter in transform .

Returns:

selfobject: The updated object.

transform(X, copy=None)#

执行标准化操作，通过中心化和缩放。

Parameters:

X{array-like, sparse matrix}，形状为 (n_samples, n_features): 用于沿特征轴缩放的数据。
copybool, 默认为 None: 是否复制输入的 X。

Returns:

X_tr{ndarray, sparse matrix}，形状为 (n_samples, n_features): 转换后的数组。

Gallery examples#

scikit-learn 1.5 版本发布亮点

scikit-learn 1.4 版本发布亮点

scikit-learn 1.2 版本发布亮点

scikit-learn 1.1 版本发布亮点

scikit-learn 1.0 版本发布亮点

scikit-learn 0.23 版本发布亮点

scikit-learn 0.22 版本发布亮点

Lasso模型选择：AIC-BIC / 交叉验证

Tweedie回归在保险理赔中的应用

使用多项逻辑回归和L1正则化进行MNIST分类