SplineTransformer#

class sklearn.preprocessing.SplineTransformer(n_knots=5, degree=3, *, knots='uniform', extrapolation='constant', include_bias=True, order='C', sparse_output=False)#

生成单变量B样条基函数。

生成一个新的特征矩阵，包含每个特征的 n_splines=n_knots + degree - 1 （对于 extrapolation="periodic" ，则为 n_knots - 1 ）个样条基函数（B样条），多项式阶数为 degree 。

要了解更多关于SplineTransformer类的信息，请访问：时间相关特征工程

更多信息请参阅用户指南。

Added in version 1.0.

Parameters:

n_knotsint, default=5

如果 knots 等于{‘uniform’, ‘quantile’}之一，则样条的节点数。必须大于或等于2。如果 knots 是类数组对象，则忽略此参数。

degreeint, default=3

样条基函数的多项式阶数。必须是一个非负整数。

knots{‘uniform’, ‘quantile’} or array-like of shape (n_knots, n_features), default=’uniform’

设置节点位置，使得第一个节点 <= 特征 <= 最后一个节点。

如果为’uniform’，则 n_knots 个节点均匀分布在特征的最小值和最大值之间。
如果为’quantile’，则它们均匀分布在特征的分位数上。
如果给定一个类数组对象，它直接指定包含边界节点的排序节点位置。注意，内部会添加 degree 个节点在第一个节点之前，同样在最后一个节点之后。

extrapolation{‘error’, ‘constant’, ‘linear’, ‘continue’, ‘periodic’}, default=’constant’

如果为’error’，训练特征的最小值和最大值之外的值会引发 ValueError 。如果为’constant’，则使用特征最小值和最大值处的样条值作为常数外推。如果为’linear’，则使用线性外推。如果为’continue’，则样条按原样外推，即在:class:scipy.interpolate.BSpline 中使用选项 extrapolate=True 。如果为’periodic’，则使用周期等于第一个和最后一个节点之间距离的周期性样条。周期性样条强制第一个和最后一个节点处的函数值和导数相等。例如，这使得可以从自然周期性的“年中的某一天”输入特征派生的样条特征中避免在12月31日和1月1日之间引入任意跳跃。在这种情况下，建议手动设置节点值以控制周期。

include_biasbool, default=True

如果为False，则丢弃数据范围内特征的最后一个样条元素。由于B样条在每个数据点的样条基函数上求和为一，它们隐含地包含一个偏差项，即一列单位向量。它在线性模型中充当截距项。

order{‘C’, ‘F’}, default=’C’

密集情况下输出数组的顺序。 'F' 顺序计算速度更快，但可能会减慢后续估计器的速度。

sparse_outputbool, default=False

如果设置为True，则返回稀疏CSR矩阵，否则返回数组。此选项仅在 scipy>=1.8 中可用。

Added in version 1.2.

Attributes:

bsplines_list of shape (n_features,): 每个特征的BSplines对象列表。
n_features_in_int: 输入特征的总数。
feature_names_in_ndarray of shape ( n_features_in_ ,): 在:term:fit 期间看到的特征名称。仅当 X 的特征名称均为字符串时定义。

Added in version 1.0.
n_features_out_int: 输出特征的总数，计算为 n_features * n_splines ，其中 n_splines 是B样条的基元素数，非周期性样条为 n_knots + degree - 1 ，周期性样条为 n_knots - 1 。如果 include_bias=False ，则仅为 n_features * (n_splines - 1) 。

See also

KBinsDiscretizer: 将连续数据分箱到区间的转换器。
PolynomialFeatures: 生成多项式和交互特征的转换器。

Notes

高阶数和高数量的节点可能导致过拟合。

参见 examples/linear_model/plot_polynomial_interpolation.py 。

Examples

>>> import numpy as np
>>> from sklearn.preprocessing import SplineTransformer
>>> X = np.arange(6).reshape(6, 1)
>>> spline = SplineTransformer(degree=2, n_knots=3)
>>> spline.fit_transform(X)
array([[0.5 , 0.5 , 0.  , 0.  ],
       [0.18, 0.74, 0.08, 0.  ],
       [0.02, 0.66, 0.32, 0.  ],
       [0.  , 0.32, 0.66, 0.02],
       [0.  , 0.08, 0.74, 0.18],
       [0.  , 0.  , 0.5 , 0.5 ]])

fit(X, y=None, sample_weight=None)#

计算样条的节点位置。

Parameters:

X形状为 (n_samples, n_features) 的类数组: 数据。
yNone: 忽略。
sample_weight形状为 (n_samples,) 的类数组，默认 = None: 每个样本的单独权重。用于在 knots="quantile" 时计算分位数。对于 knots="uniform" ，在找到 X 的最小值和最大值时，零权重的观测值将被忽略。

Returns:

selfobject: 拟合的转换器。

fit_transform(X, y=None, **fit_params)#

拟合数据，然后进行转换。

将转换器拟合到 X 和 y ，并带有可选参数 fit_params ，并返回 X 的转换版本。

Parameters:

X形状为 (n_samples, n_features) 的类数组: 输入样本。
y形状为 (n_samples,) 或 (n_samples, n_outputs) 的类数组, 默认=None: 目标值（无监督转换为 None）。
**fit_paramsdict: 其他拟合参数。

Returns:

X_new形状为 (n_samples, n_features_new) 的 ndarray 数组: 转换后的数组。

get_feature_names_out(input_features=None)#

获取变换后的输出特征名称。

Parameters:

input_features字符串数组或None，默认=None

输入特征。

如果 input_features 是 None ，则使用 feature_names_in_ 作为输入特征名称。如果 feature_names_in_ 未定义，则生成以下输入特征名称： ["x0", "x1", ..., "x(n_features_in_ - 1)"] 。
如果 input_features 是数组类型，则 input_features 必须与 feature_names_in_ 匹配，如果 feature_names_in_ 已定义。

Returns:

feature_names_out字符串对象的ndarray: 变换后的特征名称。

get_metadata_routing()#

获取此对象的元数据路由。

请查看用户指南以了解路由机制的工作原理。

Returns:

routingMetadataRequest: MetadataRequest 封装的路由信息。

get_params(deep=True)#

获取此估计器的参数。

Parameters:

deepbool, 默认=True: 如果为True，将返回此估计器和包含的子对象（也是估计器）的参数。

Returns:

paramsdict: 参数名称映射到它们的值。

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → SplineTransformer#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config ). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True : metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False : metadata is not requested and the meta-estimator will not pass it to fit .
None : metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str : metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default ( sklearn.utils.metadata_routing.UNCHANGED ) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline . Otherwise it has no effect.

Parameters:

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in fit .

Returns:

selfobject: The updated object.

set_output(*, transform=None)#

设置输出容器。

请参阅介绍 set_output API 以了解如何使用API的示例。

Parameters:

transform{“default”, “pandas”, “polars”}, 默认=None

配置 transform 和 fit_transform 的输出。

"default" : 转换器的默认输出格式
"pandas" : DataFrame 输出
"polars" : Polars 输出
None : 转换配置不变

Added in version 1.4: "polars" 选项已添加。

Returns:

self估计器实例: 估计器实例。

set_params(**params)#

设置此估计器的参数。

该方法适用于简单估计器以及嵌套对象（例如 Pipeline ）。后者具有形式为 <component>__<parameter> 的参数，以便可以更新嵌套对象的每个组件。

Parameters:

**paramsdict: 估计器参数。

Returns:

selfestimator instance: 估计器实例。

transform(X)#

将每个特征数据转换为B样条。

Parameters:

Xarray-like of shape (n_samples, n_features): 要转换的数据。

Returns:

XBS{ndarray, sparse matrix} of shape (n_samples, n_features * n_splines): 特征矩阵，其中n_splines是B样条的基元素数量，即n_knots + degree - 1。

Gallery examples#

scikit-learn 1.0 版本发布亮点

时间相关特征工程

多项式和样条插值

异常检测估计器的评估