Mlxtend.preprocessing

mlxtend version: 0.23.1

CopyTransformer

CopyTransformer()

返回输入数组的副本的转换器

有关使用示例,请参见
https://rasbt.github.io/mlxtend/user_guide/preprocessing/CopyTransformer/

Methods

fit(X, y=None)

Mock方法.不执行任何操作.

Parameters

X : {array-like, sparse matrix}, shape = [n_samples, n_features]

训练向量,其中n_samples是样本数量,n_features是特征数量.
y : array-like, shape = [n_samples] (default: None)

Returns

self

fit_transform(X, y=None)

Return a copy of the input array.

Parameters

X : {array-like, sparse matrix}, shape = [n_samples, n_features]

训练向量,其中n_samples是样本数量,n_features是特征数量.
y : array-like, shape = [n_samples] (default: None)

Returns

X_copy : 输入X数组的副本.

get_metadata_routing()

Get metadata routing of this object.

Please check :ref:`User Guide <metadata_routing>` on how the routing
mechanism works.

Returns

routing : MetadataRequest

A :class:~sklearn.utils.metadata_routing.MetadataRequest encapsulating routing information.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : dict

Parameter names mapped to their values.

set_params(params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects
(such as :class:`~sklearn.pipeline.Pipeline`). The latter have
parameters of the form ``<component>__<parameter>`` so that it's
possible to update each component of a nested object.

Parameters

**params : dict

Estimator parameters.

Returns

self : estimator instance

Estimator instance.

transform(X, y=None)

Return a copy of the input array.

Parameters

X : {array-like, sparse matrix}, shape = [n_samples, n_features]

训练向量,其中n_samples是样本数量,n_features是特征数量.
y : array-like, shape = [n_samples] (default: None)

Returns

X_copy : 输入X数组的副本.

DenseTransformer

DenseTransformer(return_copy=True)

将稀疏数组转换为密集数组.

有关使用示例,请参见
https://rasbt.github.io/mlxtend/user_guide/preprocessing/DenseTransformer/

Methods

fit(X, y=None)

Mock方法.什么也不做.

Parameters

X : {array-like, sparse matrix}, shape = [n_samples, n_features]

训练向量,其中n_samples是样本数量,n_features是特征数量.
y : array-like, shape = [n_samples] (default: None)

Returns

self

fit_transform(X, y=None)

返回输入数组的密集版本.

Parameters

X : {类数组, 稀疏矩阵}, shape = [n_samples, n_features]

训练向量,其中 n_samples 是样本数量,n_features 是特征数量.
y : 类数组, shape = [n_samples] (默认: None)

Returns

X_dense : 输入 X 数组的密集版本.

get_metadata_routing()

Get metadata routing of this object.

Please check :ref:`User Guide <metadata_routing>` on how the routing
mechanism works.

Returns

routing : MetadataRequest

A :class:~sklearn.utils.metadata_routing.MetadataRequest encapsulating routing information.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : dict

Parameter names mapped to their values.

set_params(params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects
(such as :class:`~sklearn.pipeline.Pipeline`). The latter have
parameters of the form ``<component>__<parameter>`` so that it's
possible to update each component of a nested object.

Parameters

**params : dict

Estimator parameters.

Returns

self : estimator instance

Estimator instance.

transform(X, y=None)

返回输入数组的密集版本.

Parameters

X : {类数组, 稀疏矩阵}, shape = [n_samples, n_features]

训练向量,其中 n_samples 是样本数量,n_features 是特征数量.
y : 类数组, shape = [n_samples] (默认: None)

Returns

X_dense : 输入 X 数组的密集版本.

MeanCenterer

MeanCenterer()

向量和矩阵的列中心化.

Attributes

col_means : numpy.ndarray [n_columns]

存储拟合MeanCenterer对象后用于中心化的均值的NumPy数组.

Examples

有关使用示例,请参见 https://rasbt.github.io/mlxtend/user_guide/preprocessing/MeanCenterer/

Methods

fit(X)

获取用于均值中心化的列均值.

Parameters

X : {array-like, sparse matrix}, shape = [n_samples, n_features]

数据向量数组,其中n_samples是样本数量,n_features是特征数量.

Returns

self

fit_transform(X)

拟合并转换一个数组.

Parameters

X : {array-like, sparse matrix}, shape = [n_samples, n_features]

数据向量数组,其中n_samples是样本数量,n_features是特征数量.

Returns

X_tr : {array-like, sparse matrix}, shape = [n_samples, n_features]

输入数组的副本,列已中心化.

transform(X)

中心化一个NumPy数组.

Parameters

X : {array-like, sparse matrix}, shape = [n_samples, n_features]

数据向量数组,其中n_samples是样本数量,n_features是特征数量.

Returns

X_tr : {array-like, sparse matrix}, shape = [n_samples, n_features]

输入数组的副本,列已中心化.

TransactionEncoder

TransactionEncoder()

Python列表中交易数据的编码器类

Parameters

无

Attributes

columns_: 列表在输入列表 X 中的唯一名称列表

Examples

有关使用示例,请参见 https://rasbt.github.io/mlxtend/user_guide/preprocessing/TransactionEncoder/

Methods

fit(X)

从交易DataFrame中学习唯一的列名

Parameters

X : 列表的列表

一个Python列表的列表,其中外层列表存储了n个交易,内层列表存储了每个交易中的商品.

例如, [['Apple', 'Beer', 'Rice', 'Chicken'], ['Apple', 'Beer', 'Rice'], ['Apple', 'Beer'], ['Apple', 'Bananas'], ['Milk', 'Beer', 'Rice', 'Chicken'], ['Milk', 'Beer', 'Rice'], ['Milk', 'Beer'], ['Apple', 'Bananas']]

fit_transform(X, sparse=False)

拟合一个TransactionEncoder编码器并转换数据集.

get_metadata_routing()

Get metadata routing of this object.

Please check :ref:`User Guide <metadata_routing>` on how the routing
mechanism works.

Returns

routing : MetadataRequest

A :class:~sklearn.utils.metadata_routing.MetadataRequest encapsulating routing information.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : dict

Parameter names mapped to their values.

inverse_transform(array)

将编码后的NumPy数组转换回交易记录.

Parameters

array : NumPy数组 [n_transactions, n_unique_items]

输入交易记录的NumPy独热编码布尔数组, 其中列表示按字母顺序排列的输入数组中找到的唯一项

例如, array([[True , False, True , True , False, True ], [True , False, True , False, False, True ], [True , False, True , False, False, False], [True , True , False, False, False, False], [False, False, True , True , True , True ], [False, False, True , False, True , True ], [False, False, True , False, True , False], [True , True , False, False, False, False]]) 对应的列标签可通过self.columns_获得, 例如,['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice']

Returns

X : 列表的列表

一个Python列表的列表,其中外部列表存储 n个交易记录,内部列表存储每个交易记录中的项目.

例如, [['Apple', 'Beer', 'Rice', 'Chicken'], ['Apple', 'Beer', 'Rice'], ['Apple', 'Beer'], ['Apple', 'Bananas'], ['Milk', 'Beer', 'Rice', 'Chicken'], ['Milk', 'Beer', 'Rice'], ['Milk', 'Beer'], ['Apple', 'Bananas']]

set_inverse_transform_request(self: mlxtend.preprocessing.transactionencoder.TransactionEncoder, , array: Union[bool, NoneType, str] = '$UNCHANGED$') -> mlxtend.preprocessing.transactionencoder.TransactionEncoder*

Request metadata passed to the inverse_transform method.

Note that this method is only relevant if
``enable_metadata_routing=True`` (see :func:`sklearn.set_config`).
Please see :ref:`User Guide <metadata_routing>` on how the routing
mechanism works.

The options for each parameter are:

- ``True``: metadata is requested, and passed to ``inverse_transform`` if provided. The request is ignored if metadata is not provided.

- ``False``: metadata is not requested and the meta-estimator will not pass it to ``inverse_transform``.

- ``None``: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

- ``str``: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (``sklearn.utils.metadata_routing.UNCHANGED``) retains the
existing request. This allows you to change the request for some
parameters and not others.

.. versionadded:: 1.3

.. note::
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
:class:`~sklearn.pipeline.Pipeline`. Otherwise it has no effect.

Parameters

array : str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for array parameter in inverse_transform.

Returns

self : object

The updated object.

set_output(, transform=None)*

Set output container.

See :ref:`sphx_glr_auto_examples_miscellaneous_plot_set_output.py`
for an example on how to use the API.

Parameters

transform : {"default", "pandas", "polars"}, default=None

Configure output of transform and fit_transform.
- "default": Default output format of a transformer
- "pandas": DataFrame output
- "polars": Polars output
- None: Transform configuration is unchanged
.. versionadded:: 1.4 "polars" option was added.

Returns

self : estimator instance

Estimator instance.

set_params(params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects
(such as :class:`~sklearn.pipeline.Pipeline`). The latter have
parameters of the form ``<component>__<parameter>`` so that it's
possible to update each component of a nested object.

Parameters

**params : dict

Estimator parameters.

Returns

self : estimator instance

Estimator instance.

set_transform_request(self: mlxtend.preprocessing.transactionencoder.TransactionEncoder, , sparse: Union[bool, NoneType, str] = '$UNCHANGED$') -> mlxtend.preprocessing.transactionencoder.TransactionEncoder*

Request metadata passed to the transform method.

Note that this method is only relevant if
``enable_metadata_routing=True`` (see :func:`sklearn.set_config`).
Please see :ref:`User Guide <metadata_routing>` on how the routing
mechanism works.

The options for each parameter are:

- ``True``: metadata is requested, and passed to ``transform`` if provided. The request is ignored if metadata is not provided.

- ``False``: metadata is not requested and the meta-estimator will not pass it to ``transform``.

- ``None``: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

- ``str``: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (``sklearn.utils.metadata_routing.UNCHANGED``) retains the
existing request. This allows you to change the request for some
parameters and not others.

.. versionadded:: 1.3

.. note::
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
:class:`~sklearn.pipeline.Pipeline`. Otherwise it has no effect.

Parameters

sparse : str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sparse parameter in transform.

Returns

self : object

The updated object.

transform(X, sparse=False)

将交易转换为一维独热编码的NumPy数组.

Parameters

X : 列表的列表

一个Python列表的列表,其中外部列表存储n个交易,内部列表存储每个交易中的项目.

例如, [['Apple', 'Beer', 'Rice', 'Chicken'], ['Apple', 'Beer', 'Rice'], ['Apple', 'Beer'], ['Apple', 'Bananas'], ['Milk', 'Beer', 'Rice', 'Chicken'], ['Milk', 'Beer', 'Rice'], ['Milk', 'Beer'], ['Apple', 'Bananas']]

sparse: 布尔值 (默认=False) 如果为True,transform将返回压缩稀疏行矩阵,而不是常规矩阵.

Returns

array : NumPy数组 [n_transactions, n_unique_items]

如果sparse=False（默认）. 否则为压缩稀疏行矩阵输入交易的独热编码布尔数组,其中列表示按字母顺序排列的输入数组中找到的唯一项目.确切的表示形式取决于sparse参数.

例如, array([[True , False, True , True , False, True ], [True , False, True , False, False, True ], [True , False, True , False, False, False], [True , True , False, False, False, False], [False, False, True , True , True , True ], [False, False, True , False, True , True ], [False, False, True , False, True , False], [True , True , False, False, False, False]]) 对应的列标签可通过self.columns_获得,例如, ['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice']

minmax_scaling

minmax_scaling(array, columns, min_val=0, max_val=1)

pandas DataFrame 的最小-最大缩放.

Parameters

array : pandas DataFrame 或 NumPy ndarray,形状 = [n_rows, n_columns].
columns : 类数组,形状 = [n_columns]

包含列名的类数组,例如 ['col1', 'col2', ...] 或列索引 [0, 2, 4, ...]
min_val : int 或 float,可选 (默认=0)

缩放后的最小值.
max_val : int 或 float,可选 (默认=1)

缩放后的最大值.

Returns

df_new : pandas DataFrame 对象.

具有缩放列的数组或 DataFrame 的副本.

Examples

有关使用示例,请参见 https://rasbt.github.io/mlxtend/user_guide/preprocessing/minmax_scaling/

one_hot

one_hot(y, num_labels='auto', dtype='float')

类别标签的独热编码

Parameters

y : 类数组, shape = [n_classlabels]

由类别标签组成的Python列表或numpy数组.
num_labels : int 或 'auto'

类别标签数组中唯一标签的数量.如果设置为'auto',则从输入数组推断唯一标签的数量.
dtype : str

输出数组的NumPy数组类型（float, float32, float64）.

Returns

ary : numpy.ndarray, shape = [n_classlabels]

独热编码后的数组,其中每个样本在返回的数组中表示为行向量.

Examples

有关使用示例,请参见 https://rasbt.github.io/mlxtend/user_guide/preprocessing/one_hot/

shuffle_arrays_unison

shuffle_arrays_unison(arrays, random_seed=None)

同步打乱 NumPy 数组.

Parameters

arrays : array-like, shape = [n_arrays]

一个包含 NumPy 数组的列表.
random_seed : int (默认: None)

设置随机状态.

Returns

shuffled_arrays : 打乱后的 NumPy 数组列表.

Examples

```
>>> import numpy as np
>>> from mlxtend.preprocessing import shuffle_arrays_unison
>>> X1 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
>>> y1 = np.array([1, 2, 3])
>>> X2, y2 = shuffle_arrays_unison(arrays=[X1, y1], random_seed=3)
>>> assert(X2.all() == np.array([[4, 5, 6], [1, 2, 3], [7, 8, 9]]).all())
>>> assert(y2.all() == np.array([2, 1, 3]).all())
>>>

更多使用示例,请参见
https://rasbt.github.io/mlxtend/user_guide/preprocessing/shuffle_arrays_unison/

```

standardize

standardize(array, columns=None, ddof=0, return_params=False, params=None)

标准化 pandas DataFrame 中的列.

Parameters

array : pandas DataFrame 或 NumPy ndarray,形状 = [n_rows, n_columns].
columns : 类数组,形状 = [n_columns]（默认: None）

包含列名的类数组,例如 ['col1', 'col2', ...] 或列索引 [0, 2, 4, ...] 如果为 None,则标准化所有列.
ddof : int（默认: 0）

自由度修正量.计算中使用的除数是 N - ddof,其中 N 表示元素的数量.
return_params : dict（默认: False）

如果设置为 True,除了标准化数组外,还会返回一个字典.该参数字典包含各列的均值（'avgs'）和标准差（'stds'）.
params : dict（默认: None）

包含列均值和标准差的字典,如 standardize 函数在 return_params 设置为 True 时返回的那样.如果提供了 params 字典,standardize 函数将使用这些参数而不是从当前数组中计算.

Notes

如果给定列中的所有值都相同,则这些值都设置为 0.0.parameters 字典中的标准差因此设置为 1.0,以避免除以零.

Returns

df_new : pandas DataFrame 对象.

具有标准化列的数组或 DataFrame 的副本.

Examples

有关使用示例,请参见 https://rasbt.github.io/mlxtend/user_guide/preprocessing/standardize/