事务编码器:将商品列表转换为频繁商品集挖掘的事务数据
在Python列表中用于交易数据的编码器类
> from mlxtend.preprocessing import TransactionEncoder
概述
将数据库事务数据以Python列表的形式编码为NumPy数组。
示例 1
假设我们有以下交易数据:
from mlxtend.preprocessing import TransactionEncoder
dataset = [['Apple', 'Beer', 'Rice', 'Chicken'],
['Apple', 'Beer', 'Rice'],
['Apple', 'Beer'],
['Apple', 'Bananas'],
['Milk', 'Beer', 'Rice', 'Chicken'],
['Milk', 'Beer', 'Rice'],
['Milk', 'Beer'],
['Apple', 'Bananas']]
使用 TransactionEncoder
对象,我们可以将该数据集转换为适合典型机器学习 API 的数组格式。通过 fit
方法,TransactionEncoder
学习数据集中的唯一标签,通过 transform
方法,它将输入数据集(一个 Python 列表的列表)转换为一个独热编码的 NumPy 布尔数组:
te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
te_ary
array([[ True, False, True, True, False, True],
[ True, False, True, False, False, True],
[ True, False, True, False, False, False],
[ True, True, False, False, False, False],
[False, False, True, True, True, True],
[False, False, True, False, True, True],
[False, False, True, False, True, False],
[ True, True, False, False, False, False]], dtype=bool)
NumPy数组是布尔值类型,以便在处理大型数据集时提高内存效率。如果想要经典的整数表示,可以将数组转换为适当的类型:
te_ary.astype("int")
array([[1, 0, 1, 1, 0, 1],
[1, 0, 1, 0, 0, 1],
[1, 0, 1, 0, 0, 0],
[1, 1, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 1],
[0, 0, 1, 0, 1, 1],
[0, 0, 1, 0, 1, 0],
[1, 1, 0, 0, 0, 0]])
拟合后,可以通过 columns_
属性访问与上面显示的数据数组对应的独特列名:
te.columns_
['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice']
为了方便我们,可以将编码后的数组转换为一个 pandas DataFrame
:
import pandas as pd
pd.DataFrame(te_ary, columns=te.columns_)
Apple | Bananas | Beer | Chicken | Milk | Rice | |
---|---|---|---|---|---|---|
0 | True | False | True | True | False | True |
1 | True | False | True | False | False | True |
2 | True | False | True | False | False | False |
3 | True | True | False | False | False | False |
4 | False | False | True | True | True | True |
5 | False | False | True | False | True | True |
6 | False | False | True | False | True | False |
7 | True | True | False | False | False | False |
如果我们愿意,可以通过inverse_transform
函数将独热编码数组转换回交易列表的列表:
first4 = te_ary[:4]
te.inverse_transform(first4)
[['Apple', 'Beer', 'Chicken', 'Rice'],
['Apple', 'Beer', 'Rice'],
['Apple', 'Beer'],
['Apple', 'Bananas']]
API
TransactionEncoder()
Encoder class for transaction data in Python lists
Parameters
None
Attributes
columns_: list
List of unique names in the X
input list of lists
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/preprocessing/TransactionEncoder/
Methods
fit(X)
Learn unique column names from transaction DataFrame
Parameters
-
X
: list of listsA python list of lists, where the outer list stores the n transactions and the inner list stores the items in each transaction.
For example, [['Apple', 'Beer', 'Rice', 'Chicken'], ['Apple', 'Beer', 'Rice'], ['Apple', 'Beer'], ['Apple', 'Bananas'], ['Milk', 'Beer', 'Rice', 'Chicken'], ['Milk', 'Beer', 'Rice'], ['Milk', 'Beer'], ['Apple', 'Bananas']]
fit_transform(X, sparse=False)
Fit a TransactionEncoder encoder and transform a dataset.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: boolean, optionalIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
inverse_transform(array)
Transforms an encoded NumPy array back into transactions.
Parameters
-
array
: NumPy array [n_transactions, n_unique_items]The NumPy one-hot encoded boolean array of the input transactions, where the columns represent the unique items found in the input array in alphabetic order
For example,
array([[True , False, True , True , False, True ],
[True , False, True , False, False, True ],
[True , False, True , False, False, False],
[True , True , False, False, False, False],
[False, False, True , True , True , True ],
[False, False, True , False, True , True ],
[False, False, True , False, True , False],
[True , True , False, False, False, False]])
The corresponding column labels are available as self.columns_,
e.g., ['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice']
Returns
-
X
: list of listsA python list of lists, where the outer list stores the n transactions and the inner list stores the items in each transaction.
For example,
[['Apple', 'Beer', 'Rice', 'Chicken'],
['Apple', 'Beer', 'Rice'],
['Apple', 'Beer'],
['Apple', 'Bananas'],
['Milk', 'Beer', 'Rice', 'Chicken'],
['Milk', 'Beer', 'Rice'],
['Milk', 'Beer'],
['Apple', 'Bananas']]
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Returns
self
transform(X, sparse=False)
Transform transactions into a one-hot encoded NumPy array.
Parameters
-
X
: list of listsA python list of lists, where the outer list stores the n transactions and the inner list stores the items in each transaction.
For example, [['Apple', 'Beer', 'Rice', 'Chicken'], ['Apple', 'Beer', 'Rice'], ['Apple', 'Beer'], ['Apple', 'Bananas'], ['Milk', 'Beer', 'Rice', 'Chicken'], ['Milk', 'Beer', 'Rice'], ['Milk', 'Beer'], ['Apple', 'Bananas']]
sparse: bool (default=False) If True, transform will return Compressed Sparse Row matrix instead of the regular one.
Returns
-
array
: NumPy array [n_transactions, n_unique_items]if sparse=False (default). Compressed Sparse Row matrix otherwise The one-hot encoded boolean array of the input transactions, where the columns represent the unique items found in the input array in alphabetic order. Exact representation depends on the sparse argument
For example, array([[True , False, True , True , False, True ], [True , False, True , False, False, True ], [True , False, True , False, False, False], [True , True , False, False, False, False], [False, False, True , True , True , True ], [False, False, True , False, True , True ], [False, False, True , False, True , False], [True , True , False, False, False, False]]) The corresponding column labels are available as self.columns_, e.g., ['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice']
ython