Fpmax - mlxtend

fpmax: 通过FP-Max算法获得最大项集

实现FP-Max的函数，用于提取关联规则挖掘的最大项集

# fpmax 的使用

在这个示例中，我们将使用 `mlxtend` 库中的 `fpmax` 函数。

概述

Apriori算法是最早也是最受欢迎的频繁项集生成算法之一（频繁项集随后用于关联规则挖掘）。然而，在具有大量唯一项的数据集上，Apriori的运行时间可能相当大，因为其运行时间随着唯一项的数量呈指数级增长。

与Apriori相比，FP-Growth是一种频繁模式生成算法，它将项插入模式搜索树中，从而使其在运行时间上相对于唯一项或条目数量呈线性增加。

FP-Max是FP-Growth的一个变种，专注于获取最大项集。 如果项集X是频繁的，并且不存在包含X的频繁超模式，则称项集X为最大项集。 换句话说，频繁模式X不能是更大频繁模式的子模式，以符合最大项集的定义。

参考文献

[1] Grahne, G., & Zhu, J. (2003年11月). 在挖掘频繁项集时有效使用前缀树. 在 FIMI (第90卷).

示例 1 -- 最大项集

fpmax 函数期望输入数据为一维独热编码的 pandas DataFrame。假设我们有以下交易数据：

dataset = [['Milk', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
           ['Dill', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
           ['Milk', 'Apple', 'Kidney Beans', 'Eggs'],
           ['Milk', 'Unicorn', 'Corn', 'Kidney Beans', 'Yogurt'],
           ['Corn', 'Onion', 'Onion', 'Kidney Beans', 'Ice cream', 'Eggs']]

我们可以通过 TransactionEncoder 将其转换为正确的格式，如下所示：

import pandas as pd
from mlxtend.preprocessing import TransactionEncoder

te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)
df

	Apple	Corn	Dill	Eggs	Ice cream	Kidney Beans	Milk	Nutmeg	Onion	Unicorn	Yogurt
0	False	False	False	True	False	True	True	True	True	False	True
1	False	False	True	True	False	True	False	True	True	False	True
2	True	False	False	True	False	True	True	False	False	False	False
3	False	True	False	False	False	True	True	False	False	True	True
4	False	True	False	True	True	True	False	False	True	False	False

现在，让我们返回支持度至少为60%的项和项集：

from mlxtend.frequent_patterns import fpmax

fpmax(df, min_support=0.6)

	support	itemsets
0	0.6	(5, 6)
1	0.6	(8, 3, 5)
2	0.6	(10, 5)

默认情况下，fpmax 返回项的列索引，这在后续操作中可能会有用，例如关联规则挖掘。为了更好地可读，我们可以设置 use_colnames=True 将这些整数值转换为相应的项名称：

fpmax(df, min_support=0.6, use_colnames=True)

	support	itemsets
0	0.6	(Kidney Beans, Milk)
1	0.6	(Onion, Eggs, Kidney Beans)
2	0.6	(Kidney Beans, Yogurt)

API

fpmax(df, min_support=0.5, use_colnames=False, max_len=None, verbose=0)

Get maximal frequent itemsets from a one-hot DataFrame

Parameters

df : pandas DataFrame

pandas DataFrame the encoded format. Also supports DataFrames with sparse data; for more info, please see (https://pandas.pydata.org/pandas-docs/stable/ user_guide/sparse.html#sparse-data-structures)

Please note that the old pandas SparseDataFrame format is no longer supported in mlxtend >= 0.17.2.

The allowed values are either 0/1 or True/False. For example,

    Apple  Bananas   Beer  Chicken   Milk   Rice
    0   True    False   True     True  False   True
    1   True    False   True    False  False   True
    2   True    False   True    False  False  False
    3   True     True  False    False  False  False
    4  False    False   True     True   True   True
    5  False    False   True    False   True   True
    6  False    False   True    False   True  False
    7   True     True  False    False  False  False

min_support : float (default: 0.5)

A float between 0 and 1 for minimum support of the itemsets returned. The support is computed as the fraction transactions_where_item(s)_occur / total_transactions.
use_colnames : bool (default: False)

If true, uses the DataFrames' column names in the returned DataFrame instead of column indices.
max_len : int (default: None)

Given the set of all maximal itemsets, return those that are less than max_len. If None (default) all possible itemsets lengths are evaluated.
verbose : int (default: 0)

Shows the stages of conditional tree generation.

Returns

pandas DataFrame with columns ['support', 'itemsets'] of all maximal itemsets that are >= min_support and < than max_len (if max_len is not None). Each itemset in the 'itemsets' column is of type frozenset, which is a Python built-in type that behaves similarly to sets except that it is immutable (For more info, see https://docs.python.org/3.6/library/stdtypes.html#frozenset).

Examples

For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/frequent_patterns/fpmax/

fpmax: 通过FP-Max算法获得最大项集

概述

参考文献

相关内容

示例 1 -- 最大项集

更多示例

API