Mlxtend.frequent patterns

mlxtend version: 0.23.1

apriori

apriori(df, min_support=0.5, use_colnames=False, max_len=None, verbose=0, low_memory=False)

获取单热编码DataFrame中的频繁项集

Parameters

df : pandas DataFrame

编码格式的pandas DataFrame.还支持稀疏数据的DataFrame; 更多信息,请参见 (https://pandas.pydata.org/pandas-docs/stable/ user_guide/sparse.html#sparse-data-structures)

请注意,旧版pandas SparseDataFrame格式在mlxtend >= 0.17.2中不再支持.

允许的值为0/1或True/False.例如,

Apple Bananas Beer Chicken Milk Rice 0 True False True True False True 1 True False True False False True 2 True False True False False False 3 True True False False False False 4 False False True True True True 5 False False True False True True 6 False False True False True False 7 True True False False False False
min_support : float (默认: 0.5)

介于0和1之间的浮点数,用于返回项集的最小支持度. 支持度计算为 项集出现的交易次数 / 总交易次数.
use_colnames : bool (默认: False)

如果为True,则使用DataFrame的列名返回DataFrame, 而不是列索引.
max_len : int (默认: None)

生成的项集的最大长度.如果为None（默认）,则评估所有可能的项集长度（在apriori条件下）.
verbose : int (默认: 0)

如果 >= 1 且 low_memory 为 True,则显示迭代次数. 如果 >= 1 且 low_memory 为 False,则显示组合次数.
low_memory : bool (默认: False)

如果为True,则使用迭代器搜索高于 min_support的组合. 请注意,只有在内存资源有限的情况下,才应在大型数据集上使用 low_memory=True,因为此实现的性能比默认实现慢约3-6倍.

Returns

pandas DataFrame,包含列['support', 'itemsets'], 所有支持度 >= min_support 且 < max_len 的项集（如果max_len不为None）. 'itemsets'列中的每个项集都是frozenset类型, 这是一种Python内置类型,其行为类似于集合, 只是它是不可变的（更多信息,请参见 https://docs.python.org/3.6/library/stdtypes.html#frozenset）.

Examples

有关使用示例,请参见 https://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/

association_rules

association_rules(df, metric='confidence', min_threshold=0.8, support_only=False)

生成包含指标 'score'、'confidence' 和 'lift' 的关联规则 DataFrame

Parameters

df : pandas DataFrame

pandas 频繁项集 DataFrame,包含列 ['support', 'itemsets']
metric : string (默认: 'confidence')

评估规则是否有趣的指标. 如果 support_only=True,则自动设置为 'support'. 否则,支持的指标有 'support'、'confidence'、'lift'、

'leverage'、'conviction' 和 'zhangs_metric' 这些指标的计算方式如下:

- support(A->C) = support(A+C) [即 'support'],范围:[0, 1]

- confidence(A->C) = support(A+C) / support(A),范围:[0, 1]

- lift(A->C) = confidence(A->C) / support(C),范围:[0, inf]

- leverage(A->C) = support(A->C) - support(A)*support(C),
范围:[-1, 1]

- conviction = [1 - support(C)] / [1 - confidence(A->C)],
范围:[0, inf]

- zhangs_metric(A->C) =
leverage(A->C) / max(support(A->C)*(1-support(A)), support(A)*(support(C)-support(A->C)))
范围:[-1,1]

min_threshold : float (默认: 0.8)

评估指标的最小阈值, 通过 metric 参数决定候选规则是否有趣.
support_only : bool (默认: False)

仅计算规则的支持度,并将其他指标列填充为 NaN.这在你:

a) 输入的 DataFrame 不完整时很有用,例如不包含所有规则前件和后件的支持值

b) 你只想加快计算速度,因为你不需要其他指标.

Returns

包含列 "antecedents" 和 "consequents" 的 pandas DataFrame 存储项集,以及评分指标列: "antecedent support"、"consequent support"、 "support"、"confidence"、"lift"、 "leverage"、"conviction" 所有满足 metric(rule) >= min_threshold 的规则. "antecedents" 和 "consequents" 列中的每个条目都是类型 frozenset,这是 Python 内置类型, 行为类似于集合,但不可变（更多信息请参见 https://docs.python.org/3.6/library/stdtypes.html#frozenset）.

Examples

使用示例请参见 https://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/

fpgrowth

fpgrowth(df, min_support=0.5, use_colnames=False, max_len=None, verbose=0)

获取单热编码DataFrame中的频繁项集

Parameters

df : pandas DataFrame

编码格式的pandas DataFrame.还支持稀疏数据的DataFrame; 更多信息,请参见https://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#sparse-data-structures.

请注意,旧版pandas SparseDataFrame格式在mlxtend >= 0.17.2中不再支持.

允许的值为0/1或True/False.例如,

Apple Bananas Beer Chicken Milk Rice 0 True False True True False True 1 True False True False False True 2 True False True False False False 3 True True False False False False 4 False False True True True True 5 False False True False True True 6 False False True False True False 7 True True False False False False
min_support : float (默认: 0.5)

介于0和1之间的浮点数,用于返回项集的最小支持度. 支持度计算为项集出现的交易次数 / 总交易次数.
use_colnames : bool (默认: False)

如果为真,返回的DataFrame中使用DataFrame的列名而不是列索引.
max_len : int (默认: None)

生成的项集的最大长度.如果为None（默认）,则评估所有可能的项集长度.
verbose : int (默认: 0)

显示条件树生成的阶段.

Returns

pandas DataFrame,包含['support', 'itemsets']列的所有项集, 这些项集的支持度 >= min_support 且 < max_len （如果max_len不为None）. 'itemsets'列中的每个项集是frozenset类型, 这是一种Python内置类型,行为类似于集合,但不可变（更多信息,请参见 https://docs.python.org/3.6/library/stdtypes.html#frozenset）.

Examples

使用示例请参见 https://rasbt.github.io/mlxtend/user_guide/frequent_patterns/fpgrowth/

fpmax

fpmax(df, min_support=0.5, use_colnames=False, max_len=None, verbose=0)

获取单热编码DataFrame中的最大频繁项集

Parameters

df : pandas DataFrame

编码格式的pandas DataFrame.还支持稀疏数据的DataFrame; 更多信息请参见（https://pandas.pydata.org/pandas-docs/stable/ user_guide/sparse.html#sparse-data-structures）

请注意,旧版pandas SparseDataFrame格式在mlxtend >= 0.17.2中不再支持.

允许的值为0/1或True/False.例如,

Apple Bananas Beer Chicken Milk Rice 0 True False True True False True 1 True False True False False True 2 True False True False False False 3 True True False False False False 4 False False True True True True 5 False False True False True True 6 False False True False True False 7 True True False False False False
min_support : float (默认: 0.5)

介于0和1之间的浮点数,用于返回项集的最小支持度. 支持度计算为项集出现的交易次数 / 总交易次数.
use_colnames : bool (默认: False)

如果为真,返回的DataFrame中使用DataFrame的列名而不是列索引.
max_len : int (默认: None)

给定所有最大项集的集合, 返回长度小于max_len的项集.如果为None（默认）,则评估所有可能的项集长度.
verbose : int (默认: 0)

显示条件树生成的阶段.

Returns

pandas DataFrame,包含列['support', 'itemsets'],其中包含所有支持度 >= min_support 且 < max_len（如果max_len不为None）的最大项集. 'itemsets'列中的每个项集是frozenset类型, 这是Python内置类型,行为类似于集合,但不可变（更多信息请参见 https://docs.python.org/3.6/library/stdtypes.html#frozenset）.

Examples

使用示例请参见 https://rasbt.github.io/mlxtend/user_guide/frequent_patterns/fpmax/

hmine

hmine(df, min_support=0.5, use_colnames=False, max_len=None, verbose=0) -> pandas.core.frame.DataFrame

获取单热编码DataFrame中的频繁项集

Parameters

df : pandas DataFrame

编码格式的pandas DataFrame.还支持稀疏数据的DataFrame; 更多信息请参见https://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#sparse-data-structures.

请注意,旧版pandas SparseDataFrame格式在mlxtend >= 0.17.2中不再支持.

允许的值为0/1或True/False.例如,

Apple Bananas Beer Chicken Milk Rice 0 True False True True False True 1 True False True False False True 2 True False True False False False 3 True True False False False False 4 False False True True True True 5 False False True False True True 6 False False True False True False 7 True True False False False False
min_support : float (默认: 0.5)

介于0和1之间的浮点数,用于返回项集的最小支持度. 支持度计算为项集出现的交易次数 / 总交易次数.
use_colnames : bool (默认: False)

如果为真,返回的DataFrame中使用DataFrame的列名而不是列索引.
max_len : int (默认: None)

生成的项集的最大长度.如果为None（默认）,则评估所有可能的项集长度.
verbose : int (默认: 0)

显示条件树生成的阶段.

Returns

pandas DataFrame,包含['support', 'itemsets']列的所有项集, 这些项集的支持度 >= min_support 且 < max_len（如果max_len不为None）. 'itemsets'列中的每个项集是frozenset类型, 这是Python内置类型,行为类似于集合,但不可变（更多信息请参见https://docs.python.org/3.6/library/stdtypes.html#frozenset）.

Examples

使用示例请参见 https://rasbt.github.io/mlxtend/user_guide/frequent_patterns/hmine/