featuretools.encode_features#
- featuretools.encode_features(feature_matrix, features, top_n=10, include_unknown=True, to_encode=None, inplace=False, drop_first=False, verbose=False)[source]#
编码分类特征
- Parameters:
feature_matrix (pd.DataFrame) – 特征数据框.
features (list[PrimitiveBase]) – feature_matrix 中的特征定义.
top_n (int 或 dict[string -> int]) – 包含的顶部值数量. 如果使用 dict[string -> int],键是特征名称,值是该特征要包含的顶部值数量. 如果特征名称不在字典中,则使用默认值 10.
include_unknown (pd.DataFrame) – 添加表示未知类别的特征编码. 默认为 True.
to_encode (list[str]) – 要编码的特征名称列表. 不在该列表中的特征在输出矩阵中未编码. 默认为编码所有必要的特征.
inplace (bool) – 就地编码 feature_matrix.默认为 False.
drop_first (bool) – 是否通过移除第一个级别,从 k 个分类级别中获取 k-1 个虚拟变量. 默认为 False.
verbose (str) – 打印进度信息.
- Returns:
编码后的特征矩阵, 编码后的特征
- Return type:
(pd.Dataframe, list)
Examples
In [1]: f1 = ft.Feature(es["log"].ww["product_id"]) In [2]: f2 = ft.Feature(es["log"].ww["purchased"]) In [3]: f3 = ft.Feature(es["log"].ww["value"]) In [4]: features = [f1, f2, f3] In [5]: ids = [0, 1, 2, 3, 4, 5] In [6]: feature_matrix = ft.calculate_feature_matrix(features, es, ...: instance_ids=ids) ...: In [7]: fm_encoded, f_encoded = ft.encode_features(feature_matrix, ...: features) ...: In [8]: f_encoded Out[8]: [<Feature: product_id = coke zero>, <Feature: product_id = car>, <Feature: product_id = toothpaste>, <Feature: product_id is unknown>, <Feature: purchased>, <Feature: value>] In [9]: fm_encoded, f_encoded = ft.encode_features(feature_matrix, ...: features, top_n=2) ...: In [10]: f_encoded Out[10]: [<Feature: product_id = coke zero>, <Feature: product_id = car>, <Feature: product_id is unknown>, <Feature: purchased>, <Feature: value>] In [11]: fm_encoded, f_encoded = ft.encode_features(feature_matrix, features, ....: include_unknown=False) ....: In [12]: f_encoded Out[12]: [<Feature: product_id = coke zero>, <Feature: product_id = car>, <Feature: product_id = toothpaste>, <Feature: purchased>, <Feature: value>] In [13]: fm_encoded, f_encoded = ft.encode_features(feature_matrix, features, ....: to_encode=['purchased']) ....: In [14]: f_encoded Out[14]: [<Feature: product_id>, <Feature: purchased>, <Feature: value>] In [15]: fm_encoded, f_encoded = ft.encode_features(feature_matrix, features, ....: drop_first=True) ....: In [16]: f_encoded Out[16]: [<Feature: product_id = coke zero>, <Feature: product_id = car>, <Feature: product_id is unknown>, <Feature: purchased>, <Feature: value>]