调整深度特征合成#

有几个参数可以调整以改变DFS的输出。我们将使用以下transactions实体集来探索这些参数。

[1]:
import featuretools as ft

es = ft.demo.load_mock_customer(return_entityset=True)
es

2024-10-11 14:50:08,145 featuretools - WARNING    While loading primitives via "premium_primitives" entry point, ignored primitive "DiversityScore" from "premium_primitives.diversity_score" because a primitive with that name already exists in "nlp_primitives.diversity_score"
2024-10-11 14:50:08,145 featuretools - WARNING    While loading primitives via "premium_primitives" entry point, ignored primitive "LSA" from "premium_primitives.lsa" because a primitive with that name already exists in "nlp_primitives.lsa"
2024-10-11 14:50:08,146 featuretools - WARNING    While loading primitives via "premium_primitives" entry point, ignored primitive "MeanCharactersPerSentence" from "premium_primitives.mean_characters_per_sentence" because a primitive with that name already exists in "nlp_primitives.mean_characters_per_sentence"
2024-10-11 14:50:08,146 featuretools - WARNING    While loading primitives via "premium_primitives" entry point, ignored primitive "NumberOfSentences" from "premium_primitives.number_of_sentences" because a primitive with that name already exists in "nlp_primitives.number_of_sentences"
2024-10-11 14:50:08,146 featuretools - WARNING    While loading primitives via "premium_primitives" entry point, ignored primitive "PartOfSpeechCount" from "premium_primitives.part_of_speech_count" because a primitive with that name already exists in "nlp_primitives.part_of_speech_count"
2024-10-11 14:50:08,147 featuretools - WARNING    While loading primitives via "premium_primitives" entry point, ignored primitive "PolarityScore" from "premium_primitives.polarity_score" because a primitive with that name already exists in "nlp_primitives.polarity_score"
2024-10-11 14:50:08,147 featuretools - WARNING    While loading primitives via "premium_primitives" entry point, ignored primitive "StopwordCount" from "premium_primitives.stopword_count" because a primitive with that name already exists in "nlp_primitives.stopword_count"
2024-10-11 14:50:08,167 featuretools - WARNING    Featuretools failed to load plugin tsfresh from library featuretools_tsfresh_primitives.__init__. For a full stack trace, set logging to debug.
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
[1]:
Entityset: transactions
  DataFrames:
    transactions [Rows: 500, Columns: 6]
    products [Rows: 5, Columns: 3]
    sessions [Rows: 35, Columns: 5]
    customers [Rows: 5, Columns: 5]
  Relationships:
    transactions.product_id -> products.product_id
    transactions.session_id -> sessions.session_id
    sessions.customer_id -> customers.customer_id

使用“种子特征”#

种子特征是用户提供给DFS的手动定义且特定于问题的特征。当可能时,Deep Feature Synthesis将自动在这些特征之上堆叠新特征。通过使用种子特征,我们可以在特征工程自动化中包含领域特定知识。对于下面的种子特征,领域知识可能是,对于特定零售商,超过125美元的交易将被视为昂贵的购买。

[2]:
expensive_purchase = ft.Feature(es["transactions"].ww["amount"]) > 125

feature_matrix, feature_defs = ft.dfs(
    entityset=es,
    target_dataframe_name="customers",
    agg_primitives=["percent_true"],
    seed_features=[expensive_purchase],
)
feature_matrix[["PERCENT_TRUE(transactions.amount > 125)"]]

[2]:
PERCENT_TRUE(transactions.amount > 125)
customer_id
5 0.227848
4 0.220183
1 0.119048
3 0.182796
2 0.129032

现在我们可以看到,“PERCENT_TRUE”原语已自动应用于来自“transactions”表的布尔值expensive_purchase特征。由此产生的特征可以理解为客户购买的被认为昂贵的交易所占的百分比。

为列添加“有趣”的值#

有时我们希望在执行计算之前基于第二个值创建特征。我们将这种额外的过滤条件称为“where子句”。在Deep Feature Synthesis中,通过在DFS中包含where_primitives参数来使用where子句。默认情况下,where子句是使用列的“interesting_values”构建的。可以通过调用es.add_interesting_values()为pandas EntitySet中的每个DataFrame自动确定和添加有趣的值。

[3]:
values_dict = {"device": ["desktop", "mobile", "tablet"]}
es.add_interesting_values(dataframe_name="sessions", values=values_dict)

数据框的Woodwork类型信息中存储着有趣的值。

[4]:
es["sessions"].ww.columns["device"].metadata

[4]:
{'dataframe_name': 'sessions',
 'entityset_id': 'transactions',
 'interesting_values': ['desktop', 'mobile', 'tablet']}

现在在sessions表中为device列设置了有趣的值,我们可以使用where_primitives参数来指定我们想要的聚合原语的where子句。

[5]:
feature_matrix, feature_defs = ft.dfs(
    entityset=es,
    target_dataframe_name="customers",
    agg_primitives=["count", "avg_time_between"],
    where_primitives=["count", "avg_time_between"],
    trans_primitives=[],
)
feature_matrix

/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
/Users/code/fin_tool/github/featuretools/featuretools/primitives/standard/aggregation/avg_time_between.py:59: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
  x = x.view("int64")
[5]:
zip_code AVG_TIME_BETWEEN(sessions.session_start) COUNT(sessions) AVG_TIME_BETWEEN(transactions.transaction_time) COUNT(transactions) AVG_TIME_BETWEEN(sessions.session_start WHERE device = mobile) AVG_TIME_BETWEEN(sessions.session_start WHERE device = tablet) AVG_TIME_BETWEEN(sessions.session_start WHERE device = desktop) COUNT(sessions WHERE device = mobile) COUNT(sessions WHERE device = tablet) ... AVG_TIME_BETWEEN(transactions.sessions.session_start) AVG_TIME_BETWEEN(transactions.sessions.session_start WHERE sessions.device = mobile) AVG_TIME_BETWEEN(transactions.sessions.session_start WHERE sessions.device = desktop) AVG_TIME_BETWEEN(transactions.sessions.session_start WHERE sessions.device = tablet) AVG_TIME_BETWEEN(transactions.transaction_time WHERE sessions.device = mobile) AVG_TIME_BETWEEN(transactions.transaction_time WHERE sessions.device = desktop) AVG_TIME_BETWEEN(transactions.transaction_time WHERE sessions.device = tablet) COUNT(transactions WHERE sessions.device = mobile) COUNT(transactions WHERE sessions.device = desktop) COUNT(transactions WHERE sessions.device = tablet)
customer_id
5 60091 5577.000000 6 363.333333 79 13942.500000 NaN 9685.0 3 1 ... 357.500000 796.714286 345.892857 0.000000 809.714286 376.071429 65.000000 36 29 14
4 60091 2516.428571 8 168.518519 109 3336.666667 NaN 4127.5 4 1 ... 163.101852 192.500000 223.108108 0.000000 206.250000 238.918919 65.000000 53 38 18
1 60091 3305.714286 8 192.920000 126 11570.000000 8807.5 7150.0 3 3 ... 185.120000 420.727273 275.000000 419.404762 438.454545 302.500000 442.619048 56 27 43
3 13244 5096.000000 6 287.554348 93 NaN NaN 4745.0 1 1 ... 276.956522 0.000000 233.360656 0.000000 65.000000 251.475410 65.000000 16 62 15
2 13244 4907.500000 7 328.532609 93 1690.000000 5330.0 6890.0 2 2 ... 320.054348 56.333333 417.575758 197.407407 82.333333 435.303030 226.296296 31 34 28

5 rows × 21 columns

现在,我们有几个可能有用的新功能。以下是其中两个功能,它们是基于“设备使用为平板电脑”的where子句构建的:

[6]:
feature_matrix[
    [
        "COUNT(sessions WHERE device = tablet)",
        "AVG_TIME_BETWEEN(sessions.session_start WHERE device = tablet)",
    ]
]

[6]:
COUNT(sessions WHERE device = tablet) AVG_TIME_BETWEEN(sessions.session_start WHERE device = tablet)
customer_id
5 1 NaN
4 1 NaN
1 3 8807.5
3 1 NaN
2 2 5330.0

第一个特征 COUNT(sessions WHERE device = tablet) 可以理解为指示客户在平板上完成了多少个会话。第二个特征 AVG_TIME_BETWEEN(sessions.session_start WHERE device = tablet) 计算这些会话之间的时间。我们可以看到,只在平板上完成了0或1个会话的客户在这些会话之间的平均时间上有 NaN 值。

编码分类特征#

机器学习算法通常期望所有的数据都是数字数据,或者具有定义明确的数字表示,比如对应于 01 的布尔值。当Deep Feature Synthesis生成分类特征时,我们可以使用Featuretools对其进行编码。

[7]:
feature_matrix, feature_defs = ft.dfs(
    entityset=es,
    target_dataframe_name="customers",
    agg_primitives=["mode"],
    trans_primitives=["time_since"],
    max_depth=1,
)

feature_matrix

[7]:
zip_code MODE(sessions.device) TIME_SINCE(birthday) TIME_SINCE(join_date)
customer_id
5 60091 mobile 1.268837e+09 4.493137e+08
4 60091 mobile 5.730582e+08 4.263649e+08
1 60091 mobile 9.541686e+08 4.256209e+08
3 13244 desktop 6.592854e+08 4.154081e+08
2 13244 desktop 1.203951e+09 3.941255e+08

这个特征矩阵包含两列分类,分别是zip_codeMODE(sessions.device)。我们可以使用特征矩阵和特征定义将这些分类值编码为布尔值。Featuretools提供了将DFS输出应用独热编码的功能。

[8]:
feature_matrix_enc, features_enc = ft.encode_features(feature_matrix, feature_defs)
feature_matrix_enc

[8]:
TIME_SINCE(birthday) TIME_SINCE(join_date) zip_code = 60091 zip_code = 13244 zip_code is unknown MODE(sessions.device) = mobile MODE(sessions.device) = desktop MODE(sessions.device) is unknown
customer_id
5 1.268837e+09 4.493137e+08 True False False True False False
4 5.730582e+08 4.263649e+08 True False False True False False
1 9.541686e+08 4.256209e+08 True False False True False False
3 6.592854e+08 4.154081e+08 False True False False True False
2 1.203951e+09 3.941255e+08 False True False False True False

现在返回的特征矩阵已经以一种机器学习算法可以解释的方式进行了编码。请注意,那些不需要编码的列仍然被包含在内。此外,我们还得到了一个包含编码值的新特征定义集。

[9]:
features_enc

[9]:
[<Feature: zip_code = 60091>,
 <Feature: zip_code = 13244>,
 <Feature: zip_code is unknown>,
 <Feature: MODE(sessions.device) = mobile>,
 <Feature: MODE(sessions.device) = desktop>,
 <Feature: MODE(sessions.device) is unknown>,
 <Feature: TIME_SINCE(birthday)>,
 <Feature: TIME_SINCE(join_date)>]

这些特征可以用来在新数据上计算相同的编码数值。有关在生产中进行特征工程的更多信息,请阅读部署指南。