常见问题解答#

在这里,我们试图回答一些经常出现在Github和Stack Overflow上的常见问题。

[1]:
import pandas as pd

import woodwork as ww

import featuretools as ft

2024-10-11 14:50:20,901 featuretools - WARNING    While loading primitives via "premium_primitives" entry point, ignored primitive "DiversityScore" from "premium_primitives.diversity_score" because a primitive with that name already exists in "nlp_primitives.diversity_score"
2024-10-11 14:50:20,902 featuretools - WARNING    While loading primitives via "premium_primitives" entry point, ignored primitive "LSA" from "premium_primitives.lsa" because a primitive with that name already exists in "nlp_primitives.lsa"
2024-10-11 14:50:20,902 featuretools - WARNING    While loading primitives via "premium_primitives" entry point, ignored primitive "MeanCharactersPerSentence" from "premium_primitives.mean_characters_per_sentence" because a primitive with that name already exists in "nlp_primitives.mean_characters_per_sentence"
2024-10-11 14:50:20,903 featuretools - WARNING    While loading primitives via "premium_primitives" entry point, ignored primitive "NumberOfSentences" from "premium_primitives.number_of_sentences" because a primitive with that name already exists in "nlp_primitives.number_of_sentences"
2024-10-11 14:50:20,903 featuretools - WARNING    While loading primitives via "premium_primitives" entry point, ignored primitive "PartOfSpeechCount" from "premium_primitives.part_of_speech_count" because a primitive with that name already exists in "nlp_primitives.part_of_speech_count"
2024-10-11 14:50:20,903 featuretools - WARNING    While loading primitives via "premium_primitives" entry point, ignored primitive "PolarityScore" from "premium_primitives.polarity_score" because a primitive with that name already exists in "nlp_primitives.polarity_score"
2024-10-11 14:50:20,903 featuretools - WARNING    While loading primitives via "premium_primitives" entry point, ignored primitive "StopwordCount" from "premium_primitives.stopword_count" because a primitive with that name already exists in "nlp_primitives.stopword_count"
2024-10-11 14:50:20,917 featuretools - WARNING    Featuretools failed to load plugin tsfresh from library featuretools_tsfresh_primitives.__init__. For a full stack trace, set logging to debug.

实体集#

EntitySet是Featuretools中的一个核心概念,它代表了数据集中的多个表格。在EntitySet中,每个表格被称为一个实体,而实体之间的关系被称为关系。EntitySet提供了一个方便的方式来组织和管理多个表格之间的关系,以便进行自动化特征工程。

如何获取EntitySet中列名和类型的列表?#

在创建EntitySet之后,您可能希望查看列名。EntitySet包含多个DataFrame,每个DataFrame对应EntitySet中的一个表。

[2]:
es = ft.demo.load_mock_customer(return_entityset=True)

es

/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
[2]:
Entityset: transactions
  DataFrames:
    transactions [Rows: 500, Columns: 6]
    products [Rows: 5, Columns: 3]
    sessions [Rows: 35, Columns: 5]
    customers [Rows: 5, Columns: 5]
  Relationships:
    transactions.product_id -> products.product_id
    transactions.session_id -> sessions.session_id
    sessions.customer_id -> customers.customer_id

如果您想查看底层的数据框(Dataframe),可以执行以下操作:

[3]:
es["transactions"].head()

[3]:
transaction_id session_id transaction_time product_id amount _ft_last_time
298 298 1 2014-01-01 00:00:00 5 127.64 2014-01-01 00:00:00
2 2 1 2014-01-01 00:01:05 2 109.48 2014-01-01 00:01:05
308 308 1 2014-01-01 00:02:10 3 95.06 2014-01-01 00:02:10
116 116 1 2014-01-01 00:03:15 4 78.92 2014-01-01 00:03:15
371 371 1 2014-01-01 00:04:20 3 31.54 2014-01-01 00:04:20

如果您想查看“transactions” DataFrame 的列和类型,可以执行以下操作:

[4]:
es["transactions"].ww

[4]:
Physical Type Logical Type Semantic Tag(s)
Column
transaction_id int64 Integer ['index']
session_id int64 Integer ['foreign_key', 'numeric']
transaction_time datetime64[ns] Datetime ['time_index']
product_id category Categorical ['category', 'foreign_key']
amount float64 Double ['numeric']
_ft_last_time datetime64[ns] Datetime ['last_time_index']

copy_columnsadditional_columns 之间有什么区别?#

函数 normalize_dataframe 创建一个新的DataFrame和一个与现有DataFrame的唯一值相关联的关系。它接受两个类似的参数:

  • additional_columns 从基础DataFrame中移除列并将它们移动到新的DataFrame中。

  • copy_columns 保留基础DataFrame中给定的列,同时也将它们复制到新的DataFrame中。

[5]:
data = ft.demo.load_mock_customer()
transactions_df = data["transactions"].merge(data["sessions"]).merge(data["customers"])
products_df = data["products"]

es = ft.EntitySet(id="customer_data")
es = es.add_dataframe(
    dataframe_name="transactions",
    dataframe=transactions_df,
    index="transaction_id",
    time_index="transaction_time",
)

es = es.add_dataframe(
    dataframe_name="products", dataframe=products_df, index="product_id"
)

es = es.add_relationship("products", "product_id", "transactions", "product_id")

/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(

在我们进行规范化创建新的DataFrame之前,让我们先看看基础DataFrame。

[6]:
es["transactions"].head()

[6]:
transaction_id session_id transaction_time product_id amount customer_id device session_start zip_code join_date birthday
298 298 1 2014-01-01 00:00:00 5 127.64 2 desktop 2014-01-01 13244 2012-04-15 23:31:04 1986-08-18
2 2 1 2014-01-01 00:01:05 2 109.48 2 desktop 2014-01-01 13244 2012-04-15 23:31:04 1986-08-18
308 308 1 2014-01-01 00:02:10 3 95.06 2 desktop 2014-01-01 13244 2012-04-15 23:31:04 1986-08-18
116 116 1 2014-01-01 00:03:15 4 78.92 2 desktop 2014-01-01 13244 2012-04-15 23:31:04 1986-08-18
371 371 1 2014-01-01 00:04:20 3 31.54 2 desktop 2014-01-01 13244 2012-04-15 23:31:04 1986-08-18

请注意列 session_id, session_start, join_date, device, customer_id, 和 zip_code

[7]:
es = es.normalize_dataframe(
    base_dataframe_name="transactions",
    new_dataframe_name="sessions",
    index="session_id",
    make_time_index="session_start",
    additional_columns=["join_date"],
    copy_columns=["device", "customer_id", "zip_code", "session_start"],
)

在上面,我们对列进行了规范化,创建了一个新的DataFrame。

  • 对于additional_columnstransactions DataFrame中的列['join_date']将被移除,并移到新的sessions DataFrame中。

  • 对于copy_columnstransactions DataFrame中的列['device', 'customer_id', 'zip_code', 'session_start']将被复制到新的sessions DataFrame中。

让我们在实际的EntitySet中看看这个过程。

[8]:
es["transactions"].head()

[8]:
transaction_id session_id transaction_time product_id amount customer_id device session_start zip_code birthday
298 298 1 2014-01-01 00:00:00 5 127.64 2 desktop 2014-01-01 13244 1986-08-18
2 2 1 2014-01-01 00:01:05 2 109.48 2 desktop 2014-01-01 13244 1986-08-18
308 308 1 2014-01-01 00:02:10 3 95.06 2 desktop 2014-01-01 13244 1986-08-18
116 116 1 2014-01-01 00:03:15 4 78.92 2 desktop 2014-01-01 13244 1986-08-18
371 371 1 2014-01-01 00:04:20 3 31.54 2 desktop 2014-01-01 13244 1986-08-18

请注意,['device', 'customer_id', 'zip_code', 'session_start'] 仍然存在于 transactions 数据框中,而 ['join_date'] 不在其中。但是,它们都已经被移动到 sessions 数据框中,如下所示。

[9]:
es["sessions"].head()

[9]:
session_id join_date device customer_id zip_code session_start
1 1 2012-04-15 23:31:04 desktop 2 13244 2014-01-01 00:00:00
2 2 2010-07-17 05:27:50 mobile 5 60091 2014-01-01 00:17:20
3 3 2011-04-08 20:08:14 mobile 4 60091 2014-01-01 00:28:10
4 4 2011-04-17 10:48:33 mobile 1 60091 2014-01-01 00:44:25
5 5 2011-04-08 20:08:14 mobile 4 60091 2014-01-01 01:11:30

为什么我的列会得到新的语义标签?#

在创建EntitySet的过程中,您可能会想知道为什么您的列的语义标签会发生变化。

[10]:
data = ft.demo.load_mock_customer()
transactions_df = data["transactions"].merge(data["sessions"]).merge(data["customers"])
products_df = data["products"]

es = ft.EntitySet(id="customer_data")
es = es.add_dataframe(
    dataframe_name="transactions",
    dataframe=transactions_df,
    index="transaction_id",
    time_index="transaction_time",
)
es.plot()

/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
[10]:
../_images/resources_frequently_asked_questions_20_1.svg

如果一列包含语义标签,它们将出现在上图中分号的右侧。请注意,session_idsession_start目前没有任何与它们关联的语义标签。

现在,让我们对交易数据框进行规范化,以创建一个新的数据框。

[11]:
es = es.normalize_dataframe(
    base_dataframe_name="transactions",
    new_dataframe_name="sessions",
    index="session_id",
    make_time_index="session_start",
    additional_columns=["session_start"],
)
es.plot()

[11]:
../_images/resources_frequently_asked_questions_22_0.svg

session_id 现在在 transactions DataFrame 中具有语义标签 foreign_key,在新的 DataFrame sessions 中具有 index。这是因为当我们对 DataFrame 进行规范化时,我们在 transactionssessions 之间创建了新的关系。父 DataFrame sessions 和子 DataFrame transactions 之间存在一对多的关系。

因此,在 transactions 中,session_id 具有语义标签 foreign_key,因为它代表另一个 DataFrame 中的 index。如果我们使用 add_dataframeadd_relationship 添加另一个 DataFrame,也会产生类似的效果。

此外,当我们创建新的 DataFrame 时,我们将 session_start 设置为 time_index。这将在新的 sessions DataFrame 中的 session_start 列上添加语义标签 time_index,因为它现在代表一个 time_index

如何更新列的描述或元数据?#

您可以直接更新列模式的描述或元数据属性。但是,您必须明确使用由DataFrame.ww.columns['col_name']返回的列模式,而不是 DataFrame.ww['col_name'].ww.schema。来自DataFrame.ww.columns['col_name']的列模式仍与EntitySet相关联,并传播任何属性更新,而另一个则不会。例如,这是如何更新列的描述或元数据的方法:

column_schema = df.ww.columns['col_name']

column_schema.description = '我的描述'

column_schema.metadata.update(key='value')

如何组合两个或更多有趣的值?#

在计算之前,您可能希望创建受多个值条件约束的特征。这将需要使用interesting_values。然而,由于我们试图创建具有多个条件的特征,我们需要在创建EntitySet之前修改数据框。

让我们看看您可能如何实现这一点。

首先,让我们创建我们的数据框。

[12]:
data = ft.demo.load_mock_customer()
transactions_df = data["transactions"].merge(data["sessions"]).merge(data["customers"])
products_df = data["products"]

[13]:
transactions_df.head()

[13]:
transaction_id session_id transaction_time product_id amount customer_id device session_start zip_code join_date birthday
0 298 1 2014-01-01 00:00:00 5 127.64 2 desktop 2014-01-01 13244 2012-04-15 23:31:04 1986-08-18
1 2 1 2014-01-01 00:01:05 2 109.48 2 desktop 2014-01-01 13244 2012-04-15 23:31:04 1986-08-18
2 308 1 2014-01-01 00:02:10 3 95.06 2 desktop 2014-01-01 13244 2012-04-15 23:31:04 1986-08-18
3 116 1 2014-01-01 00:03:15 4 78.92 2 desktop 2014-01-01 13244 2012-04-15 23:31:04 1986-08-18
4 371 1 2014-01-01 00:04:20 3 31.54 2 desktop 2014-01-01 13244 2012-04-15 23:31:04 1986-08-18
[14]:
products_df.head()

[14]:
product_id brand
0 1 B
1 2 B
2 3 B
3 4 B
4 5 A

现在,让我们修改我们的transactions数据框,创建一个表示多个条件的特征的额外列。

[15]:
transactions_df["product_id_device"] = (
    transactions_df["product_id"].astype(str) + " and " + transactions_df["device"]
)

在这里,我们创建了一个名为product_id_device的新列,它只是将product_id列和device列合并在一起。

现在让我们创建我们的EntitySet

[16]:
es = ft.EntitySet(id="customer_data")
es = es.add_dataframe(
    dataframe_name="transactions",
    dataframe=transactions_df,
    index="transaction_id",
    time_index="transaction_time",
    logical_types={
        "product_id": ww.logical_types.Categorical,
        "product_id_device": ww.logical_types.Categorical,
        "zip_code": ww.logical_types.PostalCode,
    },
)

es = es.add_dataframe(
    dataframe_name="products", dataframe=products_df, index="product_id"
)

es = es.normalize_dataframe(
    base_dataframe_name="transactions",
    new_dataframe_name="sessions",
    index="session_id",
    additional_columns=["device", "product_id_device", "customer_id"],
)

es = es.normalize_dataframe(
    base_dataframe_name="sessions", new_dataframe_name="customers", index="customer_id"
)
es

/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
[16]:
Entityset: customer_data
  DataFrames:
    transactions [Rows: 500, Columns: 9]
    products [Rows: 5, Columns: 2]
    sessions [Rows: 35, Columns: 5]
    customers [Rows: 5, Columns: 2]
  Relationships:
    transactions.session_id -> sessions.session_id
    sessions.customer_id -> customers.customer_id

现在,我们准备添加我们感兴趣的值。

首先,让我们查看一下有哪些有趣的值可供选择。

[17]:
interesting_values = transactions_df["product_id_device"].unique().tolist()
interesting_values

[17]:
['5 and desktop',
 '2 and desktop',
 '3 and desktop',
 '4 and desktop',
 '1 and desktop',
 '4 and mobile',
 '5 and mobile',
 '1 and mobile',
 '3 and mobile',
 '2 and mobile',
 '4 and tablet',
 '3 and tablet',
 '2 and tablet',
 '1 and tablet',
 '5 and tablet']

如果你愿意的话,你可以选择这些值的一个子集,而创建的where特征将只使用这些条件。在我们的示例中,我们将使用所有可能的有趣值。

在这里,我们将所有这些值设置为这个特定DataFrame和列的有趣值。如果我们愿意,我们可以以同样的方式为多个列创建有趣值,但在这个示例中我们将只使用这一个。

[18]:
values = {"product_id_device": interesting_values}
es.add_interesting_values(dataframe_name="sessions", values=values)

现在我们可以运行深度优先搜索算法。

[19]:
feature_matrix, feature_defs = ft.dfs(
    entityset=es,
    target_dataframe_name="customers",
    agg_primitives=["count"],
    where_primitives=["count"],
    trans_primitives=[],
)
feature_matrix.head()

[19]:
COUNT(sessions) COUNT(transactions) COUNT(sessions WHERE product_id_device = 4 and desktop) COUNT(sessions WHERE product_id_device = 1 and tablet) COUNT(sessions WHERE product_id_device = 3 and desktop) COUNT(sessions WHERE product_id_device = 4 and mobile) COUNT(sessions WHERE product_id_device = 2 and mobile) COUNT(sessions WHERE product_id_device = 5 and tablet) COUNT(sessions WHERE product_id_device = 5 and mobile) COUNT(sessions WHERE product_id_device = 3 and mobile) ... COUNT(transactions WHERE sessions.product_id_device = 4 and mobile) COUNT(transactions WHERE sessions.product_id_device = 2 and mobile) COUNT(transactions WHERE sessions.product_id_device = 5 and mobile) COUNT(transactions WHERE sessions.product_id_device = 4 and desktop) COUNT(transactions WHERE sessions.product_id_device = 3 and mobile) COUNT(transactions WHERE sessions.product_id_device = 4 and tablet) COUNT(transactions WHERE sessions.product_id_device = 5 and tablet) COUNT(transactions WHERE sessions.product_id_device = 1 and tablet) COUNT(transactions WHERE sessions.product_id_device = 1 and desktop) COUNT(transactions WHERE sessions.product_id_device = 2 and desktop)
customer_id
2 7 93 1 1 0 1 1 1 0 0 ... 18 13 0 10 0 0 13 15 8 0
5 6 79 1 0 0 1 0 0 0 1 ... 10 0 0 14 8 14 0 0 0 0
4 8 109 1 0 0 0 2 1 0 1 ... 0 23 0 18 15 0 18 0 0 10
1 8 126 0 0 0 3 0 0 0 0 ... 56 0 0 0 0 27 0 0 0 15
3 6 93 0 0 0 0 0 0 0 1 ... 0 0 0 0 16 0 0 0 33 0

5 rows × 32 columns

为了更好地理解where子句的特性,让我们来看其中的一个特性。

特性COUNT(sessions WHERE product_id_device = 5 and tablet),告诉我们客户在平板电脑上购买product_id为5的产品的会话数量。请注意,该特性依赖于多个条件(product_id = 5 & device = tablet)

[20]:
feature_matrix[["COUNT(sessions WHERE product_id_device = 5 and tablet)"]]

[20]:
COUNT(sessions WHERE product_id_device = 5 and tablet)
customer_id
2 1
5 0
4 1
1 0
3 0

深度优先搜索 (DFS)#

为什么DFS没有创建聚合特征?#

您可能已经创建了您的EntitySet,然后应用DFS来创建特征。然而,您可能会感到困惑,为什么没有创建任何聚合特征。

  • 这很可能是因为您的EntitySet中只有一个DataFrame,并且DFS无法使用少于2个DataFrame创建聚合特征。Featuretools会查找关系,并根据该关系进行聚合。

让我们看一个简单的例子。

[21]:
data = ft.demo.load_mock_customer()

transactions_df = data["transactions"].merge(data["sessions"]).merge(data["customers"])


es = ft.EntitySet(id="customer_data")

es = es.add_dataframe(
    dataframe_name="transactions", dataframe=transactions_df, index="transaction_id"
)

es

/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
[21]:
Entityset: customer_data
  DataFrames:
    transactions [Rows: 500, Columns: 11]
  Relationships:
    No relationships

请注意,我们的EntitySet中只有一个DataFrame。如果我们尝试在这个EntitySet上创建聚合特征,那是不可能的,因为DFS需要2个DataFrame来生成聚合特征。

[22]:
feature_matrix, feature_defs = ft.dfs(
    entityset=es, target_dataframe_name="transactions"
)
feature_defs

/Users/code/fin_tool/github/featuretools/featuretools/synthesis/deep_feature_synthesis.py:154: UserWarning: Only one dataframe in entityset, changing max_depth to 1 since deeper features cannot be created
  warnings.warn(
[22]:
[<Feature: session_id>,
 <Feature: product_id>,
 <Feature: amount>,
 <Feature: customer_id>,
 <Feature: device>,
 <Feature: zip_code>,
 <Feature: DAY(birthday)>,
 <Feature: DAY(join_date)>,
 <Feature: DAY(session_start)>,
 <Feature: DAY(transaction_time)>,
 <Feature: MONTH(birthday)>,
 <Feature: MONTH(join_date)>,
 <Feature: MONTH(session_start)>,
 <Feature: MONTH(transaction_time)>,
 <Feature: WEEKDAY(birthday)>,
 <Feature: WEEKDAY(join_date)>,
 <Feature: WEEKDAY(session_start)>,
 <Feature: WEEKDAY(transaction_time)>,
 <Feature: YEAR(birthday)>,
 <Feature: YEAR(join_date)>,
 <Feature: YEAR(session_start)>,
 <Feature: YEAR(transaction_time)>]

以上特征均不是聚合特征。要解决这个问题,您可以向您的EntitySet中添加另一个DataFrame。

解决方案#1 - 如果您有额外的数据,可以添加新的DataFrame。

[23]:
products_df = data["products"]
es = es.add_dataframe(
    dataframe_name="products", dataframe=products_df, index="product_id"
)
es

/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
[23]:
Entityset: customer_data
  DataFrames:
    transactions [Rows: 500, Columns: 11]
    products [Rows: 5, Columns: 2]
  Relationships:
    No relationships

注意我们现在在EntitySet中有一个额外的DataFrame,名为products

解决方案#2 - 您可以对现有的DataFrame进行规范化。

[24]:
es = es.normalize_dataframe(
    base_dataframe_name="transactions",
    new_dataframe_name="sessions",
    index="session_id",
    make_time_index="session_start",
    additional_columns=["device", "customer_id", "zip_code", "join_date"],
    copy_columns=["session_start"],
)
es

[24]:
Entityset: customer_data
  DataFrames:
    transactions [Rows: 500, Columns: 7]
    products [Rows: 5, Columns: 2]
    sessions [Rows: 35, Columns: 6]
  Relationships:
    transactions.session_id -> sessions.session_id

注意我们现在在EntitySet中有一个额外的DataFrame,名为sessions。在这里,规范化创建了transactionssessions之间的关系。然而,如果我们只使用了解决方案#1,我们也可以指定transactionsproducts之间的关系。

现在,我们可以生成聚合特征。

[25]:
feature_matrix, feature_defs = ft.dfs(
    entityset=es, target_dataframe_name="transactions"
)
feature_defs[:-10]

/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
[25]:
[<Feature: session_id>,
 <Feature: product_id>,
 <Feature: amount>,
 <Feature: DAY(birthday)>,
 <Feature: DAY(session_start)>,
 <Feature: DAY(transaction_time)>,
 <Feature: MONTH(birthday)>,
 <Feature: MONTH(session_start)>,
 <Feature: MONTH(transaction_time)>,
 <Feature: WEEKDAY(birthday)>,
 <Feature: WEEKDAY(session_start)>,
 <Feature: WEEKDAY(transaction_time)>,
 <Feature: YEAR(birthday)>,
 <Feature: YEAR(session_start)>,
 <Feature: YEAR(transaction_time)>,
 <Feature: sessions.device>,
 <Feature: sessions.customer_id>,
 <Feature: sessions.zip_code>,
 <Feature: sessions.COUNT(transactions)>,
 <Feature: sessions.MAX(transactions.amount)>,
 <Feature: sessions.MEAN(transactions.amount)>,
 <Feature: sessions.MIN(transactions.amount)>,
 <Feature: sessions.MODE(transactions.product_id)>,
 <Feature: sessions.NUM_UNIQUE(transactions.product_id)>,
 <Feature: sessions.SKEW(transactions.amount)>]

一些聚合特征包括:

  • <特征: sessions.MAX(transactions.amount)>

  • <特征: sessions.SKEW(transactions.amount)>

  • <特征: sessions.MIN(transactions.amount)>

  • <特征: sessions.MEAN(transactions.amount)>

  • <特征: sessions.COUNT(transactions)>

如何加快DFS的运行时间?#

在运行ft.dfs时可能会遇到的一个问题是性能较慢。虽然Featuretools在计算特征时通常具有最佳的默认设置,但在计算大量特征时,您可能希望提高性能。

加快性能的一种快速方法是调整ft.dfsft.calculate_feature_matrixn_jobs设置。

# 将n_jobs设置为-1将使用所有核心

feature_matrix, feature_defs = ft.dfs(entityset=es,
                                      target_dataframe_name="customers",
                                      n_jobs=-1)

feature_matrix, feature_defs = ft.calculate_feature_matrix(entityset=es,
                                                           features=feature_defs,
                                                           n_jobs=-1)

要了解更多提高性能的方法,请访问:

  • 提高计算性能

在运行DFS时如何只包含特定的特征?#

在使用DFS生成特征时,您可能希望只包含特定的特征。有多种方法可以实现这一点:

  • 使用ignore_columns来指定DataFrame中不应用于创建特征的列。它是一个将DataFrame名称映射到要忽略的列名列表的字典。

  • 使用drop_contains来删除包含在此参数中列出的任何字符串的特征。

  • 使用drop_exact来删除与此参数中列出的任何字符串完全匹配的特征。

以下是使用所有三个参数的示例:

[26]:
es = ft.demo.load_mock_customer(return_entityset=True)

feature_matrix, feature_defs = ft.dfs(
    entityset=es,
    target_dataframe_name="customers",
    ignore_columns={
        "transactions": ["amount"],
        "customers": ["age", "gender", "birthday"],
    },  # 忽略这些列
    drop_contains=["customers.SUM("],  # 删除包含这些字符串的特性
    drop_exact=["STD(transactions.quanity)"],
)  # 删除完全匹配的功能

/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)

如何在每列或每个DataFrame基础上指定原语?#

在使用DFS生成特征时,您可能希望仅针对特定原语使用特定的特征或DataFrame。这可以通过primitive_options参数来实现。primitive_options参数是一个字典,将一个原语或原语元组映射到包含原语选项的字典中。如果原语需要多个输入,则原语或原语元组也可以映射到选项字典列表。原语键可以是原语的字符串名称、原语类或原语的特定实例。每个字典为其各自的输入列提供选项。通过这些选项,有多种控制原语应用方式的方法:

  • 使用ignore_dataframes来指定不应用于为该原语创建特征的DataFrame。这是一个要忽略的DataFrame名称列表。

  • 使用include_dataframes来指定仅包含用于为该原语创建特征的DataFrame。这是要包含的DataFrame名称列表。

  • 使用ignore_columns来指定不应用于为该原语创建特征的DataFrame中的列。这是将DataFrame名称映射到要忽略的列名列表的字典。

  • 使用include_columns来指定仅应用于为该原语创建特征的DataFrame中的列。这是将DataFrame名称映射到要包含的列名列表的字典。

您还可以使用primitive_options来指定希望用作groupby转换原语的groupby的DataFrame或列:

  • 使用ignore_groupby_dataframes来指定不应用于获取该原语的groupbys的DataFrame。这是要忽略的DataFrame名称列表。

  • 使用include_groupby_dataframes来指定应用于获取该原语的groupbys的唯一DataFrame。这是要包含的DataFrame名称列表。

  • 使用ignore_groupby_columns来指定不应用作为该原语的groupbys的DataFrame中的列。这是将DataFrame名称映射到要忽略的列名列表的字典。

  • 使用include_groupby_columns来指定仅应用作为该原语的groupbys的DataFrame中的列。这是将DataFrame名称映射到要包含的列名列表的字典。

以下是使用其中一些选项的示例:

[27]:
es = ft.demo.load_mock_customer(return_entityset=True)

feature_matrix, feature_defs = ft.dfs(
    entityset=es,
    target_dataframe_name="customers",
    primitive_options={
        "mode": {
            "ignore_dataframes": ["sessions"],
            "ignore_columns": {"products": ["brand"], "transactions": ["product_id"]},
        },
        # For mode, ignore the "sessions" DataFrame and only include "brands" in the
        # "products" dataframe and "product_id" in the "transactions" DataFrame
        ("count", "mean"): {"include_dataframes": ["sessions", "transactions"]},
        # For count and mean, only include the dataframes "sessions" and "transactions"
    },
)

/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)

请注意,如果为特定实例的原语和一般原语分别提供选项(通过字符串名称或类),那么具有自己选项的实例将不使用通用选项。例如,在这种情况下:

special_mean = Mean()

options = {

    special_mean: {'include_dataframes': ['customers']},

    'mean': {'include_dataframes': ['sessions']}

原语special_mean将不使用DataFrame sessions,因为它的选项只包括customersMean原语的每个其他实例将使用'mean'选项。

有关为DFS指定选项的更多示例,请访问:

如果我没有指定cutoff_time,特征计算会使用哪个日期?#

特征计算将使用当前时间作为截止时间,即cutoff_time = datetime.now()

如何在计算特征时选择特定数量的历史数据?#

在计算特征时,您可能会遇到只希望使用特定数量的历史数据进行预测的情况。您可以使用ft.dfs中的training_window参数来实现这一目的。当您使用training_window时,Featuretools将使用在cutoff_timecutoff_time - training_window之间的历史数据。

为了进行计算,Featuretools将检查target_dataframetime_index列中的时间。

[28]:
es = ft.demo.load_mock_customer(return_entityset=True)
es["customers"].ww.time_index

/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
[28]:
'join_date'

我们的target_dataframe中有一个time_index,这对于training_window的计算是必需的。在这里,我们正在创建一个截止时间的DataFrame,以便为每个客户端设置一个唯一的训练窗口。

[29]:
cutoff_times = pd.DataFrame()
cutoff_times["customer_id"] = [1, 2, 3, 1]
cutoff_times["time"] = pd.to_datetime(
    ["2014-1-1 04:00", "2014-1-1 05:00", "2014-1-1 06:00", "2014-1-1 08:00"]
)
cutoff_times["label"] = [True, True, False, True]

feature_matrix, feature_defs = ft.dfs(
    entityset=es,
    target_dataframe_name="customers",
    cutoff_time=cutoff_times,
    cutoff_time_in_index=True,
    training_window="1 hour",
)
feature_matrix.head()

/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
[29]:
zip_code COUNT(sessions) MODE(sessions.device) NUM_UNIQUE(sessions.device) COUNT(transactions) MAX(transactions.amount) MEAN(transactions.amount) MIN(transactions.amount) MODE(transactions.product_id) NUM_UNIQUE(transactions.product_id) ... STD(sessions.SUM(transactions.amount)) SUM(sessions.MAX(transactions.amount)) SUM(sessions.MEAN(transactions.amount)) SUM(sessions.MIN(transactions.amount)) SUM(sessions.NUM_UNIQUE(transactions.product_id)) SUM(sessions.SKEW(transactions.amount)) SUM(sessions.STD(transactions.amount)) MODE(transactions.sessions.device) NUM_UNIQUE(transactions.sessions.device) label
customer_id time
1 2014-01-01 04:00:00 60091 1 tablet 1 12 139.09 85.469167 6.78 4 5 ... NaN 139.09 85.469167 6.78 5.0 -0.830975 39.825249 tablet 1 True
2 2014-01-01 05:00:00 13244 1 tablet 1 13 118.85 77.304615 21.82 1 5 ... NaN 118.85 77.304615 21.82 5.0 -0.314918 33.725036 tablet 1 True
3 2014-01-01 06:00:00 13244 2 desktop 1 12 128.26 81.747500 20.06 3 5 ... 563.882303 220.02 172.597273 111.82 6.0 -0.289466 35.704680 desktop 1 False
1 2014-01-01 08:00:00 60091 1 mobile 1 16 126.11 88.755625 11.62 4 5 ... NaN 126.11 88.755625 11.62 5.0 -1.038434 32.324534 mobile 1 True

4 rows × 76 columns

在上面的代码中,我们使用了training_window参数为1小时来运行DFS,以创建仅使用在我们提供的截止时间之前最后一个小时内收集的客户数据的特征。

我可以在单个表上运行DFS吗?#

虽然可能,但在单个表上运行DFS并没有充分利用DFS的能力。首先,DFS将无法使用任何聚合原语,因为这至少需要两个表。您只能使用转换原语。这限制了DFS通过特征堆叠生成特征的复杂性。此外,在某些情况下,在具有时间列的数据上运行单表DFS可能会导致标签泄漏。将数据拆分为多个表后,Featuretools可以根据截止时间过滤数据,而不是假设数据已经适当地展平,但在只有一个表的情况下无法做到这一点。

如果您只有一个数据表,DFS当然仍然可以派上用场。有两种主要方法可以将单个表传递给DFS。

第一种方法是简单地创建一个只有一个表的EntitySet。

例如:

[30]:
transactions_df = ft.demo.load_mock_customer(return_single_table=True)

es = ft.EntitySet(id="customer_data")
es = es.add_dataframe(
    dataframe_name="transactions",
    dataframe=transactions_df,
    index="transaction_id",
    time_index="transaction_time",
)

feature_matrix, feature_defs = ft.dfs(
    entityset=es,
    target_dataframe_name="transactions",
    trans_primitives=[
        "time_since",
        "day",
        "is_weekend",
        "cum_min",
        "minute",
        "weekday",
        "percentile",
        "year",
        "week",
        "cum_mean",
    ],
)

/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/featuretools/synthesis/deep_feature_synthesis.py:154: UserWarning: Only one dataframe in entityset, changing max_depth to 1 since deeper features cannot be created
  warnings.warn(

第二种方法是将数据框插入到一个字典中,将其名称映射到包含特定数据框信息的元组。然后我们将该字典传递给DFS中的dataframes参数。

在这种情况下,对于字典中的值,我们传入一个包含数据框、其索引列和时间索引的元组。有关可能参数的更多信息可以在DFS文档中找到。

例如:

[31]:
transactions_df = ft.demo.load_mock_customer(return_single_table=True)

dataframes = {"transactions": (transactions_df, "transaction_id", "transaction_time")}

feature_matrix, feature_defs = ft.dfs(
    dataframes=dataframes,
    target_dataframe_name="transactions",
    trans_primitives=[
        "time_since",
        "day",
        "is_weekend",
        "cum_min",
        "minute",
        "weekday",
        "percentile",
        "year",
        "week",
        "cum_mean",
    ],
)

/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/featuretools/synthesis/deep_feature_synthesis.py:154: UserWarning: Only one dataframe in entityset, changing max_depth to 1 since deeper features cannot be created
  warnings.warn(

在我们检查输出之前,让我们先看一下我们的原始单表格。

[32]:
transactions_df.head()

[32]:
transaction_id session_id transaction_time product_id amount customer_id device session_start zip_code join_date birthday brand
298 298 1 2014-01-01 00:00:00 5 127.64 2 desktop 2014-01-01 13244 2012-04-15 23:31:04 1986-08-18 A
2 2 1 2014-01-01 00:01:05 2 109.48 2 desktop 2014-01-01 13244 2012-04-15 23:31:04 1986-08-18 B
308 308 1 2014-01-01 00:02:10 3 95.06 2 desktop 2014-01-01 13244 2012-04-15 23:31:04 1986-08-18 B
116 116 1 2014-01-01 00:03:15 4 78.92 2 desktop 2014-01-01 13244 2012-04-15 23:31:04 1986-08-18 B
371 371 1 2014-01-01 00:04:20 3 31.54 2 desktop 2014-01-01 13244 2012-04-15 23:31:04 1986-08-18 B

现在我们可以看一下Featuretools能够应用于这个单个DataFrame以创建特征矩阵的转换。

[33]:
feature_matrix.head()

[33]:
session_id product_id amount customer_id device zip_code brand CUM_MEAN(amount) CUM_MEAN(customer_id) CUM_MEAN(session_id) ... WEEK(session_start) WEEK(transaction_time) WEEKDAY(birthday) WEEKDAY(join_date) WEEKDAY(session_start) WEEKDAY(transaction_time) YEAR(birthday) YEAR(join_date) YEAR(session_start) YEAR(transaction_time)
transaction_id
298 1 5 127.64 2 desktop 13244 A 127.640000 2.0 1.0 ... 1 1 0 6 2 2 1986 2012 2014 2014
2 1 2 109.48 2 desktop 13244 B 118.560000 2.0 1.0 ... 1 1 0 6 2 2 1986 2012 2014 2014
308 1 3 95.06 2 desktop 13244 B 110.726667 2.0 1.0 ... 1 1 0 6 2 2 1986 2012 2014 2014
116 1 4 78.92 2 desktop 13244 B 102.775000 2.0 1.0 ... 1 1 0 6 2 2 1986 2012 2014 2014
371 1 3 31.54 2 desktop 13244 B 88.528000 2.0 1.0 ... 1 1 0 6 2 2 1986 2012 2014 2014

5 rows × 44 columns

如何使用DFS防止标签泄漏?#

使用DFS时可能会遇到的一个问题是标签泄漏。您希望确保数据中的标签没有被错误地用来创建特征和特征矩阵。

Featuretools特别注重帮助用户避免标签泄漏。

有两种方法可以防止标签泄漏,具体取决于您的数据是否具有时间戳。

1. 没有时间戳的数据#

在没有时间戳的情况下,您可以使用仅包含训练数据的一个EntitySet,然后运行ft.dfs。这将仅使用训练数据创建一个特征矩阵,同时返回一个特征定义列表。接下来,您可以使用测试数据创建一个EntitySet,通过使用之前得到的特征定义列表调用ft.calculate_feature_matrix来重新计算相同的特征。

以下是该流程的示例:

首先,让我们创建我们的训练数据。

[34]:
train_data = pd.DataFrame(
    {
        "customer_id": [1, 2, 3, 4, 5],
        "age": [40, 50, 10, 20, 30],
        "gender": ["m", "f", "m", "f", "f"],
        "signup_date": pd.date_range("2014-01-01 01:41:50", periods=5, freq="25min"),
        "labels": [True, False, True, False, True],
    }
)
train_data.head()

[34]:
customer_id age gender signup_date labels
0 1 40 m 2014-01-01 01:41:50 True
1 2 50 f 2014-01-01 02:06:50 False
2 3 10 m 2014-01-01 02:31:50 True
3 4 20 f 2014-01-01 02:56:50 False
4 5 30 f 2014-01-01 03:21:50 True

现在,我们可以为我们的训练数据创建一个实体集。

[35]:
es_train_data = ft.EntitySet(id="customer_train_data")
es_train_data = es_train_data.add_dataframe(
    dataframe_name="customers", dataframe=train_data, index="customer_id"
)
es_train_data

/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
[35]:
Entityset: customer_train_data
  DataFrames:
    customers [Rows: 5, Columns: 5]
  Relationships:
    No relationships

接下来,我们准备为训练数据创建特征和特征矩阵。我们不希望 Featuretools 使用标签列来构建新特征,因此我们将使用 ignore_columns 选项来排除它。这也会从特征矩阵中删除标签列,因此我们会告诉 DFS 将其包含为种子特征。

[36]:
labels_feature = ft.Feature(es_train_data["customers"].ww["labels"])
feature_matrix_train, feature_defs = ft.dfs(
    entityset=es_train_data,
    target_dataframe_name="customers",
    ignore_columns={"customers": ["labels"]},
    seed_features=[labels_feature],
)
feature_matrix_train

/Users/code/fin_tool/github/featuretools/featuretools/synthesis/deep_feature_synthesis.py:154: UserWarning: Only one dataframe in entityset, changing max_depth to 1 since deeper features cannot be created
  warnings.warn(
[36]:
age labels DAY(signup_date) MONTH(signup_date) WEEKDAY(signup_date) YEAR(signup_date)
customer_id
1 40 True 1 1 2 2014
2 50 False 1 1 2 2014
3 10 True 1 1 2 2014
4 20 False 1 1 2 2014
5 30 True 1 1 2 2014

我们还将对特征矩阵进行编码,以使其与机器学习兼容。

[37]:
feature_matrix_train_enc, features_enc = ft.encode_features(
    feature_matrix_train, feature_defs
)
feature_matrix_train_enc.head()

[37]:
age labels DAY(signup_date) = 1 DAY(signup_date) is unknown MONTH(signup_date) = 1 MONTH(signup_date) is unknown WEEKDAY(signup_date) = 2 WEEKDAY(signup_date) is unknown YEAR(signup_date) = 2014 YEAR(signup_date) is unknown
customer_id
1 40 True True False True False True False True False
2 50 False True False True False True False True False
3 10 True True False True False True False True False
4 20 False True False True False True False True False
5 30 True True False True False True False True False

注意整个特征矩阵现在只包含数值和布尔值。

现在我们可以使用特征定义来计算测试数据的特征矩阵,并避免标签泄漏。

[38]:
test_train = pd.DataFrame(
    {
        "customer_id": [6, 7, 8, 9, 10],
        "age": [20, 25, 55, 22, 35],
        "gender": ["f", "m", "m", "m", "m"],
        "signup_date": pd.date_range("2014-01-01 01:41:50", periods=5, freq="25min"),
        "labels": [True, False, False, True, True],
    }
)

es_test_data = ft.EntitySet(id="customer_test_data")
es_test_data = es_test_data.add_dataframe(
    dataframe_name="customers",
    dataframe=test_train,
    index="customer_id",
    time_index="signup_date",
)

# 使用之前的功能定义
feature_matrix_enc_test = ft.calculate_feature_matrix(
    features=features_enc, entityset=es_test_data
)

feature_matrix_enc_test.head()

/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
[38]:
age labels DAY(signup_date) = 1 DAY(signup_date) is unknown MONTH(signup_date) = 1 MONTH(signup_date) is unknown WEEKDAY(signup_date) = 2 WEEKDAY(signup_date) is unknown YEAR(signup_date) = 2014 YEAR(signup_date) is unknown
customer_id
6 20 True True False True False True False True False
7 25 False True False True False True False True False
8 55 False True False True False True False True False
9 22 True True False True False True False True False
10 35 True True False True False True False True False

查看建模部分,了解如何在sklearn中使用编码矩阵的示例。

2. 带有时间戳的数据#

如果您的数据带有时间戳,防止标签泄漏的最佳方法是使用一个截止时间列表,该列表指定了在生成特征矩阵的每一行中允许使用的数据的最后时间点。要使用截止时间,您需要为实体集中的每个时间敏感的DataFrame设置一个时间索引。

提示:即使您的数据没有时间戳,您也可以添加一个带有虚拟时间戳的列,Featuretools可以将其用作时间索引。

当您调用ft.dfs时,可以像这样提供一个截止时间的DataFrame:

[39]:
cutoff_times = pd.DataFrame(
    {
        "customer_id": [1, 2, 3, 4, 5],
        "time": pd.date_range("2014-01-01 01:41:50", periods=5, freq="25min"),
    }
)
cutoff_times.head()

[39]:
customer_id time
0 1 2014-01-01 01:41:50
1 2 2014-01-01 02:06:50
2 3 2014-01-01 02:31:50
3 4 2014-01-01 02:56:50
4 5 2014-01-01 03:21:50
[40]:
train_test_data = pd.DataFrame(
    {
        "customer_id": [1, 2, 3, 4, 5],
        "age": [20, 25, 55, 22, 35],
        "gender": ["f", "m", "m", "m", "m"],
        "signup_date": pd.date_range("2010-01-01 01:41:50", periods=5, freq="25min"),
    }
)

es_train_test_data = ft.EntitySet(id="customer_train_test_data")
es_train_test_data = es_train_test_data.add_dataframe(
    dataframe_name="customers",
    dataframe=train_test_data,
    index="customer_id",
    time_index="signup_date",
)

feature_matrix_train_test, features = ft.dfs(
    entityset=es_train_test_data,
    target_dataframe_name="customers",
    cutoff_time=cutoff_times,
    cutoff_time_in_index=True,
)
feature_matrix_train_test.head()

/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/featuretools/synthesis/deep_feature_synthesis.py:154: UserWarning: Only one dataframe in entityset, changing max_depth to 1 since deeper features cannot be created
  warnings.warn(
[40]:
age DAY(signup_date) MONTH(signup_date) WEEKDAY(signup_date) YEAR(signup_date)
customer_id time
1 2014-01-01 01:41:50 20 1 1 4 2010
2 2014-01-01 02:06:50 25 1 1 4 2010
3 2014-01-01 02:31:50 55 1 1 4 2010
4 2014-01-01 02:56:50 22 1 1 4 2010
5 2014-01-01 03:21:50 35 1 1 4 2010

在上面,我们已经创建了一个使用截止时间来避免标签泄漏的特征矩阵。我们也可以使用ft.encode_features来对这个特征矩阵进行编码。

传递原始对象和字符串到DFS之间有什么区别?#

有两种方法可以将原始对象传递给DFS:使用原始对象本身,或者使用原始对象的字符串名称。

我们将使用名为TimeSincePrevious的Transform原始对象来说明这两种方法之间的区别。

首先,让我们使用原始对象名称的字符串。

[41]:
es = ft.demo.load_mock_customer(return_entityset=True)

/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
[42]:
feature_matrix, feature_defs = ft.dfs(
    entityset=es,
    target_dataframe_name="customers",
    agg_primitives=[],
    trans_primitives=["time_since_previous"],
)
feature_matrix

[42]:
zip_code TIME_SINCE_PREVIOUS(join_date)
customer_id
5 60091 NaN
4 60091 22948824.0
1 60091 744019.0
3 13244 10212841.0
2 13244 21282510.0

现在,让我们使用原始对象。

[43]:
from featuretools.primitives import TimeSincePrevious

feature_matrix, feature_defs = ft.dfs(
    entityset=es,
    target_dataframe_name="customers",
    agg_primitives=[],
    trans_primitives=[TimeSincePrevious],
)
feature_matrix

[43]:
zip_code TIME_SINCE_PREVIOUS(join_date)
customer_id
5 60091 NaN
4 60091 22948824.0
1 60091 744019.0
3 13244 10212841.0
2 13244 21282510.0

正如我们在上面看到的,特征矩阵是相同的。

然而,如果我们需要修改原语中可控参数,我们应该使用原语对象。

例如,让我们将TimeSincePrevious返回的单位修改为小时(默认为秒)。

[44]:
from featuretools.primitives import TimeSincePrevious

time_since_previous_in_hours = TimeSincePrevious(unit="hours")

feature_matrix, feature_defs = ft.dfs(
    entityset=es,
    target_dataframe_name="customers",
    agg_primitives=[],
    trans_primitives=[time_since_previous_in_hours],
)
feature_matrix

[44]:
zip_code TIME_SINCE_PREVIOUS(join_date, unit=hours)
customer_id
5 60091 NaN
4 60091 6374.673333
1 60091 206.671944
3 13244 2836.900278
2 13244 5911.808333

特性#

如何根据一些属性(特定字符串、显式原始类型、返回类型、给定深度)选择特征?#

您可能希望根据一些属性选择特征的子集。

假设您想要选择名称中包含字符串amount的特征。您可以通过在特征定义上使用get_name函数来检查这一点。

[45]:
es = ft.demo.load_mock_customer(return_entityset=True)

feature_defs = ft.dfs(
    entityset=es, target_dataframe_name="customers", features_only=True
)

features_with_amount = []
for x in feature_defs:
    if "amount" in x.get_name():
        features_with_amount.append(x)
features_with_amount[0:5]

/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
[45]:
[<Feature: MAX(transactions.amount)>,
 <Feature: MEAN(transactions.amount)>,
 <Feature: MIN(transactions.amount)>,
 <Feature: SKEW(transactions.amount)>,
 <Feature: STD(transactions.amount)>]

您可能还希望仅选择聚合特征。

[46]:
from featuretools import AggregationFeature

features_only_aggregations = []
for x in feature_defs:
    if type(x) == AggregationFeature:
        features_only_aggregations.append(x)
features_only_aggregations[0:5]

[46]:
[<Feature: COUNT(sessions)>,
 <Feature: MODE(sessions.device)>,
 <Feature: NUM_UNIQUE(sessions.device)>,
 <Feature: COUNT(transactions)>,
 <Feature: MAX(transactions.amount)>]

另外,您可能只想选择在特定深度计算的特征。您可以通过使用get_depth函数来实现这一点。

[47]:
features_only_depth_2 = []
for x in feature_defs:
    if x.get_depth() == 2:
        features_only_depth_2.append(x)
features_only_depth_2[0:5]

[47]:
[<Feature: MAX(sessions.COUNT(transactions))>,
 <Feature: MAX(sessions.MEAN(transactions.amount))>,
 <Feature: MAX(sessions.MIN(transactions.amount))>,
 <Feature: MAX(sessions.NUM_UNIQUE(transactions.product_id))>,
 <Feature: MAX(sessions.SKEW(transactions.amount))>]

最后,您可能只想返回特定类型的特征。您可以通过使用column_schema属性来实现这一点。有关使用列模式的更多信息,请查看从变量过渡到Woodwork

[48]:
features_only_numeric = []
for x in feature_defs:
    if "numeric" in x.column_schema.semantic_tags:
        features_only_numeric.append(x)
features_only_numeric[0:5]

[48]:
[<Feature: COUNT(sessions)>,
 <Feature: NUM_UNIQUE(sessions.device)>,
 <Feature: COUNT(transactions)>,
 <Feature: MAX(transactions.amount)>,
 <Feature: MEAN(transactions.amount)>]

一旦您有了特定的特征列表,您可以使用 ft.calculate_feature_matrix 仅为这些特征生成特征矩阵。

对于我们的示例,让我们只使用名称中包含字符串 amount 的特征。

[49]:
feature_matrix = ft.calculate_feature_matrix(
    entityset=es, features=features_with_amount
)  # 切换到您的特定功能列表
feature_matrix.head()

/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
[49]:
MAX(transactions.amount) MEAN(transactions.amount) MIN(transactions.amount) SKEW(transactions.amount) STD(transactions.amount) SUM(transactions.amount) MAX(sessions.MEAN(transactions.amount)) MAX(sessions.MIN(transactions.amount)) MAX(sessions.SKEW(transactions.amount)) MAX(sessions.STD(transactions.amount)) ... STD(sessions.MAX(transactions.amount)) STD(sessions.MEAN(transactions.amount)) STD(sessions.MIN(transactions.amount)) STD(sessions.SKEW(transactions.amount)) STD(sessions.SUM(transactions.amount)) SUM(sessions.MAX(transactions.amount)) SUM(sessions.MEAN(transactions.amount)) SUM(sessions.MIN(transactions.amount)) SUM(sessions.SKEW(transactions.amount)) SUM(sessions.STD(transactions.amount))
customer_id
5 149.02 80.375443 7.55 -0.025941 44.095630 6349.66 94.481667 20.65 0.602209 51.149250 ... 7.928001 11.007471 4.961414 0.415426 402.775486 839.76 472.231119 86.49 0.014384 259.873954
4 149.95 80.070459 5.73 -0.036348 45.068765 8727.68 110.450000 54.83 0.382868 54.293903 ... 3.514421 13.027258 16.960575 0.387884 235.992478 1157.99 649.657515 131.51 0.002764 356.125829
1 139.43 71.631905 5.81 0.019698 40.442059 9025.62 88.755625 26.36 0.640252 46.905665 ... 7.322191 13.759314 6.954507 0.589386 279.510713 1057.97 582.193117 78.59 -0.476122 312.745952
3 149.15 67.060430 5.89 0.418230 43.683296 6236.62 82.109444 20.06 0.854976 50.110120 ... 10.724241 11.174282 5.424407 0.429374 219.021420 847.63 405.237462 66.21 2.286086 257.299895
2 146.81 77.422366 8.73 0.098259 37.705178 7200.28 96.581000 56.46 0.755711 47.935920 ... 17.221593 11.477071 15.874374 0.509798 251.609234 931.63 548.905851 154.60 -0.277640 258.700528

5 rows × 37 columns

注意,在上面的代码中,我们可以看到所有特征矩阵的列名都包含字符串amount

如何创建where特征?#

有时,您可能希望创建在计算之前受第二个值限制的特征。这种额外的过滤条件被称为“where子句”。您可以使用列的interesting_values来创建这些特征。

如果您的EntitySet中有分类列,您可以使用add_interesting_values。此函数将为您的分类列找到有趣的值,然后可以用于生成“where”子句。

首先,让我们创建我们的EntitySet

[50]:
es = ft.demo.load_mock_customer(return_entityset=True)
es

/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
[50]:
Entityset: transactions
  DataFrames:
    transactions [Rows: 500, Columns: 6]
    products [Rows: 5, Columns: 3]
    sessions [Rows: 35, Columns: 5]
    customers [Rows: 5, Columns: 5]
  Relationships:
    transactions.product_id -> products.product_id
    transactions.session_id -> sessions.session_id
    sessions.customer_id -> customers.customer_id

现在我们可以为分类列添加有趣的值。

[51]:
es.add_interesting_values()

现在我们可以运行DFS,使用where_primitives参数来定义应用带有where子句的原语。在这种情况下,让我们使用原语count。为了使其工作,原语count必须同时存在于agg_primitiveswhere_primitives中。

[52]:
feature_matrix, feature_defs = ft.dfs(
    entityset=es,
    target_dataframe_name="customers",
    agg_primitives=["count"],
    where_primitives=["count"],
    trans_primitives=[],
)
feature_matrix.head()

[52]:
zip_code COUNT(sessions) COUNT(transactions) COUNT(sessions WHERE device = mobile) COUNT(sessions WHERE device = desktop) COUNT(sessions WHERE device = tablet) COUNT(sessions WHERE customers.zip_code = 13244) COUNT(sessions WHERE customers.zip_code = 60091) COUNT(transactions WHERE sessions.device = mobile) COUNT(transactions WHERE sessions.device = tablet) COUNT(transactions WHERE sessions.device = desktop)
customer_id
5 60091 6 79 3 2 1 0 6 36 14 29
4 60091 8 109 4 3 1 0 8 53 18 38
1 60091 8 126 3 2 3 0 8 56 43 27
3 13244 6 93 1 4 1 6 0 16 15 62
2 13244 7 93 2 3 2 7 0 31 28 34

我们现在已经创建了一些有用的特性。一个有用特性的例子是 COUNT(sessions WHERE device = tablet)。这个特性告诉我们客户在平板电脑上完成了多少个会话。

[53]:
feature_matrix[["COUNT(sessions WHERE device = tablet)"]]

[53]:
COUNT(sessions WHERE device = tablet)
customer_id
5 1
4 1
1 3
3 1
2 2

Basic Data Types#

原始类型(Transform、GroupBy Transform和Aggregation)之间有什么区别?#

您可能想知道原始类型之间的区别。

让我们来看一下transform、groupby transform和aggregation原始类型之间的区别。

首先,让我们创建一个简单的EntitySet

[54]:
import pandas as pd

import featuretools as ft

df = pd.DataFrame(
    {
        "id": [1, 2, 3, 4, 5, 6],
        "time_index": pd.date_range("1/1/2019", periods=6, freq="D"),
        "group": ["a", "a", "a", "a", "a", "a"],
        "val": [5, 1, 10, 20, 6, 23],
    }
)
es = ft.EntitySet()
es = es.add_dataframe(
    dataframe_name="observations", dataframe=df, index="id", time_index="time_index"
)

es = es.normalize_dataframe(
    base_dataframe_name="observations", new_dataframe_name="groups", index="group"
)

es.plot()

/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
[54]:
../_images/resources_frequently_asked_questions_118_1.svg

在调用normalize_dataframe之后,列”group”具有语义标签”foreign_key”,因为它标识另一个DataFrame。或者,当我们首次调用es.add_dataframe()时,也可以使用semantic_tags参数进行设置。

转换原语#

cum_sum原语计算数字列表中的累积和。

[55]:
from featuretools.primitives import CumSum

cum_sum = CumSum()
cum_sum([1, 2, 3, 4, 5]).tolist()

[55]:
[1, 3, 6, 10, 15]

如果我们使用trans_primitives参数应用它,它将在整个观察数据框上进行计算,就像这样:

[56]:
feature_matrix, feature_defs = ft.dfs(
    target_dataframe_name="observations",
    entityset=es,
    agg_primitives=[],
    trans_primitives=["cum_sum"],
    groupby_trans_primitives=[],
)

feature_matrix

[56]:
group val CUM_SUM(val)
id
1 a 5 5.0
2 a 1 6.0
3 a 10 16.0
4 a 20 36.0
5 a 6 42.0
6 a 23 65.0

分组转换原语#

如果我们使用groupby_trans_primitives应用它,那么DFS将首先按任何外键列进行分组,然后应用转换原语。因此,我们可以按组获得累积和。

[57]:
feature_matrix, feature_defs = ft.dfs(
    target_dataframe_name="observations",
    entityset=es,
    agg_primitives=[],
    trans_primitives=[],
    groupby_trans_primitives=["cum_sum"],
)

feature_matrix

/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:516: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  grouped = frame.groupby(groupby)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:559: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  frame[name].update(pd.concat(col_vals))
[57]:
group val CUM_SUM(val) by group
id
1 a 5 5.0
2 a 1 6.0
3 a 10 16.0
4 a 20 36.0
5 a 6 42.0
6 a 23 65.0

聚合原语#

最后,还有一个聚合原语“sum”。如果我们使用sum,它将在每行的截止时间为每个组计算总和。因为我们没有指定截止时间,它将对每个组的所有数据在每行中进行计算。

[58]:
feature_matrix, feature_defs = ft.dfs(
    target_dataframe_name="observations",
    entityset=es,
    agg_primitives=["sum"],
    trans_primitives=[],
    cutoff_time_in_index=True,
    groupby_trans_primitives=[],
)

feature_matrix

/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
[58]:
group val groups.SUM(observations.val)
id time
1 2024-10-11 14:50:23.002623 a 5 65.0
2 2024-10-11 14:50:23.002623 a 1 65.0
3 2024-10-11 14:50:23.002623 a 10 65.0
4 2024-10-11 14:50:23.002623 a 20 65.0
5 2024-10-11 14:50:23.002623 a 6 65.0
6 2024-10-11 14:50:23.002623 a 23 65.0

如果我们将每行的截止时间设置为时间索引,然后使用 sum 作为聚合原语,结果与 cum_sum 相同。(尽管在显示的数据框中顺序不同)。

[59]:
cutoff_time = df[["id", "time_index"]]
cutoff_time

[59]:
id time_index
1 1 2019-01-01
2 2 2019-01-02
3 3 2019-01-03
4 4 2019-01-04
5 5 2019-01-05
6 6 2019-01-06
[60]:
feature_matrix, feature_defs = ft.dfs(
    target_dataframe_name="observations",
    entityset=es,
    agg_primitives=["sum"],
    trans_primitives=[],
    groupby_trans_primitives=[],
    cutoff_time_in_index=True,
    cutoff_time=cutoff_time,
)

feature_matrix

/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
[60]:
group val groups.SUM(observations.val)
id time
1 2019-01-01 a 5 5.0
2 2019-01-02 a 1 6.0
3 2019-01-03 a 10 16.0
4 2019-01-04 a 20 36.0
5 2019-01-05 a 6 42.0
6 2019-01-06 a 23 65.0

如何获取所有聚合和转换基元的列表?#

您可以使用featuretools.list_primitives()来获取Featuretools中的所有基元。它将返回一个包含基元名称、类型和描述的DataFrame。

[61]:
df_primitives = ft.list_primitives()
df_primitives.head()

[61]:
name type description valid_inputs return_type
0 is_monotonically_increasing aggregation 判断一个序列是否单调递增. <ColumnSchema (Semantic Tags = ['numeric'])> <ColumnSchema (Logical Type = BooleanNullable)>
1 max_consecutive_positives aggregation 确定输入中连续正数值的最大数量 <ColumnSchema (Logical Type = Double)>, <Colum... <ColumnSchema (Logical Type = Integer) (Semant...
2 count_outside_nth_std aggregation 确定位于前N个标准差之外的观测值数量. <ColumnSchema (Semantic Tags = ['numeric'])> <ColumnSchema (Logical Type = Integer) (Semant...
3 num_peaks aggregation 确定一个数字列表中的峰值数量. <ColumnSchema (Semantic Tags = ['numeric'])> <ColumnSchema (Logical Type = Integer) (Semant...
4 first aggregation 确定列表中的第一个值. <ColumnSchema> None
[62]:
df_primitives.tail()

[62]:
name type description valid_inputs return_type
220 upper_case_word_count transform 确定字符串中完全大写的单词数量. <ColumnSchema (Logical Type = NaturalLanguage)> <ColumnSchema (Logical Type = IntegerNullable)...
221 days_in_month transform 确定给定日期时间所在月份的天数. <ColumnSchema (Logical Type = Datetime)> <ColumnSchema (Logical Type = Ordinal: [1, 2, ...
222 is_null transform 判断一个值是否为空. <ColumnSchema> <ColumnSchema (Logical Type = Boolean)>
223 add_numeric transform 对两个列表进行元素逐项相加. <ColumnSchema (Semantic Tags = ['numeric'])> <ColumnSchema (Semantic Tags = ['numeric'])>
224 expanding_min transform 计算给定窗口内事件的扩展最小值. <ColumnSchema (Semantic Tags = ['numeric'])>, ... <ColumnSchema (Semantic Tags = ['numeric'])>

如何更改TimeSince原语的单位?#

Featuretools中有一些原语可以进行基于时间的计算。这些包括TimeSince, TimeSincePrevious, TimeSinceLast, TimeSinceFirst

您可以将单位从默认的秒更改为任何有效的时间单位,方法如下:

[63]:
from featuretools.primitives import (
    TimeSince,
    TimeSinceFirst,
    TimeSinceLast,
    TimeSincePrevious,
)

time_since = TimeSince(unit="minutes")
time_since_previous = TimeSincePrevious(unit="hours")
time_since_last = TimeSinceLast(unit="days")
time_since_first = TimeSinceFirst(unit="years")

es = ft.demo.load_mock_customer(return_entityset=True)

feature_matrix, feature_defs = ft.dfs(
    entityset=es,
    target_dataframe_name="customers",
    agg_primitives=[time_since_last, time_since_first],
    trans_primitives=[time_since, time_since_previous],
)

/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(

现在,我们将单位更改为以下内容:

  • TimeSince 的单位为分钟

  • TimeSincePrevious 的单位为小时

  • TimeSinceLast 的单位为天

  • TimeSinceFirst 的单位为年

现在我们可以看到,我们的特征矩阵包含多个特征,其中时间差原语的单位已更改。

[64]:
feature_matrix.head()

[64]:
zip_code TIME_SINCE_FIRST(sessions.session_start, unit=years) TIME_SINCE_LAST(sessions.session_start, unit=days) TIME_SINCE_FIRST(transactions.transaction_time, unit=years) TIME_SINCE_LAST(transactions.transaction_time, unit=days) TIME_SINCE(birthday, unit=minutes) TIME_SINCE(join_date, unit=minutes) TIME_SINCE_PREVIOUS(join_date, unit=hours) TIME_SINCE_FIRST(transactions.sessions.session_start, unit=years) TIME_SINCE_LAST(transactions.sessions.session_start, unit=days)
customer_id
5 60091 10.783855 3936.283543 10.783855 3936.278277 2.114729e+07 7.488563e+06 NaN 10.783855 3936.283543
4 60091 10.783834 3936.394885 10.783834 3936.388114 9.550970e+06 7.106082e+06 6374.673333 10.783834 3936.394885
1 60091 10.783803 3936.319654 10.783803 3936.308369 1.590281e+07 7.093682e+06 206.671944 10.783803 3936.319654
3 13244 10.783698 3936.254202 10.783698 3936.242918 1.098809e+07 6.923468e+06 2836.900278 10.783698 3936.254202
2 13244 10.783888 3936.277524 10.783888 3936.268496 2.006585e+07 6.568759e+06 5911.808333 10.783888 3936.277524

现在有一些特性,其中时间单位与默认的秒不同,比如 TIME_SINCE_LAST(sessions.session_start, unit=days)TIME_SINCE_FIRST(sessions.session_start, unit=years)

Modeling#

如何在Featuretools和sklearn的train_test_split中使用我的训练和测试数据?#

您可能想知道如何在Featuretools和sklearn的train_test_split中正确使用您的训练和测试数据。有些步骤您需要遵循,以确保这个工作流程的准确性。

让我们假设我们有一个包含标签的训练数据的数据框。

[65]:
train_data = pd.DataFrame(
    {
        "customer_id": [1, 2, 3, 4, 5],
        "age": [20, 25, 55, 22, 35],
        "gender": ["f", "m", "m", "m", "m"],
        "signup_date": pd.date_range("2010-01-01 01:41:50", periods=5, freq="25min"),
        "labels": [False, True, True, False, False],
    }
)
train_data.head()

[65]:
customer_id age gender signup_date labels
0 1 20 f 2010-01-01 01:41:50 False
1 2 25 m 2010-01-01 02:06:50 True
2 3 55 m 2010-01-01 02:31:50 True
3 4 22 m 2010-01-01 02:56:50 False
4 5 35 m 2010-01-01 03:21:50 False

现在我们可以为训练数据创建我们的EntitySet,并创建我们的特征。为了防止标签泄漏,我们将使用截止时间(请参见之前的问题)。

[66]:
es_train_data = ft.EntitySet(id="customer_data")
es_train_data = es_train_data.add_dataframe(
    dataframe_name="customers", dataframe=train_data, index="customer_id"
)

cutoff_times = pd.DataFrame(
    {
        "customer_id": [1, 2, 3, 4, 5],
        "time": pd.date_range("2014-01-01 01:41:50", periods=5, freq="25min"),
    }
)

feature_matrix_train, features = ft.dfs(
    entityset=es_train_data,
    target_dataframe_name="customers",
    cutoff_time=cutoff_times,
    cutoff_time_in_index=True,
)
feature_matrix_train.head()

/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/featuretools/synthesis/deep_feature_synthesis.py:154: UserWarning: Only one dataframe in entityset, changing max_depth to 1 since deeper features cannot be created
  warnings.warn(
[66]:
age labels DAY(signup_date) MONTH(signup_date) WEEKDAY(signup_date) YEAR(signup_date)
customer_id time
1 2014-01-01 01:41:50 20 False 1 1 4 2010
2 2014-01-01 02:06:50 25 True 1 1 4 2010
3 2014-01-01 02:31:50 55 True 1 1 4 2010
4 2014-01-01 02:56:50 22 False 1 1 4 2010
5 2014-01-01 03:21:50 35 False 1 1 4 2010

我们还将对特征矩阵进行编码,以便与机器学习算法兼容。

[67]:
feature_matrix_train_enc, feature_enc = ft.encode_features(
    feature_matrix_train, features
)
feature_matrix_train_enc.head()

[67]:
age labels DAY(signup_date) = 1 DAY(signup_date) is unknown MONTH(signup_date) = 1 MONTH(signup_date) is unknown WEEKDAY(signup_date) = 4 WEEKDAY(signup_date) is unknown YEAR(signup_date) = 2010 YEAR(signup_date) is unknown
customer_id time
1 2014-01-01 01:41:50 20 False True False True False True False True False
2 2014-01-01 02:06:50 25 True True False True False True False True False
3 2014-01-01 02:31:50 55 True True False True False True False True False
4 2014-01-01 02:56:50 22 False True False True False True False True False
5 2014-01-01 03:21:50 35 False True False True False True False True False
[68]:
from sklearn.model_selection import train_test_split

X = feature_matrix_train_enc.drop(["labels"], axis=1)
y = feature_matrix_train_enc["labels"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

现在您可以使用编码后的特征矩阵与sklearn的train_test_split。这将允许您训练模型并调整参数。

在拆分训练和测试数据时,分类列是如何编码的?#

您可能想知道当对训练和测试数据进行编码时会发生什么。您可能好奇想知道如果训练数据中有一个分类列在测试数据中不存在会发生什么。

让我们通过一个简单的例子来探讨编码过程中会发生什么。

[69]:
train_data = pd.DataFrame(
    {
        "customer_id": [1, 2, 3, 4, 5],
        "product_purchased": ["coke zero", "car", "toothpaste", "coke zero", "car"],
    }
)
es_train = ft.EntitySet(id="customer_data")
es_train = es_train.add_dataframe(
    dataframe_name="customers",
    dataframe=train_data,
    index="customer_id",
    logical_types={"product_purchased": ww.logical_types.Categorical},
)
feature_matrix_train, features = ft.dfs(
    entityset=es_train, target_dataframe_name="customers"
)
feature_matrix_train

/Users/code/fin_tool/github/featuretools/featuretools/synthesis/deep_feature_synthesis.py:154: UserWarning: Only one dataframe in entityset, changing max_depth to 1 since deeper features cannot be created
  warnings.warn(
[69]:
product_purchased
customer_id
1 coke zero
2 car
3 toothpaste
4 coke zero
5 car

我们将使用ft.encode_features来正确编码product_purchased列。

[70]:
feature_matrix_train_encoded, features_encoded = ft.encode_features(
    feature_matrix_train, features
)
feature_matrix_train_encoded.head()

[70]:
product_purchased = coke zero product_purchased = car product_purchased = toothpaste product_purchased is unknown
customer_id
1 True False False False
2 False True False False
3 False False True False
4 True False False False
5 False True False False

现在让我们想象一下,我们有一些测试数据,其中缺少一个分类值(牙膏)。此外,测试数据中有一个在训练数据中不存在的值()。

[71]:
test_data = pd.DataFrame(
    {
        "customer_id": [6, 7, 8, 9, 10],
        "product_purchased": ["coke zero", "car", "coke zero", "coke zero", "water"],
    }
)

es_test = ft.EntitySet(id="customer_data")
es_test = es_test.add_dataframe(
    dataframe_name="customers", dataframe=test_data, index="customer_id"
)

feature_matrix_test = ft.calculate_feature_matrix(
    entityset=es_test, features=features_encoded
)
feature_matrix_test.head()

/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
[71]:
product_purchased = coke zero product_purchased = car product_purchased = toothpaste product_purchased is unknown
customer_id
6 True False False False
7 False True False False
8 True False False False
9 True False False False
10 False False False True

如上所示,我们成功处理了编码,并处理了以下复杂情况:

  • 牙膏 在训练数据中存在,但在测试数据中不存在

  • 在测试数据中存在,但在训练数据中不存在。

Errors and Warnings#

为什么会出现错误’数据框中的索引不唯一’?#

您可能正在尝试创建您的EntitySet,并遇到此错误。

IndexError: 索引列必须是唯一的

这是因为您的EntitySet中的每个数据框都需要一个唯一的索引。

让我们看一个简单的例子。

[72]:
product_df = pd.DataFrame({"id": [1, 2, 3, 4, 4], "rating": [3.5, 4.0, 4.5, 1.5, 5.0]})
product_df

[72]:
id rating
0 1 3.5
1 2 4.0
2 3 4.5
3 4 1.5
4 4 5.0

请注意id列具有重复索引4。如果尝试将此数据框添加到EntitySet中,将会遇到以下错误。

es = ft.EntitySet(id="产品数据")

es = es.add_dataframe(dataframe_name="产品",

                      dataframe=product_df,

                      index="id")
---------------------------------------------------------------------------

IndexError                                Traceback (most recent call last)

<ipython-input-78-854fbaf207f8> in <module>

      1 es = ft.EntitySet(id="product_data")

----> 2 es = es.add_dataframe(dataframe_name="products",

      3                       dataframe=product_df,

      4                       index="id")



~/Code/featuretools/featuretools/entityset/entityset.py in add_dataframe(self, dataframe, dataframe_name, index, logical_types, semantic_tags, make_index, time_index, secondary_time_index, already_sorted)

    625             index_was_created, index, dataframe = _get_or_create_index(index, make_index, dataframe)

    626

--> 627             dataframe.ww.init(name=dataframe_name,

    628                               index=index,

    629                               time_index=time_index,



/usr/local/Caskroom/miniconda/base/envs/featuretools/lib/python3.8/site-packages/woodwork/table_accessor.py in init(self, index, time_index, logical_types, already_sorted, schema, validate, use_standard_tags, **kwargs)

     94         """

     95         if validate:

---> 96             _validate_accessor_params(self._dataframe, index, time_index, logical_types, schema, use_standard_tags)

     97         if schema is not None:

     98             self._schema = schema



/usr/local/Caskroom/miniconda/base/envs/featuretools/lib/python3.8/site-packages/woodwork/table_accessor.py in _validate_accessor_params(dataframe, index, time_index, logical_types, schema, use_standard_tags)

    877         # 如果传递了schema,我们将忽略这些参数

    878         if index is not None:

--> 879             _check_index(dataframe, index)

    880         if logical_types:

    881             _check_logical_types(dataframe.columns, logical_types)



/usr/local/Caskroom/miniconda/base/envs/featuretools/lib/python3.8/site-packages/woodwork/table_accessor.py in _check_index(dataframe, index)

    903         # 用户指定的索引在数据框中存在但不唯一

--> 904         raise IndexError('索引列必须是唯一的')

    905

    906



IndexError: 索引列必须是唯一的

要解决上述错误,您可以采取以下解决方案之一:

解决方案#1 - 您可以在数据框上创建唯一索引。

[73]:
product_df = pd.DataFrame({"id": [1, 2, 3, 4, 5], "rating": [3.5, 4.0, 4.5, 1.5, 5.0]})
product_df

[73]:
id rating
0 1 3.5
1 2 4.0
2 3 4.5
3 4 1.5
4 5 5.0

注意我们现在有一个名为id的唯一索引列。

[74]:
es = es.add_dataframe(dataframe_name="products", dataframe=product_df, index="id")
es

[74]:
Entityset: transactions
  DataFrames:
    transactions [Rows: 500, Columns: 6]
    products [Rows: 5, Columns: 2]
    sessions [Rows: 35, Columns: 5]
    customers [Rows: 5, Columns: 5]
  Relationships:
    transactions.product_id -> products.product_id
    transactions.session_id -> sessions.session_id
    sessions.customer_id -> customers.customer_id

如上所示,我们现在可以通过在DataFrame中创建一个唯一索引来为我们的EntitySet创建DataFrame,而不会出现错误。

解决方案#2 - 在调用``add_dataframe``时将``make_index``设置为True,以在该数据上创建新索引

  • make_index通过查看行在所有其他行中的位置来为每一行创建一个唯一索引。

[75]:
product_df = pd.DataFrame({"id": [1, 2, 3, 4, 4], "rating": [3.5, 4.0, 4.5, 1.5, 5.0]})

es = ft.EntitySet(id="product_data")
es = es.add_dataframe(
    dataframe_name="products", dataframe=product_df, index="product_id", make_index=True
)

es["products"]

[75]:
product_id id rating
0 0 1 3.5
1 1 2 4.0
2 2 3 4.5
3 3 4 1.5
4 4 4 5.0

如上所示,我们在创建EntitySet时,使用了make_index参数而没有出现错误。

为什么会收到以下警告’Using training_window but last_time_index is not set’?#

如果您正在使用训练窗口,并且您的数据框没有设置last_time_index,那么您将收到此警告。

Featuretools中的训练窗口属性限制了在计算特定特征向量时可以使用的过去数据量。

您可以在创建EntitySet之后调用your_entityset.add_last_time_indexes(),自动为所有数据框添加last_time_index。这将消除警告。

[76]:
es = ft.demo.load_mock_customer(return_entityset=True)
es.add_last_time_indexes()

/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(

现在我们可以运行深度优先搜索(DFS),而不会收到警告。

[77]:
cutoff_times = pd.DataFrame()
cutoff_times["customer_id"] = [1, 2, 3, 1]
cutoff_times["time"] = pd.to_datetime(
    ["2014-1-1 04:00", "2014-1-1 05:00", "2014-1-1 06:00", "2014-1-1 08:00"]
)
cutoff_times["label"] = [True, True, False, True]

feature_matrix, feature_defs = ft.dfs(
    entityset=es,
    target_dataframe_name="customers",
    cutoff_time=cutoff_times,
    cutoff_time_in_index=True,
    training_window="1 hour",
)

/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x1112a5260> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x1112a5120> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x1112a4a40> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x1112a5c60> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x1112a5b20> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)

last_time_index vs. time_index#

  • time_index 是实例首次被知晓的时间。

  • last_time_index 是实例最后一次出现的时间。

  • 举个例子,一个顾客的会话中可能有多笔交易,这些交易可能发生在不同的时间点。如果我们想要计算用户在给定时间段内的会话次数,通常我们希望计算在训练窗口期间有任何交易的所有会话次数。为了实现这一点,我们不仅需要知道会话何时开始(time_index),还需要知道会话何时结束(last_time_index)。数据框中存储实例在数据中出现的最后时间作为last_time_index

  • 一旦设置了last_time_index,Featuretools 将检查最后时间索引是否在训练窗口的开始之后。这个检查,结合截止时间,允许 DFS 发现哪些数据与给定的训练窗口相关。

为什么在Google Colab上使用Featuretools会出现错误?#

默认情况下,Google Colab安装的是Featuretools 0.4.1版本。如果您在使用较旧版本的Featuretools时遇到问题,可能会导致无法按照我们最新的指南或文档进行操作。因此,我们建议您在Google Colab的笔记本中执行以下操作,将Featuretools升级到最新版本:

!pip install -U featuretools

您可能需要通过执行 Runtime -> Restart Runtime 来重新启动运行时。

您可以通过以下方式检查最新的Featuretools版本:

import featuretools as ft

print(ft.__version__)

您应该看到的版本号大于 0.4.1