特征选择#
Featuretools提供了一种功能,使用户能够删除在构建有效的机器学习模型中不太可能有用的特征。在特征矩阵中减少特征的数量既可以产生更好的模型结果,也可以减少预测过程中涉及的计算成本。Featuretools使用户能够对深度特征合成的结果执行特征选择,具体有三个函数: - ft.selection.remove_highly_null_features
- ft.selection.remove_single_value_features
- ft.selection.remove_highly_correlated_features
我们将详细描述这三个函数,但首先我们必须创建一个实体集,以便我们可以运行ft.dfs
。
[1]:
import pandas as pd
import featuretools as ft
from featuretools.demo.flight import load_flight
from featuretools.selection import (
remove_highly_correlated_features,
remove_highly_null_features,
remove_single_value_features,
)
es = load_flight(nrows=50)
es
2024-10-11 14:49:42,528 featuretools - WARNING While loading primitives via "premium_primitives" entry point, ignored primitive "DiversityScore" from "premium_primitives.diversity_score" because a primitive with that name already exists in "nlp_primitives.diversity_score"
2024-10-11 14:49:42,528 featuretools - WARNING While loading primitives via "premium_primitives" entry point, ignored primitive "LSA" from "premium_primitives.lsa" because a primitive with that name already exists in "nlp_primitives.lsa"
2024-10-11 14:49:42,528 featuretools - WARNING While loading primitives via "premium_primitives" entry point, ignored primitive "MeanCharactersPerSentence" from "premium_primitives.mean_characters_per_sentence" because a primitive with that name already exists in "nlp_primitives.mean_characters_per_sentence"
2024-10-11 14:49:42,529 featuretools - WARNING While loading primitives via "premium_primitives" entry point, ignored primitive "NumberOfSentences" from "premium_primitives.number_of_sentences" because a primitive with that name already exists in "nlp_primitives.number_of_sentences"
2024-10-11 14:49:42,529 featuretools - WARNING While loading primitives via "premium_primitives" entry point, ignored primitive "PartOfSpeechCount" from "premium_primitives.part_of_speech_count" because a primitive with that name already exists in "nlp_primitives.part_of_speech_count"
2024-10-11 14:49:42,529 featuretools - WARNING While loading primitives via "premium_primitives" entry point, ignored primitive "PolarityScore" from "premium_primitives.polarity_score" because a primitive with that name already exists in "nlp_primitives.polarity_score"
2024-10-11 14:49:42,529 featuretools - WARNING While loading primitives via "premium_primitives" entry point, ignored primitive "StopwordCount" from "premium_primitives.stopword_count" because a primitive with that name already exists in "nlp_primitives.stopword_count"
2024-10-11 14:49:42,543 featuretools - WARNING Featuretools failed to load plugin tsfresh from library featuretools_tsfresh_primitives.__init__. For a full stack trace, set logging to debug.
Downloading data ...
/Users/code/fin_tool/github/featuretools/featuretools/demo/flight.py:288: PerformanceWarning: Adding/subtracting object-dtype array to TimedeltaArray not vectorized.
clean_data.loc[:, "dep_time"] = clean_data["scheduled_dep_time"] + pd.to_timedelta(
/Users/code/fin_tool/github/featuretools/featuretools/demo/flight.py:293: PerformanceWarning: Adding/subtracting object-dtype array to TimedeltaArray not vectorized.
clean_data.loc[:, "arr_time"] = clean_data["dep_time"] + pd.to_timedelta(
/Users/code/fin_tool/github/featuretools/featuretools/demo/flight.py:299: PerformanceWarning: Adding/subtracting object-dtype array to TimedeltaArray not vectorized.
clean_data["scheduled_dep_time"] + clean_data["scheduled_elapsed_time"]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/logical_types.py:897: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
series = series.replace(ww.config.get_option("nan_values"), np.nan)
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/logical_types.py:897: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
series = series.replace(ww.config.get_option("nan_values"), np.nan)
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/logical_types.py:897: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
series = series.replace(ww.config.get_option("nan_values"), np.nan)
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/logical_types.py:897: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
series = series.replace(ww.config.get_option("nan_values"), np.nan)
[1]:
Entityset: Flight Data
DataFrames:
trip_logs [Rows: 50, Columns: 21]
flights [Rows: 6, Columns: 9]
airlines [Rows: 1, Columns: 1]
airports [Rows: 4, Columns: 3]
Relationships:
trip_logs.flight_id -> flights.flight_id
flights.carrier -> airlines.carrier
flights.dest -> airports.dest
移除高度缺失的特征#
我们可能有一个数据集,其中的列有许多空值。深度特征合成可能会基于这些空列构建特征,从而创建更多高度缺失的特征。在这种情况下,我们可能希望移除任何空值超过一定阈值的特征。下面是我们的特征矩阵,展示了这样一种情况:
[2]:
fm, features = ft.dfs(
entityset=es,
target_dataframe_name="trip_logs",
cutoff_time=pd.DataFrame(
{
"trip_log_id": [30, 1, 2, 3, 4],
"time": pd.to_datetime(["2016-09-22 00:00:00"] * 5),
}
),
trans_primitives=[],
agg_primitives=[],
max_depth=2,
)
fm
/Users/code/fin_tool/github/featuretools/featuretools/entityset/entityset.py:1403: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'nan' has dtype incompatible with bool, please explicitly cast to a compatible dtype first.
df.loc[mask, columns] = np.nan
/Users/code/fin_tool/github/featuretools/featuretools/entityset/entityset.py:1403: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'nan' has dtype incompatible with bool, please explicitly cast to a compatible dtype first.
df.loc[mask, columns] = np.nan
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:128: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:128: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:128: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:128: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:128: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:128: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:128: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:128: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:128: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:128: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:128: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/logical_types.py:897: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
series = series.replace(ww.config.get_option("nan_values"), np.nan)
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/logical_types.py:897: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
series = series.replace(ww.config.get_option("nan_values"), np.nan)
[2]:
flight_id | dep_delay | taxi_out | taxi_in | arr_delay | diverted | air_time | distance | carrier_delay | weather_delay | national_airspace_delay | security_delay | late_aircraft_delay | canceled | flights.origin | flights.origin_city | flights.origin_state | flights.dest | flights.distance_group | flights.carrier | flights.flight_num | flights.airports.dest_city | flights.airports.dest_state | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
trip_log_id | |||||||||||||||||||||||
30 | AA-494:RSW->CLT | NaN | NaN | NaN | NaN | <NA> | NaN | 600.0 | NaN | NaN | NaN | NaN | NaN | <NA> | RSW | Fort Myers, FL | FL | CLT | 3 | AA | 494 | Charlotte, NC | NC |
1 | AA-494:CLT->PHX | NaN | NaN | NaN | NaN | <NA> | NaN | 1773.0 | NaN | NaN | NaN | NaN | NaN | <NA> | CLT | Charlotte, NC | NC | PHX | 8 | AA | 494 | Phoenix, AZ | AZ |
2 | AA-494:CLT->PHX | NaN | NaN | NaN | NaN | <NA> | NaN | 1773.0 | NaN | NaN | NaN | NaN | NaN | <NA> | CLT | Charlotte, NC | NC | PHX | 8 | AA | 494 | Phoenix, AZ | AZ |
3 | AA-494:CLT->PHX | NaN | NaN | NaN | NaN | <NA> | NaN | 1773.0 | NaN | NaN | NaN | NaN | NaN | <NA> | CLT | Charlotte, NC | NC | PHX | 8 | AA | 494 | Phoenix, AZ | AZ |
4 | NaN | NaN | NaN | NaN | NaN | <NA> | NaN | NaN | NaN | NaN | NaN | NaN | NaN | <NA> | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
我们查看上面的特征矩阵,并决定移除缺失值较高的特征。
[3]:
ft.selection.remove_highly_null_features(fm)
[3]:
flight_id | distance | flights.origin | flights.origin_city | flights.origin_state | flights.dest | flights.distance_group | flights.carrier | flights.flight_num | flights.airports.dest_city | flights.airports.dest_state | |
---|---|---|---|---|---|---|---|---|---|---|---|
trip_log_id | |||||||||||
30 | AA-494:RSW->CLT | 600.0 | RSW | Fort Myers, FL | FL | CLT | 3 | AA | 494 | Charlotte, NC | NC |
1 | AA-494:CLT->PHX | 1773.0 | CLT | Charlotte, NC | NC | PHX | 8 | AA | 494 | Phoenix, AZ | AZ |
2 | AA-494:CLT->PHX | 1773.0 | CLT | Charlotte, NC | NC | PHX | 8 | AA | 494 | Phoenix, AZ | AZ |
3 | AA-494:CLT->PHX | 1773.0 | CLT | Charlotte, NC | NC | PHX | 8 | AA | 494 | Phoenix, AZ | AZ |
4 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
请注意,调用remove_highly_null_features
并没有移除每一个包含空值的特征。默认情况下,我们只会移除在计算的特征矩阵中空值百分比超过95%的特征。如果我们想要降低这个阈值,我们可以自己设置pct_null_threshold
参数。
[4]:
remove_highly_null_features(fm, pct_null_threshold=0.2)
[4]:
trip_log_id |
---|
30 |
1 |
2 |
3 |
4 |
移除单值特征#
另一种情况是我们计算的特征没有任何方差。在这种情况下,我们可能希望移除这些无趣的特征。为此,我们使用 remove_single_value_features
。让我们看看当我们移除下面特征矩阵中的单值特征时会发生什么。
[5]:
fm
[5]:
flight_id | dep_delay | taxi_out | taxi_in | arr_delay | diverted | air_time | distance | carrier_delay | weather_delay | national_airspace_delay | security_delay | late_aircraft_delay | canceled | flights.origin | flights.origin_city | flights.origin_state | flights.dest | flights.distance_group | flights.carrier | flights.flight_num | flights.airports.dest_city | flights.airports.dest_state | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
trip_log_id | |||||||||||||||||||||||
30 | AA-494:RSW->CLT | NaN | NaN | NaN | NaN | <NA> | NaN | 600.0 | NaN | NaN | NaN | NaN | NaN | <NA> | RSW | Fort Myers, FL | FL | CLT | 3 | AA | 494 | Charlotte, NC | NC |
1 | AA-494:CLT->PHX | NaN | NaN | NaN | NaN | <NA> | NaN | 1773.0 | NaN | NaN | NaN | NaN | NaN | <NA> | CLT | Charlotte, NC | NC | PHX | 8 | AA | 494 | Phoenix, AZ | AZ |
2 | AA-494:CLT->PHX | NaN | NaN | NaN | NaN | <NA> | NaN | 1773.0 | NaN | NaN | NaN | NaN | NaN | <NA> | CLT | Charlotte, NC | NC | PHX | 8 | AA | 494 | Phoenix, AZ | AZ |
3 | AA-494:CLT->PHX | NaN | NaN | NaN | NaN | <NA> | NaN | 1773.0 | NaN | NaN | NaN | NaN | NaN | <NA> | CLT | Charlotte, NC | NC | PHX | 8 | AA | 494 | Phoenix, AZ | AZ |
4 | NaN | NaN | NaN | NaN | NaN | <NA> | NaN | NaN | NaN | NaN | NaN | NaN | NaN | <NA> | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
Note
A list of feature definitions such as those created by dfs can be provided to the feature selection functions. Doing this will change the outputs to include an updated list of feature definitions.
[6]:
new_fm, new_features = remove_single_value_features(fm, features=features)
new_fm
[6]:
flight_id | distance | flights.origin | flights.origin_city | flights.origin_state | flights.dest | flights.distance_group | flights.airports.dest_city | flights.airports.dest_state | |
---|---|---|---|---|---|---|---|---|---|
trip_log_id | |||||||||
30 | AA-494:RSW->CLT | 600.0 | RSW | Fort Myers, FL | FL | CLT | 3 | Charlotte, NC | NC |
1 | AA-494:CLT->PHX | 1773.0 | CLT | Charlotte, NC | NC | PHX | 8 | Phoenix, AZ | AZ |
2 | AA-494:CLT->PHX | 1773.0 | CLT | Charlotte, NC | NC | PHX | 8 | Phoenix, AZ | AZ |
3 | AA-494:CLT->PHX | 1773.0 | CLT | Charlotte, NC | NC | PHX | 8 | Phoenix, AZ | AZ |
4 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
现在我们已经为更新后的特征矩阵定义了特征,我们可以看到被移除的特征有:
[7]:
set(features) - set(new_features)
[7]:
{<Feature: air_time>,
<Feature: arr_delay>,
<Feature: canceled>,
<Feature: carrier_delay>,
<Feature: dep_delay>,
<Feature: diverted>,
<Feature: flights.carrier>,
<Feature: flights.flight_num>,
<Feature: late_aircraft_delay>,
<Feature: national_airspace_delay>,
<Feature: security_delay>,
<Feature: taxi_in>,
<Feature: taxi_out>,
<Feature: weather_delay>}
使用上面所示的函数时,当计算特征的唯一值时,空值不会被考虑。如果我们想将NaN
视为一个单独的值,我们可以将count_nan_as_value
设置为True
,这样我们将在矩阵中看到flights.carrier
和flights.flight_num
。
[8]:
new_fm, new_features = remove_single_value_features(
fm, features=features, count_nan_as_value=True
)
new_fm
[8]:
flight_id | distance | flights.origin | flights.origin_city | flights.origin_state | flights.dest | flights.distance_group | flights.carrier | flights.flight_num | flights.airports.dest_city | flights.airports.dest_state | |
---|---|---|---|---|---|---|---|---|---|---|---|
trip_log_id | |||||||||||
30 | AA-494:RSW->CLT | 600.0 | RSW | Fort Myers, FL | FL | CLT | 3 | AA | 494 | Charlotte, NC | NC |
1 | AA-494:CLT->PHX | 1773.0 | CLT | Charlotte, NC | NC | PHX | 8 | AA | 494 | Phoenix, AZ | AZ |
2 | AA-494:CLT->PHX | 1773.0 | CLT | Charlotte, NC | NC | PHX | 8 | AA | 494 | Phoenix, AZ | AZ |
3 | AA-494:CLT->PHX | 1773.0 | CLT | Charlotte, NC | NC | PHX | 8 | AA | 494 | Phoenix, AZ | AZ |
4 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
被移除的特性包括:
[9]:
set(features) - set(new_features)
[9]:
{<Feature: air_time>,
<Feature: arr_delay>,
<Feature: canceled>,
<Feature: carrier_delay>,
<Feature: dep_delay>,
<Feature: diverted>,
<Feature: late_aircraft_delay>,
<Feature: national_airspace_delay>,
<Feature: security_delay>,
<Feature: taxi_in>,
<Feature: taxi_out>,
<Feature: weather_delay>}
删除高度相关的特征#
我们拥有的最后一个特征选择函数允许我们通过考虑计算特征之间的相关性来删除可能对我们尝试构建的模型多余的特征。当确定两个特征高度相关时,我们会删除两者中较复杂的那个。例如,假设我们有两个特征:col
和 -(col)
。我们可以看到 -(col)
只是 col
的否定,因此我们可以猜想这些特征会高度相关。-(col)
应用了 Negate
原语,因此它比恒等特征 col
更复杂。因此,如果我们只想保留 col
和 -(col)
中的一个,我们应该保留恒等特征。对于在复杂性上没有明显差异的特征,我们会丢弃出现在特征矩阵中较晚的特征。让我们在我们的数据上尝试一下:
[10]:
fm, features = ft.dfs(
entityset=es,
target_dataframe_name="trip_logs",
trans_primitives=["negate"],
agg_primitives=[],
max_depth=3,
)
fm.head()
/Users/code/fin_tool/github/featuretools/featuretools/entityset/entityset.py:1403: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'nan' has dtype incompatible with bool, please explicitly cast to a compatible dtype first.
df.loc[mask, columns] = np.nan
/Users/code/fin_tool/github/featuretools/featuretools/entityset/entityset.py:1403: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'nan' has dtype incompatible with bool, please explicitly cast to a compatible dtype first.
df.loc[mask, columns] = np.nan
/Users/code/fin_tool/github/featuretools/featuretools/entityset/entityset.py:1403: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'nan' has dtype incompatible with bool, please explicitly cast to a compatible dtype first.
df.loc[mask, columns] = np.nan
/Users/code/fin_tool/github/featuretools/featuretools/entityset/entityset.py:1403: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'nan' has dtype incompatible with bool, please explicitly cast to a compatible dtype first.
df.loc[mask, columns] = np.nan
[10]:
flight_id | dep_delay | taxi_out | taxi_in | arr_delay | diverted | air_time | distance | carrier_delay | weather_delay | national_airspace_delay | security_delay | late_aircraft_delay | canceled | -(air_time) | -(arr_delay) | -(carrier_delay) | -(dep_delay) | -(distance) | -(late_aircraft_delay) | -(national_airspace_delay) | -(security_delay) | -(taxi_in) | -(taxi_out) | -(weather_delay) | flights.origin | flights.origin_city | flights.origin_state | flights.dest | flights.distance_group | flights.carrier | flights.flight_num | flights.airports.dest_city | flights.airports.dest_state | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
trip_log_id | ||||||||||||||||||||||||||||||||||
30 | AA-494:RSW->CLT | -11.0 | 12.0 | 10.0 | -12.0 | False | 88.0 | 600.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -88.0 | 12.0 | -0.0 | 11.0 | -600.0 | -0.0 | -0.0 | -0.0 | -10.0 | -12.0 | -0.0 | RSW | Fort Myers, FL | FL | CLT | 3 | AA | 494 | Charlotte, NC | NC |
38 | AA-495:ATL->PHX | -6.0 | 28.0 | 5.0 | 1.0 | False | 224.0 | 1587.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -224.0 | -1.0 | -0.0 | 6.0 | -1587.0 | -0.0 | -0.0 | -0.0 | -5.0 | -28.0 | -0.0 | ATL | Atlanta, GA | GA | PHX | 7 | AA | 495 | Phoenix, AZ | AZ |
46 | AA-495:CLT->ATL | -2.0 | 18.0 | 8.0 | -3.0 | False | 50.0 | 226.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -50.0 | 3.0 | -0.0 | 2.0 | -226.0 | -0.0 | -0.0 | -0.0 | -8.0 | -18.0 | -0.0 | CLT | Charlotte, NC | NC | ATL | 1 | AA | 495 | Atlanta, GA | GA |
31 | AA-494:RSW->CLT | 0.0 | 11.0 | 10.0 | -3.0 | False | 87.0 | 600.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -87.0 | 3.0 | -0.0 | -0.0 | -600.0 | -0.0 | -0.0 | -0.0 | -10.0 | -11.0 | -0.0 | RSW | Fort Myers, FL | FL | CLT | 3 | AA | 494 | Charlotte, NC | NC |
39 | AA-495:ATL->PHX | -4.0 | 26.0 | 3.0 | 10.0 | False | 235.0 | 1587.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -235.0 | -10.0 | -0.0 | 4.0 | -1587.0 | -0.0 | -0.0 | -0.0 | -3.0 | -26.0 | -0.0 | ATL | Atlanta, GA | GA | PHX | 7 | AA | 495 | Phoenix, AZ | AZ |
请注意,我们在所有特征及其否定之间有一些非常明显的相关性。现在,使用remove_highly_correlated_features
函数,我们的默认相关性阈值为95%,我们将删除所有明显相关的特征,只保留较不复杂的特征。
[11]:
new_fm, new_features = remove_highly_correlated_features(fm, features=features)
new_fm.head()
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
[11]:
flight_id | dep_delay | taxi_out | taxi_in | arr_delay | diverted | air_time | carrier_delay | weather_delay | national_airspace_delay | security_delay | late_aircraft_delay | canceled | -(security_delay) | -(weather_delay) | flights.origin | flights.origin_city | flights.origin_state | flights.dest | flights.distance_group | flights.carrier | flights.flight_num | flights.airports.dest_city | flights.airports.dest_state | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
trip_log_id | ||||||||||||||||||||||||
30 | AA-494:RSW->CLT | -11.0 | 12.0 | 10.0 | -12.0 | False | 88.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -0.0 | -0.0 | RSW | Fort Myers, FL | FL | CLT | 3 | AA | 494 | Charlotte, NC | NC |
38 | AA-495:ATL->PHX | -6.0 | 28.0 | 5.0 | 1.0 | False | 224.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -0.0 | -0.0 | ATL | Atlanta, GA | GA | PHX | 7 | AA | 495 | Phoenix, AZ | AZ |
46 | AA-495:CLT->ATL | -2.0 | 18.0 | 8.0 | -3.0 | False | 50.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -0.0 | -0.0 | CLT | Charlotte, NC | NC | ATL | 1 | AA | 495 | Atlanta, GA | GA |
31 | AA-494:RSW->CLT | 0.0 | 11.0 | 10.0 | -3.0 | False | 87.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -0.0 | -0.0 | RSW | Fort Myers, FL | FL | CLT | 3 | AA | 494 | Charlotte, NC | NC |
39 | AA-495:ATL->PHX | -4.0 | 26.0 | 3.0 | 10.0 | False | 235.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -0.0 | -0.0 | ATL | Atlanta, GA | GA | PHX | 7 | AA | 495 | Phoenix, AZ | AZ |
已删除的特征包括:
[12]:
set(features) - set(new_features)
[12]:
{<Feature: -(carrier_delay)>,
<Feature: -(arr_delay)>,
<Feature: distance>,
<Feature: -(taxi_in)>,
<Feature: -(distance)>,
<Feature: -(national_airspace_delay)>,
<Feature: -(late_aircraft_delay)>,
<Feature: -(dep_delay)>,
<Feature: -(air_time)>,
<Feature: -(taxi_out)>}
更改相关性阈值#
我们可以通过使用pct_corr_threshold
参数来降低删除相关特征的阈值,以便更加严格。
[13]:
new_fm, new_features = remove_highly_correlated_features(
fm, features=features, pct_corr_threshold=0.9
)
new_fm.head()
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
[13]:
flight_id | dep_delay | taxi_out | taxi_in | arr_delay | diverted | air_time | carrier_delay | weather_delay | security_delay | late_aircraft_delay | canceled | -(security_delay) | -(weather_delay) | flights.origin | flights.origin_city | flights.origin_state | flights.dest | flights.distance_group | flights.carrier | flights.flight_num | flights.airports.dest_city | flights.airports.dest_state | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
trip_log_id | |||||||||||||||||||||||
30 | AA-494:RSW->CLT | -11.0 | 12.0 | 10.0 | -12.0 | False | 88.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -0.0 | -0.0 | RSW | Fort Myers, FL | FL | CLT | 3 | AA | 494 | Charlotte, NC | NC |
38 | AA-495:ATL->PHX | -6.0 | 28.0 | 5.0 | 1.0 | False | 224.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -0.0 | -0.0 | ATL | Atlanta, GA | GA | PHX | 7 | AA | 495 | Phoenix, AZ | AZ |
46 | AA-495:CLT->ATL | -2.0 | 18.0 | 8.0 | -3.0 | False | 50.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -0.0 | -0.0 | CLT | Charlotte, NC | NC | ATL | 1 | AA | 495 | Atlanta, GA | GA |
31 | AA-494:RSW->CLT | 0.0 | 11.0 | 10.0 | -3.0 | False | 87.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -0.0 | -0.0 | RSW | Fort Myers, FL | FL | CLT | 3 | AA | 494 | Charlotte, NC | NC |
39 | AA-495:ATL->PHX | -4.0 | 26.0 | 3.0 | 10.0 | False | 235.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -0.0 | -0.0 | ATL | Atlanta, GA | GA | PHX | 7 | AA | 495 | Phoenix, AZ | AZ |
已删除的特性包括:#
[14]:
set(features) - set(new_features)
[14]:
{<Feature: -(carrier_delay)>,
<Feature: -(arr_delay)>,
<Feature: -(taxi_in)>,
<Feature: distance>,
<Feature: -(distance)>,
<Feature: -(national_airspace_delay)>,
<Feature: -(late_aircraft_delay)>,
<Feature: -(dep_delay)>,
<Feature: -(air_time)>,
<Feature: national_airspace_delay>,
<Feature: -(taxi_out)>}
如果我们只想检查特征的一个子集,我们可以将features_to_check
设置为我们想要检查相关性的特征列表,那么列表之外的特征将不会被移除。
[15]:
new_fm, new_features = remove_highly_correlated_features(
fm,
features=features,
features_to_check=["air_time", "distance", "flights.distance_group"],
)
new_fm.head()
[15]:
flight_id | dep_delay | taxi_out | taxi_in | arr_delay | diverted | air_time | carrier_delay | weather_delay | national_airspace_delay | security_delay | late_aircraft_delay | canceled | -(air_time) | -(arr_delay) | -(carrier_delay) | -(dep_delay) | -(distance) | -(late_aircraft_delay) | -(national_airspace_delay) | -(security_delay) | -(taxi_in) | -(taxi_out) | -(weather_delay) | flights.origin | flights.origin_city | flights.origin_state | flights.dest | flights.distance_group | flights.carrier | flights.flight_num | flights.airports.dest_city | flights.airports.dest_state | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
trip_log_id | |||||||||||||||||||||||||||||||||
30 | AA-494:RSW->CLT | -11.0 | 12.0 | 10.0 | -12.0 | False | 88.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -88.0 | 12.0 | -0.0 | 11.0 | -600.0 | -0.0 | -0.0 | -0.0 | -10.0 | -12.0 | -0.0 | RSW | Fort Myers, FL | FL | CLT | 3 | AA | 494 | Charlotte, NC | NC |
38 | AA-495:ATL->PHX | -6.0 | 28.0 | 5.0 | 1.0 | False | 224.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -224.0 | -1.0 | -0.0 | 6.0 | -1587.0 | -0.0 | -0.0 | -0.0 | -5.0 | -28.0 | -0.0 | ATL | Atlanta, GA | GA | PHX | 7 | AA | 495 | Phoenix, AZ | AZ |
46 | AA-495:CLT->ATL | -2.0 | 18.0 | 8.0 | -3.0 | False | 50.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -50.0 | 3.0 | -0.0 | 2.0 | -226.0 | -0.0 | -0.0 | -0.0 | -8.0 | -18.0 | -0.0 | CLT | Charlotte, NC | NC | ATL | 1 | AA | 495 | Atlanta, GA | GA |
31 | AA-494:RSW->CLT | 0.0 | 11.0 | 10.0 | -3.0 | False | 87.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -87.0 | 3.0 | -0.0 | -0.0 | -600.0 | -0.0 | -0.0 | -0.0 | -10.0 | -11.0 | -0.0 | RSW | Fort Myers, FL | FL | CLT | 3 | AA | 494 | Charlotte, NC | NC |
39 | AA-495:ATL->PHX | -4.0 | 26.0 | 3.0 | 10.0 | False | 235.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -235.0 | -10.0 | -0.0 | 4.0 | -1587.0 | -0.0 | -0.0 | -0.0 | -3.0 | -26.0 | -0.0 | ATL | Atlanta, GA | GA | PHX | 7 | AA | 495 | Phoenix, AZ | AZ |
已删除的特性包括:
[16]:
set(features) - set(new_features)
[16]:
{<Feature: distance>}
为了保护特定特征不被从特征矩阵中删除,我们可以包含一个features_to_keep
列表,这些特征将不会被删除。
[17]:
new_fm, new_features = remove_highly_correlated_features(
fm,
features=features,
features_to_keep=["air_time", "distance", "flights.distance_group"],
)
new_fm.head()
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
[17]:
flight_id | dep_delay | taxi_out | taxi_in | arr_delay | diverted | air_time | distance | carrier_delay | weather_delay | national_airspace_delay | security_delay | late_aircraft_delay | canceled | -(security_delay) | -(weather_delay) | flights.origin | flights.origin_city | flights.origin_state | flights.dest | flights.distance_group | flights.carrier | flights.flight_num | flights.airports.dest_city | flights.airports.dest_state | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
trip_log_id | |||||||||||||||||||||||||
30 | AA-494:RSW->CLT | -11.0 | 12.0 | 10.0 | -12.0 | False | 88.0 | 600.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -0.0 | -0.0 | RSW | Fort Myers, FL | FL | CLT | 3 | AA | 494 | Charlotte, NC | NC |
38 | AA-495:ATL->PHX | -6.0 | 28.0 | 5.0 | 1.0 | False | 224.0 | 1587.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -0.0 | -0.0 | ATL | Atlanta, GA | GA | PHX | 7 | AA | 495 | Phoenix, AZ | AZ |
46 | AA-495:CLT->ATL | -2.0 | 18.0 | 8.0 | -3.0 | False | 50.0 | 226.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -0.0 | -0.0 | CLT | Charlotte, NC | NC | ATL | 1 | AA | 495 | Atlanta, GA | GA |
31 | AA-494:RSW->CLT | 0.0 | 11.0 | 10.0 | -3.0 | False | 87.0 | 600.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -0.0 | -0.0 | RSW | Fort Myers, FL | FL | CLT | 3 | AA | 494 | Charlotte, NC | NC |
39 | AA-495:ATL->PHX | -4.0 | 26.0 | 3.0 | 10.0 | False | 235.0 | 1587.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -0.0 | -0.0 | ATL | Atlanta, GA | GA | PHX | 7 | AA | 495 | Phoenix, AZ | AZ |
已删除的特性包括:
[18]:
set(features) - set(new_features)
[18]:
{<Feature: -(carrier_delay)>,
<Feature: -(arr_delay)>,
<Feature: -(taxi_in)>,
<Feature: -(distance)>,
<Feature: -(national_airspace_delay)>,
<Feature: -(late_aircraft_delay)>,
<Feature: -(dep_delay)>,
<Feature: -(air_time)>,
<Feature: -(taxi_out)>}