Featuretools提供了一种功能,使用户能够删除在构建有效的机器学习模型中不太可能有用的特征。在特征矩阵中减少特征的数量既可以产生更好的模型结果,也可以减少预测过程中涉及的计算成本。Featuretools使用户能够对深度特征合成的结果执行特征选择,具体有三个函数: - ft.selection.remove_highly_null_features
- ft.selection.remove_single_value_features
- ft.selection.remove_highly_correlated_features
import pandas as pd
import featuretools as ft
from featuretools.demo.flight import load_flight
from featuretools.selection import (
es = load_flight(nrows=50)
2024-10-11 14:49:42,528 featuretools - WARNING While loading primitives via "premium_primitives" entry point, ignored primitive "DiversityScore" from "premium_primitives.diversity_score" because a primitive with that name already exists in "nlp_primitives.diversity_score"
2024-10-11 14:49:42,528 featuretools - WARNING While loading primitives via "premium_primitives" entry point, ignored primitive "LSA" from "premium_primitives.lsa" because a primitive with that name already exists in "nlp_primitives.lsa"
2024-10-11 14:49:42,528 featuretools - WARNING While loading primitives via "premium_primitives" entry point, ignored primitive "MeanCharactersPerSentence" from "premium_primitives.mean_characters_per_sentence" because a primitive with that name already exists in "nlp_primitives.mean_characters_per_sentence"
2024-10-11 14:49:42,529 featuretools - WARNING While loading primitives via "premium_primitives" entry point, ignored primitive "NumberOfSentences" from "premium_primitives.number_of_sentences" because a primitive with that name already exists in "nlp_primitives.number_of_sentences"
2024-10-11 14:49:42,529 featuretools - WARNING While loading primitives via "premium_primitives" entry point, ignored primitive "PartOfSpeechCount" from "premium_primitives.part_of_speech_count" because a primitive with that name already exists in "nlp_primitives.part_of_speech_count"
2024-10-11 14:49:42,529 featuretools - WARNING While loading primitives via "premium_primitives" entry point, ignored primitive "PolarityScore" from "premium_primitives.polarity_score" because a primitive with that name already exists in "nlp_primitives.polarity_score"
2024-10-11 14:49:42,529 featuretools - WARNING While loading primitives via "premium_primitives" entry point, ignored primitive "StopwordCount" from "premium_primitives.stopword_count" because a primitive with that name already exists in "nlp_primitives.stopword_count"
2024-10-11 14:49:42,543 featuretools - WARNING Featuretools failed to load plugin tsfresh from library featuretools_tsfresh_primitives.__init__. For a full stack trace, set logging to debug.
Downloading data ...
/Users/code/fin_tool/github/featuretools/featuretools/demo/flight.py:288: PerformanceWarning: Adding/subtracting object-dtype array to TimedeltaArray not vectorized.
clean_data.loc[:, "dep_time"] = clean_data["scheduled_dep_time"] + pd.to_timedelta(
/Users/code/fin_tool/github/featuretools/featuretools/demo/flight.py:293: PerformanceWarning: Adding/subtracting object-dtype array to TimedeltaArray not vectorized.
clean_data.loc[:, "arr_time"] = clean_data["dep_time"] + pd.to_timedelta(
/Users/code/fin_tool/github/featuretools/featuretools/demo/flight.py:299: PerformanceWarning: Adding/subtracting object-dtype array to TimedeltaArray not vectorized.
clean_data["scheduled_dep_time"] + clean_data["scheduled_elapsed_time"]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/logical_types.py:897: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
series = series.replace(ww.config.get_option("nan_values"), np.nan)
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/logical_types.py:897: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
series = series.replace(ww.config.get_option("nan_values"), np.nan)
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/logical_types.py:897: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
series = series.replace(ww.config.get_option("nan_values"), np.nan)
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/logical_types.py:897: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
series = series.replace(ww.config.get_option("nan_values"), np.nan)
Entityset: Flight Data
trip_logs [Rows: 50, Columns: 21]
flights [Rows: 6, Columns: 9]
airlines [Rows: 1, Columns: 1]
airports [Rows: 4, Columns: 3]
trip_logs.flight_id -> flights.flight_id
flights.carrier -> airlines.carrier
flights.dest -> airports.dest
fm, features = ft.dfs(
"trip_log_id": [30, 1, 2, 3, 4],
"time": pd.to_datetime(["2016-09-22 00:00:00"] * 5),
/Users/code/fin_tool/github/featuretools/featuretools/entityset/entityset.py:1403: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'nan' has dtype incompatible with bool, please explicitly cast to a compatible dtype first.
df.loc[mask, columns] = np.nan
/Users/code/fin_tool/github/featuretools/featuretools/entityset/entityset.py:1403: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'nan' has dtype incompatible with bool, please explicitly cast to a compatible dtype first.
df.loc[mask, columns] = np.nan
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:128: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:128: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:128: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:128: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:128: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:128: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:128: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:128: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:128: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:128: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:128: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/logical_types.py:897: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
series = series.replace(ww.config.get_option("nan_values"), np.nan)
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/logical_types.py:897: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
series = series.replace(ww.config.get_option("nan_values"), np.nan)
flight_id | dep_delay | taxi_out | taxi_in | arr_delay | diverted | air_time | distance | carrier_delay | weather_delay | national_airspace_delay | security_delay | late_aircraft_delay | canceled | flights.origin | flights.origin_city | flights.origin_state | flights.dest | flights.distance_group | flights.carrier | flights.flight_num | flights.airports.dest_city | flights.airports.dest_state | |
trip_log_id | |||||||||||||||||||||||
30 | AA-494:RSW->CLT | NaN | NaN | NaN | NaN | <NA> | NaN | 600.0 | NaN | NaN | NaN | NaN | NaN | <NA> | RSW | Fort Myers, FL | FL | CLT | 3 | AA | 494 | Charlotte, NC | NC |
1 | AA-494:CLT->PHX | NaN | NaN | NaN | NaN | <NA> | NaN | 1773.0 | NaN | NaN | NaN | NaN | NaN | <NA> | CLT | Charlotte, NC | NC | PHX | 8 | AA | 494 | Phoenix, AZ | AZ |
2 | AA-494:CLT->PHX | NaN | NaN | NaN | NaN | <NA> | NaN | 1773.0 | NaN | NaN | NaN | NaN | NaN | <NA> | CLT | Charlotte, NC | NC | PHX | 8 | AA | 494 | Phoenix, AZ | AZ |
3 | AA-494:CLT->PHX | NaN | NaN | NaN | NaN | <NA> | NaN | 1773.0 | NaN | NaN | NaN | NaN | NaN | <NA> | CLT | Charlotte, NC | NC | PHX | 8 | AA | 494 | Phoenix, AZ | AZ |
4 | NaN | NaN | NaN | NaN | NaN | <NA> | NaN | NaN | NaN | NaN | NaN | NaN | NaN | <NA> | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
flight_id | distance | flights.origin | flights.origin_city | flights.origin_state | flights.dest | flights.distance_group | flights.carrier | flights.flight_num | flights.airports.dest_city | flights.airports.dest_state | |
trip_log_id | |||||||||||
30 | AA-494:RSW->CLT | 600.0 | RSW | Fort Myers, FL | FL | CLT | 3 | AA | 494 | Charlotte, NC | NC |
1 | AA-494:CLT->PHX | 1773.0 | CLT | Charlotte, NC | NC | PHX | 8 | AA | 494 | Phoenix, AZ | AZ |
2 | AA-494:CLT->PHX | 1773.0 | CLT | Charlotte, NC | NC | PHX | 8 | AA | 494 | Phoenix, AZ | AZ |
3 | AA-494:CLT->PHX | 1773.0 | CLT | Charlotte, NC | NC | PHX | 8 | AA | 494 | Phoenix, AZ | AZ |
4 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
remove_highly_null_features(fm, pct_null_threshold=0.2)
trip_log_id |
30 |
1 |
2 |
3 |
4 |
另一种情况是我们计算的特征没有任何方差。在这种情况下,我们可能希望移除这些无趣的特征。为此,我们使用 remove_single_value_features
flight_id | dep_delay | taxi_out | taxi_in | arr_delay | diverted | air_time | distance | carrier_delay | weather_delay | national_airspace_delay | security_delay | late_aircraft_delay | canceled | flights.origin | flights.origin_city | flights.origin_state | flights.dest | flights.distance_group | flights.carrier | flights.flight_num | flights.airports.dest_city | flights.airports.dest_state | |
trip_log_id | |||||||||||||||||||||||
30 | AA-494:RSW->CLT | NaN | NaN | NaN | NaN | <NA> | NaN | 600.0 | NaN | NaN | NaN | NaN | NaN | <NA> | RSW | Fort Myers, FL | FL | CLT | 3 | AA | 494 | Charlotte, NC | NC |
1 | AA-494:CLT->PHX | NaN | NaN | NaN | NaN | <NA> | NaN | 1773.0 | NaN | NaN | NaN | NaN | NaN | <NA> | CLT | Charlotte, NC | NC | PHX | 8 | AA | 494 | Phoenix, AZ | AZ |
2 | AA-494:CLT->PHX | NaN | NaN | NaN | NaN | <NA> | NaN | 1773.0 | NaN | NaN | NaN | NaN | NaN | <NA> | CLT | Charlotte, NC | NC | PHX | 8 | AA | 494 | Phoenix, AZ | AZ |
3 | AA-494:CLT->PHX | NaN | NaN | NaN | NaN | <NA> | NaN | 1773.0 | NaN | NaN | NaN | NaN | NaN | <NA> | CLT | Charlotte, NC | NC | PHX | 8 | AA | 494 | Phoenix, AZ | AZ |
4 | NaN | NaN | NaN | NaN | NaN | <NA> | NaN | NaN | NaN | NaN | NaN | NaN | NaN | <NA> | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
A list of feature definitions such as those created by dfs can be provided to the feature selection functions. Doing this will change the outputs to include an updated list of feature definitions.
new_fm, new_features = remove_single_value_features(fm, features=features)
flight_id | distance | flights.origin | flights.origin_city | flights.origin_state | flights.dest | flights.distance_group | flights.airports.dest_city | flights.airports.dest_state | |
trip_log_id | |||||||||
30 | AA-494:RSW->CLT | 600.0 | RSW | Fort Myers, FL | FL | CLT | 3 | Charlotte, NC | NC |
1 | AA-494:CLT->PHX | 1773.0 | CLT | Charlotte, NC | NC | PHX | 8 | Phoenix, AZ | AZ |
2 | AA-494:CLT->PHX | 1773.0 | CLT | Charlotte, NC | NC | PHX | 8 | Phoenix, AZ | AZ |
3 | AA-494:CLT->PHX | 1773.0 | CLT | Charlotte, NC | NC | PHX | 8 | Phoenix, AZ | AZ |
4 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
set(features) - set(new_features)
{<Feature: air_time>,
<Feature: arr_delay>,
<Feature: canceled>,
<Feature: carrier_delay>,
<Feature: dep_delay>,
<Feature: diverted>,
<Feature: flights.carrier>,
<Feature: flights.flight_num>,
<Feature: late_aircraft_delay>,
<Feature: national_airspace_delay>,
<Feature: security_delay>,
<Feature: taxi_in>,
<Feature: taxi_out>,
<Feature: weather_delay>}
new_fm, new_features = remove_single_value_features(
fm, features=features, count_nan_as_value=True
flight_id | distance | flights.origin | flights.origin_city | flights.origin_state | flights.dest | flights.distance_group | flights.carrier | flights.flight_num | flights.airports.dest_city | flights.airports.dest_state | |
trip_log_id | |||||||||||
30 | AA-494:RSW->CLT | 600.0 | RSW | Fort Myers, FL | FL | CLT | 3 | AA | 494 | Charlotte, NC | NC |
1 | AA-494:CLT->PHX | 1773.0 | CLT | Charlotte, NC | NC | PHX | 8 | AA | 494 | Phoenix, AZ | AZ |
2 | AA-494:CLT->PHX | 1773.0 | CLT | Charlotte, NC | NC | PHX | 8 | AA | 494 | Phoenix, AZ | AZ |
3 | AA-494:CLT->PHX | 1773.0 | CLT | Charlotte, NC | NC | PHX | 8 | AA | 494 | Phoenix, AZ | AZ |
4 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
set(features) - set(new_features)
{<Feature: air_time>,
<Feature: arr_delay>,
<Feature: canceled>,
<Feature: carrier_delay>,
<Feature: dep_delay>,
<Feature: diverted>,
<Feature: late_aircraft_delay>,
<Feature: national_airspace_delay>,
<Feature: security_delay>,
<Feature: taxi_in>,
<Feature: taxi_out>,
<Feature: weather_delay>}
和 -(col)
。我们可以看到 -(col)
只是 col
应用了 Negate
原语,因此它比恒等特征 col
更复杂。因此,如果我们只想保留 col
和 -(col)
fm, features = ft.dfs(
/Users/code/fin_tool/github/featuretools/featuretools/entityset/entityset.py:1403: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'nan' has dtype incompatible with bool, please explicitly cast to a compatible dtype first.
df.loc[mask, columns] = np.nan
/Users/code/fin_tool/github/featuretools/featuretools/entityset/entityset.py:1403: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'nan' has dtype incompatible with bool, please explicitly cast to a compatible dtype first.
df.loc[mask, columns] = np.nan
/Users/code/fin_tool/github/featuretools/featuretools/entityset/entityset.py:1403: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'nan' has dtype incompatible with bool, please explicitly cast to a compatible dtype first.
df.loc[mask, columns] = np.nan
/Users/code/fin_tool/github/featuretools/featuretools/entityset/entityset.py:1403: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'nan' has dtype incompatible with bool, please explicitly cast to a compatible dtype first.
df.loc[mask, columns] = np.nan
flight_id | dep_delay | taxi_out | taxi_in | arr_delay | diverted | air_time | distance | carrier_delay | weather_delay | national_airspace_delay | security_delay | late_aircraft_delay | canceled | -(air_time) | -(arr_delay) | -(carrier_delay) | -(dep_delay) | -(distance) | -(late_aircraft_delay) | -(national_airspace_delay) | -(security_delay) | -(taxi_in) | -(taxi_out) | -(weather_delay) | flights.origin | flights.origin_city | flights.origin_state | flights.dest | flights.distance_group | flights.carrier | flights.flight_num | flights.airports.dest_city | flights.airports.dest_state | |
trip_log_id | ||||||||||||||||||||||||||||||||||
30 | AA-494:RSW->CLT | -11.0 | 12.0 | 10.0 | -12.0 | False | 88.0 | 600.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -88.0 | 12.0 | -0.0 | 11.0 | -600.0 | -0.0 | -0.0 | -0.0 | -10.0 | -12.0 | -0.0 | RSW | Fort Myers, FL | FL | CLT | 3 | AA | 494 | Charlotte, NC | NC |
38 | AA-495:ATL->PHX | -6.0 | 28.0 | 5.0 | 1.0 | False | 224.0 | 1587.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -224.0 | -1.0 | -0.0 | 6.0 | -1587.0 | -0.0 | -0.0 | -0.0 | -5.0 | -28.0 | -0.0 | ATL | Atlanta, GA | GA | PHX | 7 | AA | 495 | Phoenix, AZ | AZ |
46 | AA-495:CLT->ATL | -2.0 | 18.0 | 8.0 | -3.0 | False | 50.0 | 226.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -50.0 | 3.0 | -0.0 | 2.0 | -226.0 | -0.0 | -0.0 | -0.0 | -8.0 | -18.0 | -0.0 | CLT | Charlotte, NC | NC | ATL | 1 | AA | 495 | Atlanta, GA | GA |
31 | AA-494:RSW->CLT | 0.0 | 11.0 | 10.0 | -3.0 | False | 87.0 | 600.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -87.0 | 3.0 | -0.0 | -0.0 | -600.0 | -0.0 | -0.0 | -0.0 | -10.0 | -11.0 | -0.0 | RSW | Fort Myers, FL | FL | CLT | 3 | AA | 494 | Charlotte, NC | NC |
39 | AA-495:ATL->PHX | -4.0 | 26.0 | 3.0 | 10.0 | False | 235.0 | 1587.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -235.0 | -10.0 | -0.0 | 4.0 | -1587.0 | -0.0 | -0.0 | -0.0 | -3.0 | -26.0 | -0.0 | ATL | Atlanta, GA | GA | PHX | 7 | AA | 495 | Phoenix, AZ | AZ |
new_fm, new_features = remove_highly_correlated_features(fm, features=features)
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
flight_id | dep_delay | taxi_out | taxi_in | arr_delay | diverted | air_time | carrier_delay | weather_delay | national_airspace_delay | security_delay | late_aircraft_delay | canceled | -(security_delay) | -(weather_delay) | flights.origin | flights.origin_city | flights.origin_state | flights.dest | flights.distance_group | flights.carrier | flights.flight_num | flights.airports.dest_city | flights.airports.dest_state | |
trip_log_id | ||||||||||||||||||||||||
30 | AA-494:RSW->CLT | -11.0 | 12.0 | 10.0 | -12.0 | False | 88.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -0.0 | -0.0 | RSW | Fort Myers, FL | FL | CLT | 3 | AA | 494 | Charlotte, NC | NC |
38 | AA-495:ATL->PHX | -6.0 | 28.0 | 5.0 | 1.0 | False | 224.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -0.0 | -0.0 | ATL | Atlanta, GA | GA | PHX | 7 | AA | 495 | Phoenix, AZ | AZ |
46 | AA-495:CLT->ATL | -2.0 | 18.0 | 8.0 | -3.0 | False | 50.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -0.0 | -0.0 | CLT | Charlotte, NC | NC | ATL | 1 | AA | 495 | Atlanta, GA | GA |
31 | AA-494:RSW->CLT | 0.0 | 11.0 | 10.0 | -3.0 | False | 87.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -0.0 | -0.0 | RSW | Fort Myers, FL | FL | CLT | 3 | AA | 494 | Charlotte, NC | NC |
39 | AA-495:ATL->PHX | -4.0 | 26.0 | 3.0 | 10.0 | False | 235.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -0.0 | -0.0 | ATL | Atlanta, GA | GA | PHX | 7 | AA | 495 | Phoenix, AZ | AZ |
set(features) - set(new_features)
{<Feature: -(carrier_delay)>,
<Feature: -(arr_delay)>,
<Feature: distance>,
<Feature: -(taxi_in)>,
<Feature: -(distance)>,
<Feature: -(national_airspace_delay)>,
<Feature: -(late_aircraft_delay)>,
<Feature: -(dep_delay)>,
<Feature: -(air_time)>,
<Feature: -(taxi_out)>}
new_fm, new_features = remove_highly_correlated_features(
fm, features=features, pct_corr_threshold=0.9
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
flight_id | dep_delay | taxi_out | taxi_in | arr_delay | diverted | air_time | carrier_delay | weather_delay | security_delay | late_aircraft_delay | canceled | -(security_delay) | -(weather_delay) | flights.origin | flights.origin_city | flights.origin_state | flights.dest | flights.distance_group | flights.carrier | flights.flight_num | flights.airports.dest_city | flights.airports.dest_state | |
trip_log_id | |||||||||||||||||||||||
30 | AA-494:RSW->CLT | -11.0 | 12.0 | 10.0 | -12.0 | False | 88.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -0.0 | -0.0 | RSW | Fort Myers, FL | FL | CLT | 3 | AA | 494 | Charlotte, NC | NC |
38 | AA-495:ATL->PHX | -6.0 | 28.0 | 5.0 | 1.0 | False | 224.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -0.0 | -0.0 | ATL | Atlanta, GA | GA | PHX | 7 | AA | 495 | Phoenix, AZ | AZ |
46 | AA-495:CLT->ATL | -2.0 | 18.0 | 8.0 | -3.0 | False | 50.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -0.0 | -0.0 | CLT | Charlotte, NC | NC | ATL | 1 | AA | 495 | Atlanta, GA | GA |
31 | AA-494:RSW->CLT | 0.0 | 11.0 | 10.0 | -3.0 | False | 87.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -0.0 | -0.0 | RSW | Fort Myers, FL | FL | CLT | 3 | AA | 494 | Charlotte, NC | NC |
39 | AA-495:ATL->PHX | -4.0 | 26.0 | 3.0 | 10.0 | False | 235.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -0.0 | -0.0 | ATL | Atlanta, GA | GA | PHX | 7 | AA | 495 | Phoenix, AZ | AZ |
set(features) - set(new_features)
{<Feature: -(carrier_delay)>,
<Feature: -(arr_delay)>,
<Feature: -(taxi_in)>,
<Feature: distance>,
<Feature: -(distance)>,
<Feature: -(national_airspace_delay)>,
<Feature: -(late_aircraft_delay)>,
<Feature: -(dep_delay)>,
<Feature: -(air_time)>,
<Feature: national_airspace_delay>,
<Feature: -(taxi_out)>}
new_fm, new_features = remove_highly_correlated_features(
features_to_check=["air_time", "distance", "flights.distance_group"],
flight_id | dep_delay | taxi_out | taxi_in | arr_delay | diverted | air_time | carrier_delay | weather_delay | national_airspace_delay | security_delay | late_aircraft_delay | canceled | -(air_time) | -(arr_delay) | -(carrier_delay) | -(dep_delay) | -(distance) | -(late_aircraft_delay) | -(national_airspace_delay) | -(security_delay) | -(taxi_in) | -(taxi_out) | -(weather_delay) | flights.origin | flights.origin_city | flights.origin_state | flights.dest | flights.distance_group | flights.carrier | flights.flight_num | flights.airports.dest_city | flights.airports.dest_state | |
trip_log_id | |||||||||||||||||||||||||||||||||
30 | AA-494:RSW->CLT | -11.0 | 12.0 | 10.0 | -12.0 | False | 88.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -88.0 | 12.0 | -0.0 | 11.0 | -600.0 | -0.0 | -0.0 | -0.0 | -10.0 | -12.0 | -0.0 | RSW | Fort Myers, FL | FL | CLT | 3 | AA | 494 | Charlotte, NC | NC |
38 | AA-495:ATL->PHX | -6.0 | 28.0 | 5.0 | 1.0 | False | 224.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -224.0 | -1.0 | -0.0 | 6.0 | -1587.0 | -0.0 | -0.0 | -0.0 | -5.0 | -28.0 | -0.0 | ATL | Atlanta, GA | GA | PHX | 7 | AA | 495 | Phoenix, AZ | AZ |
46 | AA-495:CLT->ATL | -2.0 | 18.0 | 8.0 | -3.0 | False | 50.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -50.0 | 3.0 | -0.0 | 2.0 | -226.0 | -0.0 | -0.0 | -0.0 | -8.0 | -18.0 | -0.0 | CLT | Charlotte, NC | NC | ATL | 1 | AA | 495 | Atlanta, GA | GA |
31 | AA-494:RSW->CLT | 0.0 | 11.0 | 10.0 | -3.0 | False | 87.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -87.0 | 3.0 | -0.0 | -0.0 | -600.0 | -0.0 | -0.0 | -0.0 | -10.0 | -11.0 | -0.0 | RSW | Fort Myers, FL | FL | CLT | 3 | AA | 494 | Charlotte, NC | NC |
39 | AA-495:ATL->PHX | -4.0 | 26.0 | 3.0 | 10.0 | False | 235.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -235.0 | -10.0 | -0.0 | 4.0 | -1587.0 | -0.0 | -0.0 | -0.0 | -3.0 | -26.0 | -0.0 | ATL | Atlanta, GA | GA | PHX | 7 | AA | 495 | Phoenix, AZ | AZ |
set(features) - set(new_features)
{<Feature: distance>}
new_fm, new_features = remove_highly_correlated_features(
features_to_keep=["air_time", "distance", "flights.distance_group"],
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
c /= stddev[:, None]
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
c /= stddev[None, :]
flight_id | dep_delay | taxi_out | taxi_in | arr_delay | diverted | air_time | distance | carrier_delay | weather_delay | national_airspace_delay | security_delay | late_aircraft_delay | canceled | -(security_delay) | -(weather_delay) | flights.origin | flights.origin_city | flights.origin_state | flights.dest | flights.distance_group | flights.carrier | flights.flight_num | flights.airports.dest_city | flights.airports.dest_state | |
trip_log_id | |||||||||||||||||||||||||
30 | AA-494:RSW->CLT | -11.0 | 12.0 | 10.0 | -12.0 | False | 88.0 | 600.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -0.0 | -0.0 | RSW | Fort Myers, FL | FL | CLT | 3 | AA | 494 | Charlotte, NC | NC |
38 | AA-495:ATL->PHX | -6.0 | 28.0 | 5.0 | 1.0 | False | 224.0 | 1587.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -0.0 | -0.0 | ATL | Atlanta, GA | GA | PHX | 7 | AA | 495 | Phoenix, AZ | AZ |
46 | AA-495:CLT->ATL | -2.0 | 18.0 | 8.0 | -3.0 | False | 50.0 | 226.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -0.0 | -0.0 | CLT | Charlotte, NC | NC | ATL | 1 | AA | 495 | Atlanta, GA | GA |
31 | AA-494:RSW->CLT | 0.0 | 11.0 | 10.0 | -3.0 | False | 87.0 | 600.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -0.0 | -0.0 | RSW | Fort Myers, FL | FL | CLT | 3 | AA | 494 | Charlotte, NC | NC |
39 | AA-495:ATL->PHX | -4.0 | 26.0 | 3.0 | 10.0 | False | 235.0 | 1587.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | False | -0.0 | -0.0 | ATL | Atlanta, GA | GA | PHX | 7 | AA | 495 | Phoenix, AZ | AZ |
set(features) - set(new_features)
{<Feature: -(carrier_delay)>,
<Feature: -(arr_delay)>,
<Feature: -(taxi_in)>,
<Feature: -(distance)>,
<Feature: -(national_airspace_delay)>,
<Feature: -(late_aircraft_delay)>,
<Feature: -(dep_delay)>,
<Feature: -(air_time)>,
<Feature: -(taxi_out)>}