featuretools.primitives.RollingOutlierCount#

class featuretools.primitives.RollingOutlierCount(window_length=3, gap=1, min_periods=0)[source]#

确定在给定窗口内有多少值是异常值.

描述:: 给定一个数字列表和相应的日期时间列表,返回在数值中的滚动异常值计数, 从当前行开始,向前跳过 gap 行,并在指定的窗口（由 window_length 和 gap 定义）内向后查看. 使用 IQR 方法在整个序列上计算,判断值是否为异常值.输入的日期时间应为单调递增.

Parameters:

window_length (int, string, 可选) – 指定每个窗口中包含的数据量. 如果提供整数,它将对应于行数.对于具有均匀采样频率的数据,例如每天一次, window_length 将对应于一段时间,例如 window_length 为 7 时,对应 7 天. 如果提供字符串,它必须是 Pandas 的偏移别名字符串（’1D’, ‘1H’ 等）, 它将指示每个窗口应跨越的时间长度. 可用的偏移别名列表可以在以下网址找到: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases. 默认值为 3.
gap (int, string, 可选) – 指定在每个实例之前,窗口可用数据开始前的间隙. 如果提供整数,它将对应于行数. 如果提供字符串,它必须是 Pandas 的偏移别名字符串（’1D’, ‘1H’ 等）, 它将指示目标实例与其窗口开始之间的时间长度. 默认值为 1,表示从窗口中排除目标实例.
min_periods (int, 可选) – 执行计算所需的最小观测数. 当 window_length 为整数时,min_periods 只能与 window_length 一样大. 当 window_length 为偏移别名字符串时,此限制不存在,但应注意不要选择一个总是大于窗口中观测数的 min_periods. 默认值为 1.

注意:: 只有具有固定频率的偏移别名才能用于定义 gap 和 window_length. 这意味着不能使用 M 或 W 等别名,因为它们可以表示不同的天数. （’M’ 因为不同的月份有不同的天数;’W’ 因为周将指示特定的星期几,如 W-Wed,因此将根据锚定日期指示不同的天数.）
注意:: 当使用偏移别名定义 gap 时,也必须使用偏移别名定义 window_length. 当使用偏移别名定义 window_length 时,此限制不存在.实际上, 如果数据具有均匀的采样频率,最好使用数字 gap,因为它更高效.

Examples

>>> import pandas as pd
>>> rolling_outlier_count = RollingOutlierCount(window_length=4)
>>> times = pd.date_range(start='2019-01-01', freq='1min', periods=6)
>>> rolling_outlier_count(times, [0, 0, 0, 0, 10, 0]).tolist()
[nan, 0.0, 0.0, 0.0, 0.0, 1.0]

我们也可以控制滚动计算前的间隙. >>> import pandas as pd >>> rolling_outlier_count = RollingOutlierCount(window_length=4, gap=0) >>> times = pd.date_range(start=’2019-01-01’, freq=’1min’, periods=6) >>> rolling_outlier_count(times, [0, 0, 0, 0, 10, 0]).tolist() [0.0, 0.0, 0.0, 0.0, 1.0, 1.0]

我们也可以控制滚动计算所需的最小周期数. >>> import pandas as pd >>> rolling_outlier_count = RollingOutlierCount(window_length=4, min_periods=3) >>> times = pd.date_range(start=’2019-01-01’, freq=’1min’, periods=6) >>> rolling_outlier_count(times, [0, 0, 0, 0, 10, 0]).tolist() [nan, nan, nan, 0.0, 0.0, 1.0]

我们也可以使用偏移别名字符串设置 window_length 和 gap. >>> import pandas as pd >>> rolling_outlier_count = RollingOutlierCount(window_length=’4min’, gap=’1min’) >>> times = pd.date_range(start=’2019-01-01’, freq=’1min’, periods=6) >>> rolling_outlier_count(times, [0, 0, 0, 0, 10, 0]).tolist() [nan, 0.0, 0.0, 0.0, 0.0, 1.0]

__init__(window_length=3, gap=1, min_periods=0)[source]#

Methods

`__init__`([window_length, gap, min_periods])
`flatten_nested_input_types`(input_types)	将嵌套的列模式输入展平成一个列表.
`generate_name`(base_feature_names)
`generate_names`(base_feature_names)
`get_args_string`()
`get_arguments`()
`get_description`(input_column_descriptions[, ...])
`get_filepath`(filename)
`get_function`()
`get_outliers_count`(numeric_series)

Attributes

`base_of`
`base_of_exclude`
`commutative`
`default_value`	Default value this feature returns if no data found.
`description_template`
`input_types`	woodwork.ColumnSchema types of inputs
`max_stack_depth`
`name`	Name of the primitive
`number_output_features`	Number of columns in feature matrix associated with this feature
`return_type`	ColumnSchema type of return
`stack_on`
`stack_on_exclude`
`stack_on_self`
`uses_calc_time`
`uses_full_dataframe`

Table of Contents

Previous topic

Next topic

This Page

featuretools.primitives.RollingOutlierCount#

Table of Contents

Previous topic

Next topic

This Page

Quick search

featuretools.primitives.RollingOutlierCount#