版本 0.21.0 (2017年10月27日)#

这是从 0.20.3 版本以来的一个重大发布，包括许多 API 变更、弃用、新功能、增强功能和性能改进，以及大量错误修复。我们建议所有用户升级到此版本。

亮点包括：

与 Apache Parquet 的集成，包括一个新的顶级 read_parquet() 函数和 DataFrame.to_parquet() 方法，请参见这里。
面向新用户的 pandas.api.types.CategoricalDtype 用于指定独立于数据的分类，请参见这里。
所有-NaN Series/DataFrames 上的 sum 和 prod 的行为现在是一致的，不再依赖于是否安装了 bottleneck，并且空 Series 上的 sum 和 prod 现在返回 NaN 而不是 0，请参见这里。
针对 pypy 的兼容性修复，请参见这里。
对 drop、reindex 和 rename API 的添加，使其更加一致，请参见这里。
新增了 DataFrame.infer_objects 方法（见这里）和 GroupBy.pipe 方法（见这里）。
使用标签列表进行索引，其中缺少一个或多个标签，已被弃用，并且在未来的版本中将引发 KeyError，请参见这里。

在更新之前，请检查 API 变更和弃用。

新功能#

与 Apache Parquet 文件格式的集成#

与 Apache Parquet 的集成，包括一个新的顶级 read_parquet() 和 DataFrame.to_parquet() 方法，请参见这里 (GH 15838, GH 17438)。

Apache Parquet 提供了一种跨语言的二进制文件格式，用于高效地读写数据帧。Parquet 旨在忠实地序列化和反序列化 DataFrame ，支持所有 pandas 的数据类型，包括扩展数据类型，如带时区的日期时间。

此功能依赖于 pyarrow 或 fastparquet 库。更多详情，请参见关于 Parquet 的 IO 文档。

方法 `infer_objects` 类型转换#

DataFrame.infer_objects() 和 Series.infer_objects() 方法已被添加，用于对对象列执行数据类型推断，取代了已弃用的 convert_objects 方法的一些功能。更多详细信息请参见文档这里。 (GH 11221)

此方法仅对对象列执行软转换，将Python对象转换为本机类型，但不进行任何强制转换。例如：

In [1]: df = pd.DataFrame({'A': [1, 2, 3],
   ...:                    'B': np.array([1, 2, 3], dtype='object'),
   ...:                    'C': ['1', '2', '3']})
   ...: 

In [2]: df.dtypes
Out[2]: 
A     int64
B    object
C    object
Length: 3, dtype: object

In [3]: df.infer_objects().dtypes
Out[3]: 
A     int64
B     int64
C    object
Length: 3, dtype: object

注意列 'C' 未被转换 - 只有标量数值类型会被转换为新类型。其他类型的转换应使用 to_numeric() 函数（或 to_datetime() ， to_timedelta() ）来完成。

In [4]: df = df.infer_objects()

In [5]: df['C'] = pd.to_numeric(df['C'], errors='coerce')

In [6]: df.dtypes
Out[6]: 
A    int64
B    int64
C    int64
Length: 3, dtype: object

在尝试创建列时改进了警告#

新用户经常对 DataFrame 实例上的列操作和属性访问之间的关系感到困惑 (GH 7175)。这种混淆的一个具体例子是试图通过在 DataFrame 上设置属性来创建新列：

In [1]: df = pd.DataFrame({'one': [1., 2., 3.]})
In [2]: df.two = [4, 5, 6]

这不会引发任何明显的异常，但也不会创建新列：

In [3]: df
Out[3]:
    one
0  1.0
1  2.0
2  3.0

现在将类似列表的数据结构设置为新属性会引发一个 UserWarning ，提示可能出现意外行为。请参见属性访问。

方法 `drop` 现在也接受 index/columns 关键字#

drop() 方法增加了 index/columns 关键字作为指定 axis 的替代方法。这与 reindex 的行为类似 (GH 12392)。

例如：

In [7]: df = pd.DataFrame(np.arange(8).reshape(2, 4),
   ...:                   columns=['A', 'B', 'C', 'D'])
   ...: 

In [8]: df
Out[8]: 
   A  B  C  D
0  0  1  2  3
1  4  5  6  7

[2 rows x 4 columns]

In [9]: df.drop(['B', 'C'], axis=1)
Out[9]: 
   A  D
0  0  3
1  4  7

[2 rows x 2 columns]

# the following is now equivalent
In [10]: df.drop(columns=['B', 'C'])
Out[10]: 
   A  D
0  0  3
1  4  7

[2 rows x 2 columns]

方法 `rename`, `reindex` 现在也接受 axis 关键字#

DataFrame.rename() 和 DataFrame.reindex() 方法增加了 axis 关键字，以指定操作的目标轴 (GH 12392)。

这里是 rename：

In [11]: df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})

In [12]: df.rename(str.lower, axis='columns')
Out[12]: 
   a  b
0  1  4
1  2  5
2  3  6

[3 rows x 2 columns]

In [13]: df.rename(id, axis='index')
Out[13]: 
                 A  B
281473167261904  1  4
281473167261936  2  5
281473167261968  3  6

[3 rows x 2 columns]

以及 reindex：

In [14]: df.reindex(['A', 'B', 'C'], axis='columns')
Out[14]: 
   A  B   C
0  1  4 NaN
1  2  5 NaN
2  3  6 NaN

[3 rows x 3 columns]

In [15]: df.reindex([0, 1, 3], axis='index')
Out[15]: 
     A    B
0  1.0  4.0
1  2.0  5.0
3  NaN  NaN

[3 rows x 2 columns]

“index, columns” 风格继续像以前一样工作。

In [16]: df.rename(index=id, columns=str.lower)
Out[16]: 
                 a  b
281473167261904  1  4
281473167261936  2  5
281473167261968  3  6

[3 rows x 2 columns]

In [17]: df.reindex(index=[0, 1, 3], columns=['A', 'B', 'C'])
Out[17]: 
     A    B   C
0  1.0  4.0 NaN
1  2.0  5.0 NaN
3  NaN  NaN NaN

[3 rows x 3 columns]

我们*高度*鼓励使用命名参数，以避免在使用任一风格时产生混淆。

`CategoricalDtype` 用于指定分类数据#

pandas.api.types.CategoricalDtype 已添加到公共 API 并扩展以包括 categories 和 ordered 属性。一个 CategoricalDtype 可以用来指定数组的类别集和有序性，独立于数据。例如，当将字符串数据转换为 Categorical 时，这可能很有用（GH 14711, GH 15078, GH 16015, GH 17643）：

In [18]: from pandas.api.types import CategoricalDtype

In [19]: s = pd.Series(['a', 'b', 'c', 'a'])  # strings

In [20]: dtype = CategoricalDtype(categories=['a', 'b', 'c', 'd'], ordered=True)

In [21]: s.astype(dtype)
Out[21]: 
0    a
1    b
2    c
3    a
Length: 4, dtype: category
Categories (4, object): ['a' < 'b' < 'c' < 'd']

一个值得特别提及的地方是 read_csv()。以前，使用 dtype={'col': 'category'}，返回的值和类别总是字符串。

In [22]: data = 'A,B\na,1\nb,2\nc,3'

In [23]: pd.read_csv(StringIO(data), dtype={'B': 'category'}).B.cat.categories
Out[23]: Index(['1', '2', '3'], dtype='object')

注意“对象”数据类型。

使用所有数值、日期时间或时间增量的 CategoricalDtype ，我们可以自动转换为正确的类型

In [24]: dtype = {'B': CategoricalDtype([1, 2, 3])}

In [25]: pd.read_csv(StringIO(data), dtype=dtype).B.cat.categories
Out[25]: Index([1, 2, 3], dtype='int64')

这些值已被正确解释为整数。

Categorical、CategoricalIndex 或带有分类类型的 Series 的 .dtype 属性现在将返回一个 CategoricalDtype 实例。虽然 repr 已经改变，str(CategoricalDtype()) 仍然是字符串 'category'。我们借此机会提醒用户，检测分类数据的*首选*方法是使用 pandas.api.types.is_categorical_dtype()，而不是 str(dtype) == 'category'。

更多信息请参见 CategoricalDtype 文档。

`GroupBy` 对象现在有一个 `pipe` 方法#

GroupBy 对象现在有一个 pipe 方法，类似于 DataFrame 和 Series 上的方法，允许以一种干净、易读的语法组合接受 GroupBy 的函数。(GH 17871)

作为一个具体的例子，结合 .groupby 和 .pipe ，假设有一个包含商店、产品、收入和销售数量的 DataFrame。我们希望对每个商店和每个产品进行 *价格*（即收入/数量）的分组计算。我们可以通过多步骤操作来完成，但用管道表达可以使代码更易读。

首先我们设置数据：

In [26]: import numpy as np

In [27]: n = 1000

In [28]: df = pd.DataFrame({'Store': np.random.choice(['Store_1', 'Store_2'], n),
   ....:                    'Product': np.random.choice(['Product_1',
   ....:                                                 'Product_2',
   ....:                                                 'Product_3'
   ....:                                                 ], n),
   ....:                    'Revenue': (np.random.random(n) * 50 + 10).round(2),
   ....:                    'Quantity': np.random.randint(1, 10, size=n)})
   ....: 

In [29]: df.head(2)
Out[29]: 
     Store    Product  Revenue  Quantity
0  Store_2  Product_2    32.09         7
1  Store_1  Product_3    14.20         1

[2 rows x 4 columns]

现在，要查找每个商店/产品的价格，我们可以简单地这样做：

In [30]: (df.groupby(['Store', 'Product'])
   ....:    .pipe(lambda grp: grp.Revenue.sum() / grp.Quantity.sum())
   ....:    .unstack().round(2))
   ....: 
Out[30]: 
Product  Product_1  Product_2  Product_3
Store                                   
Store_1       6.73       6.72       7.14
Store_2       7.59       6.98       7.23

[2 rows x 3 columns]

更多信息请参见文档。

`Categorical.rename_categories` 接受一个类似字典的对象#

rename_categories() 现在接受一个类似字典的参数用于 new_categories。之前的类别在字典的键中查找并替换，如果找到的话。缺失和额外键的行为与 DataFrame.rename() 相同。

In [31]: c = pd.Categorical(['a', 'a', 'b'])

In [32]: c.rename_categories({"a": "eh", "b": "bee"})
Out[32]: 
['eh', 'eh', 'bee']
Categories (2, object): ['eh', 'bee']

警告

为了协助升级 pandas，rename_categories 将 Series 视为类列表。通常，Series 被认为是类字典的（例如在 .rename，.map 中）。在未来的 pandas 版本中，rename_categories 将改为将它们视为类字典的。请遵循警告信息的建议，编写面向未来的代码。

In [33]: c.rename_categories(pd.Series([0, 1], index=['a', 'c']))
FutureWarning: Treating Series 'new_categories' as a list-like and using the values.
In a future version, 'rename_categories' will treat Series like a dictionary.
For dict-like, use 'new_categories.to_dict()'
For list-like, use 'new_categories.values'.
Out[33]:
[0, 0, 1]
Categories (2, int64): [0, 1]

其他增强功能#

新函数或方法#

Resampler.nearest() 被添加以支持最近邻上采样 (GH 17496)。
Index 增加了对 to_frame 方法的支持 (GH 15230)。

新关键词#

为 infer_dtype() 添加了 skipna 参数，以支持在存在缺失值的情况下进行类型推断 (GH 17059)。
Series.to_dict() 和 DataFrame.to_dict() 现在支持一个 into 关键字，允许你指定你希望返回的 collections.Mapping 子类。默认是 dict，这是向后兼容的。(GH 16122)
Series.set_axis() 和 DataFrame.set_axis() 现在支持 inplace 参数。(GH 14636)
Series.to_pickle() 和 DataFrame.to_pickle() 增加了一个 protocol 参数 (GH 16252)。默认情况下，该参数设置为 HIGHEST_PROTOCOL
read_feather() 增加了 nthreads 参数用于多线程操作 (GH 16359)
DataFrame.clip() 和 Series.clip() 增加了一个 inplace 参数。(GH 15388)
crosstab() 增加了一个 margins_name 参数，用于定义当 margins=True 时包含总计的行/列的名称。(GH 15972)
read_json() 现在接受一个 chunksize 参数，当 lines=True 时可以使用。如果传递了 chunksize，read_json 现在返回一个迭代器，每次迭代读取 chunksize 行。(GH 17048)
read_json() 和 to_json() 现在接受一个 compression 参数，允许它们透明地处理压缩文件。(GH 17798)

各种增强#

通过大约2.25倍提高了pandas的导入时间。 (GH 16764)
在大多数读取器（例如 read_csv()）和写入器（例如 DataFrame.to_csv()）上支持 PEP 519 – 添加文件系统路径协议 (GH 13823)。
为 pd.HDFStore、pd.ExcelFile 和 pd.ExcelWriter 添加了 __fspath__ 方法，以正确使用文件系统路径协议 (GH 13823)。
validate 参数用于 merge() 现在检查合并是否为一对一、一对多、多对一或多对多。如果发现合并不是指定类型的示例，将引发类型为 MergeError 的异常。更多信息请参见这里 (GH 16270)
在构建系统中添加了对 PEP 518 （pyproject.toml）的支持 (GH 16745)
RangeIndex.append() 现在在可能的情况下返回一个 RangeIndex 对象 (GH 16212)
Series.rename_axis() 和 DataFrame.rename_axis() 在 inplace=True 时返回 None ，同时就地重命名轴。(GH 15704)
api.types.infer_dtype() 现在可以推断小数。(GH 15690)
DataFrame.select_dtypes() 现在接受标量值作为 include/exclude 的参数，以及类似列表的值。(GH 16855)
date_range() 现在除了 ‘AS’ 之外还接受 ‘YS’ 作为年初的别名。(GH 9313)
date_range() 现在除了 ‘A’ 之外还接受 ‘Y’ 作为年末的别名。(GH 9313)
DataFrame.add_prefix() 和 DataFrame.add_suffix() 现在接受包含 ‘%’ 字符的字符串。(GH 17151)
读/写方法可以推断压缩（read_csv()、read_table()、read_pickle() 和 to_pickle()）现在可以从类路径对象推断，例如 pathlib.Path。(GH 17206)
read_sas() 现在可以识别 SAS7BDAT 文件中更多最常用的日期（日期时间）格式。(GH 15871)
DataFrame.items() 和 Series.items() 现在在 Python 2 和 3 中都存在，并且在所有情况下都是惰性的。(GH 13918, GH 17213)
pandas.io.formats.style.Styler.where() 已被实现为 pandas.io.formats.style.Styler.applymap() 的便利方法。(GH 17474)
MultiIndex.is_monotonic_decreasing() 已经实现。之前在所有情况下都返回 False。(GH 16554)
read_excel() 如果未安装 xlrd ，会以更好的消息引发 ImportError 。 (GH 17613)
DataFrame.assign() 将保留 Python 3.6+ 用户的 **kwargs 的原始顺序，而不是对列名进行排序。(GH 14207)
Series.reindex(), DataFrame.reindex(), Index.get_indexer() 现在支持 tolerance 的类列表参数。(GH 17367)

向后不兼容的 API 变化#

依赖项已增加最低版本#

我们已经更新了依赖项的最低支持版本（GH 15206, GH 15543, GH 15214）。如果已安装，我们现在需要：

包

最低版本

必需的

Numpy

1.9.0

X

Matplotlib

1.4.3

Scipy

0.14.0

瓶颈

1.0.0

此外，已不再支持 Python 3.4 (GH 15251)。

所有-NaN 或空 Series/DataFrames 的总和/乘积现在一致为 NaN#

备注

这里描述的更改已部分恢复。更多信息请参见 v0.22.0 新变化。

sum 和 prod 在全 NaN 的 Series/DataFrames 上的行为不再依赖于是否安装了 bottleneck，并且在空 Series 上 sum 和 prod 的返回值已更改 (GH 9422, GH 15507)。

在空的或全为 NaN 的 Series 上调用 sum 或 prod，或者在 DataFrame 的列上调用，将导致 NaN。请参阅文档。

In [33]: s = pd.Series([np.nan])

之前在没有安装 bottleneck 的情况下：

In [2]: s.sum()
Out[2]: np.nan

之前使用 bottleneck：

In [2]: s.sum()
Out[2]: 0.0

新行为，不考虑瓶颈安装：

In [34]: s.sum()
Out[34]: 0.0

注意，这也改变了空 Series 的总和。以前这总是返回 0，无论是否安装了 bottleneck：

In [1]: pd.Series([]).sum()
Out[1]: 0

但为了与所有NaN情况保持一致，这也被改为返回0：

In [2]: pd.Series([]).sum()
Out[2]: 0

使用带有缺失标签的列表进行索引已被弃用#

之前，通过一个标签列表进行选择，其中有一个或多个标签缺失时总是会成功，为缺失的标签返回 NaN。现在这将显示一个 FutureWarning。未来这将引发一个 KeyError (GH 15747)。当在 DataFrame 或 Series 上使用 .loc[] 或 [[]] 并传递至少包含一个缺失标签的标签列表时，将触发此警告。

In [35]: s = pd.Series([1, 2, 3])

In [36]: s
Out[36]: 
0    1
1    2
2    3
Length: 3, dtype: int64

之前的操作

In [4]: s.loc[[1, 2, 3]]
Out[4]:
1    2.0
2    3.0
3    NaN
dtype: float64

当前行为

In [4]: s.loc[[1, 2, 3]]
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike

Out[4]:
1    2.0
2    3.0
3    NaN
dtype: float64

通过 .reindex() 实现选择可能未找到的元素是惯用的方法。

In [37]: s.reindex([1, 2, 3])
Out[37]: 
1    2.0
2    3.0
3    NaN
Length: 3, dtype: float64

找到所有键的选择保持不变。

In [38]: s.loc[[1, 2]]
Out[38]: 
1    2
2    3
Length: 2, dtype: int64

NA 命名更改#

为了促进 pandas API 之间更多的统一性，我们添加了额外的顶层函数 isna() 和 notna()，它们是 isnull() 和 notnull() 的别名。命名方案现在与 .dropna() 和 .fillna() 等方法更加一致。此外，在定义了 .isnull() 和 .notnull() 方法的所有情况下，这些类还增加了名为 .isna() 和 .notna() 的方法，这些方法包含在 Categorical、Index、Series 和 DataFrame 类中。(GH 15001)。

配置选项 pd.options.mode.use_inf_as_null 已被弃用，并添加了 pd.options.mode.use_inf_as_na 作为替代。

现在，对 Series/Index 的迭代将返回 Python 标量#

之前，在使用某些迭代方法对 Series 进行操作时，如果其数据类型为 int 或 float，你会收到一个 numpy 标量，例如 np.int64，而不是一个 Python int。问题 (GH 10904) 修正了 Series.tolist() 和 list(Series) 的情况。这一更改使得所有迭代方法保持一致，特别是对于 __iter__() 和 .map()；请注意，这仅影响 int/float 数据类型。(GH 13236, GH 13258, GH 14216)。

In [39]: s = pd.Series([1, 2, 3])

In [40]: s
Out[40]: 
0    1
1    2
2    3
Length: 3, dtype: int64

之前：

In [2]: type(list(s)[0])
Out[2]: numpy.int64

新行为：

In [41]: type(list(s)[0])
Out[41]: int

此外，这现在也会正确地将 DataFrame.to_dict() 的迭代结果装箱。

In [42]: d = {'a': [1], 'b': ['b']}

In [43]: df = pd.DataFrame(d)

之前：

In [8]: type(df.to_dict()['a'][0])
Out[8]: numpy.int64

新行为：

In [44]: type(df.to_dict()['a'][0])
Out[44]: int

使用布尔索引进行索引#

之前在将布尔 Index 传递给 .loc 时，如果 Series/DataFrame 的索引有 boolean 标签，你会得到基于标签的选择，可能会重复结果标签，而不是布尔索引选择（其中 True 选择元素），这与布尔 numpy 数组索引的方式不一致。新的行为是像布尔 numpy 数组索引器一样操作。(GH 17738)

之前的行为：

In [45]: s = pd.Series([1, 2, 3], index=[False, True, False])

In [46]: s
Out[46]: 
False    1
True     2
False    3
Length: 3, dtype: int64

In [59]: s.loc[pd.Index([True, False, True])]
Out[59]:
True     2
False    1
False    3
True     2
dtype: int64

当前行为

In [47]: s.loc[pd.Index([True, False, True])]
Out[47]: 
False    1
False    3
Length: 2, dtype: int64

此外，以前如果你有一个非数字的索引（例如字符串），那么一个布尔索引会引发 KeyError。现在这将作为布尔索引器处理。

之前的行为：

In [48]: s = pd.Series([1, 2, 3], index=['a', 'b', 'c'])

In [49]: s
Out[49]: 
a    1
b    2
c    3
Length: 3, dtype: int64

In [39]: s.loc[pd.Index([True, False, True])]
KeyError: "None of [Index([True, False, True], dtype='object')] are in the [index]"

当前行为

In [50]: s.loc[pd.Index([True, False, True])]
Out[50]: 
a    1
c    3
Length: 2, dtype: int64

`PeriodIndex` 重采样#

在 pandas 的早期版本中，对由 PeriodIndex 索引的 Series/DataFrame 进行重采样在某些情况下会返回 DatetimeIndex (GH 12884)。现在，重采样到倍频会返回 PeriodIndex (GH 15944)。作为一个小的增强，对 PeriodIndex 进行重采样现在可以处理 NaT 值 (GH 13224)

之前的行为：

In [1]: pi = pd.period_range('2017-01', periods=12, freq='M')

In [2]: s = pd.Series(np.arange(12), index=pi)

In [3]: resampled = s.resample('2Q').mean()

In [4]: resampled
Out[4]:
2017-03-31     1.0
2017-09-30     5.5
2018-03-31    10.0
Freq: 2Q-DEC, dtype: float64

In [5]: resampled.index
Out[5]: DatetimeIndex(['2017-03-31', '2017-09-30', '2018-03-31'], dtype='datetime64[ns]', freq='2Q-DEC')

新行为：

In [1]: pi = pd.period_range('2017-01', periods=12, freq='M')

In [2]: s = pd.Series(np.arange(12), index=pi)

In [3]: resampled = s.resample('2Q').mean()

In [4]: resampled
Out[4]:
2017Q1    2.5
2017Q3    8.5
Freq: 2Q-DEC, dtype: float64

In [5]: resampled.index
Out[5]: PeriodIndex(['2017Q1', '2017Q3'], dtype='period[2Q-DEC]')

上采样并调用 .ohlc() 之前返回一个 Series，基本上与调用 .asfreq() 相同。OHLC 上采样现在返回一个包含列 open、high、low 和 close 的 DataFrame (GH 13083)。这与下采样和 DatetimeIndex 行为一致。

之前的行为：

In [1]: pi = pd.period_range(start='2000-01-01', freq='D', periods=10)

In [2]: s = pd.Series(np.arange(10), index=pi)

In [3]: s.resample('H').ohlc()
Out[3]:
2000-01-01 00:00    0.0
                ...
2000-01-10 23:00    NaN
Freq: H, Length: 240, dtype: float64

In [4]: s.resample('M').ohlc()
Out[4]:
         open  high  low  close
2000-01     0     9    0      9

新行为：

In [56]: pi = pd.period_range(start='2000-01-01', freq='D', periods=10)

In [57]: s = pd.Series(np.arange(10), index=pi)

In [58]: s.resample('H').ohlc()
Out[58]:
                  open  high  low  close
2000-01-01 00:00   0.0   0.0  0.0    0.0
2000-01-01 01:00   NaN   NaN  NaN    NaN
2000-01-01 02:00   NaN   NaN  NaN    NaN
2000-01-01 03:00   NaN   NaN  NaN    NaN
2000-01-01 04:00   NaN   NaN  NaN    NaN
...                ...   ...  ...    ...
2000-01-10 19:00   NaN   NaN  NaN    NaN
2000-01-10 20:00   NaN   NaN  NaN    NaN
2000-01-10 21:00   NaN   NaN  NaN    NaN
2000-01-10 22:00   NaN   NaN  NaN    NaN
2000-01-10 23:00   NaN   NaN  NaN    NaN

[240 rows x 4 columns]

In [59]: s.resample('M').ohlc()
Out[59]:
         open  high  low  close
2000-01     0     9    0      9

[1 rows x 4 columns]

在 pd.eval 中的项目分配期间改进了错误处理#

eval() 现在在项目分配功能失效或指定了就地操作但表达式中没有项目分配时会引发 ValueError (GH 16732)

In [51]: arr = np.array([1, 2, 3])

之前，如果你尝试以下表达式，你会得到一个不太有用的错误信息：

In [3]: pd.eval("a = 1 + 2", target=arr, inplace=True)
...
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`)
and integer or boolean arrays are valid indices

这是一个很长的说法，意思是 numpy 数组不支持字符串项索引。有了这个改变，错误信息现在是这样的：

In [3]: pd.eval("a = 1 + 2", target=arr, inplace=True)
...
ValueError: Cannot assign expression output to target

过去也可以在原地评估表达式，即使没有项目分配：

In [4]: pd.eval("1 + 2", target=arr, inplace=True)
Out[4]: 3

然而，这个输入没有太大意义，因为输出没有被分配给目标。现在，当传递这样的输入时，将会引发一个 ValueError：

In [4]: pd.eval("1 + 2", target=arr, inplace=True)
...
ValueError: Cannot operate inplace if there is no assignment

Dtype 转换#

之前，带有 bool 赋值的 .where() 和 .fillna() 会将类型强制转换为相同类型（例如 int / float），或者对于日期时间类型会引发错误。现在，这些操作将使用 object dtypes 保留布尔值。(GH 16821)。

In [52]: s = pd.Series([1, 2, 3])

In [5]: s[1] = True

In [6]: s
Out[6]:
0    1
1    1
2    3
dtype: int64

新行为

In [7]: s[1] = True

In [8]: s
Out[8]:
0       1
1    True
2       3
Length: 3, dtype: object

之前，将日期时间类似的对象赋值给非日期时间类似的对象会强制转换被赋值的非日期时间类似的项目 (GH 14145)。

In [53]: s = pd.Series([pd.Timestamp('2011-01-01'), pd.Timestamp('2012-01-01')])

In [1]: s[1] = 1

In [2]: s
Out[2]:
0   2011-01-01 00:00:00.000000000
1   1970-01-01 00:00:00.000000001
dtype: datetime64[ns]

这些现在强制转换为 object dtype。

In [1]: s[1] = 1

In [2]: s
Out[2]:
0    2011-01-01 00:00:00
1                      1
dtype: object

在 .where() 中对 datetimelikes 的不一致行为，本应强制转换为 object 却引发错误 (GH 16402)
在使用 float64 dtype 的 np.ndarray 对 int64 数据进行赋值时出现的错误可能会保留 int64 dtype (GH 14001)

MultiIndex 构造函数使用单个级别#

MultiIndex 构造函数不再将所有长度为一的层级压缩为一个常规的 Index。这影响所有 MultiIndex 构造函数。(GH 17178)

之前的行为：

In [2]: pd.MultiIndex.from_tuples([('a',), ('b',)])
Out[2]: Index(['a', 'b'], dtype='object')

长度为1的级别不再特殊处理。它们的行为与你使用长度为2+的级别完全相同，因此从所有 MultiIndex 构造函数中总是返回一个 MultiIndex：

In [54]: pd.MultiIndex.from_tuples([('a',), ('b',)])
Out[54]: 
MultiIndex([('a',),
            ('b',)],
           )

使用 Series 进行 UTC 本地化#

之前，当传递 utc=True 时，to_datetime() 不会本地化 datetime Series 数据。现在，to_datetime() 将正确地将 Series 本地化为 datetime64[ns, UTC] dtype，以与处理类似列表和 Index 数据的方式保持一致。(GH 6415)。

之前的操作

In [55]: s = pd.Series(['20130101 00:00:00'] * 3)

In [12]: pd.to_datetime(s, utc=True)
Out[12]:
0   2013-01-01
1   2013-01-01
2   2013-01-01
dtype: datetime64[ns]

新行为

In [56]: pd.to_datetime(s, utc=True)
Out[56]: 
0   2013-01-01 00:00:00+00:00
1   2013-01-01 00:00:00+00:00
2   2013-01-01 00:00:00+00:00
Length: 3, dtype: datetime64[s, UTC]

此外，通过 read_sql_table() 和 read_sql_query() 解析的带有日期时间列的 DataFrame 也将仅在原始 SQL 列是时区感知的日期时间列时本地化为 UTC。

范围函数的一致性#

在之前的版本中，各种范围函数之间存在一些不一致性：date_range()、bdate_range()、period_range()、timedelta_range() 和 interval_range()。(GH 17471)。

当 start、end 和 period 参数都被指定时，发生了一个不一致的行为，可能会导致模糊的范围。当所有三个参数都被传递时，interval_range 忽略了 period 参数，period_range 忽略了 end 参数，而其他范围函数则抛出了异常。为了在范围函数之间促进一致性，并避免可能的模糊范围，现在当所有三个参数都被传递时，interval_range 和 period_range 将抛出异常。

之前的行为：

 In [2]: pd.interval_range(start=0, end=4, periods=6)
 Out[2]:
 IntervalIndex([(0, 1], (1, 2], (2, 3]]
               closed='right',
               dtype='interval[int64]')

In [3]: pd.period_range(start='2017Q1', end='2017Q4', periods=6, freq='Q')
Out[3]: PeriodIndex(['2017Q1', '2017Q2', '2017Q3', '2017Q4', '2018Q1', '2018Q2'], dtype='period[Q-DEC]', freq='Q-DEC')

新行为：

In [2]: pd.interval_range(start=0, end=4, periods=6)
---------------------------------------------------------------------------
ValueError: Of the three parameters: start, end, and periods, exactly two must be specified

In [3]: pd.period_range(start='2017Q1', end='2017Q4', periods=6, freq='Q')
---------------------------------------------------------------------------
ValueError: Of the three parameters: start, end, and periods, exactly two must be specified

此外，端点参数 end 未包含在 interval_range 生成的区间中。然而，所有其他范围函数都在其输出中包含 end。为了促进范围函数之间的一致性，interval_range 现在将 end 作为最终区间的右端点，除非 freq 以跳过 end 的方式指定。

之前的行为：

In [4]: pd.interval_range(start=0, end=4)
Out[4]:
IntervalIndex([(0, 1], (1, 2], (2, 3]]
              closed='right',
              dtype='interval[int64]')

新行为：

In [57]: pd.interval_range(start=0, end=4)
Out[57]: IntervalIndex([(0, 1], (1, 2], (2, 3], (3, 4]], dtype='interval[int64, right]')

没有自动的 Matplotlib 转换器#

当 pandas 被导入时，pandas 不再向 matplotlib 注册我们的 date、time、datetime、datetime64 和 Period 转换器。Matplotlib 绘图方法（plt.plot、ax.plot 等）将不会很好地格式化 DatetimeIndex 或 PeriodIndex 值的 x 轴。你必须显式注册这些方法：

pandas 内置的 Series.plot 和 DataFrame.plot 将在首次使用时注册这些转换器 (GH 17710)。

备注

此更改已在 pandas 0.21.1 中暂时恢复，更多详情请参见这里。

其他API更改#

Categorical 构造函数不再接受标量作为 categories 关键字参数。(GH 16022)
访问已关闭的 HDFStore 上的一个不存在的属性现在会引发一个 AttributeError 而不是 ClosedFileError (GH 16301)
read_csv() 现在如果 names 参数包含重复项会发出 UserWarning (GH 17095)
read_csv() 现在默认将 'null' 和 'n/a' 字符串视为缺失值 (GH 16471, GH 16078)
pandas.HDFStore 的字符串表示现在更快且更简洁。对于之前的行为，请使用 pandas.HDFStore.info()。(GH 16503)。
HDF 存储中的压缩默认值现在遵循 pytables 标准。默认是不压缩，如果 complib 缺失且 complevel > 0 则使用 zlib (GH 15943)
Index.get_indexer_non_unique() 现在返回一个 ndarray 索引器而不是一个 Index；这与 Index.get_indexer() 一致 (GH 16819)
从 pandas._testing 中移除了 @slow 装饰器，这导致了一些下游包测试套件的问题。请改用 @pytest.mark.slow，它实现了同样的功能 (GH 16850)
将 MergeError 的定义移动到 pandas.errors 模块中。
Series.set_axis() 和 DataFrame.set_axis() 的签名已从 set_axis(axis, labels) 更改为 set_axis(labels, axis=0)，以便与 API 的其余部分保持一致。旧的签名已被弃用，并将显示 FutureWarning (GH 14636)
Series.argmin() 和 Series.argmax() 现在在使用 object 数据类型时会引发 TypeError，而不是 ValueError (GH 13595)
Period 现在是不可变的，当用户尝试为 ordinal 或 freq 属性分配新值时，现在会引发 AttributeError (GH 17116)。
to_datetime() 当传递一个带时区的 origin= 关键字参数时，现在会引发一个更详细的 ValueError 而不是 TypeError (GH 16842)
to_datetime() 现在在格式包含 %W 或 %U 但不包含星期和日历年时引发 ValueError (GH 16774)
在 read_stata() 中将非功能的 index 重命名为 index_col 以提高 API 一致性 (GH 16342)
在 DataFrame.drop() 中的一个错误导致在从数值索引中删除索引时，布尔标签 False 和 True 分别被视为标签 0 和 1。现在这将引发一个 ValueError (GH 16877)
受限的 DateOffset 关键字参数。以前，DateOffset 子类允许任意关键字参数，这可能导致意外行为。现在，只接受有效参数。(GH 17176)。

弃用#

DataFrame.from_csv() 和 Series.from_csv() 已被弃用，取而代之的是 read_csv() (GH 4191)
read_excel() 已弃用 sheetname 而改为使用 sheet_name 以与 .to_excel() 保持一致 (GH 10559)。
read_excel() 已弃用 parse_cols，改为使用 usecols 以与 read_csv() 保持一致 (GH 4988)
read_csv() 已弃用 tupleize_cols 参数。列元组将始终转换为 MultiIndex (GH 17060)
DataFrame.to_csv() 已弃用 tupleize_cols 参数。MultiIndex 列将始终作为行写入 CSV 文件 (GH 17060)
convert 参数在 .take() 方法中已被弃用，因为它未被遵守 (GH 16948)
pd.options.html.border 已被弃用，取而代之的是 pd.options.display.html.border (GH 15793)。
SeriesGroupBy.nth() 已弃用 dropna 参数的 True，改为使用 'all' (GH 11038)。
DataFrame.as_blocks() 已被弃用，因为这暴露了内部实现 (GH 17302)
pd.TimeGrouper 已被弃用，取而代之的是 pandas.Grouper (GH 16747)
cdate_range 已被弃用，取而代之的是 bdate_range()，它增加了 weekmask 和 holidays 参数，用于构建自定义频率的日期范围。更多详情请参见文档 (GH 17596)
传递 categories 或 ordered 关键字参数给 Series.astype() 已被弃用，建议改为传递一个 CategoricalDtype (GH 17636)
在 Series、DataFrame、Panel、SparseSeries 和 SparseDataFrame 上的 .get_value 和 .set_value 方法已被弃用，建议使用 .iat[] 或 .at[] 访问器 (GH 15269)
在 .to_excel(..., columns=) 中传递一个不存在的列已被弃用，将来会引发 KeyError (GH 17295)
raise_on_error 参数在 Series.where(), Series.mask(), DataFrame.where(), DataFrame.mask() 中已被弃用，取而代之的是 errors= (GH 14968)
使用 DataFrame.rename_axis() 和 Series.rename_axis() 来更改索引或列的标签现在已被弃用，建议使用 .rename。rename_axis 仍可用于更改索引或列的名称 (GH 17833)。
reindex_axis() 已被弃用，取而代之的是 reindex()。更多信息请参见这里 (GH 17833)。

Series.select 和 DataFrame.select#

Series.select() 和 DataFrame.select() 方法已被弃用，建议使用 df.loc[labels.map(crit)] (GH 12401)

In [58]: df = pd.DataFrame({'A': [1, 2, 3]}, index=['foo', 'bar', 'baz'])

In [3]: df.select(lambda x: x in ['bar', 'baz'])
FutureWarning: select is deprecated and will be removed in a future release. You can use .loc[crit] as a replacement
Out[3]:
     A
bar  2
baz  3

In [59]: df.loc[df.index.map(lambda x: x in ['bar', 'baz'])]
Out[59]: 
     A
bar  2
baz  3

[2 rows x 1 columns]

Series.argmax 和 Series.argmin#

Series.argmax() 和 Series.argmin() 的行为已被弃用，取而代之的是 Series.idxmax() 和 Series.idxmin()，分别 (GH 16830)。

为了与NumPy数组兼容，pd.Series 实现了 argmax 和 argmin。自pandas 0.13.0版本起，argmax 已成为 pandas.Series.idxmax() 的别名，而 argmin 已成为 pandas.Series.idxmin() 的别名。它们返回最大值或最小值的标签，而不是位置。

我们已经弃用了 Series.argmax 和 Series.argmin 的当前行为。使用这些方法中的任何一个都会发出 FutureWarning。如果你想获取最大值的标签，请使用 Series.idxmax()。如果你想获取最大值的位置，请使用 Series.values.argmax()。同样适用于最小值。在未来的版本中，Series.argmax 和 Series.argmin 将返回最大值或最小值的位置。

移除先前版本的弃用/更改#

read_excel() 已经删除了 has_index_names 参数 (GH 10967)
pd.options.display.height 配置已被移除 (GH 3663)
pd.options.display.line_width 配置已被移除 (GH 2881)
pd.options.display.mpl_style 配置已被移除 (GH 12190)
Index 已经弃用了 .sym_diff() 方法，改为使用 .symmetric_difference() (GH 12591)
Categorical 已经取消了 .order() 和 .sort() 方法，改为使用 .sort_values() (GH 12882)
eval() 和 DataFrame.eval() 已将 inplace 的默认值从 None 更改为 False (GH 11149)
函数 get_offset_name 已被弃用，取而代之的是偏移量的 .freqstr 属性（GH 11834）
pandas 不再测试与使用 pandas < 0.11 创建的 hdf5 文件的兼容性 (GH 17404)。

性能提升#

改进了实例化 SparseDataFrame 的性能 (GH 16773)
Series.dt 不再执行频率推断，在访问属性时产生了一个大的加速 (GH 17210)
通过不具体化值，改进了 set_categories() 的性能 (GH 17508)
Timestamp.microsecond 不再在属性访问时重新计算 (GH 17331)
改进了已经是分类数据类型的 CategoricalIndex 的性能 (GH 17513)
通过使用 RangeIndex 属性来执行计算，改进了 RangeIndex.min() 和 RangeIndex.max() 的性能 (GH 17607)

文档更改#

几个 NaT 方法的文档字符串（例如 NaT.ctime()）是不正确的 (GH 17327)
文档已删除并清理了对版本 < v0.17 的引用（GH 17442, GH 17442, GH 17404 & GH 17504）

错误修复#

转换#

在使用 int 对类似日期时间的数据进行赋值时，可能会错误地转换为类似日期时间的数据 (GH 14145)
在使用 float64 dtype 的 np.ndarray 对 int64 数据进行赋值时出现的错误可能会保留 int64 dtype (GH 14001)
修复了 IntervalIndex.is_non_overlapping_monotonic 的返回类型，使其与类似的属性/方法一致，返回一个 Python bool。之前返回的是 numpy.bool_。(GH 17237)
当区间在两边闭合并且在一点上重叠时，IntervalIndex.is_non_overlapping_monotonic 中的错误 (GH 16560)
在 Series.fillna() 中的错误：当 inplace=True 且 value 是字典时返回帧 (GH 16156)
在将 Timestamp.weekday_name 本地化为时区时，返回基于 UTC 的星期名称的错误 (GH 17354)
在DST变化时替换 tzinfo 的 Timestamp.replace 中的错误 (GH 15683)
在 Timedelta 构造和算术中存在的错误，不会传播 Overflow 异常 (GH 17367)
在传递扩展类型类（DatetimeTZDtype, CategoricalDtype）而不是实例时，astype() 转换为对象数据类型的错误。现在当传递类时会引发 TypeError (GH 17780)。
在 to_numeric() 中的一个错误，其中元素在 errors='coerce' 时并不总是被强制转换为数值 (GH 17007, GH 17125)
在 DataFrame 和 Series 构造函数中的错误，其中 range 对象在 Windows 上被转换为 int32 数据类型，而不是 int64 (GH 16804)

索引#

当使用空切片调用时（例如 df.iloc[:]），.iloc 和 .loc 索引器返回原始对象的浅拷贝。以前它们返回原始对象。（GH 13873）。
当在一个未排序的 MultiIndex 上调用时，loc 索引器现在只会在非排序级别上使用正确的切片时才会引发 UnsortedIndexError (GH 16734)。
修复了在 0.20.3 版本中使用字符串在 TimedeltaIndex 上索引时的回归问题 (GH 16896)。
修复了 TimedeltaIndex.get_loc() 处理 np.timedelta64 输入的问题 (GH 16909)。
修复当 ascending 参数是一个列表，但并非所有级别都被指定，或者顺序不同时，MultiIndex.sort_index() 的排序 (GH 16934)。
修复了使用 np.inf 进行索引时导致 OverflowError 被引发的问题 (GH 16957)
在空的 CategoricalIndex 上重新索引的错误 (GH 16770)
修复了 DataFrame.loc 在设置时与 tz-aware DatetimeIndex 对齐的问题 (GH 16889)
在传递带有旧版 numpy 的索引或系列到 .iloc 时避免 IndexError (GH 17193)
在 Python 2 的多级列中允许使用 Unicode 空字符串作为占位符 (GH 17099)
在使用 .iloc 进行原地加法或赋值，并且对 MultiIndex 使用整数索引器时出现的错误，导致读取和写入错误的索引 (GH 17148)
在 .isin() 中检查空 Series 对象的成员资格时引发错误的错误 (GH 16991)
CategoricalIndex 重索引中的错误，其中包含重复项的指定索引未被尊重 (GH 17323)
RangeIndex 与负步长的交集中的错误 (GH 17296)
在 IntervalIndex 中，对非重叠单调递减索引的包含右端点执行标量查找失败 (GH 16417, GH 17271)
当没有有效条目时，DataFrame.first_valid_index() 和 DataFrame.last_valid_index() 中的错误 (GH 17400)
当使用可调用对象调用 Series.rename() 时存在一个错误，错误地改变了 Series 的名称，而不是 Index 的名称。(GH 17407)
在 String.str_get() 中的错误在使用负索引时会引发 IndexError 而不是插入 NaNs。(GH 17704)

IO#

在从 fixed 格式 HDFStore 读取时区感知索引时 read_hdf() 中的错误 (GH 17618)
在 read_csv() 中的一个错误，其中列没有被彻底去重 (GH 17060)
在 read_csv() 中的一个错误，其中指定的列名没有被彻底去重 (GH 17095)
在 read_csv() 中的一个错误，其中 header 参数的非整数值生成了一个无帮助/不相关的错误消息 (GH 16338)
在 read_csv() 中的一个错误，在某些条件下，异常处理中的内存管理问题会导致解释器崩溃 (GH 14696, GH 16798)。
在调用 read_csv() 时，如果使用 low_memory=False，当 CSV 文件中至少有一列大小超过 2GB 时，会错误地引发 MemoryError (GH 16798)。
当使用单元素列表 header 调用 read_csv() 时，会出现所有值均为 NaN 的 DataFrame 的错误 (GH 7757)
在Python 3中，DataFrame.to_csv() 默认使用 ‘ascii’ 编码，而不是 ‘utf-8’ (GH 17097)
在 read_stata() 中的错误，当使用迭代器时无法读取值标签 (GH 16923)
在 read_stata() 中存在一个错误，其中索引未设置 (GH 16342)
在多线程运行时导入检查失败的 read_html() 中的错误 (GH 16928)
在 read_csv() 中的一个错误，当遇到坏行时，自动分隔符检测导致抛出 TypeError 而不是正确的错误消息 (GH 13374)
在 notebook=True 的情况下，DataFrame.to_html() 中的一个错误，其中具有命名索引或非MultiIndex索引的DataFrame在列或行标签上分别有不希望的水平或垂直对齐 (GH 16792)
在 DataFrame.to_html() 中的一个错误，其中没有对 justify 参数进行验证 (GH 17527)
在读取包含 VLArray 的连续混合数据表时，HDFStore.select() 中的错误 (GH 17021)
在 to_json() 中的一个错误，其中几种情况（包括带有不可打印符号的对象、带有深度递归的对象、过长的标签）导致段错误，而不是引发适当的异常 (GH 14256)

绘图#

在使用 secondary_y 和 fontsize 的绘图方法中存在一个错误，未设置次轴字体大小 (GH 12565)
在y轴上绘制 timedelta 和 datetime 数据类型时出现错误 (GH 16953)
在计算xlims时，线图不再假设x数据是单调的，即使对于未排序的x数据，现在也会显示整个线图。(GH 11310, GH 11471)
在 matplotlib 2.0.0 及以上版本中，线图的 x 轴范围计算交由 matplotlib 处理，以便应用其新的默认设置。(GH 15495)
在 Series.plot.bar 或 DataFrame.plot.bar 中存在一个错误，即 y 不尊重用户传递的 color (GH 16822)
导致 plotting.parallel_coordinates 在使用随机颜色时重置随机种子的错误 (GH 17525)

GroupBy/重采样/滚动#

DataFrame.resample(...).size() 中的错误，其中空 DataFrame 没有返回 Series (GH 14962)
在 infer_freq() 中的错误导致在工作日内有2天间隔的索引被错误地推断为每日业务 (GH 16624)
.rolling(...).quantile() 中的错误，使用了与 Series.quantile() 和 DataFrame.quantile() 不同的默认值 (GH 9413, GH 16211)
groupby.transform() 中的一个错误，会将布尔类型强制转换回浮点型 (GH 16875)
Series.resample(...).apply() 中的一个错误，其中空的 Series 修改了源索引并且没有返回 Series 的名称 (GH 14313)
在具有 DatetimeIndex 的 DataFrame 中，使用可转换为 timedelta 的 window 和 min_periods >= 1 时，.rolling(...).apply(...) 中的错误 (GH 15305)
在 DataFrame.groupby 中的一个错误，当键的数量等于 groupby 轴上的元素数量时，索引和列键未被正确识别 (GH 16859)
groupby.nunique() 中 TimeGrouper 无法正确处理 NaT 的错误 (GH 17575)
DataFrame.groupby 中的一个错误，从 MultiIndex 中选择单个级别时意外排序 (GH 17537)
DataFrame.groupby 中的错误，当使用 Grouper 对象覆盖模糊的列名时会引发虚假警告 (GH 17383)
当以列表和标量传递时，TimeGrouper 中的错误不同 (GH 17530)

Sparse#

当以字典作为数据传入时，SparseSeries 中的错误会引发 AttributeError (GH 16905)
在从SciPy稀疏矩阵实例化的:func:SparseDataFrame.fillna`中存在一个错误，当框架从SciPy稀疏矩阵实例化时，未能填充所有NaN (:issue:`16112)
在 SparseSeries.unstack() 和 SparseDataFrame.stack() 中的错误 (GH 16614, GH 15045)
在 make_sparse() 中处理两个数值/布尔数据时，当数组 dtype 为 object 时，将具有相同位的数据视为相同 (GH 17574)
SparseArray.all() 和 SparseArray.any() 现在已实现以处理 SparseArray，这些方法曾被使用但未实现 (GH 17570)

Reshaping#

与非唯一 PeriodIndex 合并/连接时引发 TypeError (GH 16871)
在 crosstab() 中的错误，其中未对齐的整数系列被转换为浮点数 (GH 17005)
在合并具有日期时间类型的分类数据时，错误地引发了 TypeError (GH 16900)
在使用 isin() 处理大型对象系列和大型比较数组时出现错误 (GH 16012)
修复了从0.20版本开始的回归问题，Series.aggregate() 和 DataFrame.aggregate() 再次允许字典作为返回值（GH 16741）
修复了使用整数类型输入时，从 pivot_table() 调用 margins=True 时的结果的 dtype (GH 17013)
在 crosstab() 中的一个错误，当传递两个同名 Series 时会引发 KeyError (GH 13279)
Series.argmin(), Series.argmax(), 以及它们在 DataFrame 和 groupby 对象上的对应函数在包含无限值的浮点数据中正确工作 (GH 13595)。
在 unique() 中的一个错误，检查字符串元组时引发了一个 TypeError (GH 17108)
在 concat() 中的错误，如果结果索引包含不可比较的元素，则结果索引的顺序是不可预测的 (GH 17344)
修复了在包含 NaT 值的 datetime64 dtype Series 上按多列排序时的回归问题 (GH 16836)
在 pivot_table() 中的一个错误，当 dropna 为 False 时，结果的列没有保留 columns 的分类数据类型 (GH 17842)
在 DataFrame.drop_duplicates 中的一个错误，当使用非唯一列名删除时会引发 ValueError (GH 17836)
在 unstack() 中的一个错误，当在调用一个层级列表时，会丢弃 fillna 参数 (GH 13971)
range 对象和其他类似列表的对象与 DataFrame 对齐中的错误，导致操作按行执行而不是按列执行 (GH 17901)

Numeric#

在使用 axis=1 和 threshold 传递类似列表时，.clip() 中的错误；之前这会引发 ValueError (GH 15390)
Series.clip() 和 DataFrame.clip() 现在将 upper 和 lower 参数的 NA 值视为 None 而不是引发 ValueError (GH 17276)。

Categorical#

当使用分类调用 Series.isin() 时出现的错误 (GH 16639)
在分类构造函数中，当值和类别为空时，导致 .categories 成为一个空的 Float64Index 而不是一个带有对象数据类型的空的 Index (GH 17248)
在带有 Series.cat 的分类操作中的错误，未保留原始 Series 的名称 (GH 17509)
在 DataFrame.merge() 中对布尔/整数数据类型的分类列失败的错误 (GH 17187)
在构建 Categorical/CategoricalDtype 时，当指定的 categories 是分类类型时的错误 (GH 17884)。

PyPy#

read_csv() 与 usecols=[<未排序的整数>] 和 read_json() 在 PyPy 上的兼容性 (GH 17351)
根据需要将测试分为 CPython 和 PyPy 的用例，这突显了使用 float('nan')、np.nan 和 NAT 进行索引匹配的脆弱性（GH 17351）
修复 DataFrame.memory_usage() 以支持 PyPy。PyPy 上的对象没有固定大小，因此使用近似值代替 (GH 17228)

其他#

某些就地操作符在调用时未被包装并产生了一个副本的问题 (GH 12962)
在 eval() 中的错误，其中 inplace 参数被错误处理 (GH 16732)

贡献者#

总共有206人为此版本贡献了补丁。名字旁边有“+”的人首次贡献了补丁。

3553x +
Aaron Barber
Adam Gleave +
Adam Smith +
AdamShamlian +
Adrian Liaw +
Alan Velasco +
Alan Yee +
Alex B +
Alex Lubbock +
Alex Marchenko +
Alex Rychyk +
Amol K +
Andreas Winkler
Andrew +
Andrew 亮
André Jonasson +
Becky Sweger
Berkay +
Bob Haffner +
Bran Yang
Brian Tu +
Brock Mendel +
Carol Willing +
Carter Green +
Chankey Pathak +
Chris
Chris Billington
Chris Filo Gorgolewski +
Chris Kerr
Chris M +
Chris Mazzullo +
Christian Prinoth
Christian Stade-Schuldt
Christoph Moehl +
DSM
Daniel Chen +
Daniel Grady
Daniel Himmelstein
Dave Willmer
David Cook
David Gwynne
David Read +
Dillon Niederhut +
Douglas Rudd
Eric Stein +
Eric Wieser +
Erik Fredriksen
Florian Wilhelm +
Floris Kint +
Forbidden Donut
Gabe F +
Giftlin +
Giftlin Rajaiah +
Giulio Pepe +
Guilherme Beltramini
Guillem Borrell +
Hanmin Qin +
Hendrik Makait +
Hugues Valois
Hussain Tamboli +
Iva Miholic +
Jan Novotný +
Jan Rudolph
Jean Helie +
Jean-Baptiste Schiratti +
Jean-Mathieu Deschenes
Jeff Knupp +
Jeff Reback
Jeff Tratner
JennaVergeynst
JimStearns206
Joel Nothman
John W. O’Brien
Jon Crall +
Jon Mease
Jonathan J. Helmus +
Joris Van den Bossche
JosephWagner
Juarez Bochi
Julian Kuhlmann +
Karel De Brabandere
Kassandra Keeton +
Keiron Pizzey +
Keith Webber
Kernc
Kevin Sheppard
Kirk Hansen +
Licht Takeuchi +
Lucas Kushner +
Mahdi Ben Jelloul +
Makarov Andrey +
Malgorzata Turzanska +
Marc Garcia +
Margaret Sy +
MarsGuy +
Matt Bark +
Matthew Roeschke
Matti Picus
Mehmet Ali “Mali” Akmanalp
Michael Gasvoda +
Michael Penkov +
Milo +
Morgan Stuart +
Morgan243 +
Nathan Ford +
Nick Eubank
Nick Garvey +
Oleg Shteynbuk +
P-Tillmann +
Pankaj Pandey
Patrick Luo
Patrick O’Melveny
Paul Reidy +
Paula +
Peter Quackenbush
Peter Yanovich +
Phillip Cloud
Pierre Haessig
Pietro Battiston
Pradyumna Reddy Chinthala
Prasanjit Prakash
RobinFiveWords
Ryan Hendrickson
Sam Foo
Sangwoong Yoon +
Simon Gibbons +
SimonBaron
Steven Cutting +
Sudeep +
Sylvia +
T N +
Telt
Thomas A Caswell
Tim Swast +
Tom Augspurger
Tong SHEN
Tuan +
Utkarsh Upadhyay +
Vincent La +
Vivek +
WANG Aiyong
WBare
Wes McKinney
XF +
Yi Liu +
Yosuke Nakabayashi +
aaron315 +
abarber4gh +
aernlund +
agustín méndez +
andymaheshw +
ante328 +
aviolov +
bpraggastis
cbertinato +
cclauss +
chernrick
chris-b1
dkamm +
dwkenefick
economy
faic +
fding253 +
gfyoung
guygoldberg +
hhuuggoo +
huashuai +
ian
iulia +
jaredsnyder
jbrockmendel +
jdeschenes
jebob +
jschendel +
keitakurita
kernc +
kiwirob +
kjford
linebp
lloydkirk
louispotok +
majiang +
manikbhandari +
margotphoenix +
matthiashuschle +
mattip
mjlove12 +
nmartensen +
pandas-docs-bot +
parchd-1 +
philipphanemann +
rdk1024 +
reidy-p +
ri938
ruiann +
rvernica +
s-weigand +
scotthavard92 +
skwbc +
step4me +
tobycheese +
topper-123 +
tsdlovell
ysau +
zzgao +

版本 0.21.0 (2017年10月27日)#

新功能#

与 Apache Parquet 文件格式的集成#

方法 infer_objects 类型转换#

在尝试创建列时改进了警告#

方法 drop 现在也接受 index/columns 关键字#

方法 rename, reindex 现在也接受 axis 关键字#

CategoricalDtype 用于指定分类数据#

GroupBy 对象现在有一个 pipe 方法#

Categorical.rename_categories 接受一个类似字典的对象#

其他增强功能#

新函数或方法#

新关键词#

各种增强#

向后不兼容的 API 变化#

依赖项已增加最低版本#

所有-NaN 或空 Series/DataFrames 的总和/乘积现在一致为 NaN#

使用带有缺失标签的列表进行索引已被弃用#

NA 命名更改#

现在，对 Series/Index 的迭代将返回 Python 标量#

使用布尔索引进行索引#

PeriodIndex 重采样#

在 pd.eval 中的项目分配期间改进了错误处理#

Dtype 转换#

MultiIndex 构造函数使用单个级别#

使用 Series 进行 UTC 本地化#

范围函数的一致性#

没有自动的 Matplotlib 转换器#

其他API更改#

弃用#

Series.select 和 DataFrame.select#

Series.argmax 和 Series.argmin#

移除先前版本的弃用/更改#

性能提升#

文档更改#

错误修复#

转换#

索引#

IO#

绘图#

GroupBy/重采样/滚动#

Sparse#

Reshaping#

Numeric#

Categorical#

PyPy#

其他#

贡献者#

方法 `infer_objects` 类型转换#

方法 `drop` 现在也接受 index/columns 关键字#

方法 `rename`, `reindex` 现在也接受 axis 关键字#

`CategoricalDtype` 用于指定分类数据#

`GroupBy` 对象现在有一个 `pipe` 方法#

`Categorical.rename_categories` 接受一个类似字典的对象#

`PeriodIndex` 重采样#