1.0.0 中的新功能（2020年1月29日）#

这是 pandas 1.0.0 中的更改。请参阅发行说明以获取包括其他版本 pandas 的完整更新日志。

备注

pandas 1.0 版本移除了许多在前几个版本中已弃用的功能（概览请参见下方）。建议首先升级到 pandas 0.25 并确保您的代码在没有警告的情况下正常工作，然后再升级到 pandas 1.0。

新的弃用政策#

从 pandas 1.0.0 开始，pandas 将采用 SemVer 的一个变体来版本发布。简而言之，

弃用将在次要版本中引入（例如 1.1.0, 1.2.0, 2.1.0, …）
弃用将在主要版本中强制执行（例如 1.0.0, 2.0.0, 3.0.0, …）
API 破坏性更改将仅在主要版本中进行（实验性功能除外）

更多信息请参见版本策略。

增强功能#

在 `rolling.apply` 和 `expanding.apply` 中使用 Numba#

我们在 apply() 和 apply() 中添加了一个 engine 关键字，允许用户使用 Numba 而不是 Cython 执行例程。如果应用函数可以操作 numpy 数组并且数据集较大（100 万行或更多），使用 Numba 引擎可以显著提高性能。更多详情，请参见 rolling apply 文档 (GH 28987, GH 30936)

为滚动操作定义自定义窗口#

我们添加了一个 pandas.api.indexers.BaseIndexer() 类，允许用户定义在 rolling 操作期间如何创建窗口边界。用户可以在 pandas.api.indexers.BaseIndexer() 子类上定义自己的 get_window_bounds 方法，该方法将在滚动聚合期间为每个窗口生成使用的起始和结束索引。有关更多详细信息和示例用法，请参阅自定义窗口滚动文档

转换为 markdown#

我们添加了 to_markdown() 用于创建一个 markdown 表格 (GH 11052)

In [1]: df = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3]}, index=['a', 'a', 'b'])

In [2]: print(df.to_markdown())
|    |   A |   B |
|:---|----:|----:|
| a  |   1 |   1 |
| a  |   2 |   2 |
| b  |   3 |   3 |

实验性新功能#

实验性的 `NA` 标量用于表示缺失值#

引入了一个新的 pd.NA 值（单例）来表示标量缺失值。迄今为止，pandas 使用多个值来表示缺失数据：浮点数据使用 np.nan，对象数据类型使用 np.nan 或 None，日期时间类数据使用 pd.NaT。pd.NA 的目标是提供一个可以在所有数据类型中一致使用的“缺失”指示符。pd.NA 目前被可空整数和布尔数据类型以及新的字符串数据类型使用（GH 28095）。

警告

实验性：pd.NA 的行为仍可能在没有警告的情况下发生变化。

例如，使用可空整数类型创建一个 Series：

In [3]: s = pd.Series([1, 2, None], dtype="Int64")

In [4]: s
Out[4]: 
0       1
1       2
2    <NA>
dtype: Int64

In [5]: s[2]
Out[5]: <NA>

与 np.nan 相比，pd.NA 在某些操作中的行为有所不同。除了算术操作外，pd.NA 在比较操作中也会传播为“缺失”或“未知”：

In [6]: np.nan > 1
Out[6]: False

In [7]: pd.NA > 1
Out[7]: <NA>

对于逻辑操作，pd.NA 遵循三值逻辑（或 Kleene逻辑）的规则。例如：

In [8]: pd.NA | True
Out[8]: True

更多信息，请参阅用户指南中关于缺失数据的 NA 部分。

专用的字符串数据类型#

我们已经添加了 StringDtype，这是一个专门用于字符串数据的扩展类型。以前，字符串通常存储在 object-dtype 的 NumPy 数组中。(GH 29975)

警告

StringDtype 目前被认为是实验性的。其实现和部分API可能会在没有警告的情况下发生变化。

'string' 扩展类型解决了对象类型 NumPy 数组的几个问题：

你可能会意外地在 object dtype 数组中存储混合字符串和非字符串。一个 StringArray 只能存储字符串。
object dtype 会破坏特定 dtype 的操作，例如 DataFrame.select_dtypes()。目前没有明确的方法来选择仅文本，同时排除非文本但仍然是 object-dtype 的列。
在阅读代码时，object dtype 数组的内容比 string 更不清晰。

In [9]: pd.Series(['abc', None, 'def'], dtype=pd.StringDtype())
Out[9]: 
0     abc
1    <NA>
2     def
dtype: string

你也可以使用别名 "string"。

In [10]: s = pd.Series(['abc', None, 'def'], dtype="string")

In [11]: s
Out[11]: 
0     abc
1    <NA>
2     def
dtype: string

通常的字符串访问方法仍然有效。在适当的情况下，Series 或 DataFrame 列的返回类型也将具有字符串 dtype。

In [12]: s.str.upper()
Out[12]: 
0     ABC
1    <NA>
2     DEF
dtype: string

In [13]: s.str.split('b', expand=True).dtypes
Out[13]: 
0    string[python]
1    string[python]
dtype: object

返回整数的字符串访问器方法将返回一个具有 Int64Dtype 的值

In [14]: s.str.count("a")
Out[14]: 
0       1
1    <NA>
2       0
dtype: Int64

我们建议在使用字符串时显式使用 string 数据类型。更多信息请参见文本数据类型。

支持缺失值的布尔数据类型#

我们已经添加了 BooleanDtype / BooleanArray，这是一个专门用于可以持有缺失值的布尔数据的扩展类型。基于布尔型NumPy数组的默认 bool 数据类型，该列只能持有 True 或 False，而不能持有缺失值。这个新的 BooleanArray 可以通过在单独的掩码中跟踪来存储缺失值。(GH 29555, GH 30095, GH 31131)

In [15]: pd.Series([True, False, None], dtype=pd.BooleanDtype())
Out[15]: 
0     True
1    False
2     <NA>
dtype: boolean

你也可以使用别名 "boolean"。

In [16]: s = pd.Series([True, False, None], dtype="boolean")

In [17]: s
Out[17]: 
0     True
1    False
2     <NA>
dtype: boolean

方法 `convert_dtypes` 以简化对支持的扩展数据类型的使用#

为了鼓励使用支持 pd.NA 的扩展数据类型 StringDtype、BooleanDtype、Int64Dtype、Int32Dtype 等，引入了方法 DataFrame.convert_dtypes() 和 Series.convert_dtypes()。(GH 29752) (GH 30929)

示例：

In [18]: df = pd.DataFrame({'x': ['abc', None, 'def'],
   ....:                    'y': [1, 2, np.nan],
   ....:                    'z': [True, False, True]})
   ....: 

In [19]: df
Out[19]: 
      x    y      z
0   abc  1.0   True
1  None  2.0  False
2   def  NaN   True

In [20]: df.dtypes
Out[20]: 
x     object
y    float64
z       bool
dtype: object

In [21]: converted = df.convert_dtypes()

In [22]: converted
Out[22]: 
      x     y      z
0   abc     1   True
1  <NA>     2  False
2   def  <NA>   True

In [23]: converted.dtypes
Out[23]: 
x    string[python]
y             Int64
z           boolean
dtype: object

这在使用诸如 read_csv() 和 read_excel() 的读取器读取数据后特别有用。描述见这里。

其他增强功能#

DataFrame.to_string() 添加了 max_colwidth 参数来控制宽列何时被截断 (GH 9784)
为 Series.to_numpy()、Index.to_numpy() 和 DataFrame.to_numpy() 添加了 na_value 参数，以控制用于缺失数据的值 (GH 30322)
MultiIndex.from_product() 如果没有明确提供，则从输入中推断级别名称（GH 27292）
DataFrame.to_latex() 现在接受 caption 和 label 参数 (GH 25436)
带有可空整数的 DataFrame、新的字符串数据类型和周期数据类型现在可以转换为 pyarrow (>=0.15.0)，这意味着在使用 pyarrow 引擎时支持写入 Parquet 文件格式 (GH 28368)。从 pyarrow >= 0.16 开始，完全往返于 parquet（使用 to_parquet() / read_parquet() 进行写入和读回）得到支持 (GH 20612)。
to_parquet() 现在适当地处理了 pyarrow 引擎中用户定义模式的 schema 参数。(GH 30270)
DataFrame.to_json() 现在接受一个 indent 整数参数来启用 JSON 输出的漂亮打印 (GH 12004)
read_stata() 可以读取 Stata 119 dta 文件。(GH 28250)
实现了 Window.var() 和 Window.std() 函数 (GH 26597)
为 DataFrame.to_string() 添加了 encoding 参数以处理非 ASCII 文本 (GH 28766)
为 DataFrame.to_html() 添加了 encoding 参数以处理非 ascii 文本 (GH 28663)
Styler.background_gradient() 现在接受 vmin 和 vmax 参数 (GH 12145)
Styler.format() 添加了 na_rep 参数以帮助格式化缺失值 (GH 21527, GH 28358)
read_excel() 现在可以通过传递 engine='pyxlsb' 来读取二进制 Excel (.xlsb) 文件。有关更多详细信息和示例用法，请参阅二进制 Excel 文件文档。关闭 GH 8540。
在 DataFrame.to_parquet() 中的 partition_cols 参数现在接受一个字符串 (GH 27117)
pandas.read_json() 现在解析 NaN、Infinity 和 -Infinity (GH 12213)
DataFrame 构造函数保留 ExtensionArray 的 dtype 与 ExtensionArray (GH 11363)
DataFrame.sort_values() 和 Series.sort_values() 增加了 ignore_index 关键字，以便能够在排序后重置索引 (GH 30114)
DataFrame.sort_index() 和 Series.sort_index() 增加了 ignore_index 关键字以重置索引 (GH 30114)
DataFrame.drop_duplicates() 增加了 ignore_index 关键字以重置索引 (GH 30114)
新增了用于导出 Stata dta 文件版本 118 和 119 的 StataWriterUTF8 写入器。这些文件格式支持导出包含 Unicode 字符的字符串。版本 119 支持包含超过 32,767 个变量的数据集 (GH 23573, GH 30959)
Series.map() 现在接受 collections.abc.Mapping 子类作为映射器 (GH 29733)
添加了一个实验性的 attrs 用于存储关于数据集的全局元数据 (GH 29062)
Timestamp.fromisocalendar() 现在兼容 Python 3.8 及以上版本 (GH 28115)
DataFrame.to_pickle() 和 read_pickle() 现在接受 URL (GH 30163)

向后不兼容的 API 变化#

避免使用 `MultiIndex.levels` 中的名称#

作为对 MultiIndex 进行更大重构的一部分，级别名称现在与级别分开存储 (GH 27242)。我们建议使用 MultiIndex.names 来访问名称，并使用 Index.set_names() 来更新名称。

为了向后兼容，你仍然可以通过级别访问名称。

In [24]: mi = pd.MultiIndex.from_product([[1, 2], ['a', 'b']], names=['x', 'y'])

In [25]: mi.levels[0].name
Out[25]: 'x'

然而，通过级别更新 MultiIndex 的名称不再可能。

In [26]: mi.levels[0].name = "new name"
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[26], line 1
----> 1 mi.levels[0].name = "new name"

File /home/pandas/pandas/core/indexes/base.py:1743, in Index.name(self, value)
   1739 @name.setter
   1740 def name(self, value: Hashable) -> None:
   1741     if self._no_setting_name:
   1742         # Used in MultiIndex.levels to avoid silently ignoring name updates.
-> 1743         raise RuntimeError(
   1744             "Cannot set name on a level of a MultiIndex. Use "
   1745             "'MultiIndex.set_names' instead."
   1746         )
   1747     maybe_extract_name(value, None, type(self))
   1748     self._name = value

RuntimeError: Cannot set name on a level of a MultiIndex. Use 'MultiIndex.set_names' instead.

In [27]: mi.names
Out[27]: FrozenList(['x', 'y'])

要更新，使用 MultiIndex.set_names ，它会返回一个新的 MultiIndex 。

In [28]: mi2 = mi.set_names("new name", level=0)

In [29]: mi2.names
Out[29]: FrozenList(['new name', 'y'])

新的 repr 用于 `IntervalArray`#

pandas.arrays.IntervalArray 采用了新的 __repr__ 以符合其他数组类的标准（GH 25022）

pandas 0.25.x

In [1]: pd.arrays.IntervalArray.from_tuples([(0, 1), (2, 3)])
Out[2]:
IntervalArray([(0, 1], (2, 3]],
              closed='right',
              dtype='interval[int64]')

pandas 1.0.0

In [30]: pd.arrays.IntervalArray.from_tuples([(0, 1), (2, 3)])
Out[30]: 
<IntervalArray>
[(0, 1], (2, 3]]
Length: 2, dtype: interval[int64, right]

`DataFrame.rename` 现在只接受一个位置参数#

DataFrame.rename() 以前接受会导致模糊或未定义行为的定位参数。从 pandas 1.0 开始，只允许传递第一个参数，该参数沿默认轴将标签映射到新名称，通过位置传递 (GH 29136)。

pandas 0.25.x

In [1]: df = pd.DataFrame([[1]])
In [2]: df.rename({0: 1}, {0: 2})
Out[2]:
FutureWarning: ...Use named arguments to resolve ambiguity...
   2
1  1

pandas 1.0.0

In [3]: df.rename({0: 1}, {0: 2})
Traceback (most recent call last):
...
TypeError: rename() takes from 1 to 2 positional arguments but 3 were given

请注意，当提供冲突或可能引起歧义的参数时，现在会引发错误。

pandas 0.25.x

In [4]: df.rename({0: 1}, index={0: 2})
Out[4]:
   0
1  1

In [5]: df.rename(mapper={0: 1}, index={0: 2})
Out[5]:
   0
2  1

pandas 1.0.0

In [6]: df.rename({0: 1}, index={0: 2})
Traceback (most recent call last):
...
TypeError: Cannot specify both 'mapper' and any of 'index' or 'columns'

In [7]: df.rename(mapper={0: 1}, index={0: 2})
Traceback (most recent call last):
...
TypeError: Cannot specify both 'mapper' and any of 'index' or 'columns'

您仍然可以通过提供 axis 关键字参数来更改第一个位置参数应用的轴。

In [31]: df.rename({0: 1})
Out[31]: 
   0
1  1

In [32]: df.rename({0: 1}, axis=1)
Out[32]: 
   1
0  1

如果你想要更新索引和列标签，请确保使用相应的关键字。

In [33]: df.rename(index={0: 1}, columns={0: 2})
Out[33]: 
   2
1  1

扩展的详细信息输出为 `DataFrame`#

DataFrame.info() 现在显示列摘要的行号（GH 17304）

pandas 0.25.x

In [1]: df = pd.DataFrame({"int_col": [1, 2, 3],
...                    "text_col": ["a", "b", "c"],
...                    "float_col": [0.0, 0.1, 0.2]})
In [2]: df.info(verbose=True)
<class 'pandas.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
int_col      3 non-null int64
text_col     3 non-null object
float_col    3 non-null float64
dtypes: float64(1), int64(1), object(1)
memory usage: 152.0+ bytes

pandas 1.0.0

In [34]: df = pd.DataFrame({"int_col": [1, 2, 3],
   ....:                    "text_col": ["a", "b", "c"],
   ....:                    "float_col": [0.0, 0.1, 0.2]})
   ....: 

In [35]: df.info(verbose=True)
<class 'pandas.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   int_col    3 non-null      int64  
 1   text_col   3 non-null      object 
 2   float_col  3 non-null      float64
dtypes: float64(1), int64(1), object(1)
memory usage: 200.0+ bytes

`pandas.array()` 推断变化#

pandas.array() 现在在几种情况下推断 pandas 的新扩展类型 (GH 29791):

字符串数据（包括缺失值）现在返回一个 arrays.StringArray。
整数数据（包括缺失值）现在返回一个 arrays.IntegerArray。
布尔数据（包括缺失值）现在返回新的 arrays.BooleanArray

pandas 0.25.x

In [1]: pd.array(["a", None])
Out[1]:
<PandasArray>
['a', None]
Length: 2, dtype: object

In [2]: pd.array([1, None])
Out[2]:
<PandasArray>
[1, None]
Length: 2, dtype: object

pandas 1.0.0

In [36]: pd.array(["a", None])
Out[36]: 
<StringArray>
['a', <NA>]
Length: 2, dtype: string

In [37]: pd.array([1, None])
Out[37]: 
<IntegerArray>
[1, <NA>]
Length: 2, dtype: Int64

提醒一下，你可以指定 dtype 来禁用所有推断。

`arrays.IntegerArray` 现在使用 `pandas.NA`#

arrays.IntegerArray 现在使用 pandas.NA 而不是 numpy.nan 作为其缺失值标记 (GH 29964)。

pandas 0.25.x

In [1]: a = pd.array([1, 2, None], dtype="Int64")
In [2]: a
Out[2]:
<IntegerArray>
[1, 2, NaN]
Length: 3, dtype: Int64

In [3]: a[2]
Out[3]:
nan

pandas 1.0.0

In [38]: a = pd.array([1, 2, None], dtype="Int64")

In [39]: a
Out[39]: 
<IntegerArray>
[1, 2, <NA>]
Length: 3, dtype: Int64

In [40]: a[2]
Out[40]: <NA>

这有一些API破坏性的后果。

转换为 NumPy ndarray

当转换为NumPy数组时，缺失值将是 pd.NA ，这不能转换为浮点数。因此，调用 np.asarray(integer_array, dtype="float") 现在将引发错误。

pandas 0.25.x

In [1]: np.asarray(a, dtype="float")
Out[1]:
array([ 1.,  2., nan])

pandas 1.0.0

In [41]: np.asarray(a, dtype="float")
Out[41]: array([ 1.,  2., nan])

使用 arrays.IntegerArray.to_numpy() 并显式指定 na_value。

In [42]: a.to_numpy(dtype="float", na_value=np.nan)
Out[42]: array([ 1.,  2., nan])

归约可以返回 pd.NA

当执行如 skipna=False 的求和等缩减操作时，在存在缺失值的情况下，结果现在将是 pd.NA 而不是 np.nan (GH 30958)。

pandas 0.25.x

In [1]: pd.Series(a).sum(skipna=False)
Out[1]:
nan

pandas 1.0.0

In [43]: pd.Series(a).sum(skipna=False)
Out[43]: <NA>

value_counts 返回一个可为空的整数类型

Series.value_counts() 使用可空整数类型现在返回一个可空整数类型的值。

pandas 0.25.x

In [1]: pd.Series([2, 1, 1, None], dtype="Int64").value_counts().dtype
Out[1]:
dtype('int64')

pandas 1.0.0

In [44]: pd.Series([2, 1, 1, None], dtype="Int64").value_counts().dtype
Out[44]: Int64Dtype()

有关 pandas.NA 和 numpy.nan 之间差异的更多信息，请参见 NA 语义。

`arrays.IntegerArray` 比较返回 `arrays.BooleanArray`#

对 arrays.IntegerArray 的比较操作现在返回 arrays.BooleanArray 而不是 NumPy 数组 (GH 29964)。

pandas 0.25.x

In [1]: a = pd.array([1, 2, None], dtype="Int64")
In [2]: a
Out[2]:
<IntegerArray>
[1, 2, NaN]
Length: 3, dtype: Int64

In [3]: a > 1
Out[3]:
array([False,  True, False])

pandas 1.0.0

In [45]: a = pd.array([1, 2, None], dtype="Int64")

In [46]: a > 1
Out[46]: 
<BooleanArray>
[False, True, <NA>]
Length: 3, dtype: boolean

注意，缺失值现在会传播，而不是像 numpy.nan 那样总是比较不相等。更多信息请参见 NA 语义。

默认情况下，`Categorical.min()` 现在返回最小值而不是 np.nan#

当 Categorical 包含 np.nan 时，Categorical.min() 默认情况下不再返回 np.nan (skipna=True) (GH 25303)

pandas 0.25.x

In [1]: pd.Categorical([1, 2, np.nan], ordered=True).min()
Out[1]: nan

pandas 1.0.0

In [47]: pd.Categorical([1, 2, np.nan], ordered=True).min()
Out[47]: 1

空 `pandas.Series` 的默认数据类型#

初始化一个没有指定数据类型的空 pandas.Series 现在会引发一个 DeprecationWarning (GH 17261)。默认数据类型将从 float64 在未来版本中更改为 object，以便与 DataFrame 和 Index 的行为保持一致。

pandas 1.0.0

In [1]: pd.Series()
Out[2]:
DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
Series([], dtype: float64)

重采样操作的结果数据类型推断变化#

DataFrame.resample() 聚合中的结果 dtype 规则已为扩展类型更改（GH 31359）。以前，pandas 会尝试将结果转换回原始 dtype，如果不可能，则回退到通常的推断规则。现在，pandas 只有在结果中的标量值是扩展 dtype 的标量类型实例时，才会返回原始 dtype 的结果。

In [48]: df = pd.DataFrame({"A": ['a', 'b']}, dtype='category',
   ....:                   index=pd.date_range('2000', periods=2))
   ....: 

In [49]: df
Out[49]: 
            A
2000-01-01  a
2000-01-02  b

pandas 0.25.x

In [1]> df.resample("2D").agg(lambda x: 'a').A.dtype
Out[1]:
CategoricalDtype(categories=['a', 'b'], ordered=False)

pandas 1.0.0

In [50]: df.resample("2D").agg(lambda x: 'a').A.dtype
Out[50]: CategoricalDtype(categories=['a', 'b'], ordered=False, categories_dtype=object)

这修复了 resample 和 groupby 之间不一致的问题。这也修复了一个潜在的bug，其中结果的值可能会根据结果如何被强制转换回原始数据类型而改变。

pandas 0.25.x

In [1] df.resample("2D").agg(lambda x: 'c')
Out[1]:

     A
0  NaN

pandas 1.0.0

In [51]: df.resample("2D").agg(lambda x: 'c')
Out[51]: 
            A
2000-01-01  c

增加 Python 的最低版本要求#

pandas 1.0.0 支持 Python 3.6.1 及以上版本 (GH 29212)。

增加了依赖项的最小版本#

一些依赖项的最低支持版本已更新（GH 29766, GH 29723）。如果已安装，我们现在要求：

包	最低版本	必需的
numpy	1.13.3	X
pytz	2015.4	X
python-dateutil	2.6.1	X
瓶颈	1.2.1
numexpr	2.6.2
pytest (开发版)	4.0.2

对于可选库，一般的建议是使用最新版本。下表列出了每个库在 pandas 开发过程中当前测试的最低版本。低于最低测试版本的可选库可能仍然有效，但不被视为支持。

包	最低版本	Changed
beautifulsoup4	4.6.0
fastparquet	0.3.2	X
gcsfs	0.2.2
lxml	3.8.0
matplotlib	2.2.2
numba	0.46.0	X
openpyxl	2.5.7	X
pyarrow	0.13.0	X
pymysql	0.7.1
pytables	3.4.2
s3fs	0.3.0	X
scipy	0.19.0
sqlalchemy	1.1.4
xarray	0.8.2
xlrd	1.1.0
xlsxwriter	0.9.8
xlwt	1.2.0

更多信息请参见依赖项和可选依赖项。

构建变化#

pandas 添加了一个 pyproject.toml 文件，并且将不再在上传到 PyPI 的源代码分发中包含 cythonized 文件 (GH 28341, GH 20775)。如果你是通过构建分发（wheel）或通过 conda 安装的，这对你应该没有任何影响。如果你是从源代码构建 pandas，你应该不再需要在调用 pip install pandas 之前在构建环境中安装 Cython。

其他 API 更改#

DataFrameGroupBy.transform() 和 SeriesGroupBy.transform() 现在在无效的操作名称上引发 (GH 27489)
pandas.api.types.infer_dtype() 现在将返回“integer-na”用于整数和 np.nan 混合 (GH 27283)
MultiIndex.from_arrays() 如果明确提供 names=None，将不再从数组中推断名称 (GH 27292)
为了改进标签补全，pandas 在使用 dir 内省 pandas 对象时（例如 dir(df)），不包括大多数已弃用的属性。要查看哪些属性被排除，请查看对象的 _deprecations 属性，例如 pd.DataFrame._deprecations (GH 28805)。
unique() 返回的 dtype 现在与输入 dtype 匹配。(GH 27874)
将 options.matplotlib.register_converters 的默认配置值从 True 改为 "auto" (GH 18720)。现在，pandas 自定义格式器将仅应用于通过 plot() 创建的图表。以前，pandas 的格式器会在 plot() 之后创建的所有图表中应用。更多信息请参见单位注册。
Series.dropna() 已经取消了 **kwargs 参数，取而代之的是一个单一的 how 参数。之前提供 how 以外的任何参数给 **kwargs 都会引发 TypeError (GH 29388)
在测试 pandas 时，pytest 的最低要求版本现在是 5.0.1 (GH 29664)
Series.str.__iter__() 已被弃用，并将在未来的版本中移除 (GH 28277)。
为 read_csv() 的默认 NA 值列表添加了 <NA> (GH 30821)

文档改进#

新增了关于扩展到大型数据集的新章节 (GH 28315)。
为 HDF5 数据集添加了关于查询 MultiIndex 的子节 (GH 28791)。

弃用#

Series.item() 和 Index.item() 已被 _undeprecated_ (GH 29250)
Index.set_value 已被弃用。对于给定的索引 idx ，数组 arr ，在 idx 中的值 idx_val 和新的值 val ，idx.set_value(arr, idx_val, val) 等同于 arr[idx.get_loc(idx_val)] = val ，应改用后者 (GH 28621)。
is_extension_type() 已被弃用，应使用 is_extension_array_dtype() 代替 (GH 29457)
eval() 关键字参数 “truediv” 已被弃用，并将在未来版本中移除 (GH 29812)
DateOffset.isAnchored() 和 DatetOffset.onOffset() 已被弃用，并将在未来版本中移除，请改用 DateOffset.is_anchored() 和 DateOffset.is_on_offset() (GH 30340)
pandas.tseries.frequencies.get_offset 已被弃用，并将在未来版本中移除，请改用 pandas.tseries.frequencies.to_offset (GH 4205)
Categorical.take_nd() 和 CategoricalIndex.take_nd() 已被弃用，请使用 Categorical.take() 和 CategoricalIndex.take() 代替 (GH 27745)
参数 numeric_only 的 Categorical.min() 和 Categorical.max() 已被弃用，并替换为 skipna (GH 25303)
在 lreshape() 中的参数 label 已被弃用，并将在未来版本中移除 (GH 29742)
pandas.core.index 已被弃用，并将在未来版本中移除，公共类在顶层命名空间中可用 (GH 19711)
pandas.json_normalize() 现在在顶层命名空间中公开。使用 json_normalize 作为 pandas.io.json.json_normalize 现已弃用，建议使用 json_normalize 作为 pandas.json_normalize() 代替 (GH 27586)。
numpy 参数在 pandas.read_json() 中已被弃用 (GH 28512)。
DataFrame.to_stata(), DataFrame.to_feather(), 和 DataFrame.to_parquet() 参数 “fname” 已弃用，请使用 “path” 代替 (GH 23574)
已弃用的内部属性 _start、_stop 和 _step 的 RangeIndex 现在会引发 FutureWarning 而不是 DeprecationWarning (GH 26581)
pandas.util.testing 模块已被弃用。请使用 pandas.testing 中的公共 API，文档位于断言函数 (GH 16232)。
pandas.SparseArray 已被弃用。请使用 pandas.arrays.SparseArray (arrays.SparseArray) 代替。(GH 30642)
Series.take() 和 DataFrame.take() 的参数 is_copy 已被弃用，并将在未来版本中移除。(GH 27357)
对多维索引的支持（例如 index[:, None]）在 Index 上已被弃用，并将在未来版本中移除，请在索引前转换为 numpy 数组（GH 30588）
pandas.np 子模块现已弃用。请直接导入 numpy 代替 (GH 30296)
pandas.datetime 类现在已被弃用。请从 datetime 导入 (GH 30610)
diff 将在未来引发 TypeError 而不是隐式地丢失扩展类型的dtype。在调用 diff 之前转换为正确的dtype (GH 31025)

从分组DataFrame中选择列

当从 DataFrameGroupBy 对象中选择列时，传递单个键（或键的元组）在单括号内已被弃用，应改为使用项目列表。(GH 23566) 例如：

df = pd.DataFrame({
    "A": ["foo", "bar", "foo", "bar", "foo", "bar", "foo", "foo"],
    "B": np.random.randn(8),
    "C": np.random.randn(8),
})
g = df.groupby('A')

# single key, returns SeriesGroupBy
g['B']

# tuple of single key, returns SeriesGroupBy
g[('B',)]

# tuple of multiple keys, returns DataFrameGroupBy, raises FutureWarning
g[('B', 'C')]

# multiple keys passed directly, returns DataFrameGroupBy, raises FutureWarning
# (implicitly converts the passed strings into a single tuple)
g['B', 'C']

# proper way, returns DataFrameGroupBy
g[['B', 'C']]

移除先前版本的弃用/更改#

已移除 SparseSeries 和 SparseDataFrame

SparseSeries、SparseDataFrame 和 DataFrame.to_sparse 方法已被移除 (GH 28425)。我们建议使用带有稀疏值的 Series 或 DataFrame 代替。

Matplotlib 单位注册

之前，pandas 会在导入 pandas 时作为副作用向 matplotlib 注册转换器（GH 18720）。这改变了在导入 pandas 后通过 matplotlib 制作的图表的输出，即使你直接使用 matplotlib 而不是 plot()。

要在matplotlib图表中使用pandas格式器，请指定

In [1]: import pandas as pd
In [2]: pd.options.plotting.matplotlib.register_converters = True

请注意，由 DataFrame.plot() 和 Series.plot() 创建的图表确实会自动注册转换器。唯一的行为变化是在通过 matplotlib.pyplot.plot 或 matplotlib.Axes.plot 绘制类似日期的对象时。更多信息请参见时间序列图的自定义格式化器。

其他移除

从 read_stata()、StataReader 和 StataReader.read() 中移除了之前已弃用的关键字 “index”，请改用 “index_col” (GH 17328)
移除了 StataReader.data 方法，请改用 StataReader.read() (GH 9493)
移除了 pandas.plotting._matplotlib.tsplot，请使用 Series.plot() 代替 (GH 19980)
pandas.tseries.converter.register 已移动到 pandas.plotting.register_matplotlib_converters() (GH 18307)
Series.plot() 不再接受位置参数，请改为传递关键字参数 (GH 30003)
DataFrame.hist() 和 Series.hist() 不再允许 figsize="default"，请通过传递一个元组来指定图形大小 (GH 30003)
整数类型数组的整除运算 Timedelta 现在会引发 TypeError (GH 21036)
TimedeltaIndex 和 DatetimeIndex 不再接受像 “timedelta64” 或 “datetime64” 这样的非纳秒 dtype 字符串，请改用 “timedelta64[ns]” 和 “datetime64[ns]” (GH 24806)
将 pandas.api.types.infer_dtype() 中的默认 “skipna” 参数从 False 改为 True (GH 24050)
移除了 Series.ix 和 DataFrame.ix (GH 26438)
移除了 Index.summary (GH 18217)
从 Index 构造函数中移除了之前已弃用的关键字 “fastpath” (GH 23110)
移除了 Series.get_value, Series.set_value, DataFrame.get_value, DataFrame.set_value (GH 17739)
移除了 Series.compound 和 DataFrame.compound (GH 26405)
将 DataFrame.set_index() 和 Series.set_axis() 中的默认“inplace”参数从 None 更改为 False (GH 27600)
移除了 Series.cat.categorical, Series.cat.index, Series.cat.name (GH 24751)
从 to_datetime() 和 to_timedelta() 中移除了之前已弃用的关键字 “box”；此外，这些现在总是返回 DatetimeIndex、TimedeltaIndex、Index、Series 或 DataFrame (GH 24486)
to_timedelta(), Timedelta, 和 TimedeltaIndex 不再允许 “M”, “y”, 或 “Y” 作为 “unit” 参数 (GH 23264)
从（非公开的）``offsets.generate_range`` 中移除了之前已弃用的关键字 “time_rule”，该函数已移动到 core.arrays._ranges.generate_range() (GH 24157)
DataFrame.loc() 或 Series.loc() 使用类列表索引器和缺失标签将不再重新索引 (GH 17295)
DataFrame.to_excel() 和 Series.to_excel() 在列不存在的情况下将不再重新索引 (GH 17295)
从 concat() 中移除了之前已弃用的关键字 “join_axes”；请改为在结果上使用 reindex_like (GH 22318)
从 DataFrame.sort_index() 中移除了之前已弃用的关键字 “by”，请改用 DataFrame.sort_values() (GH 10726)
在 DataFrame.aggregate()、Series.aggregate()、core.groupby.DataFrameGroupBy.aggregate()、core.groupby.SeriesGroupBy.aggregate()、core.window.rolling.Rolling.aggregate() 中移除了对嵌套重命名的支持 (GH 18529)
将 datetime64 数据传递给 TimedeltaIndex 或将 timedelta64 数据传递给 DatetimeIndex 现在会引发 TypeError (GH 23539, GH 23937)
将 int64 值传递给 DatetimeIndex 并指定时区现在会将这些值解释为 UTC 中的纳秒时间戳，而不是给定时区中的挂钟时间 (GH 24559)
传递给 DataFrame.groupby() 的元组现在被唯一地视为单个键 (GH 18314)
移除了 Index.contains，请改用 key in index 代替 (GH 30103)
在 Timestamp、DatetimeIndex、TimedeltaIndex 中，不再允许 int 或整数数组的加减法，请使用 obj + n * obj.freq 代替 obj + n (GH 22535)
移除了 Series.ptp (GH 21614)
移除了 Series.from_array (GH 18258)
移除了 DataFrame.from_items (GH 18458)
移除了 DataFrame.as_matrix, Series.as_matrix (GH 18458)
移除了 Series.asobject (GH 18477)
移除了 DataFrame.as_blocks、Series.as_blocks、DataFrame.blocks、Series.blocks (GH 17656)
pandas.Series.str.cat() 现在默认对齐 others，使用 join='left' (GH 27611)
pandas.Series.str.cat() 不再接受列表中的列表 (GH 27611)
Series.where() 使用 Categorical 数据类型（或 DataFrame.where() 使用 Categorical 列）不再允许设置新的类别 (GH 24114)
从 DatetimeIndex、TimedeltaIndex 和 PeriodIndex 构造函数中移除了之前已弃用的关键字 “start”、”end” 和 “periods”；请改用 date_range()、timedelta_range() 和 period_range() (GH 23919)
从 DatetimeIndex 和 TimedeltaIndex 构造函数中移除了之前已弃用的关键字 “verify_integrity” (GH 23919)
从 pandas.core.internals.blocks.make_block 中移除了之前已弃用的关键字 “fastpath” (GH 19265)
从 Block.make_block_same_class() 中移除了之前已弃用的关键字 “dtype” (GH 19434)
移除了 ExtensionArray._formatting_values 。请改用 ExtensionArray._formatter 。 (GH 23601)
移除了 MultiIndex.to_hierarchical (GH 21613)
移除了 MultiIndex.labels，请改用 MultiIndex.codes (GH 23752)
从 MultiIndex 构造函数中移除了之前已弃用的关键字 “labels”，请改用 “codes” (GH 23752)
移除了 MultiIndex.set_labels，请使用 MultiIndex.set_codes() 代替 (GH 23752)
从 MultiIndex.set_codes()、MultiIndex.copy()、MultiIndex.drop() 中移除了之前已弃用的关键字 “labels”，请改用 “codes” (GH 23752)
移除了对旧版 HDF5 格式的支持 (GH 29787)
传递一个 dtype 别名（例如 ‘datetime64[ns, UTC]’）到 DatetimeTZDtype 不再允许，请使用 DatetimeTZDtype.construct_from_string() 代替 (GH 23990)
从 read_excel() 中移除了之前已弃用的关键字 “skip_footer”；请改用 “skipfooter” (GH 18836)
read_excel() 不再允许为参数 usecols 传递整数值，而是传递一个从 0 到 usecols （包括 usecols）的整数列表 (GH 23635)
从 DataFrame.to_records() 中移除了之前已弃用的关键字 “convert_datetime64” (GH 18902)
移除了 IntervalIndex.from_intervals ，改为使用 IntervalIndex 构造函数 (GH 19263)
将 DatetimeIndex.to_series() 中的默认 “keep_tz” 参数从 None 改为 True (GH 23739)
移除了 api.types.is_period 和 api.types.is_datetimetz (GH 23917)
能够读取包含在 Categorical 实例中的 pickle 文件，这些实例是由 0.16 版本之前的 pandas 创建的，已被移除 (GH 27538)
移除了 pandas.tseries.plotting.tsplot (GH 18627)
从 DataFrame.apply() 中移除了之前已弃用的关键字 “reduce” 和 “broadcast” (GH 18577)
在 pandas._testing 中移除了之前已弃用的 assert_raises_regex 函数 (GH 29174)
在 pandas.core.indexes.frozen 中移除了之前已弃用的 FrozenNDArray 类 (GH 29335)
从 read_feather() 中移除了之前已弃用的关键字 “nthreads”，请改用 “use_threads” (GH 23053)
移除了 Index.is_lexsorted_for_tuple (GH 29305)
在 DataFrame.aggregate()、Series.aggregate()、core.groupby.DataFrameGroupBy.aggregate()、core.groupby.SeriesGroupBy.aggregate()、core.window.rolling.Rolling.aggregate() 中移除了对嵌套重命名的支持 (GH 29608)
移除了 Series.valid；请改用 Series.dropna() (GH 18800)
移除了 DataFrame.is_copy, Series.is_copy (GH 18812)
移除了 DataFrame.get_ftype_counts, Series.get_ftype_counts (GH 18243)
移除了 DataFrame.ftypes, Series.ftypes, Series.ftype (GH 26744)
移除了 Index.get_duplicates，请改用 idx[idx.duplicated()].unique() 代替 (GH 20239)
移除了 Series.clip_upper, Series.clip_lower, DataFrame.clip_upper, DataFrame.clip_lower (GH 24203)
移除了修改 DatetimeIndex.freq、TimedeltaIndex.freq 或 PeriodIndex.freq 的能力 (GH 20772)
移除了 DatetimeIndex.offset (GH 20730)
移除了 DatetimeIndex.asobject 、 TimedeltaIndex.asobject 、 PeriodIndex.asobject ，请改用 astype(object) 代替 (GH 29801)
从 factorize() 中移除了之前已弃用的关键字 “order” (GH 19751)
从 read_stata() 和 DataFrame.to_stata() 中移除了之前已弃用的关键字 “encoding” (GH 21400)
将 concat() 中的默认 “sort” 参数从 None 改为 False (GH 20613)
从 DataFrame.update() 中移除了之前已弃用的关键字 “raise_conflict”，请改用 “errors” (GH 23585)
从 DatetimeIndex.shift()、TimedeltaIndex.shift()、PeriodIndex.shift() 中移除了之前已弃用的关键字 “n”，请改用 “periods” (GH 22458)
从 DataFrame.resample() 中移除了之前弃用的关键字 “how”、”fill_method” 和 “limit” (GH 30139)
将整数传递给 Series.fillna() 或 DataFrame.fillna() 并带有 timedelta64[ns] 数据类型现在会引发 TypeError (GH 24694)
不再支持将多个轴传递给 DataFrame.dropna() (GH 20995)
移除了 Series.nonzero，请使用 to_numpy().nonzero() 代替 (GH 24048)
不再支持将浮点数据类型 codes 传递给 Categorical.from_codes()，请改为传递 codes.astype(np.int64) (GH 21775)
从 Series.str.partition() 和 Series.str.rpartition() 中移除了之前已弃用的关键字 “pat”，请改用 “sep” (GH 23767)
移除了 Series.put (GH 27106)
移除了 Series.real, Series.imag (GH 27106)
移除了 Series.to_dense, DataFrame.to_dense (GH 26684)
移除了 Index.dtype_str ，请改用 str(index.dtype) 代替 (GH 27106)
Categorical.ravel() 返回一个 Categorical 而不是一个 ndarray (GH 27199)
Numpy ufuncs 上的 ‘outer’ 方法，例如在 Series 对象上操作的 np.subtract.outer 不再支持，并将引发 NotImplementedError (GH 27198)
移除了 Series.get_dtype_counts 和 DataFrame.get_dtype_counts (GH 27145)
将 Categorical.take() 中的默认 “fill_value” 参数从 True 改为 False (GH 20841)
将 Series.rolling().apply()、DataFrame.rolling().apply()、Series.expanding().apply() 和 DataFrame.expanding().apply() 中 raw 参数的默认值从 None 改为 False (GH 20584)
移除了 Series.argmin() 和 Series.argmax() 的已弃用行为，使用 Series.idxmin() 和 Series.idxmax() 以获得旧的行为 (GH 16955)
传递一个 tz-aware datetime.datetime 或 Timestamp 到带有 tz 参数的 Timestamp 构造函数现在会引发 ValueError (GH 23621)
移除了 Series.base, Index.base, Categorical.base, Series.flags, Index.flags, PeriodArray.flags, Series.strides, Index.strides, Series.itemsize, Index.itemsize, Series.data, Index.data (GH 20721)
更改了 Timedelta.resolution() 以匹配标准库 datetime.timedelta.resolution 的行为，对于旧的行为，请使用 Timedelta.resolution_string() (GH 26839)
移除了 Timestamp.weekday_name、DatetimeIndex.weekday_name 和 Series.dt.weekday_name (GH 18164)
移除了之前在 Timestamp.tz_localize()、DatetimeIndex.tz_localize() 和 Series.tz_localize() 中已弃用的关键字 “errors” (GH 22644)
将 CategoricalDtype 中的默认“ordered”参数从 None 更改为 False (GH 26336)
Series.set_axis() 和 DataFrame.set_axis() 现在需要 “labels” 作为第一个参数，并且 “axis” 作为可选的命名参数 (GH 30089)
移除了 to_msgpack, read_msgpack, DataFrame.to_msgpack, Series.to_msgpack (GH 27103)
移除了 Series.compress (GH 21930)
从 Categorical.fillna() 中移除了之前弃用的关键字 “fill_value”，请改用 “value” (GH 19269)
从 andrews_curves() 中移除了之前已弃用的关键字 “data”，请改用 “frame” (GH 6956)
从 parallel_coordinates() 中移除了之前已弃用的关键字 “data”，请改用 “frame” (GH 6956)
从 parallel_coordinates() 中移除了之前已弃用的关键字 “colors”，请改用 “color” (GH 6956)
从 read_gbq() 中移除了之前已弃用的关键字 “verbose” 和 “private_key” (GH 30200)
调用 np.array 和 np.asarray 在 tz-aware Series 和 DatetimeIndex 上现在将返回一个 tz-aware Timestamp 的对象数组 (GH 24596)

性能提升#

在 DataFrame 算术和标量比较操作中的性能提升 (GH 24990, GH 29853)
使用非唯一 IntervalIndex 进行索引的性能改进 (GH 27489)
在 MultiIndex.is_monotonic 中的性能提升 (GH 27495)
当 bins 是一个 IntervalIndex 时，cut() 的性能提升 (GH 27668)
使用 range 初始化 DataFrame 时的性能提升 (GH 30171)
当 method 为 "spearman" 时，DataFrame.corr() 的性能改进 (GH 28139)
在提供替换值列表时，DataFrame.replace() 的性能改进 (GH 28099)
通过使用向量化而不是遍历循环，在 DataFrame.select_dtypes() 中提高了性能 (GH 28317)
在 Categorical.searchsorted() 和 CategoricalIndex.searchsorted() 中的性能提升 (GH 28795)
当比较一个 Categorical 与一个标量且该标量不在类别中时的性能改进 (GH 29750)
在检查 Categorical 中的值是否等于、等于或大于或大于给定的标量时，性能有所提升。如果检查 Categorical 是否小于或小于或等于标量，则没有这种提升 (GH 29820)
在 Index.equals() 和 MultiIndex.equals() 中的性能提升 (GH 29134)
当 skipna 为 True 时，infer_dtype() 的性能提升 (GH 28814)

错误修复#

Categorical#

添加了测试以断言当值不是类别中的值时，fillna() 引发正确的 ValueError 消息 (GH 13628)
在 Categorical.astype() 中的错误，当转换为整数时 NaN 值处理不正确 (GH 28406)
DataFrame.reindex() 在目标包含重复项时与 CategoricalIndex 一起使用会失败，如果源包含重复项则不会失败 (GH 28107)
在 Categorical.astype() 中的错误不允许转换为扩展数据类型 (GH 28668)
一个错误，其中 merge() 无法在分类和扩展 dtype 列上进行连接 (GH 28668)
Categorical.searchsorted() 和 CategoricalIndex.searchsorted() 现在也可以在未排序的分类数据上工作 (GH 21667)
添加了测试以断言使用 DataFrame.to_parquet() 或 read_parquet() 往返转换为 parquet 将保留字符串类型的分类数据类型 (GH 27955)
在 Categorical.remove_categories() 中更改了错误信息，以始终将无效的移除显示为集合 (GH 28669)
在使用分类类型的日期访问器时，对日期时间的 Series 进行访问不会返回与使用 str.() / dt.() 访问相同类型的对象。例如，当访问 Series.dt.tz_localize() 在一个包含重复条目的 Categorical 上时，访问器会跳过重复项 (GH 27952)
DataFrame.replace() 和 Series.replace() 中的一个错误，会在分类数据上给出不正确的结果 (GH 26988)
在空的 Categorical 上调用 Categorical.min() 或 Categorical.max() 会引发 numpy 异常的错误 (GH 30227)
以下方法现在在通过 groupby(..., observed=False) 调用时也能正确输出未观测类别的值 (GH 17605) * core.groupby.SeriesGroupBy.count() * core.groupby.SeriesGroupBy.size() * core.groupby.SeriesGroupBy.nunique() * core.groupby.SeriesGroupBy.nth()

Datetimelike#

在 Series.__setitem__() 中的错误，错误地将 np.timedelta64("NaT") 转换为 np.datetime64("NaT") 当插入到具有 datetime64 dtype 的 Series 中 (GH 27311)
当底层数据为只读时，Series.dt() 属性查找中的错误 (GH 27529)
HDFStore.__getitem__ 中的错误在 Python 2 中错误地读取了 tz 属性 (GH 26443)
在 to_datetime() 中的一个错误，当传递包含格式错误的 str 数组且设置 errors=”coerce” 时，可能会错误地引发 ValueError (GH 28299)
在 core.groupby.SeriesGroupBy.nunique() 中的错误，其中 NaT 值干扰了唯一值的计数 (GH 27951)
当从一个 np.datetime64 对象中减去一个 Timestamp 时，错误地引发 TypeError 的错误 (GH 28286)
带有 Timestamp 的整数或整数数据类型数组的加法和减法现在将引发 NullFrequencyError 而不是 ValueError (GH 28268)
Series 和 DataFrame 中的整数数据类型在加减 np.datetime64 对象时未能引发 TypeError 的错误 (GH 28080)
在 Series.astype()、Index.astype() 和 DataFrame.astype() 中存在一个错误，当转换为整数类型时无法处理 NaT (GH 28492)
在 Week 中的错误，当添加或减去一个无效类型时，weekday 错误地引发 AttributeError 而不是 TypeError (GH 28530)
当与数据类型为 'timedelta64[ns]' 的 Series 进行算术运算时，DataFrame 中的错误 (GH 28049)
在原始 DataFrame 中某一列是 datetime 类型且列标签不是标准整数时，core.groupby.generic.SeriesGroupBy.apply() 引发 ValueError 的错误 (GH 28247)
在 pandas._config.localization.get_locales() 中的错误，其中 locales -a 将区域设置列表编码为 windows-1252 (GH 23638, GH 24760, GH 27368)
Series.var() 中的一个错误，当使用 timedelta64[ns] dtype 调用时未能引发 TypeError (GH 28289)
在 DatetimeIndex.strftime() 和 Series.dt.strftime() 中的错误，其中 NaT 被转换为字符串 'NaT' 而不是 np.nan (GH 29578)
使用长度不正确的布尔掩码屏蔽类似日期时间的数组时未引发 IndexError 的错误 (GH 30308)
在 Timestamp.resolution 是一个属性而不是类属性中的错误 (GH 29910)
当使用 None 调用 pandas.to_datetime() 时，出现返回 TypeError 而不是 NaT 的错误 (GH 30011)
在使用 cache=True``（默认值）时，:func:`pandas.to_datetime` 对 ``deque 对象失败的问题 (GH 29403)
在 datetime64 或 timedelta64 数据类型下，Series.item() 中的 Bug，DatetimeIndex.item() 和 TimedeltaIndex.item() 返回一个整数而不是 Timestamp 或 Timedelta (GH 30175)
在添加非优化的 DateOffset 时 DatetimeIndex 加法中的错误，错误地丢失了时区信息 (GH 30336)
在 DataFrame.drop() 中的一个错误，尝试从 DatetimeIndex 中删除不存在的值会生成一个令人困惑的错误信息 (GH 30399)
DataFrame.append() 中的错误会移除新数据的时间区意识 (GH 30238)
在带有时区感知数据类型的 Series.cummin() 和 Series.cummax() 中的错误，错误地丢失了其时区 (GH 15553)
在 DatetimeArray、TimedeltaArray 和 PeriodArray 中的一个错误，即原地加法和减法实际上并没有原地操作 (GH 24115)
当使用存储 IntegerArray 的 Series 调用 pandas.to_datetime() 时，出现 TypeError 而不是返回 Series 的错误 (GH 30050)
在 date_range() 中使用自定义营业时间作为 freq 并给定 periods 数量时存在错误 (GH 30593)
在 PeriodIndex 与错误地将整数转换为 Period 对象的比较中存在错误，这与 Period 比较行为不一致 (GH 30722)
在尝试将一个带时区的 Timestamp 插入到一个不带时区的 DatetimeIndex 中，或者反之亦然时，DatetimeIndex.insert() 中的错误会引发一个 ValueError 而不是 TypeError (GH 30806)

Timedelta#

在从一个 np.datetime64 对象中减去一个 TimedeltaIndex 或 TimedeltaArray 时出现的错误 (GH 29558)

时区#

Numeric#

在零列的 DataFrame 中使用 DataFrame.quantile() 时出现错误 (GH 23925)
DataFrame 的灵活不等式比较方法（DataFrame.lt()、DataFrame.le()、DataFrame.gt()、DataFrame.ge()）在对象类型和 complex 条目中未能像它们对应的 Series 那样引发 ``TypeError``（GH 28079）
DataFrame 逻辑操作 (&, |, ^) 中的错误，未通过填充 NA 值匹配 Series 行为 (GH 28741)
在 DataFrame.interpolate() 中的错误，其中通过名称指定轴在赋值前引用变量 (GH 29142)
在 Series.var() 中存在一个错误，在使用可空整数数据类型的序列时没有正确计算值，并且没有传递 ddof 参数 (GH 29128)
当使用 frac > 1 且 replace = False 时改进了错误信息 (GH 27451)
数字索引中的错误导致可以实例化一个 Int64Index、UInt64Index 或 Float64Index 带有无效的 dtype（例如类似日期时间的）(GH 29539)
UInt64Index 在从包含 np.uint64 范围内值的列表构建时精度丢失的错误 (GH 29526)
NumericIndex 构造中的错误，当使用 np.uint64 范围内的整数时导致索引失败 (GH 28023)
在 NumericIndex 构造中存在一个错误，当使用 np.uint64 范围内的整数对 DataFrame 进行索引时，会导致 UInt64Index 被转换为 Float64Index (GH 28279)
在使用 method=`index` 且索引未排序时，Series.interpolate() 中的错误之前会返回不正确的结果。(GH 21037)
在 DataFrame.round() 中的一个错误，当一个包含 IntervalIndex 列的 CategoricalIndex 的 DataFrame 会错误地引发一个 TypeError (GH 30063)
当存在重复索引时，Series.pct_change() 和 DataFrame.pct_change() 中的错误 (GH 30463)
DataFrame 累积操作（例如 cumsum, cummax）中的错误，不正确地转换为 object-dtype (GH 19296)
diff 中的错误导致扩展类型的 dtype 丢失 (GH 30889)
DataFrame.diff 中的一个错误，当其中一个列是可空整数类型时会引发 IndexError (GH 30967)

转换#

字符串#

在空的 Series 上调用 Series.str.isalnum() (以及其他 “ismethods”) 会返回 object 类型而不是 bool (GH 29624)

Interval#

在 IntervalIndex.get_indexer() 中的一个错误，其中 Categorical 或 CategoricalIndex target 会错误地引发 TypeError (GH 30063)
在 pandas.core.dtypes.cast.infer_dtype_from_scalar 中的一个错误，当传递 pandas_dtype=True 时没有推断出 IntervalDtype (GH 30337)
Series 构造函数中的一个错误，其中从 Interval 对象的 list 构造 Series 时，结果是 object dtype 而不是 IntervalDtype (GH 23563)
在 IntervalDtype 中的一个错误，其中 kind 属性被错误地设置为 None 而不是 "O" (GH 30568)
在 IntervalIndex、IntervalArray 和带有区间数据的 Series 中存在错误，其中相等比较不正确 (GH 24112)

索引#

使用反向切片器进行赋值的错误 (GH 26939)
在索引中存在重复项时，DataFrame.explode() 中的错误会复制帧 (GH 28010)
在重新索引包含 Period 的另一种类型的索引时，PeriodIndex() 中的错误 (GH 28323) (GH 28337)
通过 .loc 分配列时修复 numpy 非 ns datetime 类型 (GH 27395)
在 Float64Index.astype() 中的错误，当转换为整数类型时，np.inf 未被正确处理 (GH 28475)
Index.union() 当左边包含重复项时可能会失败 (GH 28257)
使用 .loc 进行索引时，当索引是具有非字符串类别的 CategoricalIndex 时无法工作 (GH 17569, GH 30225)
Index.get_indexer_non_unique() 在某些情况下可能会失败并抛出 TypeError，例如在字符串索引中搜索整数时 (GH 28257)
Float64Index.get_loc() 中的错误不正确地引发 TypeError 而不是 KeyError (GH 29189)
在1行DataFrame中设置Categorical值时，DataFrame.loc() 中的错误导致dtype不正确 (GH 25495)
MultiIndex.get_loc() 当输入包含缺失值时无法找到缺失值 (GH 19132)
在 Series.__setitem__() 中的错误，当新数据的长度与 True 值的数量匹配且新数据不是 Series 或 np.array 时，错误地分配了值 (GH 30567)
在用 PeriodIndex 进行索引时存在一个错误，错误地接受表示年份的整数，应使用例如 ser.loc["2007"] 而不是 ser.loc[2007] (GH 30763)

缺失#

MultiIndex#

MultiIndex 的构造函数会验证给定的 sortorder 是否与实际的 lexsort_depth 兼容，如果 verify_integrity 参数为 ``True``（默认值）（GH 28735）
Series 和 MultiIndex .drop 如果在级别中没有给出标签，则使用 MultiIndex 引发异常 (GH 8594)

IO#

read_csv() 现在在使用 Python csv 引擎时接受二进制模式文件缓冲区（GH 23779）
在 DataFrame.to_json() 中的一个错误，当使用元组作为列或索引值并且使用 orient="columns" 或 orient="index" 时会产生无效的 JSON (GH 20500)。
改进无限解析。read_csv() 现在将 Infinity, +Infinity, -Infinity 解释为浮点值 (GH 10065)
在 DataFrame.to_csv() 中的一个错误，当 na_rep 的长度比文本输入数据短时，值被截断。(GH 25099)
在 DataFrame.to_string() 中的一个错误，其中值被截断使用显示选项而不是输出完整内容 (GH 9784)
在 DataFrame.to_json() 中的一个错误，当日期时间列标签在 orient="table" 时不会以 ISO 格式写出 (GH 28130)
在 DataFrame.to_parquet() 中的一个错误，当使用 engine='fastparquet' 写入 GCS 时，如果文件不存在则会失败 (GH 28326)
在引发异常时，read_hdf() 关闭了它未打开的存储 (GH 28699)
在 DataFrame.read_json() 中的一个错误，使用 orient="index" 时不会保持顺序 (GH 28557)
在 DataFrame.to_html() 中的错误，未验证 formatters 参数的长度 (GH 28469)
当 sheet_name 参数引用一个不存在的表时，DataFrame.read_excel() 中 engine='ods' 的错误 (GH 27676)
pandas.io.formats.style.Styler() 格式化浮点值时的小数显示不正确 (GH 13257)
当同时使用 formatters=<list> 和 max_cols 时，DataFrame.to_html() 中的错误。(GH 25955)
在 Styler.background_gradient() 中的错误无法与 dtype Int64 一起工作 (GH 28869)
DataFrame.to_clipboard() 中的一个错误，在 ipython 中无法可靠工作 (GH 22707)
在 read_json() 中存在一个错误，默认编码未设置为 utf-8 (GH 29565)
在 PythonParser 中存在一个错误，当处理十进制字段时，字符串和字节被混合使用 (GH 29650)
read_gbq() 现在接受 progress_bar_type 以在数据下载时显示进度条。(GH 29857)
在 pandas.io.json.json_normalize() 中的一个错误，当 record_path 指定的位置缺少值时会引发 TypeError (GH 30148)
read_excel() 现在接受二进制数据 (GH 15914)
在 read_csv() 中的一个错误，其中编码处理仅限于字符串 utf-16 用于C引擎 (GH 24130)

绘图#

在 Series.plot() 中的错误，无法绘制布尔值 (GH 23719)
在 DataFrame.plot() 中的错误，当没有行时无法绘图 (GH 27758)
在同一轴上绘制多个序列时，DataFrame.plot() 中的错误导致图例标记不正确 (GH 18222)
当 kind='box' 并且数据包含 datetime 或 timedelta 数据时，DataFrame.plot() 中的 Bug。这些类型现在会自动被删除 (GH 22799)
DataFrame.plot.line() 和 DataFrame.plot.area() 中的错误在 x 轴上产生错误的 xlim (GH 27686, GH 25160, GH 24784)
一个错误，其中 DataFrame.boxplot() 不接受像 DataFrame.plot.box() 那样的 color 参数 (GH 26214)
xticks 参数在 DataFrame.plot.bar() 中被忽略的错误 (GH 14119)
set_option() 现在在设置选项时验证提供给 'plotting.backend' 的绘图后端是否实现了该后端，而不是在创建绘图时验证 (GH 28163)
DataFrame.plot() 现在允许一个 backend 关键字参数，以允许在一个会话中切换后端 (GH 28619)。
颜色验证中的错误不正确地对非颜色样式引发 (GH 29122)。
允许 DataFrame.plot.scatter() 绘制 objects 和 datetime 类型的数据 (GH 18755, GH 30391)
在 DataFrame.hist() 中的错误，xrot=0 在 by 和子图 (GH 30288) 中不起作用。

GroupBy/重采样/滚动#

在 core.groupby.DataFrameGroupBy.apply() 中的错误，当函数返回一个 Index 时，只显示单个组的输出 (GH 28652)
在包含多个组的 DataFrame.groupby() 中存在一个错误，如果任何组包含所有NA值，则会引发 IndexError (GH 20519)
Resampler.size() 和 Resampler.count() 在使用空 Series 或 DataFrame 时返回错误的 dtype (GH 28427)
在 DataFrame.rolling() 中的错误，不允许在 axis=1 时对日期时间进行滚动 (GH 28192)
在 DataFrame.rolling() 中的错误，不允许在多索引级别上滚动 (GH 15584)。
在 DataFrame.rolling() 中的错误，不允许在单调递减的时间索引上滚动 (GH 19248)。
在 axis=1 时，DataFrame.groupby() 不提供按列名选择的问题 (GH 27614)
在 core.groupby.DataFrameGroupby.agg() 中的错误，无法使用带有命名聚合的 lambda 函数 (GH 27519)
在按分类列分组时，DataFrame.groupby() 中的错误丢失列名信息 (GH 28787)
在 DataFrame.groupby() 和 Series.groupby() 中的命名聚合中删除由于重复输入函数引起的错误。以前如果在同一列上应用相同的函数会引发错误，现在如果新分配的名称不同则允许。(GH 28426)
core.groupby.SeriesGroupBy.value_counts() 即使当 Grouper 创建空组时也能处理这种情况 (GH 28479)
在 core.window.rolling.Rolling.quantile() 中的 Bug，当在 groupby 中使用时忽略 interpolation 关键字参数 (GH 28779)
在 DataFrame.groupby() 中的错误，其中 any、all、nunique 和转换函数会错误地处理重复的列标签 (GH 21668)
在 core.groupby.DataFrameGroupBy.agg() 中存在一个错误，即带有时区感知的 datetime64 列不正确地将结果转换为原始数据类型 (GH 29641)
当使用 axis=1 并且有一个单层列索引时，DataFrame.groupby() 中的错误 (GH 30208)
当在 axis=1 上使用 nunique 时，DataFrame.groupby() 中的错误 (GH 30253)
在 DataFrameGroupBy.quantile() 和 SeriesGroupBy.quantile() 中存在一个错误，当 q 值为多个类似列表的值且列名为整数时 (GH 30289)
DataFrameGroupBy.pct_change() 和 SeriesGroupBy.pct_change() 中的错误在 fill_method 为 None 时导致 TypeError (GH 30463)
在 Rolling.count() 和 Expanding.count() 参数中的错误，其中 min_periods 被忽略 (GH 26996)

Reshaping#

在 DataFrame.apply() 中的错误导致在空的 DataFrame 中产生不正确的输出 (GH 28202, GH 21959)
在创建 MultiIndex 时，DataFrame.stack() 中的错误未正确处理非唯一索引 (GH 28301)
在 pivot_table() 中的错误，当 margins=True 和 aggfunc='mean' 时未返回正确的类型 float (GH 24893)
错误 merge_asof() 无法使用 datetime.timedelta 作为 tolerance 关键字参数 (GH 28098)
在 merge() 中的错误，没有正确地附加后缀与 MultiIndex (GH 28518)
qcut() 和 cut() 现在处理布尔输入 (GH 20303)
修复以确保在使用容差值时，所有整数类型都可以在 merge_asof() 中使用。以前，每个非 int64 类型都会引发一个错误的 MergeError (GH 28870)。
当 columns 不是一个类列表值时，在 get_dummies() 中提供更好的错误信息 (GH 28383)
Index.join() 中的一个错误，导致 MultiIndex 名称顺序不匹配时出现无限递归错误。(GH 25760, GH 28956)
错误 Series.pct_change() 在提供锚定频率时会抛出 ValueError (GH 28664)
在某些情况下，当两个 DataFrame 具有相同列但顺序不同时，DataFrame.equals() 错误地返回 True 的 Bug (GH 28839)
在 DataFrame.replace() 中的错误导致非数字替换者的数据类型未被尊重 (GH 26632)
在 melt() 中的一个错误，当为 id_vars 或 value_vars 提供混合的字符串和数值时，会错误地引发 ValueError (GH 29718)
当转置一个每列都是相同扩展数据类型的 DataFrame 时，数据类型现在被保留 (GH 30091)
在 merge_asof() 中，当 left_index 和 right_on 是时区感知的列时出现的错误 (GH 29864)
在 labels=True 时改进了 cut() 和 qcut() 的错误信息和文档字符串 (GH 13318)
在带有级别列表的 DataFrame.unstack() 中缺少 fill_na 参数的错误 (GH 30740)

Sparse#

SparseDataFrame 算术运算中将输入错误地转换为浮点数的错误 (GH 28107)
在 DataFrame.sparse 中存在一个错误，当存在名为 sparse 的列时，返回的是 Series 而不是访问器 (GH 30758)
修复了布尔型 SparseArray 的 operator.xor()。现在返回稀疏结果，而不是对象类型 (GH 31025)

ExtensionArray#

在设置标量字符串时 arrays.PandasArray 中的错误 (GH 28118, GH 28150)。
可空整数无法与字符串比较的错误 (GH 28930)
在 DataFrame 构造函数对类列表数据和指定 dtype 引发 ValueError 的错误 (GH 30280)

其他#

尝试使用 set_option() 将 display.precision、display.max_rows 或 display.max_columns 设置为 None 或正整数以外的任何值都会引发 ValueError (GH 23348)
使用 DataFrame.replace() 在嵌套字典中使用重叠键将不再引发，现在匹配扁平字典的行为 (GH 27660)
DataFrame.to_csv() 和 Series.to_csv() 现在支持将字典作为 compression 参数，其中键 'method' 是压缩方法，其他键作为压缩方法为 'zip' 时的额外压缩选项。(GH 26023)
在 Series.diff() 中的一个错误，其中布尔序列会错误地引发 TypeError (GH 17294)
Series.append() 在传递一个 Series 元组时将不再引发 TypeError (GH 28410)
修复在调用 pandas.libs._json.encode() 时对0维数组的损坏错误信息 (GH 18878)
在 DataFrame.query() 和 DataFrame.eval() 中的反引号引用现在也可以用于使用无效标识符，例如以数字开头、是Python关键字或使用单字符操作符的名称。(GH 27017)
pd.core.util.hashing.hash_pandas_object 中的一个错误，其中包含元组的数组被错误地视为不可哈希的 (GH 28969)
在 DataFrame.append() 中存在的错误，当使用空列表追加时会引发 IndexError (GH 28769)
修复 AbstractHolidayCalendar 以返回2030年之后（现在可到2200年）的正确结果 (GH 27790)
修复了 IntegerArray 在除以 0 的操作中返回 inf 而不是 NaN 的问题 (GH 27398)
修复了当另一个值为 0 或 1 时 IntegerArray 的 pow 操作 (GH 29997)
如果启用了 use_inf_as_na，Series.count() 中的错误会引发 (GH 29478)
在 Index 中的一个错误，其中可以设置一个不可哈希的名称而不会引发 TypeError (GH 29069)
当传递一个二维 ndarray 和一个扩展 dtype 时，DataFrame 构造函数中的 Bug (GH 12513)
Bug in DataFrame.to_csv() 当提供一个 dtype="string" 的系列和一个 na_rep 时，na_rep 被截断为2个字符。(GH 29975)
在 DataFrame.itertuples() 中，错误地确定是否可以使用命名元组来处理包含255列的数据帧的错误 (GH 28282)
在 testing.assert_series_equal() 中处理嵌套的 NumPy object 数组，以用于 ExtensionArray 实现 (GH 30841)
Index 构造函数中的错误错误地允许二维输入数组 (GH 13601, GH 27125)

贡献者#

总共有308人为此版本贡献了补丁。名字后面带有“+”的人是第一次贡献补丁。

Aaditya Panikath +
Abdullah İhsan Seçer
Abhijeet Krishnan +
Adam J. Stewart
Adam Klaum +
Addison Lynch
Aivengoe +
Alastair James +
Albert Villanova del Moral
Alex Kirko +
Alfredo Granja +
Allen Downey
Alp Arıbal +
Andreas Buhr +
Andrew Munch +
Andy
Angela Ambroz +
Aniruddha Bhattacharjee +
Ankit Dhankhar +
Antonio Andraues Jr +
Arda Kosar +
Asish Mahapatra +
Austin Hackett +
Avi Kelman +
AyowoleT +
Bas Nijholt +
Ben Thayer
Bharat Raghunathan
Bhavani Ravi
Bhuvana KA +
Big Head
Blake Hawkins +
Bobae Kim +
Brett Naul
Brian Wignall
Bruno P. Kinoshita +
Bryant Moscon +
Cesar H +
Chris Stadler
Chris Zimmerman +
Christopher Whelan
Clemens Brunner
Clemens Tolboom +
Connor Charles +
Daniel Hähnke +
Daniel Saxton
Darin Plutchok +
Dave Hughes
David Stansby
DavidRosen +
Dean +
Deepan Das +
Deepyaman Datta
DorAmram +
Dorothy Kabarozi +
Drew Heenan +
Eliza Mae Saret +
Elle +
Endre Mark Borza +
Eric Brassell +
Eric Wong +
Eunseop Jeong +
Eyden Villanueva +
Felix Divo
ForTimeBeing +
Francesco Truzzi +
Gabriel Corona +
Gabriel Monteiro +
Galuh Sahid +
Georgi Baychev +
Gina
GiuPassarelli +
Grigorios Giannakopoulos +
Guilherme Leite +
Guilherme Salomé +
Gyeongjae Choi +
Harshavardhan Bachina +
Harutaka Kawamura +
Hassan Kibirige
Hielke Walinga
Hubert
Hugh Kelley +
Ian Eaves +
Ignacio Santolin +
Igor Filippov +
Irv Lustig
Isaac Virshup +
Ivan Bessarabov +
JMBurley +
Jack Bicknell +
Jacob Buckheit +
Jan Koch
Jan Pipek +
Jan Škoda +
Jan-Philip Gehrcke
Jasper J.F. van den Bosch +
Javad +
Jeff Reback
Jeremy Schendel
Jeroen Kant +
Jesse Pardue +
Jethro Cao +
Jiang Yue
Jiaxiang +
Jihyung Moon +
Jimmy Callin
Jinyang Zhou +
Joao Victor Martinelli +
Joaq Almirante +
John G Evans +
John Ward +
Jonathan Larkin +
Joris Van den Bossche
Josh Dimarsky +
Joshua Smith +
Josiah Baker +
Julia Signell +
Jung Dong Ho +
Justin Cole +
Justin Zheng
Kaiqi Dong
Karthigeyan +
Katherine Younglove +
Katrin Leinweber
Kee Chong Tan +
Keith Kraus +
Kevin Nguyen +
Kevin Sheppard
Kisekka David +
Koushik +
Kyle Boone +
Kyle McCahill +
Laura Collard, PhD +
LiuSeeker +
Louis Huynh +
Lucas Scarlato Astur +
Luiz Gustavo +
Luke +
Luke Shepard +
MKhalusova +
Mabel Villalba
Maciej J +
Mak Sze Chun
Manu NALEPA +
Marc
Marc Garcia
Marco Gorelli +
Marco Neumann +
Martin Winkel +
Martina G. Vilas +
Mateusz +
Matthew Roeschke
Matthew Tan +
Max Bolingbroke
Max Chen +
MeeseeksMachine
Miguel +
MinGyo Jung +
Mohamed Amine ZGHAL +
Mohit Anand +
MomIsBestFriend +
Naomi Bonnin +
Nathan Abel +
Nico Cernek +
Nigel Markey +
Noritada Kobayashi +
Oktay Sabak +
Oliver Hofkens +
Oluokun Adedayo +
Osman +
Oğuzhan Öğreden +
Pandas Development Team +
Patrik Hlobil +
Paul Lee +
Paul Siegel +
Petr Baev +
Pietro Battiston
Prakhar Pandey +
Puneeth K +
Raghav +
Rajat +
Rajhans Jadhao +
Rajiv Bharadwaj +
Rik-de-Kort +
Roei.r
Rohit Sanjay +
Ronan Lamy +
Roshni +
Roymprog +
Rushabh Vasani +
Ryan Grout +
Ryan Nazareth
Samesh Lakhotia +
Samuel Sinayoko
Samyak Jain +
Sarah Donehower +
Sarah Masud +
Saul Shanabrook +
Scott Cole +
SdgJlbl +
Seb +
Sergei Ivko +
Shadi Akiki
Shorokhov Sergey
Siddhesh Poyarekar +
Sidharthan Nair +
Simon Gibbons
Simon Hawkins
Simon-Martin Schröder +
Sofiane Mahiou +
Sourav kumar +
Souvik Mandal +
Soyoun Kim +
Sparkle Russell-Puleri +
Srinivas Reddy Thatiparthy (శ్రీనివాస్ రెడ్డి తాటిపర్తి)
Stuart Berg +
Sumanau Sareen
Szymon Bednarek +
Tambe Tabitha Achere +
Tan Tran
Tang Heyi +
Tanmay Daripa +
Tanya Jain
Terji Petersen
Thomas Li +
Tirth Jain +
Tola A +
Tom Augspurger
Tommy Lynch +
Tomoyuki Suzuki +
Tony Lorenzo
Unprocessable +
Uwe L. Korn
Vaibhav Vishal
Victoria Zdanovskaya +
Vijayant +
Vishwak Srinivasan +
WANG Aiyong
Wenhuan
Wes McKinney
Will Ayd
Will Holmgren
William Ayd
William Blan +
Wouter Overmeire
Wuraola Oyewusi +
YaOzI +
Yash Shukla +
Yu Wang +
Yusei Tahara +
alexander135 +
alimcmaster1
avelineg +
bganglia +
bolkedebruin
bravech +
chinhwee +
cruzzoe +
dalgarno +
daniellebrown +
danielplawrence
est271 +
francisco souza +
ganevgv +
garanews +
gfyoung
h-vetinari
hasnain2808 +
ianzur +
jalbritt +
jbrockmendel
jeschwar +
jlamborn324 +
joy-rosie +
kernc
killerontherun1
krey +
lexy-lixinyu +
lucyleeow +
lukasbk +
maheshbapatu +
mck619 +
nathalier
naveenkaushik2504 +
nlepleux +
nrebena
ohad83 +
pilkibun
pqzx +
proost +
pv8493013j +
qudade +
rhstanton +
rmunjal29 +
sangarshanan +
sardonick +
saskakarsi +
shaido987 +
ssikdar1
steveayers124 +
tadashigaki +
timcera +
tlaytongoogle +
tobycheese
tonywu1999 +
tsvikas +
yogendrasoni +
zys5945 +

1.0.0 中的新功能（2020年1月29日）#

新的弃用政策#

增强功能#

在 rolling.apply 和 expanding.apply 中使用 Numba#

为滚动操作定义自定义窗口#

转换为 markdown#

实验性新功能#

实验性的 NA 标量用于表示缺失值#

专用的字符串数据类型#

支持缺失值的布尔数据类型#

方法 convert_dtypes 以简化对支持的扩展数据类型的使用#

其他增强功能#

向后不兼容的 API 变化#

避免使用 MultiIndex.levels 中的名称#

新的 repr 用于 IntervalArray#

DataFrame.rename 现在只接受一个位置参数#

扩展的详细信息输出为 DataFrame#

pandas.array() 推断变化#

arrays.IntegerArray 现在使用 pandas.NA#

arrays.IntegerArray 比较返回 arrays.BooleanArray#

默认情况下，Categorical.min() 现在返回最小值而不是 np.nan#

空 pandas.Series 的默认数据类型#

重采样操作的结果数据类型推断变化#

增加 Python 的最低版本要求#

增加了依赖项的最小版本#

构建变化#

其他 API 更改#

文档改进#

弃用#

移除先前版本的弃用/更改#

性能提升#

错误修复#

Categorical#

Datetimelike#

Timedelta#

时区#

Numeric#

转换#

字符串#

Interval#

索引#

缺失#

MultiIndex#

IO#

绘图#

GroupBy/重采样/滚动#

Reshaping#

Sparse#

ExtensionArray#

其他#

贡献者#

在 `rolling.apply` 和 `expanding.apply` 中使用 Numba#

实验性的 `NA` 标量用于表示缺失值#

方法 `convert_dtypes` 以简化对支持的扩展数据类型的使用#

避免使用 `MultiIndex.levels` 中的名称#

新的 repr 用于 `IntervalArray`#

`DataFrame.rename` 现在只接受一个位置参数#

扩展的详细信息输出为 `DataFrame`#

`pandas.array()` 推断变化#

`arrays.IntegerArray` 现在使用 `pandas.NA`#

`arrays.IntegerArray` 比较返回 `arrays.BooleanArray`#

默认情况下，`Categorical.min()` 现在返回最小值而不是 np.nan#

空 `pandas.Series` 的默认数据类型#