2.0.0 中的新功能 (2023年4月3日)#

这些是 pandas 2.0.0 中的更改。有关包括其他版本 pandas 的完整更新日志，请参见发行说明。

增强功能#

使用 pip extras 安装可选依赖项#

在使用 pip 安装 pandas 时，可以通过指定 extras 来安装可选的依赖项集合。

pip install "pandas[performance, aws]>=2.0.0"

可用的额外功能，在安装指南中找到，包括 [all, performance, computation, fss, aws, gcp, excel, parquet, feather, hdf5, spss, postgresql, mysql, sql-other, html, xml, plot, output_formatting, clipboard, compression, test] (GH 39164)。

`Index` 现在可以容纳 numpy 数值类型#

现在可以在 Index 中使用任何 numpy 数值类型 (GH 42717)。

之前只能使用 int64、uint64 和 float64 数据类型：

In [1]: pd.Index([1, 2, 3], dtype=np.int8)
Out[1]: Int64Index([1, 2, 3], dtype="int64")
In [2]: pd.Index([1, 2, 3], dtype=np.uint16)
Out[2]: UInt64Index([1, 2, 3], dtype="uint64")
In [3]: pd.Index([1, 2, 3], dtype=np.float32)
Out[3]: Float64Index([1.0, 2.0, 3.0], dtype="float64")

Int64Index, UInt64Index 和 Float64Index 在 pandas 版本 1.4 中已被弃用，现在已被移除。取而代之的是应直接使用 Index，它可以接受所有 numpy 数值类型，即 int8/ int16/int32/int64/uint8/uint16/uint32/uint64/float32/float64 类型：

In [1]: pd.Index([1, 2, 3], dtype=np.int8)
Out[1]: Index([1, 2, 3], dtype='int8')

In [2]: pd.Index([1, 2, 3], dtype=np.uint16)
Out[2]: Index([1, 2, 3], dtype='uint16')

In [3]: pd.Index([1, 2, 3], dtype=np.float32)
Out[3]: Index([1.0, 2.0, 3.0], dtype='float32')

Index 能够持有 numpy 数值类型的能力意味着 pandas 功能的一些变化。特别是，以前被迫创建 64 位索引的操作，现在可以创建更低位大小的索引，例如 32 位索引。

以下是一个可能不全面的变更列表：

现在，使用 numpy 数值数组实例化时会遵循 numpy 数组的 dtype。以前，从 numpy 数值数组创建的所有索引都被强制为 64 位。现在，例如，Index(np.array([1, 2, 3])) 在 32 位系统上将是 int32，而以前即使在 32 位系统上也会是 int64。使用数字列表实例化 Index 仍将返回 64 位 dtype，例如 Index([1, 2, 3]) 将具有 int64 dtype，这与以前相同。

DatetimeIndex 的各种数值日期时间属性（day、month、year 等）之前是 int64 类型，而 arrays.DatetimeArray 是 int32 类型。现在它们在 DatetimeIndex 上也是 int32 类型：

In [4]: idx = pd.date_range(start='1/1/2018', periods=3, freq='ME')

In [5]: idx.array.year
Out[5]: array([2018, 2018, 2018], dtype=int32)

In [6]: idx.year
Out[6]: Index([2018, 2018, 2018], dtype='int32')

从 Series.sparse.from_coo() 索引上的级别 dtypes 现在是 int32 类型，这与 scipy 稀疏矩阵上的 rows/cols 类型相同。以前它们是 int64 类型。

In [7]: from scipy import sparse

In [8]: A = sparse.coo_matrix(
   ...:     ([3.0, 1.0, 2.0], ([1, 0, 0], [0, 2, 3])), shape=(3, 4)
   ...: )
   ...: 

In [9]: ser = pd.Series.sparse.from_coo(A)

In [10]: ser.index.dtypes
Out[10]: 
level_0    int32
level_1    int32
dtype: object

Index 不能使用 float16 dtype 实例化。以前使用 dtype float16 实例化一个 Index 会生成一个 float64 dtype 的 Float64Index。现在会引发一个 NotImplementedError：

In [11]: pd.Index([1, 2, 3], dtype=np.float16)
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Cell In[11], line 1
----> 1 pd.Index([1, 2, 3], dtype=np.float16)

File /home/pandas/pandas/core/indexes/base.py:576, in Index.__new__(cls, data, dtype, copy, name, tupleize_cols)
    572 arr = ensure_wrapped_if_datetimelike(arr)
    574 klass = cls._dtype_to_subclass(arr.dtype)
--> 576 arr = klass._ensure_array(arr, arr.dtype, copy=False)
    577 return klass._simple_new(arr, name, refs=refs)

File /home/pandas/pandas/core/indexes/base.py:589, in Index._ensure_array(cls, data, dtype, copy)
    586     raise ValueError("Index data must be 1-dimensional")
    587 elif dtype == np.float16:
    588     # float16 not supported (no indexing engine)
--> 589     raise NotImplementedError("float16 indexes are not supported")
    591 if copy:
    592     # asarray_tuplesafe does not always copy underlying data,
    593     #  so need to make sure that this happens
    594     data = data.copy()

NotImplementedError: float16 indexes are not supported

参数 `dtype_backend`，用于返回 pyarrow 支持的或 numpy 支持的可空数据类型#

以下函数获得了一个新的关键字 dtype_backend (GH 36712)

当此选项设置为 "numpy_nullable" 时，它将返回一个由可空类型支持的 DataFrame。

当此关键字设置为 "pyarrow" 时，这些函数将返回由 pyarrow 支持的可空 ArrowDtype DataFrame (GH 48957, GH 49997)：

In [12]: import io

In [13]: data = io.StringIO("""a,b,c,d,e,f,g,h,i
   ....:     1,2.5,True,a,,,,,
   ....:     3,4.5,False,b,6,7.5,True,a,
   ....: """)
   ....: 

In [14]: df = pd.read_csv(data, dtype_backend="pyarrow")

In [15]: df.dtypes
Out[15]: 
a     int64[pyarrow]
b    double[pyarrow]
c      bool[pyarrow]
d    string[pyarrow]
e     int64[pyarrow]
f    double[pyarrow]
g      bool[pyarrow]
h    string[pyarrow]
i      null[pyarrow]
dtype: object

In [16]: data.seek(0)
Out[16]: 0

In [17]: df_pyarrow = pd.read_csv(data, dtype_backend="pyarrow", engine="pyarrow")

In [18]: df_pyarrow.dtypes
Out[18]: 
a     int64[pyarrow]
b    double[pyarrow]
c      bool[pyarrow]
d    string[pyarrow]
e     int64[pyarrow]
f    double[pyarrow]
g      bool[pyarrow]
h    string[pyarrow]
i      null[pyarrow]
dtype: object

写时复制改进#

一个新的延迟复制机制被添加到写时复制优化中列出的方法中，该机制在对象被修改之前推迟复制。这些方法在启用写时复制时返回视图，与常规执行相比提供了显著的性能提升 (GH 49473)。
当启用写时复制时，将 DataFrame 的单个列作为 Series 访问（例如 df["col"]）现在每次构造时总是返回一个新对象（而不是多次返回相同的、缓存的 Series 对象）。这确保了这些 Series 对象正确遵循写时复制规则 (GH 49450)
当从现有的 Series 构造 Series 时，如果 copy=False 的默认值为 False，Series 构造函数现在将创建一个惰性副本（将副本推迟到对数据进行修改时）(GH 50471)
DataFrame 构造函数现在在从现有的 DataFrame 构造时会创建一个延迟复制（将复制推迟到对数据进行修改时），默认情况下 copy=False (GH 51239)
当从一系列对象的字典构造 DataFrame 时，如果指定 copy=False，DataFrame 构造函数现在将使用这些 Series 对象的惰性拷贝作为 DataFrame 的列 (GH 50777)
当从 Series 或 Index 构造 DataFrame 并指定 copy=False 时，DataFrame 构造函数现在将尊重写时复制。
当从NumPy数组构造时，DataFrame 和 Series 构造函数现在默认会复制数组，以避免在修改数组时改变 DataFrame / Series。指定 copy=False 以获得旧的行为。当设置 copy=False 时，pandas不保证在创建 DataFrame / Series 后修改NumPy数组时的正确写时复制行为。
DataFrame.from_records() 现在在调用 DataFrame 时会尊重写时复制。
尝试使用链式赋值设置值（例如，df["a"][1:3] = 0）在启用写时复制时现在总会引发警告。在这种模式下，链式赋值永远无法工作，因为我们总是在索引操作（getitem）结果的临时对象中设置值，而在写时复制下，这总是表现为复制。因此，通过链式赋值永远无法更新原始的 Series 或 DataFrame。因此，会向用户引发一个信息性警告，以避免静默地不执行任何操作 (GH 49467)。
DataFrame.replace() 现在会在 inplace=True 时尊重写时复制机制。
DataFrame.transpose() 现在将尊重写入时复制机制。
可以就地进行的算术运算，例如 ser *= 2 现在将尊重写时复制机制。
DataFrame.__getitem__() 现在在 DataFrame 具有 MultiIndex 列时会尊重写时复制机制。
Series.__getitem__() 现在将在使用时尊重写时复制机制。
Series 有一个 MultiIndex。
Series.view() 现在将尊重写时复制机制。

写时复制可以通过以下方式启用：

pd.set_option("mode.copy_on_write", True)

pd.options.mode.copy_on_write = True

另外，可以通过以下方式在本地启用写时复制：

with pd.option_context("mode.copy_on_write", True):
    ...

其他增强功能#

在使用带有 pyarrow.string 类型的 ArrowDtype 时，增加了对 str 访问器方法的支持 (GH 50325)
在使用带有 pyarrow.timestamp 类型的 ArrowDtype 时，增加了对 dt 访问器方法的支持 (GH 50954)
read_sas() 现在支持使用 encoding='infer' 来正确读取并使用 sas 文件指定的编码。(GH 48048)
DataFrameGroupBy.quantile(), SeriesGroupBy.quantile() 和 DataFrameGroupBy.std() 现在保留可为空的 dtypes，而不是转换为 numpy dtypes (GH 37493)
DataFrameGroupBy.std(), SeriesGroupBy.std() 现在支持 datetime64, timedelta64 和 DatetimeTZDtype dtypes (GH 48481)
Series.add_suffix(), DataFrame.add_suffix(), Series.add_prefix() 和 DataFrame.add_prefix() 支持 axis 参数。如果设置了 axis ，可以覆盖考虑哪个轴的默认行为 (GH 47819)
testing.assert_frame_equal() 现在显示数据框不同之处的第一个元素，类似于 pytest 的输出 (GH 47910)
在 DataFrame.to_dict() 中添加了 index 参数 (GH 46398)
在 merge() 中增加了对扩展数组 dtypes 的支持 (GH 44240)
为 DataFrame 上的二元运算符添加了元数据传播 (GH 28283)
通过 _accumulate 将 cumsum、cumprod、cummin 和 cummax 添加到 ExtensionArray 接口中 (GH 28385)
CategoricalConversionWarning, InvalidComparison, InvalidVersion, LossySetitemError, 和 NoBufferPresent 现在在 pandas.errors 中暴露 (GH 27656)
通过添加缺失的测试包 pytest-asyncio 修复 test optional_extra (GH 48361)
DataFrame.astype() 异常消息在类型转换不可能时改进为包含列名。(GH 47571)
date_range() 现在支持一个 unit 关键字（“s”、“ms”、“us”或“ns”）来指定输出索引的所需分辨率（GH 49106）
timedelta_range() 现在支持一个 unit 关键字（“s”、“ms”、“us”或“ns”）来指定输出索引的所需分辨率（GH 49824）
DataFrame.to_json() 现在支持一个 mode 关键字，支持的输入为 ‘w’ 和 ‘a’。默认值为 ‘w’，当 lines=True 且 orient=’records’ 时，可以使用 ‘a’ 将记录导向的 json 行追加到现有的 json 文件中。(GH 35849)
为 IntervalIndex.from_breaks()、IntervalIndex.from_arrays() 和 IntervalIndex.from_tuples() 添加了 name 参数 (GH 48911)
在使用 testing.assert_frame_equal() 对 DataFrame 进行比较时，改进异常消息以包括正在比较的列 (GH 50323)
改进了当连接列重复时 merge_asof() 的错误信息 (GH 50102)
为 get_dummies() 添加了对扩展数组 dtypes 的支持 (GH 32430)
添加了与 Series.infer_objects() 类似的 Index.infer_objects() (GH 50034)
在 Series.infer_objects() 和 DataFrame.infer_objects() 中添加了 copy 参数，传递 False 将避免对已经是非对象或无法推断出更好数据类型的序列或列进行复制 (GH 50096)
DataFrame.plot.hist() 现在识别 xlabel 和 ylabel 参数 (GH 49793)
Series.drop_duplicates() 增加了 ignore_index 关键字以重置索引 (GH 48304)
Series.dropna() 和 DataFrame.dropna() 增加了 ignore_index 关键字以重置索引 (GH 31725)
在 to_datetime() 中改进了非ISO8601格式的错误信息，告知用户第一个错误的位置 (GH 50361)
当尝试对齐 DataFrame 对象（例如，在 DataFrame.compare() 中）时，改进了错误消息，以澄清“相同标签”指的是索引和列 (GH 50083)
增加了对 pyarrow 字符串 dtypes 的 Index.min() 和 Index.max() 支持 (GH 51397)
添加了 DatetimeIndex.as_unit() 和 TimedeltaIndex.as_unit() 以转换为不同的分辨率；支持的分辨率是 “s”、”ms”、”us” 和 “ns” (GH 50616)
添加了 Series.dt.unit() 和 Series.dt.as_unit() 以转换为不同的分辨率；支持的分辨率是 “s”, “ms”, “us”, 和 “ns” (GH 51223)
为 read_sql() 添加了新的参数 dtype，以与 read_sql_query() 保持一致 (GH 50797)
read_csv(), read_table(), read_fwf() 和 read_excel() 现在接受 date_format (GH 50601)
to_datetime() 现在接受 "ISO8601" 作为 format 的参数，这将匹配任何 ISO8601 字符串（但可能不是相同格式） (GH 50411)
to_datetime() 现在接受 "mixed" 作为 format 的参数，这将单独推断每个元素的格式 (GH 50972)
为 read_json() 添加了新的参数 engine，以通过指定 engine="pyarrow" 支持使用 pyarrow 解析 JSON (GH 48893)
增加了对 SQLAlchemy 2.0 的支持 (GH 40686)
增加了在 engine="pyarrow" 时对 decimal 参数的支持于 read_csv() (GH 51302)
Index 集合操作 Index.union(), Index.intersection(), Index.difference(), 和 Index.symmetric_difference() 现在支持 sort=True，这将始终返回一个排序结果，与默认的 sort=None 不同，后者在某些情况下不进行排序 (GH 25151)
新增了新的转义模式 “latex-math” 以避免在格式化器中转义 “$” (GH 50040)

值得注意的错误修复#

这些是可能具有显著行为变化的错误修复。

`DataFrameGroupBy.cumsum()` 和 `DataFrameGroupBy.cumprod()` 溢出而不是有损转换为浮点数#

在以前的版本中，我们在应用 cumsum 和 cumprod 时转换为浮点数，这即使结果可以用 int64 数据类型保持，也会导致不正确的结果。此外，当达到 int64 的限制时，聚合会与 numpy 和常规的 DataFrame.cumprod() 和 DataFrame.cumsum() 方法一致地溢出 (GH 37493)。

旧行为

In [1]: df = pd.DataFrame({"key": ["b"] * 7, "value": 625})
In [2]: df.groupby("key")["value"].cumprod()[5]
Out[2]: 5.960464477539062e+16

我们在第6个值上返回了不正确的结果。

新行为

In [19]: df = pd.DataFrame({"key": ["b"] * 7, "value": 625})

In [20]: df.groupby("key")["value"].cumprod()
Out[20]: 
0                   625
1                390625
2             244140625
3          152587890625
4        95367431640625
5     59604644775390625
6    359414837200037393
Name: value, dtype: int64

我们溢出了第7个值，但第6个值仍然是正确的。

`DataFrameGroupBy.nth()` 和 `SeriesGroupBy.nth()` 现在表现为过滤器#

在 pandas 的早期版本中，DataFrameGroupBy.nth() 和 SeriesGroupBy.nth() 的行为类似于聚合。然而，对于大多数输入 n，它们可能每个组返回零行或多行。这意味着它们是过滤操作，类似于例如 DataFrameGroupBy.head()。pandas 现在将它们视为过滤操作 (GH 13666)。

In [21]: df = pd.DataFrame({"a": [1, 1, 2, 1, 2], "b": [np.nan, 2.0, 3.0, 4.0, 5.0]})

In [22]: gb = df.groupby("a")

旧行为

In [5]: gb.nth(n=1)
Out[5]:
   A    B
1  1  2.0
4  2  5.0

新行为

In [23]: gb.nth(n=1)
Out[23]: 
   a    b
1  1  2.0
4  2  5.0

特别是，结果的索引是通过选择适当的行从输入中派生的。此外，当 n 大于组时，返回的是没有行而不是 NaN。

旧行为

In [5]: gb.nth(n=3, dropna="any")
Out[5]:
    B
A
1 NaN
2 NaN

新行为

In [24]: gb.nth(n=3, dropna="any")
Out[24]: 
Empty DataFrame
Columns: [a, b]
Index: []

向后不兼容的 API 更改#

使用 datetime64 或 timedelta64 dtype 构造时，分辨率不受支持#

在过去的版本中，当构建 Series 或 DataFrame 并传递一个不支持分辨率（即除“ns”以外的任何分辨率）的“datetime64”或“timedelta64” dtype 时，pandas 会静默地将给定的 dtype 替换为其纳秒模拟：

以前的行为:

In [5]: pd.Series(["2016-01-01"], dtype="datetime64[s]")
Out[5]:
0   2016-01-01
dtype: datetime64[ns]

In [6] pd.Series(["2016-01-01"], dtype="datetime64[D]")
Out[6]:
0   2016-01-01
dtype: datetime64[ns]

在 pandas 2.0 中，我们支持分辨率 “s”、”ms”、”us” 和 “ns”。当传递一个支持的 dtype（例如 “datetime64[s]”）时，结果现在具有完全请求的 dtype：

新行为:

In [25]: pd.Series(["2016-01-01"], dtype="datetime64[s]")
Out[25]: 
0   2016-01-01
dtype: datetime64[s]

对于不支持的数据类型，pandas 现在会引发错误，而不是静默地替换为支持的数据类型：

新行为:

In [26]: pd.Series(["2016-01-01"], dtype="datetime64[D]")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[26], line 1
----> 1 pd.Series(["2016-01-01"], dtype="datetime64[D]")

File /home/pandas/pandas/core/series.py:503, in Series.__init__(self, data, index, dtype, name, copy)
    501         data = data.copy()
    502 else:
--> 503     data = sanitize_array(data, index, dtype, copy)
    504     data = SingleBlockManager.from_array(data, index, refs=refs)
    506 NDFrame.__init__(self, data)

File /home/pandas/pandas/core/construction.py:645, in sanitize_array(data, index, dtype, copy, allow_2d)
    642     subarr = np.array([], dtype=np.float64)
    644 elif dtype is not None:
--> 645     subarr = _try_cast(data, dtype, copy)
    647 else:
    648     subarr = maybe_convert_platform(data)

File /home/pandas/pandas/core/construction.py:805, in _try_cast(arr, dtype, copy)
    800     return lib.ensure_string_array(arr, convert_na_value=False, copy=copy).reshape(
    801         shape
    802     )
    804 elif dtype.kind in "mM":
--> 805     return maybe_cast_to_datetime(arr, dtype)
    807 # GH#15832: Check if we are requesting a numeric dtype and
    808 # that we can convert the data to the requested dtype.
    809 elif dtype.kind in "iu":
    810     # this will raise if we have e.g. floats

File /home/pandas/pandas/core/dtypes/cast.py:1225, in maybe_cast_to_datetime(value, dtype)
   1221     raise TypeError("value must be listlike")
   1223 # TODO: _from_sequence would raise ValueError in cases where
   1224 #  _ensure_nanosecond_dtype raises TypeError
-> 1225 _ensure_nanosecond_dtype(dtype)
   1227 if lib.is_np_dtype(dtype, "m"):
   1228     res = TimedeltaArray._from_sequence(value, dtype=dtype)

File /home/pandas/pandas/core/dtypes/cast.py:1282, in _ensure_nanosecond_dtype(dtype)
   1279     raise ValueError(msg)
   1280 # TODO: ValueError or TypeError? existing test
   1281 #  test_constructor_generic_timestamp_bad_frequency expects TypeError
-> 1282 raise TypeError(
   1283     f"dtype={dtype} is not supported. Supported resolutions are 's', "
   1284     "'ms', 'us', and 'ns'"
   1285 )

TypeError: dtype=datetime64[D] is not supported. Supported resolutions are 's', 'ms', 'us', and 'ns'

值计数将结果名称设置为 `count`#

在过去的版本中，当运行 Series.value_counts() 时，结果会继承原始对象的名称，而结果索引将是无名的。这会在重置索引时引起混淆，并且列名与列值不对应。现在，结果名称将是 'count' （如果传递了 normalize=True ，则为 'proportion' ），索引将命名为原始对象的名称 (GH 49497)。

以前的行为:

In [8]: pd.Series(['quetzal', 'quetzal', 'elk'], name='animal').value_counts()

Out[2]:
quetzal    2
elk        1
Name: animal, dtype: int64

新行为:

In [27]: pd.Series(['quetzal', 'quetzal', 'elk'], name='animal').value_counts()
Out[27]: 
animal
quetzal    2
elk        1
Name: count, dtype: int64

同样适用于其他 value_counts 方法（例如，DataFrame.value_counts()）。

不允许将 astype 转换为不支持的 datetime64/timedelta64 dtypes#

在之前的版本中，将 Series 或 DataFrame 从 datetime64[ns] 转换为不同的 datetime64[X] dtype 时，会返回 datetime64[ns] dtype 而不是请求的 dtype。在 pandas 2.0 中，增加了对 “datetime64[s]”、”datetime64[ms]” 和 “datetime64[us]” dtypes 的支持，因此转换为这些 dtypes 时会返回请求的 dtype：

以前的行为:

In [28]: idx = pd.date_range("2016-01-01", periods=3)

In [29]: ser = pd.Series(idx)

以前的行为:

In [4]: ser.astype("datetime64[s]")
Out[4]:
0   2016-01-01
1   2016-01-02
2   2016-01-03
dtype: datetime64[ns]

使用新的行为，我们得到了完全请求的 dtype：

新行为:

In [30]: ser.astype("datetime64[s]")
Out[30]: 
0   2016-01-01
1   2016-01-02
2   2016-01-03
dtype: datetime64[s]

对于不支持的分辨率，例如“datetime64[D]”，我们改为引发错误，而不是静默忽略请求的dtype：

新行为:

In [31]: ser.astype("datetime64[D]")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[31], line 1
----> 1 ser.astype("datetime64[D]")

File /home/pandas/pandas/core/generic.py:6401, in NDFrame.astype(self, dtype, copy, errors)
   6397     results = [ser.astype(dtype, errors=errors) for _, ser in self.items()]
   6399 else:
   6400     # else, only a single dtype is given
-> 6401     new_data = self._mgr.astype(dtype=dtype, errors=errors)
   6402     res = self._constructor_from_mgr(new_data, axes=new_data.axes)
   6403     return res.__finalize__(self, method="astype")

File /home/pandas/pandas/core/internals/managers.py:588, in BaseBlockManager.astype(self, dtype, errors)
    587 def astype(self, dtype, errors: str = "raise") -> Self:
--> 588     return self.apply("astype", dtype=dtype, errors=errors)

File /home/pandas/pandas/core/internals/managers.py:438, in BaseBlockManager.apply(self, f, align_keys, **kwargs)
    436         applied = b.apply(f, **kwargs)
    437     else:
--> 438         applied = getattr(b, f)(**kwargs)
    439     result_blocks = extend_blocks(applied, result_blocks)
    441 out = type(self).from_blocks(result_blocks, self.axes)

File /home/pandas/pandas/core/internals/blocks.py:609, in Block.astype(self, dtype, errors, squeeze)
    606         raise ValueError("Can not squeeze with more than one column.")
    607     values = values[0, :]  # type: ignore[call-overload]
--> 609 new_values = astype_array_safe(values, dtype, errors=errors)
    611 new_values = maybe_coerce_values(new_values)
    613 refs = None

File /home/pandas/pandas/core/dtypes/astype.py:234, in astype_array_safe(values, dtype, copy, errors)
    231     dtype = dtype.numpy_dtype
    233 try:
--> 234     new_values = astype_array(values, dtype, copy=copy)
    235 except (ValueError, TypeError):
    236     # e.g. _astype_nansafe can fail on object-dtype of strings
    237     #  trying to convert to float
    238     if errors == "ignore":

File /home/pandas/pandas/core/dtypes/astype.py:176, in astype_array(values, dtype, copy)
    172     return values
    174 if not isinstance(values, np.ndarray):
    175     # i.e. ExtensionArray
--> 176     values = values.astype(dtype, copy=copy)
    178 else:
    179     values = _astype_nansafe(values, dtype, copy=copy)

File /home/pandas/pandas/core/arrays/datetimes.py:754, in DatetimeArray.astype(self, dtype, copy)
    752 elif isinstance(dtype, PeriodDtype):
    753     return self.to_period(freq=dtype.freq)
--> 754 return dtl.DatetimeLikeArrayMixin.astype(self, dtype, copy)

File /home/pandas/pandas/core/arrays/datetimelike.py:495, in DatetimeLikeArrayMixin.astype(self, dtype, copy)
    491 elif (dtype.kind in "mM" and self.dtype != dtype) or dtype.kind == "f":
    492     # disallow conversion between datetime/timedelta,
    493     # and conversions for any datetimelike to float
    494     msg = f"Cannot cast {type(self).__name__} to dtype {dtype}"
--> 495     raise TypeError(msg)
    496 else:
    497     return np.asarray(self, dtype=dtype)

TypeError: Cannot cast DatetimeArray to dtype datetime64[D]

对于从 timedelta64[ns] 数据类型的转换，旧的行为是转换为浮点格式。

以前的行为:

In [32]: idx = pd.timedelta_range("1 Day", periods=3)

In [33]: ser = pd.Series(idx)

以前的行为:

In [7]: ser.astype("timedelta64[s]")
Out[7]:
0     86400.0
1    172800.0
2    259200.0
dtype: float64

In [8]: ser.astype("timedelta64[D]")
Out[8]:
0    1.0
1    2.0
2    3.0
dtype: float64

新的行为，对于 datetime64 来说，要么给出确切请求的 dtype，要么引发错误：

新行为:

In [34]: ser.astype("timedelta64[s]")
Out[34]: 
0   1 days
1   2 days
2   3 days
dtype: timedelta64[s]

In [35]: ser.astype("timedelta64[D]")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[35], line 1
----> 1 ser.astype("timedelta64[D]")

File /home/pandas/pandas/core/generic.py:6401, in NDFrame.astype(self, dtype, copy, errors)
   6397     results = [ser.astype(dtype, errors=errors) for _, ser in self.items()]
   6399 else:
   6400     # else, only a single dtype is given
-> 6401     new_data = self._mgr.astype(dtype=dtype, errors=errors)
   6402     res = self._constructor_from_mgr(new_data, axes=new_data.axes)
   6403     return res.__finalize__(self, method="astype")

File /home/pandas/pandas/core/internals/managers.py:588, in BaseBlockManager.astype(self, dtype, errors)
    587 def astype(self, dtype, errors: str = "raise") -> Self:
--> 588     return self.apply("astype", dtype=dtype, errors=errors)

File /home/pandas/pandas/core/internals/managers.py:438, in BaseBlockManager.apply(self, f, align_keys, **kwargs)
    436         applied = b.apply(f, **kwargs)
    437     else:
--> 438         applied = getattr(b, f)(**kwargs)
    439     result_blocks = extend_blocks(applied, result_blocks)
    441 out = type(self).from_blocks(result_blocks, self.axes)

File /home/pandas/pandas/core/internals/blocks.py:609, in Block.astype(self, dtype, errors, squeeze)
    606         raise ValueError("Can not squeeze with more than one column.")
    607     values = values[0, :]  # type: ignore[call-overload]
--> 609 new_values = astype_array_safe(values, dtype, errors=errors)
    611 new_values = maybe_coerce_values(new_values)
    613 refs = None

File /home/pandas/pandas/core/dtypes/astype.py:234, in astype_array_safe(values, dtype, copy, errors)
    231     dtype = dtype.numpy_dtype
    233 try:
--> 234     new_values = astype_array(values, dtype, copy=copy)
    235 except (ValueError, TypeError):
    236     # e.g. _astype_nansafe can fail on object-dtype of strings
    237     #  trying to convert to float
    238     if errors == "ignore":

File /home/pandas/pandas/core/dtypes/astype.py:176, in astype_array(values, dtype, copy)
    172     return values
    174 if not isinstance(values, np.ndarray):
    175     # i.e. ExtensionArray
--> 176     values = values.astype(dtype, copy=copy)
    178 else:
    179     values = _astype_nansafe(values, dtype, copy=copy)

File /home/pandas/pandas/core/arrays/timedeltas.py:356, in TimedeltaArray.astype(self, dtype, copy)
    352         return type(self)._simple_new(
    353             res_values, dtype=res_values.dtype, freq=self.freq
    354         )
    355     else:
--> 356         raise ValueError(
    357             f"Cannot convert from {self.dtype} to {dtype}. "
    358             "Supported resolutions are 's', 'ms', 'us', 'ns'"
    359         )
    361 return dtl.DatetimeLikeArrayMixin.astype(self, dtype, copy=copy)

ValueError: Cannot convert from timedelta64[ns] to timedelta64[D]. Supported resolutions are 's', 'ms', 'us', 'ns'

UTC 和固定偏移时区默认使用标准库的 tzinfo 对象#

在之前的版本中，用于表示UTC的默认 tzinfo 对象是 pytz.UTC。在pandas 2.0中，我们默认使用 datetime.timezone.utc 代替。同样地，对于表示固定UTC偏移的时区，我们使用 datetime.timezone 对象而不是 pytz.FixedOffset 对象。见 (GH 34916)

以前的行为:

In [2]: ts = pd.Timestamp("2016-01-01", tz="UTC")
In [3]: type(ts.tzinfo)
Out[3]: pytz.UTC

In [4]: ts2 = pd.Timestamp("2016-01-01 04:05:06-07:00")
In [3]: type(ts2.tzinfo)
Out[5]: pytz._FixedOffset

新行为:

In [36]: ts = pd.Timestamp("2016-01-01", tz="UTC")

In [37]: type(ts.tzinfo)
Out[37]: datetime.timezone

In [38]: ts2 = pd.Timestamp("2016-01-01 04:05:06-07:00")

In [39]: type(ts2.tzinfo)
Out[39]: datetime.timezone

对于既不是UTC也不是固定偏移的时区，例如“US/Pacific”，我们继续默认使用 pytz 对象。

空的 DataFrame/Series 现在默认会有一个 `RangeIndex`#

之前，构造一个空的（其中 data 是 None 或一个类似空列表的参数） Series 或 DataFrame 而不指定轴（index=None, columns=None）会返回轴为具有对象数据类型的空 Index。

现在，轴返回一个空的 RangeIndex (GH 49572)。

以前的行为:

In [8]: pd.Series().index
Out[8]:
Index([], dtype='object')

In [9] pd.DataFrame().axes
Out[9]:
[Index([], dtype='object'), Index([], dtype='object')]

新行为:

In [40]: pd.Series().index
Out[40]: RangeIndex(start=0, stop=0, step=1)

In [41]: pd.DataFrame().axes
Out[41]: [RangeIndex(start=0, stop=0, step=1), RangeIndex(start=0, stop=0, step=1)]

DataFrame 到 LaTeX 有一个新的渲染引擎#

现有的 DataFrame.to_latex() 已被重构，以利用之前在 Styler.to_latex() 下可用的扩展实现。参数签名相似，尽管 col_space 已被移除，因为它被LaTeX引擎忽略。此渲染引擎还需要 jinja2 作为依赖项，需要安装，因为渲染基于jinja2模板。

下面列出的 pandas latex 选项不再使用并已被移除。通用的最大行和列参数仍然保留，但应替换为 Styler 等效项。下面指出了提供类似功能的替代选项：

display.latex.escape: 替换为 styler.format.escape,
display.latex.longtable: 替换为 styler.latex.environment,
display.latex.multicolumn, display.latex.multicolumn_format 和 display.latex.multirow: 替换为 styler.sparse.rows, styler.sparse.columns, styler.latex.multirow_align 和 styler.latex.multicol_align,
display.latex.repr: 替换为 styler.render.repr,
display.max_rows 和 display.max_columns：替换为 styler.render.max_rows、styler.render.max_columns 和 styler.render.max_elements。

请注意，由于这一变化，一些默认值也发生了变化：

multirow 现在默认为 True。
multirow_align 默认为 “r” 而不是 “l”。
multicol_align 默认值为 “r” 而不是 “l”。
escape 现在默认设置为 False。

注意 _repr_latex_ 的行为也发生了变化。以前设置 display.latex.repr 只会在使用 nbconvert 转换 JupyterNotebook 时生成 LaTeX，而在用户运行笔记本时不会生成。现在 styler.render.repr 选项允许在 JupyterNotebooks 中控制特定操作的输出（不仅仅是 nbconvert 转换时）。请参见 GH 39911。

增加了依赖项的最低版本要求#

一些依赖项的最低支持版本已更新。如果已安装，我们现在要求：

包	最低版本	必需的	Changed
mypy (dev)	1.0		X
pytest (开发版)	7.0.0		X
pytest-xdist (开发版)	2.2.0		X
hypothesis (开发)	6.34.2		X
python-dateutil	2.8.2	X	X
tzdata	2022.1	X	X

对于可选库，一般的建议是使用最新版本。下表列出了每个库在 pandas 开发过程中目前测试的最低版本。低于最低测试版本的可选库可能仍然有效，但不被视为支持。

包	最低版本	Changed
pyarrow	7.0.0	X
matplotlib	3.6.1	X
fastparquet	0.6.3	X
xarray	0.21.0	X

更多信息请参见依赖项和可选依赖项。

日期时间现在以一致的格式解析#

在过去，to_datetime() 独立地猜测每个元素的格式。这在某些元素具有混合日期格式的情况下是合适的——然而，当用户期望一致的格式时，函数在元素之间切换格式时会经常导致问题。从版本 2.0.0 开始，解析将使用一致的格式，由第一个非 NA 值决定（除非用户指定了格式，在这种情况下使用该格式）。

旧行为:

In [1]: ser = pd.Series(['13-01-2000', '12-01-2000'])
In [2]: pd.to_datetime(ser)
Out[2]:
0   2000-01-13
1   2000-12-01
dtype: datetime64[ns]

新行为:

In [42]: ser = pd.Series(['13-01-2000', '12-01-2000'])

In [43]: pd.to_datetime(ser)
Out[43]: 
0   2000-01-13
1   2000-01-12
dtype: datetime64[s]

请注意，这也会影响 read_csv()。

如果你仍然需要解析格式不一致的日期，你可以使用 format='mixed' （可能与 dayfirst 一起使用）

ser = pd.Series(['13-01-2000', '12 January 2000'])
pd.to_datetime(ser, format='mixed', dayfirst=True)

或者，如果你的格式都是 ISO8601（但可能不是完全相同的格式）

ser = pd.Series(['2020-01-01', '2020-01-01 03:00'])
pd.to_datetime(ser, format='ISO8601')

其他 API 更改#

在 Timestamp 构造函数中的 tz、nanosecond 和 unit 关键字现在仅限关键字使用 (GH 45307, GH 32526)
在 Timestamp 中传递大于 999 或小于 0 的 nanoseconds 现在会引发 ValueError (GH 48538, GH 48255)
read_csv()：当使用c解析器时，通过``index_col``指定不正确的列数现在会引发``ParserError``而不是``IndexError``。
在 get_dummies() 中 dtype 的默认值从 uint8 改为 bool (GH 45848)
DataFrame.astype(), Series.astype(), 和 DatetimeIndex.astype() 将 datetime64 数据转换为 “datetime64[s]”, “datetime64[ms]”, “datetime64[us]” 中的任何一个时，将返回具有给定分辨率的对象，而不是强制转换回 “datetime64[ns]” (GH 48928)
DataFrame.astype(), Series.astype(), 和 DatetimeIndex.astype() 将 timedelta64 数据转换为 “timedelta64[s]”, “timedelta64[ms]”, “timedelta64[us]” 中的任何一个时，将返回具有给定分辨率的对象，而不是强制转换为 “float64” dtype (GH 48963)
DatetimeIndex.astype(), TimedeltaIndex.astype(), PeriodIndex.astype() Series.astype(), DataFrame.astype() 使用 datetime64, timedelta64 或 PeriodDtype 数据类型时，不再允许转换为除 “int64” 以外的整数数据类型，请改为使用 obj.astype('int64', copy=False).astype(dtype) (GH 49715)
Index.astype() 现在允许从 float64 dtype 转换为类似日期时间的 dtype，与 Series 行为匹配 (GH 49660)
将数据类型为 “timedelta64[s]”、”timedelta64[ms]” 或 “timedelta64[us]” 的数据传递给 TimedeltaIndex、Series 或 DataFrame 构造函数时，现在将保留该数据类型，而不是转换为 “timedelta64[ns]”；分辨率较低的 timedelta64 数据将被转换为最低支持的分辨率 “timedelta64[s]” (GH 49014)
将 dtype 为 “timedelta64[s]”、”timedelta64[ms]” 或 “timedelta64[us]” 传递给 TimedeltaIndex、Series 或 DataFrame 构造函数现在将保留该 dtype 而不是转换为 “timedelta64[ns]”；为 Series 或 DataFrame 传递较低分辨率的 dtype 将被转换为最低支持的分辨率 “timedelta64[s]” (GH 49014)
将 np.datetime64 对象以非纳秒分辨率传递给 Timestamp 时，如果输入分辨率为“s”、“ms”、“us”或“ns”，则将保留输入分辨率；否则它将被转换为最接近的支持分辨率 (GH 49008)
传递 datetime64 值时，如果分辨率不是纳秒，则 to_datetime() 将保留输入分辨率，如果它是“s”、“ms”、“us”或“ns”；否则它将被转换为最接近的支持分辨率 (GH 50369)
传递整数值和一个非纳秒的 datetime64 数据类型（例如 “datetime64[s]”）的 DataFrame、Series 或 Index 将把值视为数据类型单位的倍数，匹配例如 Series(np.array(values, dtype="M8[s]")) 的行为（GH 51092）
将ISO-8601格式的字符串传递给 Timestamp 将保留解析输入的分辨率，如果它是“s”、“ms”、“us”或“ns”；否则它将被转换为最接近的支持分辨率 (GH 49737)
在 DataFrame.mask() 和 Series.mask() 中的 other 参数现在默认为 no_default 而不是 np.nan，与 DataFrame.where() 和 Series.where() 一致。条目将被填充为相应的 NULL 值（numpy dtypes 为 np.nan，extension dtypes 为 pd.NA）。(GH 49111)
更改了带有 SparseDtype 的 Series.quantile() 和 DataFrame.quantile() 的行为，以保留稀疏数据类型 (GH 49583)
当使用对象类型的 Index 创建 Series 时，如果索引是日期时间对象，pandas 不再静默地将索引转换为 DatetimeIndex (GH 39307, GH 23598)
pandas.testing.assert_index_equal() 参数 exact="equiv" 现在认为当两者都是 RangeIndex 或 Index 且具有 int64 dtype 时，两个索引是相等的。以前这意味着要么是 RangeIndex 要么是 Int64Index (GH 51098)
Series.unique() 使用 dtype “timedelta64[ns]” 或 “datetime64[ns]” 现在返回 TimedeltaArray 或 DatetimeArray 而不是 numpy.ndarray (GH 49176)
to_datetime() 和 DatetimeIndex 现在允许包含 datetime 对象和数值条目的序列，匹配 Series 的行为 (GH 49037, GH 50453)
pandas.api.types.is_string_dtype() 现在仅对 dtype=object 的类数组对象返回 True，当元素被推断为字符串时 (GH 15585)
将包含 datetime 对象和 date 对象的序列传递给 Series 构造函数将返回 object 类型而不是 datetime64[ns] 类型，这与 Index 行为一致（GH 49341）
将无法解析为日期时间的字符串传递给带有 dtype="datetime64[ns]" 的 Series 或 DataFrame 将引发错误，而不是默默忽略关键字并返回 object 类型 (GH 24435)
将一个包含无法转换为 Timedelta 类型的序列传递给 to_timedelta() 或使用 dtype="timedelta64[ns]" 传递给 Series 或 DataFrame 构造函数，或传递给 TimedeltaIndex 现在会引发 TypeError 而不是 ValueError (GH 49525)
更改了包含至少一个 NaT 以及其余为 None 或 NaN 的序列的 Index 构造函数行为，推断为 datetime64[ns] 数据类型而不是 object，与 Series 行为匹配 (GH 49340)
read_stata() 参数 index_col 设置为 ``None``（默认值）时，现在会将返回的 DataFrame 的索引设置为 RangeIndex 而不是 Int64Index (GH 49745)
在处理对象数据类型时，Index、Series 和 DataFrame 算术方法的行为已更改，数组操作的结果不再进行类型推断，使用 result.infer_objects(copy=False) 对结果进行类型推断 (GH 49999, GH 49714)
更改了包含全``bool``值或全复数值的对象类型 numpy.ndarray 的 Index 构造函数的行为，现在这将保留对象类型，与 Series 行为一致 (GH 49594)
更改了 Series.astype() 从包含 bytes 对象的 object-dtype 到字符串 dtypes 的行为；现在对 bytes 对象执行 val.decode() 而不是 str(val)，与 Index.astype() 行为匹配 (GH 45326)
在 read_csv() 中添加了 "None" 到默认的 na_values (GH 50286)
当给定一个整数数据类型和不是整数的浮点数据时，Series 和 DataFrame 构造函数的行为发生了变化，现在会引发 ValueError 而不是静默保留浮点数据类型；执行 Series(data) 或 DataFrame(data) 以获得旧的行为，执行 Series(data).astype(dtype) 或 DataFrame(data).astype(dtype) 以获得指定的数据类型 (GH 49599)
更改了 DataFrame.shift() 在 axis=1、整数 fill_value 和同质日期时间类型时的行为，现在用整数类型填充新列，而不是转换为日期时间类型 (GH 49842)
在 read_json() 中遇到异常时，文件现在会被关闭 (GH 49921)
更改了 read_csv()、read_json() 和 read_fwf() 的行为，当未指定索引时，索引现在将始终为 RangeIndex。以前，如果新的 DataFrame/Series 长度为 0，索引将是一个 dtype 为 object 的 Index (GH 49572)
DataFrame.values(), DataFrame.to_numpy(), DataFrame.xs(), DataFrame.reindex(), DataFrame.fillna(), 和 DataFrame.replace() 不再静默地合并底层数组；请使用 df = df.copy() 以确保合并 (GH 49356)
使用 loc 或 iloc 在两个轴上进行完整切片创建一个新的 DataFrame（例如 df.loc[:, :] 或 df.iloc[:, :]）现在返回一个新的 DataFrame（浅拷贝）而不是原始 DataFrame，与其他获取完整切片的方法一致（例如 df.loc[:] 或 df[:]）(GH 49469)
Series 和 DataFrame 构造函数现在在分别传递一个 Series 和 DataFrame 时，并且默认 copy=False``（并且如果没有其他关键字触发复制）时，将返回一个浅拷贝（即共享数据，但不共享属性）。以前，新的 Series 或 DataFrame 会共享索引属性（例如 ``df.index = ... 也会更新父级或子级的索引）(GH 49523)
不允许对 Timedelta 对象计算 cumprod；之前这会返回不正确的值 (GH 50246)
DataFrame 对象从 HDFStore 文件中读取时，如果没有索引，现在会有一个 RangeIndex 而不是 int64 索引 (GH 51076)
使用包含 NA 和/或 NaT 数据的数值型 numpy dtype 实例化一个 Index 现在会引发 ValueError。以前会引发 TypeError (GH 51050)
使用 read_json(orient='split') 加载包含重复列的 JSON 文件时，会重命名列以避免重复，就像 read_csv() 和其他读取器所做的那样 (GH 50370)
从 Series.sparse.from_coo 返回的 Series 的索引级别现在总是具有 int32 数据类型。以前它们具有 int64 数据类型 (GH 50926)
to_datetime() 使用 unit 为 “Y” 或 “M” 时，如果序列包含非整数的 float 值，现在会引发错误，与 Timestamp 行为一致 (GH 50301)
方法 Series.round(), DataFrame.__invert__(), Series.__invert__(), DataFrame.swapaxes(), DataFrame.first(), DataFrame.last(), Series.first(), Series.last() 和 DataFrame.align() 现在将始终返回新对象 (GH 51032)
DataFrame 和 DataFrameGroupBy 聚合（例如“sum”）对于对象类型的列不再推断非对象类型的结果，请在结果上显式调用 result.infer_objects(copy=False) 以获得旧的行为（GH 51205, GH 49603）
使用 ArrowDtype 数据类型进行除以零操作会根据分子返回 -inf、nan 或 inf，而不是引发异常 (GH 51541)
添加了 pandas.api.types.is_any_real_numeric_dtype() 以检查实数数据类型 (GH 51152)
value_counts() 现在返回带有 ArrowDtype 的数据，类型为 pyarrow.int64 而不是 "Int64" 类型 (GH 51462)
factorize() 和 unique() 在传递非纳秒分辨率的 numpy timedelta64 或 datetime64 时保留原始数据类型 (GH 48670)

备注

当前的 PDEP 提议弃用并移除 pandas API 中除少数方法外的所有 inplace 和 copy 关键字。当前的讨论正在进行这里。在写时复制（Copy-on-Write）的背景下，这些关键字将不再必要。如果该提议被接受，这两个关键字将在 pandas 的下一个版本中被弃用，并在 pandas 3.0 中被移除。

弃用#

不推荐使用系统本地时区解析日期时间字符串到 tzlocal，请传递一个 tz 关键字或显式调用 tz_localize 代替 (GH 50791)
在 to_datetime() 和 read_csv() 中弃用的参数 infer_datetime_format ，因为它的严格版本现在是默认的 (GH 48621)
在解析字符串时使用 to_datetime() 的 unit 的弃用行为，在未来的版本中，这些将被解析为日期时间（匹配无单位行为），而不是转换为浮点数。要保留旧的行为，请在调用 to_datetime() 之前将字符串转换为数值类型（GH 50735）
已弃用 pandas.io.sql.execute() (GH 50185)
Index.is_boolean() 已被弃用。请改用 pandas.api.types.is_bool_dtype() (GH 50042)
Index.is_integer() 已被弃用。请改用 pandas.api.types.is_integer_dtype() (GH 50042)
Index.is_floating() 已被弃用。请改用 pandas.api.types.is_float_dtype() (GH 50042)
Index.holds_integer() 已被弃用。请使用 pandas.api.types.infer_dtype() 代替 (GH 50243)
Index.is_numeric() 已被弃用。请改用 pandas.api.types.is_any_real_numeric_dtype() (GH 50042,:issue:51152)
Index.is_categorical() 已被弃用。请改用 pandas.api.types.is_categorical_dtype() (GH 50042)
Index.is_object() 已被弃用。请改用 pandas.api.types.is_object_dtype() (GH 50042)
Index.is_interval() 已被弃用。请改用 pandas.api.types.is_interval_dtype() (GH 50042)
弃用的参数 date_parser 在 read_csv(), read_table(), read_fwf(), 和 read_excel() 中，改为使用 date_format (GH 50601)
已弃用 datetime64 和 DatetimeTZDtype 数据类型的 all 和 any 归约，请使用例如 (obj != pd.Timestamp(0), tz=obj.tz).all() 代替 (GH 34479)
在 Resampler 中弃用了未使用的参数 *args 和 **kwargs (GH 50977)
不推荐在单个元素的 Series 上调用 float 或 int 以分别返回 float 或 int。请在调用 float 或 int 之前提取元素 (GH 51101)。
已弃用 Grouper.groups()，请改用 Groupby.groups() (GH 51182)
已弃用 Grouper.grouper()，请改用 Groupby.grouper() (GH 51182)
已弃用 Grouper.obj()，请改用 Groupby.obj() (GH 51206)
已弃用 Grouper.indexer()，请改用 Resampler.indexer() (GH 51206)
已弃用 Grouper.ax()，请改用 Resampler.ax() (GH 51206)
已弃用的关键字 use_nullable_dtypes 在 read_parquet() 中，请使用 dtype_backend 代替 (GH 51853)
弃用 Series.pad() 而改用 Series.ffill() (GH 33396)
弃用 Series.backfill() 而改用 Series.bfill() (GH 33396)
弃用 DataFrame.pad() 而改为使用 DataFrame.ffill() (GH 33396)
弃用 DataFrame.backfill() 以支持 DataFrame.bfill() (GH 33396)
已弃用 close()。请改用 StataReader 作为上下文管理器 (GH 49228)
在迭代一个通过 level 参数（该参数是一个长度为1的列表）分组的 DataFrameGroupBy 或 SeriesGroupBy 时，不推荐生成标量；相反，将返回一个长度为一的元组 (GH 51583)

移除先前版本的弃用/更改#

移除了 Int64Index、UInt64Index 和 Float64Index。更多信息请参见这里 (GH 42717)
移除了已弃用的 Timestamp.freq、Timestamp.freqstr 和 Timestamp 构造函数及 Timestamp.fromordinal() 中的参数 freq (GH 14146)
移除了已弃用的 CategoricalBlock、Block.is_categorical()，要求 datetime64 和 timedelta64 值在传递给 Block.make_block_same_class() 之前必须包装在 DatetimeArray 或 TimedeltaArray 中，要求 DatetimeTZBlock.values 在传递给 BlockManager 构造函数时具有正确的 ndim，并从 SingleBlockManager 构造函数中移除了 “fastpath” 关键字 (GH 40226, GH 40571)
移除了已弃用的全局选项 use_inf_as_null ，改为使用 use_inf_as_na (GH 17126)
移除了已弃用的模块 pandas.core.index (GH 30193)
移除了已弃用的别名 pandas.core.tools.datetimes.to_time ，请直接从 pandas.core.tools.times 导入该函数 (GH 34145)
移除了已弃用的别名 pandas.io.json.json_normalize ，请直接从 pandas.json_normalize 导入该函数 (GH 27615)
移除了已弃用的 Categorical.to_dense() ，请改用 np.asarray(cat) (GH 32639)
移除了已弃用的 Categorical.take_nd() (GH 27745)
移除了已弃用的 Categorical.mode()，请改用 Series(cat).mode() (GH 45033)
移除了已弃用的 Categorical.is_dtype_equal() 和 CategoricalIndex.is_dtype_equal() (GH 37545)
移除了已弃用的 CategoricalIndex.take_nd() (GH 30702)
移除了已弃用的 Index.is_type_compatible() (GH 42113)
移除了已弃用的 Index.is_mixed() ，请直接检查 index.inferred_type 代替 (GH 32922)
移除了已弃用的 pandas.api.types.is_categorical()；请改用 pandas.api.types.is_categorical_dtype() (GH 33385)
移除了已弃用的 Index.asi8() (GH 37877)
当传递 datetime64[ns] 数据类型数据和时区感知数据类型到 Series 时，强制弃用改变行为，将值解释为墙时间而不是UTC时间，匹配 DatetimeIndex 行为 (GH 41662)
当对多个在索引或列上未对齐的 DataFrame 应用 numpy ufunc 时，强制弃用更改行为，现在将首先对齐输入 (GH 39239)
移除了已弃用的 DataFrame._AXIS_NUMBERS()、DataFrame._AXIS_NAMES()、Series._AXIS_NUMBERS()、Series._AXIS_NAMES() (GH 33637)
移除了已弃用的 Index.to_native_types()，请改用 obj.astype(str) (GH 36418)
移除了已弃用的 Series.iteritems(), DataFrame.iteritems()，请使用 obj.items 代替 (GH 45321)
移除了已弃用的 DataFrame.lookup() (GH 35224)
移除了已弃用的 Series.append(), DataFrame.append()，请改用 concat() (GH 35407)
移除了已弃用的 Series.iteritems()、DataFrame.iteritems() 和 HDFStore.iteritems()，请改用 obj.items (GH 45321)
移除了已弃用的 DatetimeIndex.union_many() (GH 45018)
移除了已弃用的 DatetimeArray、DatetimeIndex 和 dt 访问器的 weekofyear 和 week 属性，改为使用 isocalendar().week (GH 33595)
移除了已弃用的 RangeIndex._start(), RangeIndex._stop(), RangeIndex._step()，请改用 start, stop, step (GH 30482)
移除了已弃用的 DatetimeIndex.to_perioddelta() ，请改用 dtindex - dtindex.to_period(freq).to_timestamp() (GH 34853)
移除了已弃用的 Styler.hide_index() 和 Styler.hide_columns() (GH 49397)
移除了已弃用的 Styler.set_na_rep() 和 Styler.set_precision() (GH 49397)
移除了已弃用的 Styler.where() (GH 49397)
移除了已弃用的 Styler.render() (GH 49397)
移除了在 DataFrame.to_latex() 中已弃用的参数 col_space (GH 47970)
移除了已弃用的参数 null_color 在 Styler.highlight_null() 中 (GH 49397)
移除了在 testing.assert_frame_equal(), testing.assert_extension_array_equal(), testing.assert_series_equal(), testing.assert_index_equal() 中已弃用的参数 check_less_precise (GH 30562)
移除了已弃用的 null_counts 参数在 DataFrame.info() 中。请使用 show_counts 代替 (GH 37999)
移除了已弃用的 Index.is_monotonic() 和 Series.is_monotonic()；请改用 obj.is_monotonic_increasing (GH 45422)
移除了已弃用的 Index.is_all_dates() (GH 36697)
强制弃用禁止将时区感知的 Timestamp 和 dtype="datetime64[ns]" 传递给 Series 或 DataFrame 构造函数 (GH 41555)
强制弃用禁止将时区感知的值序列和 dtype="datetime64[ns]" 传递给 Series 或 DataFrame 构造函数 (GH 41555)
强制弃用禁止在 DataFrame 构造函数中使用 numpy.ma.mrecords.MaskedRecords；请传递 "{name: data[name] for name in data.dtype.names} 代替 (GH 40363)
在 Series.astype() 和 DataFrame.astype() 中强制弃用不允许无单位的“datetime64”数据类型 (GH 47844)
强制弃用禁止使用 .astype 将 datetime64[ns] Series、DataFrame 或 DatetimeIndex 转换为时区感知的 dtype，请改用 obj.tz_localize 或 ser.dt.tz_localize (GH 39258)
强制弃用禁止使用 .astype 将时区感知的 Series 、 DataFrame 或 DatetimeIndex 转换为时区不敏感的 datetime64[ns] 数据类型，请使用 obj.tz_localize(None) 或 obj.tz_convert("UTC").tz_localize(None) 代替 (GH 39258)
强制弃用不允许在 concat() 中传递非布尔参数进行排序 (GH 44629)
移除了日期解析函数 parse_date_time(), parse_date_fields(), parse_all_fields() 和 generic_parser() (GH 24518)
从 core.arrays.SparseArray 构造函数中移除了参数 index (GH 43523)
从 DataFrame.groupby() 和 Series.groupby() 中移除参数 squeeze (GH 32380)
从 DateOffset 中移除了已弃用的 apply、apply_index、__call__、onOffset 和 isAnchored 属性 (GH 34171)
在 DatetimeIndex.to_series() 中移除了 keep_tz 参数 (GH 29731)
从 Index.copy() 中移除参数 names 和 dtype，从 MultiIndex.copy() 中移除参数 levels 和 codes (GH 35853, GH 36685)
从 MultiIndex.set_levels() 和 MultiIndex.set_codes() 中移除参数 inplace (GH 35626)
从 DataFrame.to_excel() 和 Series.to_excel() 中移除了参数 verbose 和 encoding (GH 47912)
从 DataFrame.to_csv() 和 Series.to_csv() 中移除了参数 line_terminator，请改用 lineterminator (GH 45302)
从 DataFrame.set_axis() 和 Series.set_axis() 中移除了参数 inplace，请改用 obj = obj.set_axis(..., copy=False) (GH 48130)
禁止传递位置参数给 MultiIndex.set_levels() 和 MultiIndex.set_codes() (GH 41485)
不允许解析带有单位为“Y”、“y”或“M”的Timedelta字符串，因为这些单位不代表明确的持续时间 (GH 36838)
移除了 MultiIndex.is_lexsorted() 和 MultiIndex.lexsort_depth() (GH 38701)
从 PeriodIndex.astype() 中移除了参数 how ，请改用 PeriodIndex.to_timestamp() (GH 37982)
从 DataFrame.mask()、DataFrame.where()、Series.mask() 和 Series.where() 中移除了参数 try_cast (GH 38836)
从 Period.to_timestamp() 中移除了参数 tz，请使用 obj.to_timestamp(...).tz_localize(tz) 代替 (GH 34522)
在 DataFrame.plot() 和 Series.plot() 中移除了参数 sort_columns (GH 47563)
从 DataFrame.take() 和 Series.take() 中移除了参数 is_copy (GH 30615)
从 Index.get_slice_bound()、Index.slice_indexer() 和 Index.slice_locs() 中移除了参数 kind (GH 41378)
从 read_csv() 中移除了参数 prefix、squeeze、error_bad_lines 和 warn_bad_lines (GH 40413, GH 43427)
从 read_excel() 中移除了参数 squeeze (GH 43427)
从 DataFrame.describe() 和 Series.describe() 中移除了参数 datetime_is_numeric，因为日期时间数据将始终被汇总为数值数据 (GH 34798)
禁止将列表 key 传递给 Series.xs() 和 DataFrame.xs()，请改为传递一个元组 (GH 41789)
在 Index 构造函数中禁止使用特定于子类的关键字（例如“freq”、“tz”、“names”、“closed”）（GH 38597）
从 Categorical.remove_unused_categories() 中移除了参数 inplace (GH 37918)
不允许在 unit="M" 或 unit="Y" 的情况下将非整数的浮点数传递给 Timestamp (GH 47266)
从 read_excel() 中移除关键字 convert_float 和 mangle_dupe_cols (GH 41176)
从 read_csv() 和 read_table() 中移除关键字 mangle_dupe_cols (GH 48137)
从 DataFrame.where()、Series.where()、DataFrame.mask() 和 Series.mask() 中移除了 errors 关键字 (GH 47728)
禁止传递除 io 和 sheet_name 之外的非关键字参数给 read_excel() (GH 34418)
禁止传递非关键字参数给 DataFrame.drop() 和 Series.drop() 除了 labels (GH 41486)
禁止传递非关键字参数给 DataFrame.fillna() 和 Series.fillna() 除了 value (GH 41485)
禁止传递非关键字参数给 StringMethods.split() 和 StringMethods.rsplit()，除了 pat (GH 47448)
禁止传递非关键字参数给 DataFrame.set_index() 除了 keys (GH 41495)
禁止传递非关键字参数给 Resampler.interpolate() 除了 method (GH 41699)
禁止传递非关键字参数给 DataFrame.reset_index() 和 Series.reset_index() 除了 level (GH 41496)
禁止传递非关键字参数给 DataFrame.dropna() 和 Series.dropna() (GH 41504)
禁止传递非关键字参数给 ExtensionArray.argsort() (GH 46134)
禁止传递非关键字参数给 Categorical.sort_values() (GH 47618)
禁止传递非关键字参数给 Index.drop_duplicates() 和 Series.drop_duplicates() (GH 41485)
禁止传递非关键字参数给 DataFrame.drop_duplicates() 除了 subset (GH 41485)
禁止传递非关键字参数给 DataFrame.sort_index() 和 Series.sort_index() (GH 41506)
禁止传递非关键字参数给 DataFrame.interpolate() 和 Series.interpolate() 除了 method (GH 41510)
禁止传递非关键字参数给 DataFrame.any() 和 Series.any() (GH 44896)
禁止传递非关键字参数给 Index.set_names() 除了 names (GH 41551)
禁止传递非关键字参数给 Index.join() 除了 other (GH 46518)
禁止传递非关键字参数给 concat() 除了 objs (GH 41485)
不允许传递非关键字参数给 pivot() 除了 data (GH 48301)
禁止传递非关键字参数给 DataFrame.pivot() (GH 48301)
禁止传递非关键字参数给 read_html() 除了 io (GH 27573)
禁止传递非关键字参数给 read_json() 除了 path_or_buf (GH 27573)
禁止传递非关键字参数给 read_sas() 除了 filepath_or_buffer (GH 47154)
禁止传递非关键字参数给 read_stata() 除了 filepath_or_buffer (GH 48128)
禁止传递非关键字参数给 read_csv() 除了 filepath_or_buffer (GH 41485)
禁止传递非关键字参数给 read_table() 除了 filepath_or_buffer (GH 41485)
禁止传递非关键字参数给 read_fwf() 除了 filepath_or_buffer (GH 44710)
禁止传递非关键字参数给 read_xml() 除了 path_or_buffer (GH 45133)
禁止传递非关键字参数给 Series.mask() 和 DataFrame.mask() 除了 cond 和 other (GH 41580)
禁止传递非关键字参数给 DataFrame.to_stata() 除了 path (GH 48128)
禁止传递非关键字参数给 DataFrame.where() 和 Series.where()，除了 cond 和 other (GH 41523)
禁止传递非关键字参数给 Series.set_axis() 和 DataFrame.set_axis()，除了 labels (GH 41491)
禁止传递非关键字参数给 Series.rename_axis() 和 DataFrame.rename_axis()，除了 mapper (GH 47587)
禁止传递非关键字参数给 Series.clip() 和 DataFrame.clip() 除了 lower 和 upper (GH 41511)
禁止传递非关键字参数给 Series.bfill() 、 Series.ffill() 、 DataFrame.bfill() 和 DataFrame.ffill() (GH 41508)
禁止传递非关键字参数给 DataFrame.replace() 和 Series.replace()，除了 to_replace 和 value (GH 47587)
禁止传递非关键字参数给 DataFrame.sort_values() 除了 by (GH 41505)
禁止传递非关键字参数给 Series.sort_values() (GH 41505)
禁止传递非关键字参数给 DataFrame.reindex() 除了 labels (GH 17966)
禁止使用非唯一 Index 对象的 Index.reindex() (GH 42568)
不允许使用标量 data 构造 Categorical (GH 38433)
不允许在没有传递 data 的情况下构造 CategoricalIndex (GH 38944)
移除了 Rolling.validate()、Expanding.validate() 和 ExponentialMovingWindow.validate() (GH 43665)
移除了返回 "freq" 的 Rolling.win_type (GH 38963)
移除了 Rolling.is_datetimelike (GH 38963)
在 DataFrame 和 Series 聚合中移除了 level 关键字；请改用 groupby (GH 39983)
移除了已弃用的 Timedelta.delta()、Timedelta.is_populated() 和 Timedelta.freq (GH 46430, GH 46476)
移除了已弃用的 NaT.freq (GH 45071)
已移除已弃用的 Categorical.replace()，请改用 Series.replace() (GH 44929)
从 Categorical.min() 和 Categorical.max() 中移除了 numeric_only 关键字，改为使用 skipna (GH 48821)
更改了 DataFrame.median() 和 DataFrame.mean() 在 numeric_only=None 时的行为，不再排除类似日期时间的列一旦 numeric_only=None 弃用生效，此注释将不再相关 (GH 29941)
移除了 is_extension_type() ，改为使用 is_extension_array_dtype() (GH 29457)
移除了 .ExponentialMovingWindow.vol (GH 39220)
移除了 Index.get_value() 和 Index.set_value() (GH 33907, GH 28621)
移除了 Series.slice_shift() 和 DataFrame.slice_shift() (GH 37601)
移除 DataFrameGroupBy.pad() 和 DataFrameGroupBy.backfill() (GH 45076)
从 read_json() 中移除 numpy 参数 (GH 30636)
在 DataFrame.to_dict() 中禁止传递 orient 的缩写 (GH 32516)
不允许在非单调的 DatetimeIndex 上进行部分切片，且键不在索引中。现在会引发 KeyError (GH 18531)
移除了 get_offset ，改为使用 to_offset() (GH 30340)
在 infer_freq() 中移除了 warn 关键字 (GH 45947)
在 DataFrame.between_time() 中移除了 include_start 和 include_end 参数，改为使用 inclusive (GH 43248)
在 date_range() 和 bdate_range() 中移除了 closed 参数，改为使用 inclusive 参数 (GH 40245)
在 DataFrame.expanding() 中移除了 center 关键字 (GH 20647)
从 eval() 中移除了 truediv 关键字 (GH 29812)
在 Index.get_loc() 中移除了 method 和 tolerance 参数。请改用 index.get_indexer([label], method=..., tolerance=...) (GH 42269)
移除了 pandas.datetime 子模块 (GH 30489)
移除了 pandas.np 子模块 (GH 30296)
移除了 pandas.util.testing ，改为使用 pandas.testing (GH 30745)
移除了 Series.str.__iter__() (GH 28277)
移除了 pandas.SparseArray，改为使用 arrays.SparseArray (GH 30642)
移除了 pandas.SparseSeries 和 pandas.SparseDataFrame，包括 pickle 支持。(GH 30642)
强制禁止在具有 datetime64、timedelta64 或 period 数据类型的 DataFrame.shift() 和 Series.shift() 中传递整数 fill_value (GH 32591)
在 DataFrame.ewm() 中强制禁止将字符串列标签转换为 times (GH 43265)
强制禁止在 Series.between() 中将 True 和 False 传递给 inclusive，改为分别使用 "both" 和 "neither" (GH 40628)
强制禁止在 read_csv 中使用 engine="c" 时，usecols 使用越界索引 (GH 25623)
强制禁止在 ExcelWriter 中使用 **kwargs；请改用关键字参数 engine_kwargs (GH 40430)
强制禁止将列标签元组传递给 DataFrameGroupBy.__getitem__() (GH 30546)
在使用 MultiIndex 的级别上通过一系列标签进行索引时，强制禁止缺失标签。现在会引发 KeyError (GH 42351)
强制禁止使用 .loc 通过位置切片设置值。请使用带有标签的 .loc 或带有位置的 .iloc 代替 (GH 31840)
强制禁止使用 float 键进行位置索引，即使该键是一个整数，也应手动转换为整数 (GH 34193)
强制禁止使用 DataFrame 索引器与 .iloc，请改用 .loc 以实现自动对齐 (GH 39022)
在 __getitem__ 和 __setitem__ 方法中强制禁止 set 或 dict 索引器 (GH 42825)
强制禁止对 Index 进行索引或对 Series 进行位置索引生成多维对象，例如 obj[:, None]，请在索引前转换为 numpy (GH 35141)
在 merge() 中强制禁止 dict 或 set 对象在 suffixes 中使用 (GH 34810)
强制禁止 merge() 通过 suffixes 关键字和已经存在的列产生重复列 (GH 22818)
强制禁止在不同数量的级别上使用 merge() 或 join() (GH 34862)
在 DataFrame.melt() 中强制禁止 value_name 参数以匹配 DataFrame 列中的元素 (GH 35003)
在 DataFrame.to_markdown() 和 Series.to_markdown() 中强制禁止将 showindex 传递到 **kwargs 中，改为使用 index (GH 33091)
直接移除设置 Categorical._codes (GH 41429)
直接移除设置 Categorical.categories (GH 47834)
从 Categorical.add_categories(), Categorical.remove_categories(), Categorical.set_categories(), Categorical.rename_categories(), Categorical.reorder_categories(), Categorical.set_ordered(), Categorical.as_ordered(), Categorical.as_unordered() 中移除了参数 inplace (GH 37981, GH 41118, GH 41133, GH 47834)
强制使用 min_periods=None 的 Rolling.count() 默认到窗口的大小 (GH 31302)
在 DataFrame.to_parquet(), DataFrame.to_stata() 和 DataFrame.to_feather() 中将 fname 重命名为 path (GH 30338)
强制禁止使用带切片（例如 ser[[slice(0, 2)]]）的单项列表对 Series 进行索引。可以将列表转换为元组，或者直接传递切片（GH 31333）。
在使用字符串索引器对具有 DatetimeIndex 索引的 DataFrame 进行索引时，行为发生了变化，以前这是作为行切片操作的，现在它像任何其他列键一样操作；使用 frame.loc[key] 以获得旧的行为 (GH 36179)
强制 display.max_colwidth 选项不接受负整数 (GH 31569)
移除了 display.column_space 选项，改为使用 df.to_string(col_space=...) (GH 47280)
从 pandas 类中移除了已弃用的方法 mad (GH 11787)
从 pandas 类中移除了已弃用的方法 tshift (GH 11631)
更改了传递给 Series 的空数据的默认行为；默认的数据类型将是 object 而不是 float64 (GH 29405)
更改了 DatetimeIndex.union()、DatetimeIndex.intersection() 和 DatetimeIndex.symmetric_difference() 在时区不匹配时的行为，改为转换为 UTC 而不是转换为对象数据类型 (GH 39328)
更改了 to_datetime() 在参数为 “now” 且 utc=False 时的行为，以匹配 Timestamp("now") (GH 18705)
更改了在具有时区意识的 DatetimeIndex 上使用时区无知的 datetime 对象或反之进行索引的行为；这些现在像任何其他不可比较的类型一样，通过引发 KeyError 来处理 (GH 36148)
更改了带有 datetime64 类型和 fill_value 的 datetime.date 对象的 Index.reindex()、Series.reindex() 和 DataFrame.reindex() 的行为；这些不再被视为等同于 datetime.datetime 对象，因此 reindex 会转换为对象类型 (GH 39767)
当给定一个不是显式 SparseDtype 的 dtype 时，SparseArray.astype() 的行为发生了变化，转换为请求的精确 dtype 而不是静默使用 SparseDtype 代替 (GH 34457)
更改了 Index.ravel() 的行为，使其返回原始 Index 的视图，而不是 np.ndarray (GH 36900)
更改了 Series.to_frame() 和 Index.to_frame() 在显式 name=None 时的行为，使用 None 作为列名，而不是索引的名称或默认的 0 (GH 45523)
更改了 concat() 在 bool-dtype 数组和整数 dtype 数组一起使用时的行为，现在返回 object dtype 而不是整数 dtype；在连接之前显式地将布尔对象转换为整数以获得旧的行为 (GH 45101)
在给定浮点数 data 和整数 dtype 的情况下，当数据无法无损转换时，保留浮点数 dtype，与 Series 行为匹配 (GH 41170)
当给定包含数字条目的 np.ndarray 对象类型时，Index 构造函数的行为发生了变化；现在保留对象类型而不是推断数字类型，这与 Series 行为一致 (GH 42870)
更改了 Index.__and__()、Index.__or__() 和 Index.__xor__() 的行为，使其表现为逻辑操作（与 Series 行为匹配），而不是集合操作的别名（GH 37374）
当传递一个其第一个元素是 Categorical 的列表时，DataFrame 构造函数的行为发生了变化，现在将这些元素视为行并转换为 object 数据类型，与其他类型的行为一致 (GH 38845)
当传递一个数据无法转换的 ``dtype``（非 int 类型）时，DataFrame 构造函数的行为发生了变化；现在它会引发错误，而不是默默忽略 dtype (GH 41733)
更改了 Series 构造函数的行为，它将不再从字符串条目推断 datetime64 或 timedelta64 数据类型 (GH 41731)
更改了 Timestamp 构造函数的行为，当传递一个 np.datetime64 对象和一个 tz 时，将其解释为本地时间而不是UTC时间 (GH 42288)
更改了 Timestamp.utcfromtimestamp() 的行为，以返回一个满足 Timestamp.utcfromtimestamp(val).timestamp() == val 的时区感知对象 (GH 45083)
当传递 SparseArray 或 SparseDtype 时，Index 构造函数的行为已更改，以保留该数据类型而不是转换为 numpy.ndarray (GH 43930)
在具有 DatetimeTZDtype 的对象上进行类似 setitem 的操作（__setitem__, fillna, where, mask, replace, insert, shift 的 fill_value）时，如果使用与对象时区不匹配的值，值将被转换为对象的时区，而不是将两者都转换为 object-dtype (GH 44243)
更改了 Index、Series、DataFrame 构造函数在浮点型数据和 DatetimeTZDtype 数据时的行为，现在这些数据被解释为 UTC 时间而不是本地时间，与整型数据处理方式一致 (GH 45573)
更改了带有整数数据类型和包含 NaN 的浮点数据的 Series 和 DataFrame 构造函数的行为，现在会引发 IntCastingNaNError (GH 40110)
更改了使用整数 dtype 和值太大而无法无损地转换为此 dtype 的 Series 和 DataFrame 构造函数的行为，现在会引发 ValueError (GH 41734)
更改了使用整数 dtype 和值具有 datetime64 或 timedelta64 dtypes 的 Series 和 DataFrame 构造函数的行为，现在会引发 TypeError，请改用 values.view("int64") (GH 41770)
从 pandas.DataFrame.resample()、pandas.Series.resample() 和 pandas.Grouper 中移除了已弃用的 base 和 loffset 参数。请改用 offset 或 origin (GH 31809)
更改了 Series.fillna() 和 DataFrame.fillna() 在 timedelta64[ns] 数据类型和不兼容的 fill_value 情况下的行为；现在会转换为 object 数据类型，而不是引发错误，与其他数据类型的行为一致 (GH 45746)
将 Series.str.replace() 的 regex 默认参数从 True 改为 False。此外，当 regex=True 时，单个字符 pat 现在被视为正则表达式而不是字符串字面量。(GH 36695, GH 24804)
更改了 DataFrame.any() 和 DataFrame.all() 在 bool_only=True 时的行为；具有所有布尔值的对象类型列将不再被包含，请先手动转换为 bool 类型 (GH 46188)
更改了 DataFrame.max(), DataFrame.min, DataFrame.mean, DataFrame.median, DataFrame.skew, DataFrame.kurt 在 axis=None 时的行为，返回一个标量，该标量是跨两个轴应用聚合的结果 (GH 45072)
更改了 Timestamp 与 datetime.date 对象的比较行为；这些现在比较为不相等，并在不等比较时引发错误，匹配 datetime.datetime 的行为 (GH 36131)
更改了 NaT 与 datetime.date 对象比较的行为；这些现在在不等比较时会引发错误 (GH 39196)
在使用列表或字典时，强制弃用在 Series.transform 和 DataFrame.transform 中引发 TypeError 的静默丢弃列（GH 43740）
更改了 DataFrame.apply() 使用类似列表的行为，以便任何部分失败都会引发错误 (GH 43740)
更改了 DataFrame.to_latex() 的行为，现在通过 Styler.to_latex() 使用 Styler 实现 (GH 47970)
当使用整数键和 Float64Index 时，Series.__setitem__() 的行为发生了变化，如果键不在索引中；以前我们将键视为位置（行为类似于 series.iloc[key] = val），现在我们将其视为标签（行为类似于 series.loc[key] = val），与 Series.__getitem__() 行为一致 (GH 33469)
从 factorize()、Index.factorize() 和 ExtensionArray.factorize() 中移除了 na_sentinel 参数 (GH 47157)
更改了 Series.diff() 和 DataFrame.diff() 在使用未实现 diff 的 ExtensionDtype 数据类型时的行为，现在这些操作会引发 TypeError 而不是转换为 numpy (GH 31025)
强制弃用在使用 method="outer" 调用 DataFrame 上的 numpy “ufunc”；现在会引发 NotImplementedError (GH 36955)
强制弃用禁止在具有非数字数据类型（GH 47500）的 Series 缩减（rank、any、all 等）中传递 numeric_only=True
更改了 DataFrameGroupBy.apply() 和 SeriesGroupBy.apply() 的行为，以便即使在检测到转换器时也尊重 group_keys (GH 34998)
在 DataFrame 和 Series 之间的比较中，如果帧的列与序列的索引不匹配，则会引发 ValueError 而不是自动对齐，请在比较之前执行 left, right = left.align(right, axis=1, copy=False) (GH 36795)
在会静默删除引发错误的列的 DataFrame 减少操作中强制弃用 numeric_only=None``（默认值）；现在 ``numeric_only 默认为 False (GH 41480)
在所有带有该参数的 DataFrame 方法中，将 numeric_only 的默认值更改为 False (GH 46096, GH 46906)
在 Series.rank() 中将 numeric_only 的默认值更改为 False (GH 47561)
当 numeric_only=False 时，在 groupby 和 resample 操作中强制弃用静默丢弃烦扰列 (GH 41475)
在 Rolling、Expanding 和 ExponentialMovingWindow 操作中强制弃用静默丢弃烦扰列。现在这将引发一个 errors.DataError (GH 42834)
在使用 df.loc[:, foo] = bar 或 df.iloc[:, foo] = bar 设置值时的行为已更改，这些操作现在总是尝试就地设置值，然后再回退到类型转换（GH 45333）
在各种 DataFrameGroupBy 方法中更改了 numeric_only 的默认值；所有方法现在默认设置为 numeric_only=False (GH 46072)
在 Resampler 方法中将 numeric_only 的默认值更改为 False (GH 47177)
使用 DataFrameGroupBy.transform() 方法并返回 DataFrame 的可调用对象将与其输入的索引对齐 (GH 47244)
当向 DataFrame.groupby() 提供长度为一的列列表时，通过遍历生成的 DataFrameGroupBy 对象返回的键现在将是长度为一的元组 (GH 47761)
移除了已弃用的方法 ExcelWriter.write_cells(), ExcelWriter.save(), ExcelWriter.cur_sheet(), ExcelWriter.handles(), ExcelWriter.path() (GH 45795)
ExcelWriter 属性 book 不能再被设置；它仍然可以被访问和修改 (GH 48943)
在 Rolling、Expanding 和 ExponentialMovingWindow 操作中移除了未使用的 *args 和 **kwargs (GH 47851)
从 DataFrame.to_csv() 中移除了已弃用的参数 line_terminator (GH 45302)
从 lreshape() 中移除了已弃用的参数 label (GH 30219)
在 DataFrame.eval() 和 DataFrame.query() 中，expr 之后的参数是仅关键字参数 (GH 47587)
移除了 Index._get_attributes_dict() (GH 50648)
移除了 Series.__array_wrap__() (GH 50648)
更改了 DataFrame.value_counts() 的行为，以返回一个包含 MultiIndex 的 Series 用于任何类列表（一个元素或多个元素），但对于单个标签返回一个 Index (GH 50829)

性能提升#

在 DataFrameGroupBy.median() 和 SeriesGroupBy.median() 以及 DataFrameGroupBy.cumprod() 中对可空 dtypes 的性能改进 (GH 37493)
在 DataFrameGroupBy.all()、DataFrameGroupBy.any()、SeriesGroupBy.all() 和 SeriesGroupBy.any() 中对对象数据类型的性能改进 (GH 50623)
在 MultiIndex.argsort() 和 MultiIndex.sort_values() 中的性能改进 (GH 48406)
在 MultiIndex.size() 中的性能提升 (GH 48723)
在 MultiIndex.union() 中没有缺失值和没有重复值的性能改进 (GH 48505, GH 48752)
在 MultiIndex.difference() 中的性能提升 (GH 48606)
在 MultiIndex 集合操作中，当 sort=None 时的性能提升 (GH 49010)
在 DataFrameGroupBy.mean()、SeriesGroupBy.mean()、DataFrameGroupBy.var() 和 SeriesGroupBy.var() 中对扩展数组 dtypes 的性能改进 (GH 37493)
在 level=None 时 MultiIndex.isin() 的性能改进 (GH 48622, GH 49577)
在 MultiIndex.putmask() 中的性能提升 (GH 49830)
当索引包含重复项时，Index.union() 和 MultiIndex.union() 的性能改进 (GH 48900)
在 Series.rank() 中对 pyarrow-backed dtypes 的性能改进 (GH 50264)
在 Series.searchsorted() 中对 pyarrow 支持的 dtypes 的性能改进 (GH 50447)
在 Series.fillna() 中对扩展数组 dtypes 的性能改进 (GH 49722, GH 50078)
在 Index 是单调的情况下，对于掩码和箭头 dtypes，Index.join()、Index.intersection() 和 Index.union() 的性能改进 (GH 50310, GH 51365)
对具有可空数据类型的 Series.value_counts() 的性能改进 (GH 48338)
对于传递带有可空数据类型的整数 numpy 数组的 Series 构造函数的性能改进 (GH 48338)
对传递列表的 DatetimeIndex 构造函数的性能改进 (GH 48609)
在 merge() 和 DataFrame.join() 中，当基于排序的 MultiIndex 进行连接时，性能得到了提升 (GH 48504)
在解析带有时区偏移的字符串时，to_datetime() 的性能改进 (GH 50107)
在 MultiIndex 的基于元组的索引中，DataFrame.loc() 和 Series.loc() 的性能改进 (GH 48384)
使用分类数据类型对 Series.replace() 的性能改进 (GH 49404)
对 MultiIndex.unique() 的性能改进 (GH 48335)
使用可空和箭头数据类型的索引操作的性能改进 (GH 49420, GH 51316)
对于扩展数组支持的索引的 concat() 性能改进 (GH 49128, GH 49178)
对 api.types.infer_dtype() 的性能改进 (GH 51054)
在使用 BZ2 或 LZMA 时减少 DataFrame.to_pickle()/Series.to_pickle() 的内存使用 (GH 49068)
通过传递类型为 np.str_ 的 numpy 数组来改进 StringArray 构造函数的性能 (GH 49109)
在 from_tuples() 中的性能提升 (GH 50620)
在 factorize() 中的性能提升 (GH 49177)
在 __setitem__() 中的性能提升 (GH 50248, GH 50632)
当数组包含 NA 时，ArrowExtensionArray 比较方法的性能改进 (GH 50524)
在 to_numpy() 中的性能提升 (GH 49973, GH 51227)
解析字符串到 BooleanDtype 时的性能提升 (GH 50613)
在 DataFrame.join() 中，当基于 MultiIndex 的子集进行连接时，性能提升 (GH 48611)
对 MultiIndex.intersection() 的性能改进 (GH 48604)
在 DataFrame.__setitem__() 中的性能提升 (GH 46267)
对于可空dtypes，var 和 std 的性能改进 (GH 48379)。
在迭代 pyarrow 和可空 dtypes 时的性能提升 (GH 49825, GH 49851)
对 read_sas() 的性能改进 (GH 47403, GH 47405, GH 47656, GH 48502)
在 RangeIndex.sort_values() 中的内存改进 (GH 48801)
在 Series.to_numpy() 中，如果 copy=True，通过避免复制两次来提高性能 (GH 24345)
在使用 MultiIndex 时，Series.rename() 的性能改进 (GH 21055)
当 by 是分类类型且 sort=False 时，DataFrameGroupBy 和 SeriesGroupBy 的性能改进 (GH 48976)
当 by 是分类类型且 observed=False 时，DataFrameGroupBy 和 SeriesGroupBy 的性能改进 (GH 49596)
在参数 index_col 设置为 ``None``（默认值）的情况下，read_stata() 的性能提升。现在索引将是 RangeIndex 而不是 Int64Index (GH 49745)
在未按索引合并时，merge() 的性能提升 - 新的索引现在将是 RangeIndex 而不是 Int64Index (GH 49478)
在使用任何非对象数据类型时，DataFrame.to_dict() 和 Series.to_dict() 的性能改进 (GH 46470)
当有多个表格时，read_html() 的性能提升 (GH 49929)
在从字符串或整数构造时，Period 构造函数的性能提升 (GH 38312)
在使用 '%Y%m%d' 格式时，to_datetime() 的性能提升 (GH 17410)
当给出格式或可以推断格式时，to_datetime() 的性能改进 (GH 50465)
在可空 dtypes 中对 Series.median() 的性能改进 (GH 50838)
在将 to_datetime() lambda 函数传递给 date_parser 并且输入具有混合时区偏移时，read_csv() 的性能改进 (GH 35296)
在 isna() 和 isnull() 中的性能改进 (GH 50658)
在具有分类数据类型的 SeriesGroupBy.value_counts() 中性能提升 (GH 46202)
修复了 read_hdf() 中的引用泄漏 (GH 37441)
修复了在序列化日期时间和时间增量时 DataFrame.to_json() 和 Series.to_json() 中的内存泄漏问题 (GH 40443)
在许多 DataFrameGroupBy 方法中减少了内存使用 (GH 51090)
在 DataFrame.round() 中对整数 decimal 参数的性能改进 (GH 17254)
在使用大字典作为 to_replace 时，DataFrame.replace() 和 Series.replace() 的性能改进 (GH 6697)
在读取可查找文件时 StataReader 的内存改进 (GH 48922)

错误修复#

Categorical#

在 Categorical.set_categories() 中丢失 dtype 信息的错误 (GH 48812)
在 to_replace 值与新值重叠时，Series.replace() 中分类数据类型的错误 (GH 49404)
在 Series.replace() 中使用分类数据类型时，底层类别的可空数据类型丢失的错误 (GH 49404)
DataFrame.groupby() 和 Series.groupby() 中的错误会在作为分组依据时重新排序类别 (GH 48749)
在从 Categorical 对象构造并且 dtype="category" 时，Categorical 构造函数中丢失有序性的错误 (GH 49309)
在 SeriesGroupBy.min(), SeriesGroupBy.max(), DataFrameGroupBy.min(), 和 DataFrameGroupBy.max() 中，未排序的 CategoricalDtype 且没有组的情况下未能引发 TypeError 的错误 (GH 51034)

Datetimelike#

pandas.infer_freq() 中的错误，在 RangeIndex 上推断时引发 TypeError (GH 47084)
在 to_datetime() 中的错误，当字符串参数对应于大整数时，不正确地引发 OverflowError (GH 50533)
to_datetime() 中的错误在 errors='coerce' 和 infer_datetime_format=True 的情况下对无效偏移量引发 (GH 48633)
DatetimeIndex 构造函数在明确指定 tz=None 时未能引发错误，同时与时区感知的 dtype 或数据结合使用 (GH 48659)
在从 DatetimeIndex 中减去一个 datetime 标量时，无法保留原始 freq 属性 (GH 48818)
pandas.tseries.holiday.Holiday 中的一个错误，其中半开日期区间导致从 USFederalHolidayCalendar.holidays() 返回的类型不一致 (GH 49075)
在渲染带有 dateutil 或 zoneinfo 时区的时区感知数据类型时，DatetimeIndex 和 Series 以及 DataFrame 在夏令时转换附近存在错误 (GH 49684)
在传递非ISO8601 format 时，to_datetime() 中的错误在解析 Timestamp、datetime.datetime、datetime.date 或 np.datetime64 对象时引发 ValueError (GH 49298, GH 50036)
在解析空字符串和非ISO8601格式时，to_datetime() 中的错误会引发 ValueError。现在，空字符串将被解析为 NaT，以与ISO8601格式的处理方式兼容 (GH 50251)
Timestamp 中的错误在解析非ISO8601分隔的日期字符串时显示 UserWarning，这对用户来说是无法操作的 (GH 50232)
在 to_datetime() 中的错误在解析包含ISO周指令和ISO工作日指令格式的日期时显示了误导性的 ValueError (GH 50308)
当 freq 参数具有零持续时间（例如“0ns”）时，Timestamp.round() 中的错误返回不正确的结果而不是引发 (GH 49737)
to_datetime() 中的错误在传递无效格式且 errors 为 'ignore' 或 'coerce' 时没有引发 ValueError (GH 50266)
在 DateOffset 中的错误在以毫秒和另一个超每日参数构造时抛出 TypeError (GH 49897)
to_datetime() 中的错误在解析带有小数日期的字符串时没有引发 ValueError，格式为 '%Y%m%d' (GH 50051)
to_datetime() 中的错误在解析带有ISO8601格式的混合偏移日期字符串时，没有将 None 转换为 NaT (GH 50071)
to_datetime() 中的错误在解析超出界限的日期字符串时没有返回输入，使用 errors='ignore' 和 format='%Y%m%d' (GH 14487)
to_datetime() 中的错误在解析带有时区信息字符串、ISO8601 格式和 utc=False 时，将无时区的 datetime.datetime 转换为有时区的 (GH 50254)
to_datetime() 中的错误在解析带有 ISO8601 格式的日期时抛出 ValueError，其中某些值未零填充 (GH 21422)
在使用 format='%Y%m%d' 和 errors='ignore' 时，to_datetime() 中的错误导致结果不正确 (GH 26493)
在 to_datetime() 中的错误无法解析日期字符串 'today' 和 'now' 如果 format 不是 ISO8601 (GH 50359)
在 Timestamp.utctimetuple() 中引发 TypeError 的错误 (GH 32174)
to_datetime() 中的错误在解析混合偏移的 Timestamp 时，当 errors='ignore' 时会引发 ValueError (GH 50585)
在 to_datetime() 中的错误在处理接近溢出边界的浮点输入时处理不当 (GH 50183)
在使用单位为“Y”或“M”的 to_datetime() 时出现错误，结果不正确，与逐点 Timestamp 结果不匹配 (GH 50870)
Series.interpolate() 和 DataFrame.interpolate() 中存在一个错误，当使用 datetime 或 timedelta dtypes 时会错误地引发 ValueError (GH 11312)
在 to_datetime() 中的错误在输入超出范围时没有返回带有 errors='ignore' 的输入 (GH 50587)
当给定一个包含时区感知datetime64列的 DataFrame 输入时，DataFrame.from_records() 中的错误会错误地丢失时区感知 (GH 51162)
在 to_datetime() 中的错误在解析带有 errors='coerce' 的日期字符串时引发了 decimal.InvalidOperation (GH 51084)
在同时指定 unit 和 origin 的情况下，to_datetime() 存在错误，返回不正确的结果 (GH 42624)
在将包含时区感知日期时间或字符串的对象数据类型对象转换为 datetime64[ns] 时，Series.astype() 和 DataFrame.astype() 中的错误错误地本地化为 UTC 而不是引发 TypeError (GH 50140)
在包含 NaT 的组中，具有日期时间或时间增量数据类型的 DataFrameGroupBy.quantile() 和 SeriesGroupBy.quantile() 存在错误，导致结果不正确 (GH 51373)
DataFrameGroupBy.quantile() 和 SeriesGroupBy.quantile() 中的错误在遇到 PeriodDtype 或 DatetimeTZDtype 时错误地引发 (GH 51373)

Timedelta#

在输入具有可空类型 Float64 时，to_timedelta() 中的错误 (GH 48796)
Timedelta 构造函数中的错误在给定 np.timedelta64("nat") 时错误地引发而不是返回 NaT (GH 48898)
Timedelta 构造函数在同时传递 Timedelta 对象和关键字（例如 days, seconds）时未能引发错误的 Bug (GH 48898)
Timedelta 与非常大的 datetime.timedelta 对象进行比较时出现错误，引发 OutOfBoundsTimedelta (GH 49021)

时区#

在包含多个具有异构时区的时区感知 datetime 对象的 object-dtype 中，Series.astype() 和 DataFrame.astype() 中的错误，错误地引发到 DatetimeTZDtype （GH 32581）
在 to_datetime() 中的错误在 format 指定为 %Z 时无法解析带有时区名称的日期字符串 (GH 49748)
当向 Timestamp.tz_localize() 中的 ambiguous 参数传递无效值时，提供更好的错误信息 (GH 49565)
字符串解析中的错误错误地允许使用无效的时区构造 Timestamp，这会在尝试打印时引发 (GH 50668)
在 objects_to_datetime64ns() 中修正了 TypeError 消息，以通知 DatetimeIndex 具有混合时区 (GH 50974)

Numeric#

在 DataFrame.add() 中的错误：当输入包含混合的 DataFrame 类型和 Series 类型时无法应用 ufunc (GH 39853)
Series 中的算术运算错误，在结合掩码数据类型和 numpy 数据类型时未传播掩码 (GH 45810, GH 42630)
在 DataFrame.sem() 和 Series.sem() 中的错误，当使用由 ArrowDtype 支持的数据时，总是会引发一个错误的 TypeError (GH 49759)
在 Series.__add__() 中将列表和掩码的 Series 转换为对象的错误 (GH 22962)
在 mode() 中的错误，当存在 NA 值时，dropna=False 未被遵守 (GH 50982)
在 DataFrame.query() 中使用 engine="numexpr" 并且列名是 min 或 max 时会出现 TypeError (GH 50937)
在包含 pd.NaT 和 axis=1 的 tz-aware 数据中，DataFrame.min() 和 DataFrame.max() 存在错误，会返回不正确的结果 (GH 51242)

转换#

在从字符串列表构建 Series 时，使用 int64 数据类型出现错误，本应转换却抛出异常 (GH 44923)
在构建带有掩码 dtype 和布尔值的 Series 时出现 NA 引发的问题 (GH 42137)
在 DataFrame.eval() 中的错误在函数调用中存在负值时错误地引发 AttributeError (GH 46471)
在 Series.convert_dtypes() 中的错误，当 Series 包含 NA 并且具有 object 数据类型时，不会将数据类型转换为可为空的数据类型 (GH 48791)
任何带有 kind="M" 的 ExtensionDtype 子类被解释为时区类型的问题 (GH 34986)
arrays.ArrowExtensionArray 中的一个错误，当传递一个字符串序列或二进制时会引发 NotImplementedError (GH 49172)
在从非pyarrow字符串dtype转换为pyarrow数值类型时，Series.astype() 中的错误引发 pyarrow.ArrowInvalid (GH 50430)
在转换为 string 且 copy=False 时，DataFrame.astype() 中的错误会就地修改输入数组 (GH 51073)
在应用 na_value 之前，Series.to_numpy() 中的错误转换为 NumPy 数组 (GH 48951)
在将数据转换为 pyarrow 数据类型时，DataFrame.astype() 中的错误不会复制数据 (GH 50984)
在 to_datetime() 中的错误在 format 是 ISO8601 格式时不尊重 exact 参数 (GH 12649)
在将 TimedeltaArray.astype() 转换为 pyarrow 持续时间类型时引发 TypeError 的错误 (GH 49795)
在 DataFrame.eval() 和 DataFrame.query() 中存在扩展数组数据类型引发的错误 (GH 29618, GH 50261, GH 31913)
在从 Index 创建 Series() 时，当 dtype 等于 Index 的 dtype 时，数据未被复制的问题 (GH 52008)

字符串#

在 pandas.api.types.is_string_dtype() 中的一个错误，对于 StringDtype 或 ArrowDtype 使用 pyarrow.string() 不会返回 True (GH 15585)
在将字符串dtypes转换为“datetime64[ns]”或“timedelta64[ns]”时错误地引发``TypeError``的Bug (GH 36153)
在字符串数据类型列中设置值时存在错误，当数组包含缺失值时，作为副作用会修改数组 (GH 51299)

Interval#

在 IntervalIndex.is_overlapping() 中的错误：如果区间有重复的左边界，则输出不正确 (GH 49581)
Series.infer_objects() 中的一个错误，未能推断出 IntervalDtype 用于 Interval 对象的对象系列 (GH 50090)
在具有 IntervalDtype 和无效的空 fill_value 的 Series.shift() 中存在错误，未能引发 TypeError (GH 51258)

索引#

在索引器是具有 boolean 数据类型的 DataFrame 时，DataFrame.__setitem__() 中的错误引发 (GH 47125)
在为 uint dtypes 索引列和索引时，DataFrame.reindex() 填充了错误的值的错误 (GH 48184)
当使用不同dtypes设置 DataFrame 时，DataFrame.loc() 中的Bug强制值为单一dtype (GH 50467)
在 DataFrame.sort_values() 中的一个错误，当 by 是空列表且 inplace=True 时，None 没有被返回 (GH 50643)
在通过列表索引器设置值时，DataFrame.loc() 强制转换数据类型的错误 (GH 49159)
在 Series.loc() 中对切片索引器的越界结束引发错误的错误 (GH 50161)
DataFrame.loc() 在所有 False bool 索引器和空对象时引发 ValueError 的错误 (GH 51450)
在 DataFrame.loc() 中使用 bool 索引器和 MultiIndex 引发 ValueError 的错误 (GH 47687)
在为带有非标量索引器的 pyarrow 支持的列设置值时，DataFrame.loc() 引发 IndexError 的错误 (GH 50085)
在索引包含扩展浮点数据类型（Float64 和 Float64）或使用整数的复杂数据类型的索引时，DataFrame.__getitem__()、Series.__getitem__()、DataFrame.__setitem__() 和 Series.__setitem__() 中的错误 (GH 51053)
在 DataFrame.loc() 中设置与空索引器不兼容的值时修改对象的错误 (GH 45981)
在右值是带有 MultiIndex 列的 DataFrame 时，DataFrame.__setitem__() 中引发 ValueError 的错误 (GH 49121)
在重新索引 columns 和 index 时，当 DataFrame 只有一个扩展数组列时，DataFrame.reindex() 将 dtype 转换为 object 的错误 (GH 48190)
在索引器是具有数值扩展数组数据类型的 Series 时，DataFrame.iloc() 引发 IndexError 的错误 (GH 49521)
在结果索引中格式化百分位数时，describe() 中的错误显示了比需要更多的小数位数 (GH 46362)
DataFrame.compare() 中的错误在比较 NA 与可空数据类型中的值时无法识别差异 (GH 48939)
在带有 MultiIndex 的 Series.rename() 中存在一个错误，导致扩展数组 dtypes 丢失 (GH 21055)
在 DataFrame 中，DataFrame.isetitem() 中的错误强制将扩展数组的数据类型转换为对象 (GH 49922)
在从空的 pyarrow 支持的对象中选择时，Series.__getitem__() 返回损坏对象的错误 (GH 51734)
在 BusinessHour 中的错误会导致在索引中不包含开放时间时创建 DatetimeIndex 失败 (GH 49835)

缺失#

当 Index 由包含 NA 的元组组成时，Index.equals() 引发 TypeError 的错误 (GH 48446)
当数据包含NaN且使用defaultdict映射时，Series.map() 中的错误导致不正确的结果 (GH 48813)
在执行与 bytes 对象的二进制操作时，NA 中的错误引发了一个 TypeError 而不是返回 NA (GH 49108)
在 overwrite=False 时，DataFrame.update() 中的错误，当 self 有包含 NaT 值的列且该列不在 other 中时引发 TypeError (GH 16713)
在包含 NA 的 object-dtype Series 中替换值时，Series.replace() 引发 RecursionError 的错误 (GH 47480)
在 Series.replace() 中替换数值 Series 中的值为 NA 时引发 RecursionError 的错误 (GH 50758)

MultiIndex#

MultiIndex.get_indexer() 中的错误不匹配 NaN 值 (GH 29252, GH 37222, GH 38623, GH 42883, GH 43222, GH 46173, GH 48905)
在索引包含 NA 时，MultiIndex.argsort() 引发 TypeError 的错误 (GH 48495)
在 MultiIndex.difference() 中的错误导致丢失扩展数组的数据类型 (GH 48606)
MultiIndex.set_levels 在设置空级别时引发 IndexError 的错误 (GH 48636)
在 MultiIndex.unique() 中丢失扩展数组数据类型的错误 (GH 48335)
在 MultiIndex.intersection() 中丢失扩展数组的错误 (GH 48604)
在 MultiIndex.union() 中丢失扩展数组的错误 (GH 48498, GH 48505, GH 48900)
在 MultiIndex.union() 中的错误，当 sort=None 且索引包含缺失值时不进行排序 (GH 49010)
在 MultiIndex.append() 中未检查名称是否相等的错误 (GH 48288)
在 MultiIndex.symmetric_difference() 中丢失扩展数组的错误 (GH 48607)
在 MultiIndex 有重复项时，MultiIndex.join() 中的错误导致 dtypes 丢失 (GH 49830)
在 MultiIndex.putmask() 中的错误导致扩展数组丢失 (GH 49830)
在 MultiIndex.value_counts() 中存在一个错误，返回的是由元组扁平索引的 Series 而不是 MultiIndex (GH 49558)

I/O#

read_sas() 中的错误导致 DataFrame 碎片化并引发 errors.PerformanceWarning (GH 48595)
在 read_excel() 中改进了错误信息，在读取文件时如果引发异常，则包括出错的表单名称 (GH 48706)
当序列化一个由 PyArrow 支持的数据子集时出现错误，会序列化整个数据而不是子集 (GH 42600)
在指定 chunksize 且结果为空时，read_sql_query() 忽略 dtype 参数的错误 (GH 50245)
在 names 列数少于单行 csv 的情况下，read_csv() 中的错误在使用 engine="c" 时会引发 errors.ParserError (GH 47566)
在 read_json() 中使用 orient="table" 和 NA 值时出现的错误 (GH 40255)
显示 string dtypes 时存在错误，未显示存储选项 (GH 50099)
在 header=False 的情况下，DataFrame.to_string() 中的一个错误，将索引名称打印在数据第一行的同一行上 (GH 49230)
在 DataFrame.to_string() 中的错误，忽略扩展数组的浮点格式化器 (GH 39336)
修复了源自内部 JSON 模块初始化的内存泄漏问题 (GH 49222)
修复了 json_normalize() 会错误地删除与 sep 参数匹配的列名前导字符的问题 (GH 49861)
在包含 NA 时，read_csv() 中的错误不必要地溢出扩展数组数据类型 (GH 32134)
DataFrame.to_dict() 中的错误未将 NA 转换为 None (GH 50795)
在 DataFrame.to_json() 中的一个错误，当无法编码字符串时会导致段错误 (GH 50307)
当 DataFrame 包含非标量数据时，使用 na_rep 设置的 DataFrame.to_html() 中的错误 (GH 47103)
在 read_xml() 中的错误，当使用 iterparse 时，类似文件的对象会失败 (GH 50641)
当 engine="pyarrow" 时，read_csv() 中的 encoding 参数处理不正确 (GH 51302)
在 read_xml() 中的错误在使用 iterparse 时忽略了重复元素 (GH 51183)
在实例化过程中如果发生异常，ExcelWriter 中的错误会导致文件句柄未关闭 (GH 51443)
Bug in DataFrame.to_parquet() 中，当 engine="pyarrow" 时，非字符串索引或列引发 ValueError (GH 52036)

周期#

在 Period.strftime() 和 PeriodIndex.strftime() 中的错误，当传递特定于区域设置的指令时引发 UnicodeDecodeError (GH 46319)
在将 Period 对象添加到 DateOffset 对象数组时错误地引发 TypeError (GH 50162)
在 Period 中的一个错误，当传递一个分辨率比纳秒更细的字符串时，会导致 KeyError 而不是丢弃额外的精度 (GH 50417)
解析表示周期的字符串（例如“2017-01-23/2017-01-29”）时出现错误，将其解析为分钟频率而不是周频率 (GH 50803)
在 DataFrameGroupBy.sum()、DataFrameGroupByGroupBy.cumsum()、DataFrameGroupByGroupBy.prod()、DataFrameGroupByGroupBy.cumprod() 中存在一个错误，当使用 PeriodDtype 时未能引发 TypeError (GH 51040)
在解析空字符串时，Period 错误地引发 ValueError 而不是返回 NaT (GH 51349)

绘图#

DataFrame.plot.hist() 中的错误，未删除 weights 中对应 data 中 NaN 值的元素 (GH 48884)
ax.set_xlim 有时会引发 UserWarning，由于 set_xlim 不接受解析参数，用户无法解决此问题 - 转换器现在使用 Timestamp() 代替 (GH 49148)

分组/重采样/滚动#

ExponentialMovingWindow 中的错误，对于不支持的操作，online 没有引发 NotImplementedError (GH 48834)
DataFrameGroupBy.sample() 中的错误在对象为空时引发 ValueError (GH 48459)
当索引的一个条目等于索引的名称时，Series.groupby() 中的错误会引发 ValueError (GH 48567)
在传递空 DataFrame 时，DataFrameGroupBy.resample() 中的错误会产生不一致的结果 (GH 47705)
DataFrameGroupBy 和 SeriesGroupBy 中的错误在按分类索引分组时不会在结果中包含未观察到的类别 (GH 49354)
DataFrameGroupBy 和 SeriesGroupBy 中的错误会根据分组时输入索引的分类改变结果顺序 (GH 49223)
在 DataFrameGroupBy 和 SeriesGroupBy 中，当对分类数据进行分组时，即使使用 sort=False 也会对结果值进行排序 (GH 42482)
DataFrameGroupBy.apply() 和 SeriesGroupBy.apply 中的错误，当 as_index=False 时，在使用它们失败并出现 TypeError 时不会尝试使用分组键进行计算（GH 49256）
DataFrameGroupBy.describe() 中的错误会描述组键 (GH 49256)
在 as_index=False 的情况下，SeriesGroupBy.describe() 中的错误会导致不正确的形状 (GH 49256)
DataFrameGroupBy 和 SeriesGroupBy 中的 dropna=False 错误会在分组器为分类时删除 NA 值 (GH 36327)
在 SeriesGroupBy.nunique() 中的错误会在分组器是一个空的分类且 observed=True 时错误地引发 (GH 21334)
SeriesGroupBy.nth() 中的错误会在从 DataFrameGroupBy 子集化后分组器包含 NA 值时引发 (GH 26454)
在 DataFrame.groupby() 中的错误不会在 as_index=False 时将通过 key 指定的 Grouper 包含在结果中 (GH 50413)
DataFrameGroupBy.value_counts() 中的错误在使用 TimeGrouper 时会引发 (GH 50486)
Resampler.size() 中的一个错误导致返回了一个宽的 DataFrame 而不是一个带有 MultiIndex 的 Series (GH 46826)
Bug in DataFrameGroupBy.transform() 和 SeriesGroupBy.transform() 在 grouper 具有 axis=1 时，对于 "idxmin" 和 "idxmax" 参数会错误地引发 (GH 45986)
DataFrameGroupBy 中的一个错误在使用空 DataFrame、分类分组器和 dropna=False 时会引发 (GH 50634)
SeriesGroupBy.value_counts() 中的错误不尊重 sort=False (GH 50482)
DataFrameGroupBy.resample() 中的错误在从键列表获取重采样时间索引的结果时引发 KeyError (GH 50840)
DataFrameGroupBy.transform() 和 SeriesGroupBy.transform() 中的错误会在 grouper 对 "ngroup" 参数有 axis=1 时错误地引发 (GH 45986)
当数据有重复列时，DataFrameGroupBy.describe() 中的错误产生了不正确的结果 (GH 50806)
DataFrameGroupBy.agg() 中 engine="numba" 未能尊重 as_index=False 的错误 (GH 51228)
DataFrameGroupBy.agg()、SeriesGroupBy.agg() 和 Resampler.agg() 中的错误会在传递函数列表时忽略参数 (GH 50863)
DataFrameGroupBy.ohlc() 中的错误忽略了 as_index=False (GH 51413)
在列子集化后 DataFrameGroupBy.agg() 中的 Bug（例如 .groupby(...)[["a", "b"]]）不会在结果中包含分组 (GH 51186)

Reshaping#

DataFrame.pivot_table() 中对于可空类型和 margins=True 引发 TypeError 的错误 (GH 48681)
在 MultiIndex 具有混合名称时，DataFrame.unstack() 和 Series.unstack() 中的错误导致错误的 MultiIndex 级别被解堆叠 (GH 48763)
在 DataFrame.melt() 中丢失扩展数组数据类型的错误 (GH 41570)
DataFrame.pivot() 中的错误不尊重 None 作为列名 (GH 48293)
当 left_on 或 right_on 是或包含 CategoricalIndex 时，DataFrame.join() 中的错误会错误地引发 AttributeError (GH 48464)
在结果为空 DataFrame 时，DataFrame.pivot_table() 中的 ValueError 错误与参数 margins=True 有关 (GH 49240)
在传递无效的 validate 选项时，在 merge() 中澄清错误信息 (GH 49417)
在包含 NaN 值或空列表的多列上，DataFrame.explode() 引发 ValueError 的错误 (GH 46084)
在带有 timedelta64[ns] 端点的 IntervalDtype 列的 DataFrame.transpose() 中的错误 (GH 44917)
DataFrame.agg() 和 Series.agg() 中的错误会在传递函数列表时忽略参数 (GH 50863)

Sparse#

当将 SparseDtype 与 datetime64[ns] 子类型转换为 int64 数据类型时，Series.astype() 中的错误引发，与非稀疏行为不一致 (GH 49631,:issue:50087)
当将 datetime64[ns] 转换为 Sparse[datetime64[ns]] 时，Series.astype() 中的错误不正确地引发 (GH 50082)
在 MultiIndex 包含 ExtensionArray 时，Series.sparse.to_coo() 引发 SystemError 的错误 (GH 50996)

ExtensionArray#

在可空整数中，Series.mean() 存在不必要的溢出问题 (GH 48378)
在 Series.tolist() 中，对于可空 dtypes 返回 numpy 标量而不是 python 标量的问题 (GH 49890)
在 Series.round() 中，对于 pyarrow 支持的 dtypes 引发的 AttributeError 错误 (GH 50437)
当将一个空的 DataFrame 与具有相同 ExtensionDtype 的另一个 DataFrame 连接时，结果的 dtype 变成了对象 (GH 48510)
在指定 na_value 时，array.PandasArray.to_numpy() 中的错误会因 NA 值而引发 (GH 40638)
在 api.types.is_numeric_dtype() 中的一个错误，其中自定义的 ExtensionDtype 如果在 _is_numeric 返回 True 时不会返回 True (GH 50563)
在 api.types.is_integer_dtype()、api.types.is_unsigned_integer_dtype()、api.types.is_signed_integer_dtype()、api.types.is_float_dtype() 中的错误，如果自定义的 ExtensionDtype 的 kind 返回相应的 NumPy 类型，则不会返回 True (GH 50667)
Series 构造函数中不必要的溢出问题，对于可空的无符号整数类型 (GH 38798, GH 25880)
在 StringArray 中设置非字符串值时，引发 ValueError 而不是 TypeError 的错误 (GH 49632)
在 DataFrame.reindex() 中的错误不遵守默认的 copy=True 关键字，在具有 ExtensionDtype 的列的情况下（结果也导致使用 getitem ([]) 选择多个列时没有正确地生成副本）(GH 51197)
ArrowExtensionArray 逻辑操作 & 和 | 引发 KeyError 的错误 (GH 51688)

Styler#

修复了对于包含 NA 值的可空 dtype Series 的 background_gradient() (GH 50712)

元数据#

在 DataFrame.corr() 和 DataFrame.cov() 中修复了固定元数据传播 (GH 28283)

其他#

错误地接受包含多次“[pyarrow]”的dtype字符串的错误 (GH 51548)
Series.searchsorted() 中的错误：当接受 DataFrame 作为参数 value 时行为不一致 (GH 49620)
在 DataFrame 输入时 array() 未能引发错误的 Bug (GH 51167)

贡献者#

总共有260人为此版本贡献了补丁。名字后面带有“+”的人首次贡献了补丁。

5j9 +
ABCPAN-rank +
Aarni Koskela +
Aashish KC +
Abubeker Mohammed +
Adam Mróz +
Adam Ormondroyd +
Aditya Anulekh +
Ahmed Ibrahim
Akshay Babbar +
Aleksa Radojicic +
Alex +
Alex Buzenet +
Alex Kirko
Allison Kwan +
Amay Patel +
Ambuj Pawar +
Amotz +
Andreas Schwab +
Andrew Chen +
Anton Shevtsov
Antonio Ossa Guerra +
Antonio Ossa-Guerra +
Anushka Bishnoi +
Arda Kosar
Armin Berres
Asadullah Naeem +
Asish Mahapatra
Bailey Lissington +
BarkotBeyene
Ben Beasley
Bhavesh Rajendra Patil +
Bibek Jha +
Bill +
Bishwas +
CarlosGDCJ +
Carlotta Fabian +
Chris Roth +
Chuck Cadman +
Corralien +
DG +
Dan Hendry +
Daniel Isaac
David Kleindienst +
David Poznik +
David Rudel +
DavidKleindienst +
Dea María Léon +
Deepak Sirohiwal +
Dennis Chukwunta
Douglas Lohmann +
Dries Schaumont
Dustin K +
Edoardo Abati +
Eduardo Chaves +
Ege Özgüroğlu +
Ekaterina Borovikova +
Eli Schwartz +
Elvis Lim +
Emily Taylor +
Emma Carballal Haire +
Erik Welch +
Fangchen Li
Florian Hofstetter +
Flynn Owen +
Fredrik Erlandsson +
Gaurav Sheni
Georeth Chow +
George Munyoro +
Guilherme Beltramini
Gulnur Baimukhambetova +
H L +
Hans
Hatim Zahid +
HighYoda +
Hiki +
Himanshu Wagh +
Hugo van Kemenade +
Idil Ismiguzel +
Irv Lustig
Isaac Chung
Isaac Virshup
JHM Darbyshire
JHM Darbyshire (iMac)
JMBurley
Jaime Di Cristina
Jan Koch
JanVHII +
Janosh Riebesell
JasmandeepKaur +
Jeremy Tuloup
Jessica M +
Jonas Haag
Joris Van den Bossche
João Meirelles +
Julia Aoun +
Justus Magin +
Kang Su Min +
Kevin Sheppard
Khor Chean Wei
Kian Eliasi
Kostya Farber +
KotlinIsland +
Lakmal Pinnaduwage +
Lakshya A Agrawal +
Lawrence Mitchell +
Levi Ob +
Loic Diridollou
Lorenzo Vainigli +
Luca Pizzini +
Lucas Damo +
Luke Manley
Madhuri Patil +
Marc Garcia
Marco Edward Gorelli
Marco Gorelli
MarcoGorelli
Maren Westermann +
Maria Stazherova +
Marie K +
Marielle +
Mark Harfouche +
Marko Pacak +
Martin +
Matheus Cerqueira +
Matheus Pedroni +
Matteo Raso +
Matthew Roeschke
MeeseeksMachine +
Mehdi Mohammadi +
Michael Harris +
Michael Mior +
Natalia Mokeeva +
Neal Muppidi +
Nick Crews
Nishu Choudhary +
Noa Tamir
Noritada Kobayashi
Omkar Yadav +
P. Talley +
Pablo +
Pandas Development Team
Parfait Gasana
Patrick Hoefler
Pedro Nacht +
Philip +
Pietro Battiston
Pooja Subramaniam +
Pranav Saibhushan Ravuri +
Pranav. P. A +
Ralf Gommers +
RaphSku +
Richard Shadrach
Robsdedude +
Roger
Roger Thomas
RogerThomas +
SFuller4 +
Salahuddin +
Sam Rao
Sean Patrick Malloy +
Sebastian Roll +
Shantanu
Shashwat +
Shashwat Agrawal +
Shiko Wamwea +
Shoham Debnath
Shubhankar Lohani +
Siddhartha Gandhi +
Simon Hawkins
Soumik Dutta +
Sowrov Talukder +
Stefanie Molin
Stefanie Senger +
Stepfen Shawn +
Steven Rotondo
Stijn Van Hoey
Sudhansu +
Sven
Sylvain MARIE
Sylvain Marié
Tabea Kossen +
Taylor Packard
Terji Petersen
Thierry Moisan
Thomas H +
Thomas Li
Torsten Wörtwein
Tsvika S +
Tsvika Shapira +
Vamsi Verma +
Vinicius Akira +
William Andrea
William Ayd
William Blum +
Wilson Xing +
Xiao Yuan +
Xnot +
Yasin Tatar +
Yuanhao Geng
Yvan Cywan +
Zachary Moon +
Zhengbo Wang +
abonte +
adrienpacifico +
alm
amotzop +
andyjessen +
anonmouse1 +
bang128 +
bishwas jha +
calhockemeyer +
carla-alves-24 +
carlotta +
casadipietra +
catmar22 +
cfabian +
codamuse +
dataxerik
davidleon123 +
dependabot[bot] +
fdrocha +
github-actions[bot]
himanshu_wagh +
iofall +
jakirkham +
jbrockmendel
jnclt +
joelchen +
joelsonoda +
joshuabello2550
joycewamwea +
kathleenhang +
krasch +
ltoniazzi +
luke396 +
milosz-martynow +
minat-hub +
mliu08 +
monosans +
nealxm
nikitaved +
paradox-lab +
partev
raisadz +
ram vikram singh +
rebecca-palmer
sarvaSanjay +
seljaks +
silviaovo +
smij720 +
soumilbaldota +
stellalin7 +
strawberry beach sandals +
tmoschou +
uzzell +
yqyqyq-W +
yun +
Ádám Lippai
김동현 (Daniel Donghyun Kim) +

2.0.0 中的新功能 (2023年4月3日)#

增强功能#

使用 pip extras 安装可选依赖项#

Index 现在可以容纳 numpy 数值类型#

参数 dtype_backend，用于返回 pyarrow 支持的或 numpy 支持的可空数据类型#

写时复制改进#

其他增强功能#

值得注意的错误修复#

DataFrameGroupBy.cumsum() 和 DataFrameGroupBy.cumprod() 溢出而不是有损转换为浮点数#

DataFrameGroupBy.nth() 和 SeriesGroupBy.nth() 现在表现为过滤器#

向后不兼容的 API 更改#

使用 datetime64 或 timedelta64 dtype 构造时，分辨率不受支持#

值计数将结果名称设置为 count#

不允许将 astype 转换为不支持的 datetime64/timedelta64 dtypes#

UTC 和固定偏移时区默认使用标准库的 tzinfo 对象#

空的 DataFrame/Series 现在默认会有一个 RangeIndex#

DataFrame 到 LaTeX 有一个新的渲染引擎#

增加了依赖项的最低版本要求#

日期时间现在以一致的格式解析#

其他 API 更改#

弃用#

移除先前版本的弃用/更改#

性能提升#

错误修复#

Categorical#

Datetimelike#

Timedelta#

时区#

Numeric#

转换#

字符串#

Interval#

索引#

缺失#

MultiIndex#

I/O#

周期#

绘图#

分组/重采样/滚动#

Reshaping#

Sparse#

ExtensionArray#

Styler#

元数据#

其他#

贡献者#

`Index` 现在可以容纳 numpy 数值类型#

参数 `dtype_backend`，用于返回 pyarrow 支持的或 numpy 支持的可空数据类型#

`DataFrameGroupBy.cumsum()` 和 `DataFrameGroupBy.cumprod()` 溢出而不是有损转换为浮点数#

`DataFrameGroupBy.nth()` 和 `SeriesGroupBy.nth()` 现在表现为过滤器#

值计数将结果名称设置为 `count`#

空的 DataFrame/Series 现在默认会有一个 `RangeIndex`#