2.2.0 版本的新增内容（2024年1月19日）#

这些是 pandas 2.2.0 中的更改。请参阅发布以获取包括其他版本 pandas 的完整更新日志。

pandas 3.0 中的即将到来的变化#

pandas 3.0 将带来两个对 pandas 默认行为的更大改变。

写时复制#

目前可选的写时复制模式将在 pandas 3.0 中默认启用。将没有选项保持当前行为启用。新的行为语义在关于写时复制的用户指南中解释。

新的行为可以从 pandas 2.0 开始通过以下选项启用：

pd.options.mode.copy_on_write = True

这一更改带来了 pandas 在处理副本和视图时的行为变化。其中一些变化允许明确的弃用，例如链式赋值的变化。其他变化则更为微妙，因此警告隐藏在一个可以在 pandas 2.2 中启用的选项后面。

pd.options.mode.copy_on_write = "warn"

这种模式会在许多不同的情况下发出警告，而这些情况实际上与大多数查询无关。我们建议探索这种模式，但不必消除所有这些警告。迁移指南更详细地解释了升级过程。

专用的字符串数据类型（默认由 Arrow 支持）#

从历史上看，pandas 使用 NumPy 对象数据类型表示字符串列。这种表示方法存在许多问题，包括性能慢和内存占用大。这将在 pandas 3.0 中发生变化。pandas 将开始推断字符串列为新的 string 数据类型，由 Arrow 支持，该类型在内存中表示连续的字符串。这将带来巨大的性能和内存改进。

旧行为：

In [1]: ser = pd.Series(["a", "b"])
Out[1]:
0    a
1    b
dtype: object

新行为：

In [1]: ser = pd.Series(["a", "b"])
Out[1]:
0    a
1    b
dtype: string

在这些场景中使用的字符串数据类型将主要表现为 NumPy 对象的行为，包括缺失值语义和这些列上的一般操作。

此更改包括API中的一些其他更改：

目前，指定 dtype="string" 会创建一个由 Python 字符串支持的 dtype，这些字符串存储在 NumPy 数组中。在 pandas 3.0 中，这种 dtype 将创建一个由 Arrow 支持的字符串列。
列名和索引也将由 Arrow 字符串支持。
PyArrow 将成为 pandas 3.0 的必需依赖项，以适应这一变化。

这个未来的 dtype 推断逻辑可以通过以下方式启用：

pd.options.future.infer_string = True

增强功能#

在 to_sql 和 read_sql 中支持 ADBC 驱动#

read_sql() 和 to_sql() 现在可以使用 Apache Arrow ADBC 驱动程序。与通过 SQLAlchemy 使用的传统驱动程序相比，ADBC 驱动程序应提供显著的性能提升、更好的类型支持和更清晰的空值处理。

import adbc_driver_postgresql.dbapi as pg_dbapi

df = pd.DataFrame(
    [
        [1, 2, 3],
        [4, 5, 6],
    ],
    columns=['a', 'b', 'c']
)
uri = "postgresql://postgres:postgres@localhost/postgres"
with pg_dbapi.connect(uri) as conn:
    df.to_sql("pandas_table", conn, index=False)

# for round-tripping
with pg_dbapi.connect(uri) as conn:
    df2 = pd.read_sql("pandas_table", conn)

Arrow 类型系统提供了更广泛的数据类型，可以更紧密地匹配 PostgreSQL 等数据库所能提供的数据类型。例如，请注意不同数据库和 pandas 后端中可用的类型（非详尽列表）：

numpy/pandas	arrow	postgres	sqlite
int16/Int16	int16	SMALLINT	INTEGER
int32/Int32	int32	INTEGER	INTEGER
int64/Int64	int64	BIGINT	INTEGER
float32	float32	`toctree` 是一个 reStructuredText 指令，这是一个非常多功能的标记。指令可以有参数、选项和内容。	`toctree` 是一个 reStructuredText 指令，这是一个非常多功能的标记。指令可以有参数、选项和内容。
float64	float64	DOUBLE PRECISION	`toctree` 是一个 reStructuredText 指令，这是一个非常多功能的标记。指令可以有参数、选项和内容。
对象	string	`toctree` 是一个 reStructuredText 指令，这是一个非常多功能的标记。指令可以有参数、选项和内容。	`toctree` 是一个 reStructuredText 指令，这是一个非常多功能的标记。指令可以有参数、选项和内容。
bool	`bool_`	布尔值
datetime64[ns]	时间戳(微秒)	TIMESTAMP
datetime64[ns,tz]	timestamp(us,tz)	TIMESTAMPTZ
	date32	日期
	month_day_nano_interval	INTERVAL
	binary	BINARY	BLOB
	decimal128	DECIMAL [1]
	列表	ARRAY [1]
	struct	复合类型 [1]

脚注

如果你有兴趣在整个 DataFrame 的生命周期中尽可能地保留数据库类型，建议用户利用 read_sql() 的 dtype_backend="pyarrow" 参数

# for round-tripping
with pg_dbapi.connect(uri) as conn:
    df2 = pd.read_sql("pandas_table", conn, dtype_backend="pyarrow")

这将防止您的数据被转换为传统的 pandas/NumPy 类型系统，该系统通常以无法回溯的方式转换 SQL 类型。

有关ADBC驱动程序及其开发状态的完整列表，请参阅 ADBC驱动程序实现状态文档。

基于一个或多个条件创建一个 pandas 系列#

Series.case_when() 函数已添加，用于基于一个或多个条件创建一个 Series 对象。(GH 39154)

In [1]: import pandas as pd

In [2]: df = pd.DataFrame(dict(a=[1, 2, 3], b=[4, 5, 6]))

In [3]: default=pd.Series('default', index=df.index)

In [4]: default.case_when(
   ...:      caselist=[
   ...:          (df.a == 1, 'first'),                              # condition, replacement
   ...:          (df.a.gt(1) & df.b.eq(5), 'second'),  # condition, replacement
   ...:      ],
   ...: )
   ...: 
Out[4]: 
0      first
1     second
2    default
dtype: object

`to_numpy` 用于 NumPy 可空和 Arrow 类型转换为合适的 NumPy dtype#

to_numpy 对于 NumPy 可空类型和 Arrow 类型现在将转换为合适的 NumPy dtype，而不是可空和 PyArrow 支持的扩展 dtype 的 object dtype。

旧行为:

In [1]: ser = pd.Series([1, 2, 3], dtype="Int64")
In [2]: ser.to_numpy()
Out[2]: array([1, 2, 3], dtype=object)

新行为：

In [5]: ser = pd.Series([1, 2, 3], dtype="Int64")

In [6]: ser.to_numpy()
Out[6]: array([1, 2, 3])

In [7]: ser = pd.Series([1, 2, 3], dtype="timestamp[ns][pyarrow]")

In [8]: ser.to_numpy()
Out[8]: 
array(['1970-01-01T00:00:00.000000001', '1970-01-01T00:00:00.000000002',
       '1970-01-01T00:00:00.000000003'], dtype='datetime64[ns]')

默认的 NumPy dtype（没有任何参数）确定如下：

float dtypes 被转换为 NumPy 浮点数
没有缺失值的整数数据类型会被转换为 NumPy 整数数据类型
带有缺失值的整数数据类型会被转换为 NumPy 浮点数据类型，并且使用 NaN 作为缺失值指示符
没有缺失值的布尔型数据类型会被转换为 NumPy 布尔型数据类型
带有缺失值的布尔型数据类型保持对象数据类型
datetime 和 timedelta 类型分别被转换为 Numpy 的 datetime64 和 timedelta64 类型，并且 NaT 被用作缺失值指示符

Series.struct 访问器用于 PyArrow 结构化数据#

Series.struct 访问器提供了用于处理 struct[pyarrow] dtype 系列数据的属性和方法。例如，Series.struct.explode() 将 PyArrow 结构化数据转换为 pandas DataFrame。(GH 54938)

In [9]: import pyarrow as pa

In [10]: series = pd.Series(
   ....:     [
   ....:         {"project": "pandas", "version": "2.2.0"},
   ....:         {"project": "numpy", "version": "1.25.2"},
   ....:         {"project": "pyarrow", "version": "13.0.0"},
   ....:     ],
   ....:     dtype=pd.ArrowDtype(
   ....:         pa.struct([
   ....:             ("project", pa.string()),
   ....:             ("version", pa.string()),
   ....:         ])
   ....:     ),
   ....: )
   ....: 

In [11]: series.struct.explode()
Out[11]: 
   project version
0   pandas   2.2.0
1    numpy  1.25.2
2  pyarrow  13.0.0

使用 Series.struct.field() 来索引到一个（可能是嵌套的）结构字段。

In [12]: series.struct.field("project")
Out[12]: 
0     pandas
1      numpy
2    pyarrow
Name: project, dtype: string[pyarrow]

Series.list 访问器用于 PyArrow 列表数据#

Series.list 访问器提供了处理 list[pyarrow] dtype 系列数据的属性和方法。例如，Series.list.__getitem__() 允许在系列中索引 pyarrow 列表。(GH 55323)

In [13]: import pyarrow as pa

In [14]: series = pd.Series(
   ....:     [
   ....:         [1, 2, 3],
   ....:         [4, 5],
   ....:         [6],
   ....:     ],
   ....:     dtype=pd.ArrowDtype(
   ....:         pa.list_(pa.int64())
   ....:     ),
   ....: )
   ....: 

In [15]: series.list[0]
Out[15]: 
0    1
1    4
2    6
dtype: int64[pyarrow]

Calamine 引擎用于 `读取Excel()`#

calamine 引擎被添加到 read_excel() 。它使用 python-calamine ，这为 Rust 库 calamine 提供了 Python 绑定。此引擎支持 Excel 文件（.xlsx、.xlsm、.xls、.xlsb）和 OpenDocument 电子表格（.ods）（GH 50395）。

这个引擎有两个优点：

Calamine 通常比其他引擎更快，一些基准测试显示结果比 ‘openpyxl’ 快 5 倍，比 ‘odf’ 快 20 倍，比 ‘pyxlsb’ 快 4 倍，比 ‘xlrd’ 快 1.5 倍。但是，’openpyxl’ 和 ‘pyxlsb’ 在从大文件中读取几行时由于对行进行惰性迭代而更快。
Calamine 支持在 .xlsb 文件中识别日期时间，而 ‘pyxlsb’ 是 pandas 中唯一可以读取 .xlsb 文件的其他引擎。

pd.read_excel("path_to_file.xlsb", engine="calamine")

更多信息，请参见用户指南中关于 IO 工具的 Calamine（Excel 和 ODS 文件）。

其他增强功能#

to_sql() 当method参数设置为 multi 时，在后端的Oracle上工作
Series.attrs / DataFrame.attrs 现在使用深拷贝来传播 attrs (GH 54134)。
get_dummies() 现在返回与输入数据类型兼容的扩展数据类型 boolean 或 bool[pyarrow] (GH 56273)
read_csv() 现在支持 on_bad_lines 参数与 engine="pyarrow" (GH 54480)
read_sas() 返回 datetime64 数据类型，其分辨率更好地匹配在 SAS 中本机存储的分辨率，并避免在无法使用 datetime64[ns] 数据类型存储的情况下返回 object-dtype (GH 56127)
read_spss() 现在返回一个存储元数据在 DataFrame.attrs 中的 DataFrame (GH 54264)
tseries.api.guess_datetime_format() 现在是公共 API 的一部分 (GH 54727)
DataFrame.apply() 现在允许使用 numba（通过 engine="numba"）来即时编译传递的函数，从而允许潜在的速度提升 (GH 54666)
ExtensionArray._explode() 接口方法已添加，以允许扩展类型实现 explode 方法 (GH 54833)
ExtensionArray.duplicated() 已添加，以允许扩展类型实现 duplicated 方法 (GH 55255)
Series.ffill(), Series.bfill(), DataFrame.ffill(), 和 DataFrame.bfill() 增加了参数 limit_area; 第三方 ExtensionArray 作者需要在方法 _pad_or_backfill 中添加这个参数 (GH 56492)
允许通过 read_only、data_only 和 keep_links 参数使用 engine_kwargs 调用 openpyxl 的 read_excel() (GH 55027)
为 ArrowDtype 和掩码数据类型实现 Series.interpolate() 和 DataFrame.interpolate() (GH 56267)
为 Series.value_counts() 实现掩码算法 (GH 54984)
为具有 pyarrow.duration 类型的 ArrowDtype 实现了 Series.dt() 方法和属性 (GH 52284)
为 ArrowDtype 实现了 Series.str.extract() (GH 56268)
改进了在 DatetimeIndex.to_period() 中出现的错误消息，对于不支持作为周期频率的频率，例如 "BMS" (GH 56243)
当使用无效的偏移量（如 "QS"）构造 Period 时，改进了错误信息 (GH 55785)
数据类型 string[pyarrow] 和 string[pyarrow_numpy] 现在都使用 PyArrow 的 large_string 类型，以避免长列的溢出问题 (GH 56259)

值得注意的错误修复#

这些是可能具有显著行为变化的错误修复。

`merge()` 和 `DataFrame.join()` 现在一致遵循文档中描述的排序行为#

在之前版本的 pandas 中，merge() 和 DataFrame.join() 并不总是返回遵循文档排序行为的结果。现在，pandas 在合并和连接操作中遵循文档排序行为 (GH 54611, GH 56426, GH 56443)。

如文档所述，sort=True 在结果的 DataFrame 中按字典顺序对连接键进行排序。使用 sort=False，连接键的顺序取决于连接类型（how 关键字）：

how="left": 保留左边键的顺序
how="right": 保留右侧键的顺序
how="inner": 保留左侧键的顺序
how="outer": 按字典顺序排序键

一个行为变化示例是与非唯一左连接键和 sort=False 的内部连接：

In [16]: left = pd.DataFrame({"a": [1, 2, 1]})

In [17]: right = pd.DataFrame({"a": [1, 2]})

In [18]: result = pd.merge(left, right, how="inner", on="a", sort=False)

旧行为

In [5]: result
Out[5]:
   a
0  1
1  1
2  2

新行为

In [19]: result
Out[19]: 
   a
0  1
1  2
2  1

`merge()` 和 `DataFrame.join()` 在级别不同时不再重新排序级别#

在 pandas 的早期版本中，merge() 和 DataFrame.join() 在连接两个具有不同级别的索引时会重新排序索引级别 (GH 34133)。

In [20]: left = pd.DataFrame({"left": 1}, index=pd.MultiIndex.from_tuples([("x", 1), ("x", 2)], names=["A", "B"]))

In [21]: right = pd.DataFrame({"right": 2}, index=pd.MultiIndex.from_tuples([(1, 1), (2, 2)], names=["B", "C"]))

In [22]: left
Out[22]: 
     left
A B      
x 1     1
  2     1

In [23]: right
Out[23]: 
     right
B C       
1 1      2
2 2      2

In [24]: result = left.join(right)

旧行为

In [5]: result
Out[5]:
       left  right
B A C
1 x 1     1      2
2 x 2     1      2

新行为

In [25]: result
Out[25]: 
       left  right
A B C             
x 1 1     1      2
  2 2     1      2

增加了依赖项的最小版本#

对于可选依赖，一般的建议是使用最新版本。低于最低测试版本的可选依赖可能仍然有效，但不被认为是受支持的。下表列出了已增加其最低测试版本的可选依赖。

包	新最低版本
beautifulsoup4	4.11.2
blosc	1.21.3
bottleneck	1.3.6
fastparquet	2022.12.0
fsspec	2022.11.0
gcsfs	2022.11.0
lxml	4.9.2
matplotlib	3.6.3
numba	0.56.4
numexpr	2.8.4
qtpy	2.3.0
openpyxl	3.1.0
psycopg2	2.9.6
pyreadstat	1.2.0
pytables	3.8.0
pyxlsb	1.0.10
s3fs	2022.11.0
scipy	1.10.0
sqlalchemy	2.0.0
tabulate	0.9.0
xarray	2022.12.0
xlsxwriter	3.0.5
zstandard	0.19.0
pyqt5	5.15.8
tzdata	2022.7

更多信息请参见依赖项和可选依赖项。

其他 API 更改#

可空扩展数据类型的哈希值已更改，以提高哈希操作的性能（GH 56507）
check_exact 现在仅在 testing.assert_frame_equal() 和 testing.assert_series_equal() 中对浮点数数据类型生效。特别是，整数数据类型总是被精确检查 (GH 55882)

弃用#

链式赋值#

在为即将到来的对 pandas 3.0 中复制 / 视图行为的较大更改做准备时（写时复制 (CoW)，PDEP-7），我们开始弃用 链式赋值。

链式赋值发生在您尝试通过两个连续的索引操作来更新 pandas DataFrame 或 Series 时。根据这些操作的类型和顺序，目前这可能会或不会起作用。

一个典型的例子如下：

df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})

# first selecting rows with a mask, then assigning values to a column
# -> this has never worked and raises a SettingWithCopyWarning
df[df["bar"] > 5]["foo"] = 100

# first selecting the column, and then assigning to a subset of that column
# -> this currently works
df["foo"][df["bar"] > 5] = 100

这个链式赋值的第二个示例目前可以更新原始的 df。这在 pandas 3.0 中将不再有效，因此我们开始弃用这一点：

>>> df["foo"][df["bar"] > 5] = 100
FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

你可以通过移除链式赋值的使用来修复这个警告，并确保你的代码为 pandas 3.0 做好准备。通常，这可以通过使用例如 .loc 在单一步骤中进行赋值来完成。对于上面的例子，我们可以这样做：

df.loc[df["bar"] > 5, "foo"] = 100

同样的弃用适用于以链式方式进行的就地方法，例如：

>>> df["foo"].fillna(0, inplace=True)
FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.

当目标是更新DataFrame df 中的列时，这里的选择是直接在 df 上调用方法，例如 df.fillna({"foo": 0}, inplace=True)。

更多详情请参见迁移指南。

弃用别名 `M`, `Q`, `Y` 等，改为使用 `ME`, `QE`, `YE` 等作为偏移量#

已弃用以下频率别名（GH 9586）：

offsets	已弃用的别名	新别名
`MonthEnd`	`M`	`ME`
`BusinessMonthEnd`	`BM`	`BME`
`SemiMonthEnd`	`SM`	`SME`
`自定义业务月末`	`CBM`	`CBME`
`QuarterEnd`	`Q`	`QE`
`BQuarterEnd`	`BQ`	`BQE`
`YearEnd`	`Y`	`YE`
`BYearEnd`	`BY`	`BYE`

例如：

以前的行为:

In [8]: pd.date_range('2020-01-01', periods=3, freq='Q-NOV')
Out[8]:
DatetimeIndex(['2020-02-29', '2020-05-31', '2020-08-31'],
              dtype='datetime64[ns]', freq='Q-NOV')

未来行为:

In [26]: pd.date_range('2020-01-01', periods=3, freq='QE-NOV')
Out[26]: DatetimeIndex(['2020-02-29', '2020-05-31', '2020-08-31'], dtype='datetime64[ns]', freq='QE-NOV')

已弃用的自动向下转换#

弃用了在多种方法中对象dtype结果的自动向下转换。这些方法会以难以预测的方式静默改变dtype，因为行为依赖于值。此外，pandas正在远离静默的dtype变化（GH 54710, GH 54261）。

这些方法是：

显式调用 DataFrame.infer_objects() 以在未来复制当前的行为。

result = result.infer_objects(copy=False)

或者使用 astype 将所有全范围浮点数显式转换为整数。

将以下选项设置为选择未来的行为：

In [9]: pd.set_option("future.no_silent_downcasting", True)

其他弃用#

将 Timedelta.resolution_string() 改为返回 h, min, s, ms, us, 和 ns 而不是 H, T, S, L, U, 和 N，以兼容频率别名中的相应弃用 (GH 52536)
已弃用 offsets.Day.delta, offsets.Hour.delta, offsets.Minute.delta, offsets.Second.delta, offsets.Milli.delta, offsets.Micro.delta, offsets.Nano.delta, 请改用 pd.Timedelta(obj) (GH 55498)
已弃用 pandas.api.types.is_interval() 和 pandas.api.types.is_period()，请改用 isinstance(obj, pd.Interval) 和 isinstance(obj, pd.Period) (GH 55264)
已弃用 read_gbq() 和 DataFrame.to_gbq()。请改用 pandas_gbq.read_gbq 和 pandas_gbq.to_gbq https://pandas-gbq.readthedocs.io/en/latest/api.html (GH 55525)
已弃用 DataFrameGroupBy.fillna() 和 SeriesGroupBy.fillna()；请使用 DataFrameGroupBy.ffill()、DataFrameGroupBy.bfill() 进行前向和后向填充，或使用 DataFrame.fillna() 以单一值填充（或等效的 Series 方法）(GH 55718)
已弃用 DateOffset.is_anchored()，对于非Tick子类使用 ``obj.n == 1``（对于Tick，这始终为False）(GH 55388)
已弃用 DatetimeArray.__init__() 和 TimedeltaArray.__init__()，请改用 array() (GH 55623)
已弃用 Index.format()，请使用 index.astype(str) 或 index.map(formatter) 代替 (GH 55413)
已弃用 Series.ravel()，底层数组已经是1D，所以ravel是不必要的 (GH 52511)
已弃用 Series.resample() 和 DataFrame.resample() 使用 PeriodIndex`（以及 'convention' 关键字），请在重采样前转换为 :class:`DatetimeIndex`（使用 `.to_timestamp()``）（GH 53481）
已弃用 Series.view()，请改用 Series.astype() 来更改数据类型 (GH 20251)
已弃用 offsets.Tick.is_anchored()，请改用 False (GH 55388)
已弃用的 core.internals 成员 Block、ExtensionBlock 和 DatetimeTZBlock，请改用公共 API (GH 55139)
在 PeriodIndex 构造函数中弃用 year, month, quarter, day, hour, minute, 和 second 关键字，请改用 PeriodIndex.from_fields() (GH 55960)
不推荐在 Index.view() 中接受一个类型作为参数，请改为不带任何参数调用 (GH 55709)
在 date_range(), timedelta_range(), period_range(), 和 interval_range() 中弃用允许非整数 periods 参数 (GH 56036)
在 DataFrame.to_clipboard() 中弃用允许非关键字参数 (GH 54229)
在 DataFrame.to_csv() 中弃用允许非关键字参数，除了 path_or_buf (GH 54229)
在 DataFrame.to_dict() 中弃用允许非关键字参数 (GH 54229)
在 DataFrame.to_excel() 中不推荐使用非关键字参数，除了 excel_writer (GH 54229)
在 DataFrame.to_gbq() 中弃用允许非关键字参数，除了 destination_table (GH 54229)
在 DataFrame.to_hdf() 中弃用允许非关键字参数，除了 path_or_buf (GH 54229)
在 DataFrame.to_html() 中不推荐使用非关键字参数，除了 buf (GH 54229)
在 DataFrame.to_json() 中弃用允许非关键字参数，除了 path_or_buf (GH 54229)
在 DataFrame.to_latex() 中弃用允许非关键字参数，除了 buf (GH 54229)
在 DataFrame.to_markdown() 中弃用允许非关键字参数，除了 buf (GH 54229)
在 DataFrame.to_parquet() 中弃用允许非关键字参数，除了 path (GH 54229)
在 DataFrame.to_pickle() 中弃用允许非关键字参数，除了 path (GH 54229)
在 DataFrame.to_string() 中弃用允许非关键字参数，除了 buf (GH 54229)
在 DataFrame.to_xml() 中弃用允许非关键字参数，除了 path_or_buffer (GH 54229)
弃用允许将 BlockManager 对象传递给 DataFrame 或将 SingleBlockManager 对象传递给 Series (GH 52419)
已弃用的行为：使用带有 object-dtype 索引的 Index.insert() 静默执行类型推断，改为显式调用 result.infer_objects(copy=False) 以获得旧行为 (GH 51363)
在 Series.isin() 和 Index.isin() 中弃用对非日期时间类型值（主要是字符串）使用 datetime64、timedelta64 和 PeriodDtype 数据类型的转换（GH 53111）
在 Index、Series 和 DataFrame 构造函数中弃用 dtype 推断，当给定 pandas 输入时，调用 .infer_objects 以保持当前行为 (GH 56012)
在将 Index 设置到 DataFrame 中时，弃用 dtype 推断，改为显式转换 (GH 56102)
在使用 DataFrameGroupBy.apply() 和 DataFrameGroupBy.resample() 时，不推荐在计算中包含组；传递 include_groups=False 以排除组 (GH 7155)
已弃用使用长度为零的布尔索引器对 Index 进行索引 (GH 55820)
当按长度为1的类列表分组时，不传递元组给 DataFrameGroupBy.get_group 或 SeriesGroupBy.get_group 已被弃用 (GH 25971)
已弃用的字符串 AS 表示 YearBegin 中的频率和字符串 AS-DEC、AS-JAN 等表示具有不同财政年度开始的各种年度频率 (GH 54275)
已弃用的字符串 A 表示 YearEnd 中的频率和字符串 A-DEC, A-JAN 等表示具有各种财年末的年度频率 (GH 54275)
已弃用的字符串 BAS 表示 BYearBegin 中的频率，以及表示具有各种财政年度开始的年度频率的字符串 BAS-DEC、BAS-JAN 等 (GH 54275)
已弃用的字符串 BA 表示 BYearEnd 中的频率和字符串 BA-DEC, BA-JAN 等表示具有不同财政年度结束的年频率 (GH 54275)
已弃用的字符串 H, BH, 和 CBH 表示 Hour, BusinessHour, CustomBusinessHour 中的频率 (GH 52536)
已弃用的字符串 H, S, U, 和 N 表示 to_timedelta() 中的单位 (GH 52536)
已弃用的字符串 H, T, S, L, U, 和 N 表示 Timedelta 中的单位（GH 52536）
已弃用的字符串 T, S, L, U, 和 N 表示 Minute, Second, Milli, Micro, Nano 中的频率 (GH 52536)
不推荐在 read_csv() 中结合解析的日期时间列以及 keep_date_col 关键字 (GH 55569)
弃用了 DataFrameGroupBy.grouper 和 SeriesGroupBy.grouper；这些属性将在未来版本的 pandas 中移除 (GH 56521)
弃用了 Grouping 属性的 group_index, result_index, 和 group_arraylike；这些将在 pandas 的未来版本中被移除 (GH 56148)
在 read_csv() 和 read_table() 中弃用了 delim_whitespace 关键字，请改用 sep="\\s+" (GH 55569)
废弃了在 to_datetime(), to_timedelta(), 和 to_numeric() 中的 errors="ignore" 选项；请改为显式捕获异常 (GH 54467)
在 Series 构造函数中弃用了 fastpath 关键字 (GH 20110)
在 Series.resample() 和 DataFrame.resample() 中弃用了 kind 关键字，请显式转换对象的 index 代替 (GH 55895)
在 PeriodIndex 中弃用了 ordinal 关键字，请改用 PeriodIndex.from_ordinals() (GH 55960)
在 TimedeltaIndex 构造中弃用了 unit 关键字，请改用 to_timedelta() (GH 55499)
在 read_csv() 和 read_table() 中弃用了 verbose 关键字 (GH 55569)
弃用了 DataFrame.replace() 和 Series.replace() 使用 CategoricalDtype 的行为；在未来的版本中，replace 将更改值同时保留类别。要更改类别，请使用 ser.cat.rename_categories 代替 (GH 55147)
弃用了 Series.value_counts() 和 Index.value_counts() 在对象数据类型下的行为；在未来的版本中，这些方法将不会对结果的 Index 进行数据类型推断，执行 result.index = result.index.infer_objects() 以保留旧的行为 (GH 56161)
在 DataFrame.pivot_table() 中弃用了 observed=False 的默认值；将在未来版本中改为 True (GH 56236)
已弃用扩展测试类 BaseNoReduceTests, BaseBooleanReduceTests, 和 BaseNumericReduceTests, 请改用 BaseReduceTests (GH 54663)
弃用了选项 mode.data_manager 和 ArrayManager；未来版本中将仅提供 BlockManager (GH 55043)
弃用了之前的 DataFrame.stack 实现；指定 future_stack=True 以采用未来版本 (GH 53515)

性能提升#

在 testing.assert_frame_equal() 和 testing.assert_series_equal() 中的性能提升 (GH 55949, GH 55971)
在 axis=1 和未对齐索引的对象中，concat() 的性能改进 (GH 55084)
在 get_dummies() 中的性能提升 (GH 56089)
在按升序键连接时，merge() 和 merge_ordered() 的性能提升 (GH 56115)
当 by 不是 None 时，merge_asof() 的性能改进 (GH 55580, GH 55678)
在 read_stata() 中对包含许多变量的文件的性能改进 (GH 55515)
在使用 pyarrow 时间戳和持续时间数据类型进行聚合时，DataFrame.groupby() 的性能提升 (GH 55031)
当基于未排序的分类索引进行连接时，DataFrame.join() 的性能改进 (GH 56345)
在使用 MultiIndex 进行索引时，DataFrame.loc() 和 Series.loc() 的性能提升 (GH 56062)
在由 MultiIndex 索引时，DataFrame.sort_index() 和 Series.sort_index() 的性能改进 (GH 54835)
在将 DataFrame 转换为字典时，DataFrame.to_dict() 的性能改进 (GH 50990)
在 Index.difference() 中的性能改进 (GH 55108)
当索引已经排序时，Index.sort_values() 的性能提升 (GH 56128)
当 method 不是 None 时，MultiIndex.get_indexer() 的性能改进 (GH 55839)
在 Series.duplicated() 中对 pyarrow dtypes 的性能改进 (GH 55255)
当 dtype 为 "string[pyarrow]" 或 "string[pyarrow_numpy]" 时，Series.str.get_dummies() 的性能提升 (GH 56110)
在 Series.str() 方法中的性能提升 (GH 55736)
在掩码数据类型中，Series.value_counts() 和 Series.mode() 的性能改进 (GH 54984, GH 55340)
在 DataFrameGroupBy.nunique() 和 SeriesGroupBy.nunique() 中的性能提升 (GH 55972)
在 SeriesGroupBy.idxmax(), SeriesGroupBy.idxmin(), DataFrameGroupBy.idxmax(), DataFrameGroupBy.idxmin() 中的性能提升 (GH 54234)
当哈希一个可为空的扩展数组时的性能改进 (GH 56507)
在非唯一索引中进行索引时的性能改进 (GH 55816)
使用超过4个键进行索引时的性能提升 (GH 54550)
将时间本地化为UTC时的性能改进 (GH 55241)

错误修复#

Categorical#

Categorical.isin() 对于包含重叠 Interval 值的分类数据引发 InvalidIndexError (GH 34974)
在 CategoricalDtype.__eq__() 中存在一个错误，对于混合类型的无序分类数据返回 False (GH 55468)
在将 pa.dictionary 转换为 CategoricalDtype 时出现错误，使用 pa.DictionaryArray 作为类别 (GH 56672)

Datetimelike#

在传递 tz 和 dayfirst 或 yearfirst 时，DatetimeIndex 构造中的错误忽略了 dayfirst/yearfirst (GH 55813)
当传递一个浮点对象的对象类型 ndarray 和一个 tz 时，DatetimeIndex 中的错误不正确地本地化结果 (GH 55780)
在 Series.isin() 中存在一个错误，当使用 DatetimeTZDtype dtype 和所有为 NaT 的比较值时，即使序列包含 NaT 条目，也会错误地返回全为 False 的结果 (GH 56427)
当使用 DatetimeTZDtype dtype DataFrame 连接全NA的 DataFrame 时，concat() 中出现的 AttributeError 错误 (GH 52093)
在 testing.assert_extension_array_equal() 中的错误，在比较分辨率时可能会使用错误的单位 (GH 55730)
在传递混合字符串和数字类型的列表时，to_datetime() 和 DatetimeIndex 中的错误不正确地引发 (GH 55780)
在传递混合类型对象时，to_datetime() 和 DatetimeIndex 中的错误，这些对象混合了时区或混合了时区感知，未能引发 ValueError (GH 55693)
在 Tick.delta() 中存在一个错误，当处理非常大的 tick 时，会引发 OverflowError 而不是 OutOfBoundsTimedelta (GH 55503)
在非纳秒分辨率下使用 DatetimeIndex.shift() 的错误，错误地以纳秒分辨率返回 (GH 56117)
在 DatetimeIndex.union() 中存在一个错误，当具有相同时区但单位不同的时区感知索引时，返回对象类型 (GH 55238)
在 Index.is_monotonic_increasing() 和 Index.is_monotonic_decreasing() 中的错误总是将 Index.is_unique() 缓存为 True 当索引中的第一个值是 NaT 时 (GH 55755)
在 Index.view() 中存在一个错误，将 datetime64 dtype 转换为不支持的分辨率时错误地引发 (GH 55710)
在非纳秒分辨率和 NaT 条目的情况下，Series.dt.round() 中的错误不正确地引发 OverflowError (GH 56158)
在 Series.fillna() 中使用非纳秒分辨率的数据类型和高分辨率向量值返回不正确（内部损坏）的结果 (GH 56410)
在具有分钟或小时分辨率和时区偏移的ISO8601格式字符串中，Timestamp.unit() 的错误被错误推断 (GH 56208)
在 .astype 中将高分辨率的 datetime64 数据类型转换为低分辨率的 datetime64 数据类型（例如 datetime64[us]->datetime64[ms]）时，在接近较低实现边界值时会静默溢出（GH 55979）
在将 Week 偏移量添加或减去到 datetime64 Series、Index 或 DataFrame 列时，非纳秒分辨率返回不正确的结果 (GH 55583)
在将 BusinessDay 偏移量与 offset 属性相加或相减时，非纳秒的 Index、Series 或 DataFrame 列出现错误结果 (GH 55608)
在将具有微秒分量的 DateOffset 对象添加或减去到 datetime64 Index、Series 或 DataFrame 列时出现的错误，这些列具有非纳秒分辨率 (GH 55595)
在非常大的 Tick 对象与 Timestamp 或 Timedelta 对象进行加减运算时，出现 OverflowError 而不是 OutOfBoundsTimedelta 的错误 (GH 55503)
在使用非纳秒的 DatetimeTZDtype 创建 Index、Series 或 DataFrame 时，如果输入值在纳秒分辨率下会超出界限，错误地引发 OutOfBoundsDatetime (GH 54620)
在创建 Index、Series 或 DataFrame 时存在一个错误，当使用非纳秒的 ``datetime64``（或 DatetimeTZDtype）从混合数值输入时，将这些数值视为纳秒而不是数据类型单位的倍数（这种情况会在非混合数值输入时发生）(GH 56004)
在创建具有非纳秒 datetime64 数据类型和超出 datetime64[ns] 范围输入的 Index、Series 或 DataFrame 时，错误地引发 OutOfBoundsDatetime 的错误 (GH 55756)
解析具有纳秒分辨率的非ISO8601格式日期时间字符串时，错误地截断亚微秒组件的错误 (GH 56051)
解析带有亚秒分辨率和尾随零的日期时间字符串时，错误地推断秒或毫秒分辨率的问题 (GH 55737)
在使用 unit 与 Timestamp 的逐点结果不匹配的浮点型参数时，to_datetime() 的结果中存在错误 (GH 56037)
修复了 concat() 在连接具有不同分辨率的 datetime64 列时会引发错误的问题 (GH 53641)

Timedelta#

Timedelta 构造中出现错误，引发 OverflowError 而不是 OutOfBoundsTimedelta (GH 55503)
在渲染 (__repr__) 中 TimedeltaIndex 和带有 timedelta64 值的 Series 的错误，这些值具有非纳秒分辨率的条目，且都是 24 小时的倍数，无法使用在纳秒情况下使用的紧凑表示法 (GH 55405)

时区#

在 AbstractHolidayCalendar 中的一个错误，计算假期时区数据未传播 (GH 54580)
在具有模糊值和 pytz 时区未能引发 pytz.AmbiguousTimeError 的 Timestamp 构造中的错误 (GH 55657)
在夏令时期间UTC+0附近使用 nonexistent="shift_forward 时 Timestamp.tz_localize() 的错误 (GH 51501)

Numeric#

Bug in read_csv() with engine="pyarrow" causing rounding errors for large integers (GH 52505)
Series.__floordiv__() 和 Series.__truediv__() 中的错误：对于具有整数类型的 ArrowDtype，在大除数时引发 (GH 56706)
在 ArrowDtype 使用整数类型时，Series.__floordiv__() 中的错误对大值抛出异常 (GH 56645)
Series.pow() 中的错误未正确填充缺失值 (GH 55512)
在 Series.replace() 和 DataFrame.replace() 中匹配浮点数 0.0 与 False 以及反之的错误 (GH 55398)
在 Series.round() 中对可空布尔类型的错误引发 (GH 55936)

转换#

在调用 str 对未序列化的数组进行 DataFrame.astype() 时存在错误 - 数组可能会就地更改 (GH 54654)
在 DataFrame.astype() 中的错误，其中 errors="ignore" 对扩展类型无效 (GH 54654)
在 Series.convert_dtypes() 中的错误未将所有 NA 列转换为 null[pyarrow] (GH 55346)
:meth:DataFrame.loc 中的错误在通过全列设置器（例如 df.loc[:, 'a'] = incompatible_value）分配具有不同 dtype 的 Series 时没有抛出“不兼容 dtype 警告”（参见 PDEP6）（GH 39584）

字符串#

在检查没有元素的对象数组是否为字符串类型时，pandas.api.types.is_string_dtype() 存在错误 (GH 54661)
Bug in DataFrame.apply() failing when engine="numba" and columns or index have StringDtype (GH 56189)
DataFrame.reindex() 中的错误不匹配带有 string[pyarrow_numpy] dtype 的 Index (GH 56106)
Index.str.cat() 中的错误总是将结果转换为对象数据类型 (GH 56157)
在 ArrowDtype 使用 pyarrow.string dtype 和 string[pyarrow] 为 pyarrow 后端时，Series.__mul__() 中的 Bug (GH 51970)
当 start < 0 时，ArrowDtype 使用 pyarrow.string 的 Series.str.find() 中的 Bug (GH 56411)
当 dtype=pandas.ArrowDtype(pyarrow.string())) 时，Series.str.fullmatch() 中的错误允许部分匹配，当正则表达式以文字结尾时 //$ (GH 56652)
当 n < 0 时，ArrowDtype 使用 pyarrow.string 的 Series.str.replace() 中的 Bug (GH 56404)
Series.str.startswith() 和 Series.str.endswith() 在为 ArrowDtype 使用 pyarrow.string dtype 时，对于类型为 tuple[str, ...] 的参数存在错误 (GH 56579)
Series.str.startswith() 和 Series.str.endswith() 在 string[pyarrow] 中使用 tuple[str, ...] 类型的参数时存在错误 (GH 54942)
dtype="string[pyarrow_numpy]" 的比较操作中的错误：如果无法比较dtypes则引发 (GH 56008)

Interval#

Interval 的 __repr__ 中没有显示 Timestamp 边界的 UTC 偏移。此外，现在将显示小时、分钟和秒组件 (GH 55015)
在 IntervalIndex.factorize() 和 Series.factorize() 中存在一个错误，当使用 IntervalDtype 处理 datetime64 或 timedelta64 区间时，未能保留非纳秒单位 (GH 56099)
当传递 datetime64 或 timedelta64 数组时，IntervalIndex.from_arrays() 中的错误，这些数组的分辨率不匹配，构造了一个无效的 IntervalArray 对象 (GH 55714)
在 IntervalIndex.from_tuples() 中存在一个错误，如果子类型是可空扩展数据类型，则会引发异常 (GH 56765)
在 IntervalIndex.get_indexer() 中存在一个错误，datetime 或 timedelta 区间错误地匹配整数目标 (GH 47772)
在 IntervalIndex.get_indexer() 中存在一个错误，即带时区的日期时间间隔错误地匹配了一组不带时区的目标 (GH 47772)
在 Series 上使用 IntervalIndex 设置值时，使用切片不正确地引发错误 (GH 54722)

索引#

在 DataFrame 具有 MultiIndex 时，DataFrame.loc() 中的错误会改变布尔索引器 (GH 56635)
当设置具有扩展数据类型的 Series 到 NumPy 数据类型时，DataFrame.loc() 中的错误 (GH 55604)
在 Index.difference() 中的错误，当 other 为空或 other 被认为不可比较时，未返回唯一值集合 (GH 55113)
在将 Categorical 值设置到具有 numpy dtypes 的 DataFrame 中时出现 RecursionError 的错误 (GH 52927)
在设置单个字符串值时创建新列时修复了缺失值的错误 (GH 56204)

缺失#

DataFrame.update() 中的错误不会就地更新 tz-aware datetime64 dtypes (GH 56227)

MultiIndex#

MultiIndex.get_indexer() 中的错误，当提供 method 且索引非单调时未引发 ValueError (GH 53452)

I/O#

Bug in read_csv() where engine="python" did not respect chunksize arg when skiprows was specified (GH 56323) 的中文翻译为：
Bug in read_csv() where engine="python" was causing a TypeError when a callable skiprows and a chunk size was specified (GH 55677) 的中文翻译结果为：
在 read_csv() 中的错误，其中 on_bad_lines="warn" 会写入 stderr 而不是引发 Python 警告；现在这会产生一个 errors.ParserWarning (GH 54296)
在 engine="pyarrow" 下 read_csv() 中的 quotechar 被忽略的错误 (GH 52266)
Bug in read_csv() with engine="pyarrow" where usecols wasn’t working with a CSV with no headers (GH 54459)
Bug in read_excel(), with engine="xlrd" (xls files) erroring when the file contains NaN or Inf (GH 54564) 的中文翻译结果为：
在 read_json() 中的错误，如果设置了 infer_string ，则无法正确处理 dtype 转换 (GH 56195)
在 DataFrame.to_excel() 中的错误，使用 OdsWriter (ods 文件) 写入布尔/字符串值 (GH 54994)
在 DataFrame.to_hdf() 和 read_hdf() 中存在一个错误，当使用 datetime64 数据类型且分辨率不是纳秒时，无法正确往返 (GH 55622)
在 DataFrame.to_stata() 中对扩展数据类型引发错误的缺陷 (GH 54671)
Bug in read_excel() with engine="odf" (ods files) when a string cell contains an annotation (GH 55200)
在 read_excel() 中使用没有缓存格式化单元格的 ODS 文件处理浮点值的错误 (GH 55219)
在 DataFrame.to_json() 中，对于不支持的 NumPy 类型会引发 OverflowError 而不是 TypeError 的错误 (GH 55403)

周期#

当传递 data、ordinal 和 **fields 中的多个时，PeriodIndex 构造中的错误未能引发 ValueError (GH 55961)
在 Period 加法中，错误地静默环绕而不是引发 OverflowError (GH 55503)
在从 PeriodDtype 使用 astype 转换为 datetime64 或 DatetimeTZDtype 时，如果单位不是纳秒，错误地返回纳秒单位 (GH 55958)

绘图#

在 vert=False 和使用 sharey=True 创建的 Matplotlib Axes 时，DataFrame.plot.box() 中的 Bug (GH 54941)
DataFrame.plot.scatter() 中的错误丢弃字符串列 (GH 56142)
当重用一个 ax 对象时，Series.plot() 中的错误在传递一个 how 关键字时未能引发 (GH 55953)

分组/重采样/滚动#

DataFrameGroupBy.idxmin(), DataFrameGroupBy.idxmax(), SeriesGroupBy.idxmin(), 和 SeriesGroupBy.idxmax() 中的错误在索引是包含NA值的 CategoricalIndex 时不会保留 Categorical 数据类型 (GH 54234)
Bug in DataFrameGroupBy.transform() and SeriesGroupBy.transform() when observed=False and f="idxmin" or f="idxmax" would incorrectly raise on unobserved categories (GH 54234)
DataFrameGroupBy.value_counts() 和 SeriesGroupBy.value_counts() 中的错误可能会导致如果 DataFrame 的列或 Series 的名称是整数时排序不正确 (GH 55951)
DataFrameGroupBy.value_counts() 和 SeriesGroupBy.value_counts() 中的错误不会在 DataFrame.groupby() 和 Series.groupby() 中尊重 sort=False (GH 55951)
DataFrameGroupBy.value_counts() 和 SeriesGroupBy.value_counts() 中的错误会在 sort=True 和 normalize=True 时按比例而不是频率排序 (GH 55951)
在具有非纳秒分辨率的 DatetimeIndex 中，DataFrame.asfreq() 和 Series.asfreq() 存在错误，错误地将分辨率转换为纳秒分辨率 (GH 55958)
当传递带有非纳秒 datetime64 或 DatetimeTZDtype 数据类型的 times 时，DataFrame.ewm() 中的错误 (GH 56262)
在 DataFrame.groupby() 和 Series.groupby() 中的一个错误，当按 Decimal 和 NA 值的组合进行分组时，如果 sort=True 则会失败 (GH 54847)
在选择列的子集来应用函数时，DataFrame 子类中的 DataFrame.groupby() 存在错误 (GH 56761)
在 DataFrame.resample() 中存在一个错误，不尊重 BusinessDay 的 closed 和 label 参数 (GH 55282)
当对 ArrowDtype 类型的 pyarrow.timestamp 或 pyarrow.duration 进行重采样时，DataFrame.resample() 中的错误 (GH 55989)
在 DataFrame.resample() 中的错误，其中分箱边缘对于 BusinessDay 不正确 (GH 55281)
在 DataFrame.resample() 中的错误，其中分箱边缘对于 MonthBegin 不正确 (GH 55271)
在 DataFrame.rolling() 和 Series.rolling() 中的一个错误，其中重复的类似日期时间的索引在 closed='left' 和 closed='neither' 的情况下被视为连续而不是相等 (GH 20712)
在 DataFrame.rolling() 和 Series.rolling() 中的错误，其中 index 或 on 列是带有 pyarrow.timestamp 类型的 ArrowDtype (GH 55849)

Reshaping#

在传递 DatetimeIndex 索引时，concat() 忽略 sort 参数的错误 (GH 54769)
在 ignore_index=False 时，concat() 重命名 Series 的错误 (GH 15047)
当 by 的 dtype 不是 object、int64 或 uint64 时，merge_asof() 引发 TypeError 的错误 (GH 22794)
在 merge_asof() 中对字符串数据类型引发不正确错误的错误 (GH 56444)
在使用 ArrowDtype 列时，当使用 Timedelta 容差时，merge_asof() 中的错误 (GH 56486)
在 merge() 中合并 datetime 列和 timedelta 列时未引发错误 (GH 56455)
在 merge() 中合并字符串列和数值列时未引发错误的错误 (GH 56441)
在 merge() 中对新字符串数据类型排序的错误 (GH 56442)
当左和/或右为空时，merge() 中的错误导致列以不正确的顺序返回 (GH 51929)
在 DataFrame.melt() 中的一个错误，当 var_name 不是字符串时会引发异常 (GH 55948)
在 DataFrame.melt() 中的一个错误，它不会保留日期时间 (GH 55254)
在 DataFrame.pivot_table() 中的错误，当列有数字名称时，行边距不正确 (GH 26568)
在 DataFrame.pivot() 中使用数值列和数据扩展类型时出现的错误 (GH 56528)
在 future_stack=True 的情况下，DataFrame.stack() 中的错误不会保留索引中的 NA 值 (GH 56573)

Sparse#

在使用不同于数组填充值的填充值时，arrays.SparseArray.take() 中的错误 (GH 55181)

其他#

DataFrame.__dataframe__() 不支持 pyarrow 大字符串 (GH 56702)
在格式化百分位数时，DataFrame.describe() 中的错误导致结果中的百分位数 99.999% 被四舍五入为 100% (GH 55765)
在 api.interchange.from_dataframe() 中的错误，当处理空字符串列时引发 NotImplementedError (GH 56703)
在 cut() 和 qcut() 中使用 datetime64 数据类型的值且单位不是纳秒时，错误地返回纳秒单位的箱子 (GH 56101)
在 cut() 中的错误，错误地允许使用时区无知的分箱切割时区感知的日期时间 (GH 54964)
在 infer_freq() 和 DatetimeIndex.inferred_freq() 中，每周频率和非纳秒分辨率存在错误 (GH 55609)
在 DataFrame.apply() 中的一个错误，当传递 raw=True 时忽略了传递给应用函数的 args (GH 55009)
DataFrame.from_dict() 中的一个错误，该错误总是会对创建的 DataFrame 的行进行排序。（GH 55683）
在传递 axis="columns" 和 ignore_index=True 时，DataFrame.sort_index() 中的错误引发 ValueError (GH 56478)
在启用 use_inf_as_na 选项的情况下，在 DataFrame 内渲染 inf 值时出现的错误 (GH 55483)
在渲染带有 MultiIndex 的 Series 时，当其中一个索引级别的名称是 0 时，不会显示该名称 (GH 55415)
当将一个空的 DataFrame 赋值给一列时，错误信息中的Bug (GH 55956)
当类似时间的字符串被转换为带有 pyarrow.time64 类型的 ArrowDtype 时出现错误 (GH 56463)
修复了在 engine="numba" 下通过 core.window.Rolling.apply 传递 numpy ufunc 时，numba >= 0.58.0 产生的虚假弃用警告 (GH 55247)

贡献者#

总共有162人为此版本贡献了补丁。名字旁边有“+”的人首次贡献了补丁。

AG
Aaron Rahman +
Abdullah Ihsan Secer +
Abhijit Deo +
Adrian D’Alessandro
Ahmad Mustafa Anis +
Amanda Bizzinotto
Amith KK +
Aniket Patil +
Antonio Fonseca +
Artur Barseghyan
Ben Greiner
Bill Blum +
Boyd Kane
Damian Kula
Dan King +
Daniel Weindl +
Daniele Nicolodi
David Poznik
David Toneian +
Dea María Léon
Deepak George +
Dmitriy +
Dominique Garmier +
Donald Thevalingam +
Doug Davis +
Dukastlik +
Elahe Sharifi +
Eric Han +
Fangchen Li
Francisco Alfaro +
Gadea Autric +
Guillaume Lemaitre
Hadi Abdi Khojasteh
Hedeer El Showk +
Huanghz2001 +
Isaac Virshup
Issam +
Itay Azolay +
Itayazolay +
Jaca +
Jack McIvor +
JackCollins91 +
James Spencer +
Jay
Jessica Greene
Jirka Borovec +
JohannaTrost +
John C +
Joris Van den Bossche
José Lucas Mayer +
José Lucas Silva Mayer +
João Andrade +
Kai Mühlbauer
Katharina Tielking, MD +
Kazuto Haruguchi +
Kevin
Lawrence Mitchell
Linus +
Linus Sommer +
Louis-Émile Robitaille +
Luke Manley
Lumberbot (aka Jack)
Maggie Liu +
MainHanzo +
Marc Garcia
Marco Edward Gorelli
MarcoGorelli
Martin Šícho +
Mateusz Sokół
Matheus Felipe +
Matthew Roeschke
Matthias Bussonnier
Maxwell Bileschi +
Michael Tiemann
Michał Górny
Molly Bowers +
Moritz Schubert +
NNLNR +
Natalia Mokeeva
Nils Müller-Wendt +
Omar Elbaz
Pandas Development Team
Paras Gupta +
Parthi
Patrick Hoefler
Paul Pellissier +
Paul Uhlenbruck +
Philip Meier
Philippe THOMY +
Quang Nguyễn
Raghav
Rajat Subhra Mukherjee
Ralf Gommers
Randolf Scholz +
Richard Shadrach
Rob +
Rohan Jain +
Ryan Gibson +
Sai-Suraj-27 +
Samuel Oranyeli +
Sara Bonati +
Sebastian Berg
Sergey Zakharov +
Shyamala Venkatakrishnan +
StEmGeo +
Stefanie Molin
Stijn de Gooijer +
Thiago Gariani +
Thomas A Caswell
Thomas Baumann +
Thomas Guillet +
Thomas Lazarus +
Thomas Li
Tim Hoffmann
Tim Swast
Tom Augspurger
Toro +
Torsten Wörtwein
Ville Aikas +
Vinita Parasrampuria +
Vyas Ramasubramani +
William Andrea
William Ayd
Willian Wang +
Xiao Yuan
Yao Xiao
Yves Delley
Zemux1613 +
Ziad Kermadi +
aaron-robeson-8451 +
aram-cinnamon +
caneff +
ccccjone +
chris-caballero +
cobalt
color455nm +
denisrei +
dependabot[bot]
jbrockmendel
jfadia +
johanna.trost +
kgmuzungu +
mecopur +
mhb143 +
morotti +
mvirts +
omar-elbaz
paulreece
pre-commit-ci[bot]
raj-thapa
rebecca-palmer
rmhowe425
rohanjain101
shiersansi +
smij720
srkds +
taytzehao
torext
vboxuser +
xzmeng +
yashb +