1.4.0 中的新功能 (2022年1月22日)#

这是 pandas 1.4.0 中的更改。请参阅发布以获取包括其他版本 pandas 的完整更新日志。

增强功能#

改进的警告信息#

之前，警告信息可能指向了 pandas 库中的某些行。运行脚本 setting_with_copy_warning.py

import pandas as pd

df = pd.DataFrame({'a': [1, 2, 3]})
df[:2].loc[:, 'a'] = 5

使用 pandas 1.3 导致:

.../site-packages/pandas/core/indexing.py:1951: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.

这使得很难确定警告是从哪里生成的。现在，pandas 将检查调用堆栈，报告引发警告的第一个不在 pandas 库内的行。上述脚本的输出现在为:

setting_with_copy_warning.py:4: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.

索引可以包含任意的 ExtensionArrays#

到目前为止，将自定义的 ExtensionArray 传递给 pd.Index 会将数组转换为 object dtype。现在 Index 可以直接持有任意的 ExtensionArrays (GH 43930)。

以前的行为:

In [1]: arr = pd.array([1, 2, pd.NA])

In [2]: idx = pd.Index(arr)

在旧的行为中，idx 将是对象数据类型：

以前的行为:

In [1]: idx
Out[1]: Index([1, 2, <NA>], dtype='object')

在新行为中，我们保持原始的 dtype：

新行为:

In [3]: idx
Out[3]: Index([1, 2, <NA>], dtype='Int64')

此规则的一个例外是 SparseArray，它将继续转换为 numpy 数据类型，直到 pandas 2.0。届时它将像其他 ExtensionArrays 一样保留其数据类型。

Styler#

Styler 在 1.4.0 版本中得到了进一步的发展。以下是一些总体上的增强：

索引的样式和格式化已经添加，使用 Styler.apply_index()、Styler.applymap_index() 和 Styler.format_index()。这些方法反映了用于样式和格式化数据值的方法的签名，并且适用于 HTML、LaTeX 和 Excel 格式 (GH 41893, GH 43101, GH 41993, GH 41995)

新的方法 Styler.hide() 弃用了 Styler.hide_index() 和 Styler.hide_columns() (GH 43758)

关键词参数 level 和 names 已添加到 Styler.hide() （以及隐式添加到已弃用的方法 Styler.hide_index() 和 Styler.hide_columns()），以增加对 MultiIndexes 和 Index 名称可见性的控制 (GH 25475, GH 43404, GH 43346)

Styler.export() 和 Styler.use() 已经更新，以解决从 v1.2.0 和 v1.3.0 添加的所有功能 (GH 40675)

pd.options.styler 类别下的全局选项已扩展，以配置默认 Styler 属性，这些属性涉及格式化、编码以及 HTML 和 LaTeX 渲染。请注意，以前 Styler 依赖于 display.html.use_mathjax，现在已被 styler.html.mathjax 取代 (GH 41395)

验证某些关键字参数，例如 caption (GH 43368)

如下记录的各种错误修复

此外，还有一些针对 HTML 特定渲染的增强：

Styler.bar() 引入了额外的参数来控制对齐和显示 (GH 26070, GH 36419)，并且它还验证输入参数 width 和 height (GH 42511)

Styler.to_html() 引入了关键字参数 sparse_index、sparse_columns、bold_headers、caption、max_rows 和 max_columns (GH 41946, GH 43149, GH 42972)

Styler.to_html() 省略了隐藏表格元素的 CSSStyle 规则，以提高性能 (GH 43619)

自定义 CSS 类现在可以直接指定，无需字符串替换 (GH 43686)

通过一个新的 hyperlinks 格式化关键字参数自动渲染超链接 (GH 45058)

还有一些特定于 LaTeX 的增强功能：

Styler.to_latex() 引入了关键字参数 environment，该参数还允许通过单独的 jinja2 模板指定 “longtable” 条目 (GH 41866)

现在可以对 LaTeX 进行简单的稀疏化，而无需包含 multirow 包 (GH 43369)

cline 支持已通过关键字参数添加，用于 MultiIndex 行稀疏化（GH 45138）

基于 pyarrow 的新 CSV 引擎的多线程 CSV 读取#

pandas.read_csv() 现在接受 engine="pyarrow"``（需要至少 ``pyarrow 1.0.1）作为参数，允许在安装了 pyarrow 的多核机器上进行更快的 csv 解析。更多信息请参见 I/O 文档。(GH 23697, GH 43706)

滚动和扩展窗口的排名函数#

在 Rolling 和 Expanding 中添加了 rank 函数。新函数支持 DataFrame.rank() 的 method、ascending 和 pct 标志。method 参数支持 min、max 和 average 排名方法。示例：

In [4]: s = pd.Series([1, 4, 2, 3, 5, 3])

In [5]: s.rolling(3).rank()
Out[5]: 
0    NaN
1    NaN
2    2.0
3    2.0
4    3.0
5    1.5
dtype: float64

In [6]: s.rolling(3).rank(method="max")
Out[6]: 
0    NaN
1    NaN
2    2.0
3    2.0
4    3.0
5    2.0
dtype: float64

按位置索引分组#

现在可以指定相对于每个组末端的位置范围。

对于 DataFrameGroupBy.head()、SeriesGroupBy.head()、DataFrameGroupBy.tail() 和 SeriesGroupBy.tail() 的负参数现在可以正确工作，并分别产生相对于每个组末尾和开头的范围。以前，负参数返回空帧。

In [7]: df = pd.DataFrame([["g", "g0"], ["g", "g1"], ["g", "g2"], ["g", "g3"],
   ...:                    ["h", "h0"], ["h", "h1"]], columns=["A", "B"])
   ...: 

In [8]: df.groupby("A").head(-1)
Out[8]: 
   A   B
0  g  g0
1  g  g1
2  g  g2
4  h  h0

DataFrameGroupBy.nth() 和 SeriesGroupBy.nth() 现在接受一个切片或整数和切片的列表。

In [9]: df.groupby("A").nth(slice(1, -1))
Out[9]: 
   A   B
1  g  g1
2  g  g2

In [10]: df.groupby("A").nth([slice(None, 1), slice(-1, None)])
Out[10]: 
   A   B
0  g  g0
3  g  g3
4  h  h0
5  h  h1

DataFrameGroupBy.nth() 和 SeriesGroupBy.nth() 现在接受索引表示法。

In [11]: df.groupby("A").nth[1, -1]
Out[11]: 
   A   B
1  g  g1
3  g  g3
5  h  h1

In [12]: df.groupby("A").nth[1:-1]
Out[12]: 
   A   B
1  g  g1
2  g  g2

In [13]: df.groupby("A").nth[:1, -1:]
Out[13]: 
   A   B
0  g  g0
3  g  g3
4  h  h0
5  h  h1

DataFrame.from_dict 和 DataFrame.to_dict 有了新的 `'tight'` 选项#

一个新的 'tight' 字典格式，保留 MultiIndex 条目和名称，现在可以通过 DataFrame.from_dict() 和 DataFrame.to_dict() 方法使用，并且可以与标准的 json 库一起使用，以生成 DataFrame 对象的紧凑表示 (GH 4889)。

In [14]: df = pd.DataFrame.from_records(
   ....:     [[1, 3], [2, 4]],
   ....:     index=pd.MultiIndex.from_tuples([("a", "b"), ("a", "c")],
   ....:                                     names=["n1", "n2"]),
   ....:     columns=pd.MultiIndex.from_tuples([("x", 1), ("y", 2)],
   ....:                                       names=["z1", "z2"]),
   ....: )
   ....: 

In [15]: df
Out[15]: 
z1     x  y
z2     1  2
n1 n2      
a  b   1  3
   c   2  4

In [16]: df.to_dict(orient='tight')
Out[16]: 
{'index': [('a', 'b'), ('a', 'c')],
 'columns': [('x', 1), ('y', 2)],
 'data': [[1, 3], [2, 4]],
 'index_names': ['n1', 'n2'],
 'column_names': ['z1', 'z2']}

其他增强功能#

concat() 将在所有对象的 attrs 相同时保留 attrs，在 attrs 不同时丢弃 attrs (GH 41828)
DataFrameGroupBy 操作在 as_index=False 时现在正确地保留了被分组列的 ExtensionDtype 数据类型 (GH 41373)
在 DataFrame.plot.hist() 和 DataFrame.plot.box() 中添加对 by 参数赋值的支持 (GH 15079)
Series.sample(), DataFrame.sample(), DataFrameGroupBy.sample(), 和 SeriesGroupBy.sample() 现在接受 np.random.Generator 作为 random_state 的输入。生成器在 replace=False 时会更具性能（GH 38100）
Series.ewm() 和 DataFrame.ewm() 现在支持一个 method 参数，该参数带有一个 'table' 选项，可以在整个 DataFrame 上执行窗口操作。有关性能和功能优势，请参见窗口概述 (GH 42273)
DataFrameGroupBy.cummin(), SeriesGroupBy.cummin(), DataFrameGroupBy.cummax(), 和 SeriesGroupBy.cummax() 现在支持参数 skipna (GH 34047)
read_table() 现在支持 storage_options 参数 (GH 39167)
DataFrame.to_stata() 和 StataWriter() 现在接受仅关键字参数 value_labels 来保存非分类列的标签 (GH 38454)
依赖于基于哈希映射算法的方法，例如 DataFrameGroupBy.value_counts()、DataFrameGroupBy.count() 和 factorize()，忽略了复数的虚部 (GH 17927)
添加在 Python 3.9 中引入的 Series.str.removeprefix() 和 Series.str.removesuffix() 以从字符串类型的 Series 中移除前缀/后缀 (GH 36944)
尝试使用 DataFrame.to_csv(), DataFrame.to_html(), DataFrame.to_excel(), DataFrame.to_feather(), DataFrame.to_parquet(), DataFrame.to_stata(), DataFrame.to_json(), DataFrame.to_pickle(), 和 DataFrame.to_xml() 写入丢失父目录的文件时，现在会明确提到丢失的父目录，对于 Series 的对应方法也是如此 (GH 24306)
使用 .loc 和 .iloc 进行索引现在支持 Ellipsis (GH 37750)
IntegerArray.all() , IntegerArray.any(), FloatingArray.any(), 和 FloatingArray.all() 使用 Kleene 逻辑 (GH 41967)
在 DataFrame.to_stata()、StataWriter、StataWriter117 和 StataWriterUTF8 中增加了对可空布尔和整数类型的支持 (GH 40855)
DataFrame.__pos__() 和 DataFrame.__neg__() 现在保留 ExtensionDtype 数据类型 (GH 43883)
当一个可选依赖项现在无法导入时引发的错误现在包括原始异常，以便于调查 (GH 43882)
添加了 ExponentialMovingWindow.sum() (GH 13297)
Series.str.split() 现在支持一个 regex 参数，该参数明确指定模式是否为正则表达式。默认值为 None (GH 43563, GH 32835, GH 25549)
DataFrame.dropna() 现在接受单个标签作为 subset 以及类似数组的对象 (GH 41021)
添加了 DataFrameGroupBy.value_counts() (GH 43564)
read_csv() 现在在 engine="python" 时，on_bad_lines 中接受一个 callable 函数用于自定义处理坏行 (GH 5686)
ExcelWriter 参数 if_sheet_exists="overlay" 选项已添加 (GH 40231)
read_excel() 现在接受一个 decimal 参数，允许用户在将字符串列解析为数值时指定小数点 (GH 14403)
DataFrameGroupBy.mean(), SeriesGroupBy.mean(), DataFrameGroupBy.std(), SeriesGroupBy.std(), DataFrameGroupBy.var(), SeriesGroupBy.var(), DataFrameGroupBy.sum(), 和 SeriesGroupBy.sum() 现在支持使用 engine 关键字的 Numba 执行 (GH 43731, GH 44862, GH 44939)
Timestamp.isoformat() 现在处理来自基础 datetime 类的 timespec 参数 (GH 26131)
NaT.to_numpy() dtype 参数现在被尊重，因此可以返回 np.timedelta64 (GH 44460)
新选项 display.max_dir_items 自定义添加到 Dataframe.__dir__() 的列数，并建议用于制表符补全 (GH 37996)
在 USFederalHolidayCalendar 中添加了“六月独立日” (GH 44574)
Rolling.var(), Expanding.var(), Rolling.std(), 和 Expanding.std() 现在支持使用 engine 关键字进行 Numba 执行 (GH 44461)
Series.info() 已添加，以兼容 DataFrame.info() (GH 5167)
实现了 IntervalArray.min() 和 IntervalArray.max()，因此 min 和 max 现在可以用于 IntervalIndex、Series 和 DataFrame 带有 IntervalDtype (GH 44746)
UInt64Index.map() 现在尽可能保留 dtype (GH 44609)
read_json() 现在可以解析无符号长长整数 (GH 26068)
DataFrame.take() 现在在传递标量作为索引器时会引发 TypeError (GH 42875)
is_list_like() 现在将鸭子数组识别为类列表，除非 .ndim == 0 (GH 35131)
ExtensionDtype 和 ExtensionArray 现在在以 orient='table' 使用 DataFrame.to_json() 导出 DataFrame 时被（反）序列化 (GH 20612, GH 44705)
为 DataFrame.to_pickle()/read_pickle() 及其相关功能添加对 Zstandard 压缩的支持 (GH 43925)
DataFrame.to_sql() 现在返回一个写入行数的 int (GH 23998)

值得注意的错误修复#

这些是可能具有显著行为变化的错误修复。

不一致的日期字符串解析#

dayfirst 选项的 to_datetime() 不是严格的，这可能导致令人惊讶的行为：

In [17]: pd.to_datetime(["31-12-2021"], dayfirst=False)
Out[17]: DatetimeIndex(['2021-12-31'], dtype='datetime64[s]', freq=None)

现在，如果无法根据给定的 dayfirst 值解析日期字符串，则会引发警告，当该值是分隔的日期字符串时（例如 31-12-2012）。

在合并时忽略具有空值或全为NA的列中的dtypes#

备注

此行为变更已在 pandas 1.4.3 中恢复。

当使用 concat() 来连接两个或更多 DataFrame 对象时，如果其中一个 DataFrame 是空的或有全部为 NA 的值，其 dtype 在查找连接的 dtype 时有时会被忽略。现在这些情况将一致地不被忽略 (GH 43507)。

In [3]: df1 = pd.DataFrame({"bar": [pd.Timestamp("2013-01-01")]}, index=range(1))
In [4]: df2 = pd.DataFrame({"bar": np.nan}, index=range(1, 2))
In [5]: res = pd.concat([df1, df2])

之前，df2 中的 float-dtype 会被忽略，因此结果 dtype 会是 datetime64[ns]。因此，np.nan 会被转换为 NaT。

以前的行为:

In [6]: res
Out[6]:
         bar
0 2013-01-01
1        NaT

现在尊重了 float-dtype。由于这些 DataFrame 的公共 dtype 是对象，因此保留了 np.nan。

新行为:

In [6]: res
Out[6]:
                   bar
0  2013-01-01 00:00:00
1                  NaN

在 value_counts 和 mode 中，空值不再被强制转换为 NaN 值#

Series.value_counts() 和 Series.mode() 不再将 None、NaT 和其他空值强制转换为 np.object_-dtype 的 NaN 值。此行为现在与 unique、isin 和其他行为一致 (GH 42688)。

In [18]: s = pd.Series([True, None, pd.NaT, None, pd.NaT, None])

In [19]: res = s.value_counts(dropna=False)

之前，所有空值都被替换为一个 NaN 值。

以前的行为:

In [3]: res
Out[3]:
NaN     5
True    1
dtype: int64

现在空值不再被混淆。

新行为:

In [20]: res
Out[20]: 
None    3
NaT     2
True    1
Name: count, dtype: int64

在 read_csv 中的 mangle_dupe_cols 不再重命名与目标名称冲突的唯一列#

read_csv() 不再重命名与重复列的目标名称冲突的唯一列标签。已存在的列被跳过，即使用下一个可用的索引作为目标列名称 (GH 14704)。

In [21]: import io

In [22]: data = "a,a,a.1\n1,2,3"

In [23]: res = pd.read_csv(io.StringIO(data))

之前，第二列被称为 a.1，而第三列也被重命名为 a.1.1。

以前的行为:

In [3]: res
Out[3]:
    a  a.1  a.1.1
0   1    2      3

现在，在更改第二列的名称时，重命名检查 a.1 是否已经存在，并跳过这个索引。第二列被重命名为 a.2。

新行为:

In [24]: res
Out[24]: 
   a  a.2  a.1
0  1    2    3

unstack 和 pivot_table 不再对结果超出 int32 限制的情况引发 ValueError#

之前 DataFrame.pivot_table() 和 DataFrame.unstack() 如果操作可能产生超过 2**31 - 1 个元素的结果，会引发一个 ValueError。现在这个操作会引发一个 errors.PerformanceWarning 代替 (GH 26314)。

以前的行为:

In [3]: df = DataFrame({"ind1": np.arange(2 ** 16), "ind2": np.arange(2 ** 16), "count": 0})
In [4]: df.pivot_table(index="ind1", columns="ind2", values="count", aggfunc="count")
ValueError: Unstacked DataFrame is too big, causing int32 overflow

新行为:

In [4]: df.pivot_table(index="ind1", columns="ind2", values="count", aggfunc="count")
PerformanceWarning: The following operation may generate 4294967296 cells in the resulting pandas object.

groupby.apply 一致变换检测#

DataFrameGroupBy.apply() 和 SeriesGroupBy.apply() 设计得非常灵活，允许用户执行聚合、转换、过滤，并使用用户定义的函数，这些函数可能不属于这些类别中的任何一个。作为其中的一部分，apply 将尝试检测操作是否是转换，并且在这种情况下的结果将具有与输入相同的索引。为了确定操作是否是转换，pandas 比较输入的索引与结果的索引，并确定它是否已被变异。在 pandas 1.3 之前的版本中，不同的代码路径使用了不同的“变异”定义：有些会使用 Python 的 is，而其他一些只会测试到相等性。

这种不一致性已被移除，pandas 现在测试到相等性。

In [25]: def func(x):
   ....:     return x.copy()
   ....: 

In [26]: df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [5, 6]})

In [27]: df
Out[27]: 
   a  b  c
0  1  3  5
1  2  4  6

以前的行为:

In [3]: df.groupby(['a']).apply(func)
Out[3]:
     a  b  c
a
1 0  1  3  5
2 1  2  4  6

In [4]: df.set_index(['a', 'b']).groupby(['a']).apply(func)
Out[4]:
     c
a b
1 3  5
2 4  6

在上面的例子中，第一个使用了一个代码路径，其中 pandas 使用 is 并且确定 func 不是一个变换，而第二个测试到相等性并确定 func 是一个变换。在第一种情况下，结果的索引与输入的不同。

新行为:

In [5]: df.groupby(['a']).apply(func)
Out[5]:
   a  b  c
0  1  3  5
1  2  4  6

In [6]: df.set_index(['a', 'b']).groupby(['a']).apply(func)
Out[6]:
     c
a b
1 3  5
2 4  6

在两种情况下，都确定 func 是一个转换。在每种情况下，结果的索引与输入相同。

向后不兼容的 API 变化#

增加 Python 的最小版本#

pandas 1.4.0 支持 Python 3.8 及以上版本。

增加了依赖项的最低版本要求#

一些依赖项的最低支持版本已更新。如果已安装，我们现在要求：

包	最低版本	必需的	Changed
numpy	1.18.5	X	X
pytz	2020.1	X	X
python-dateutil	2.8.1	X	X
瓶颈	1.3.1		X
numexpr	2.7.1		X
pytest (开发版)	6.0
mypy (dev)	0.930		X

对于可选库，一般的建议是使用最新版本。下表列出了每个库在pandas开发过程中当前测试的最低版本。低于最低测试版本的可选库可能仍然有效，但不被视为支持。

包	最低版本	Changed
beautifulsoup4	4.8.2	X
fastparquet	0.4.0
fsspec	0.7.4
gcsfs	0.6.0
lxml	4.5.0	X
matplotlib	3.3.2	X
numba	0.50.1	X
openpyxl	3.0.3	X
pandas-gbq	0.14.0	X
pyarrow	1.0.1	X
pymysql	0.10.1	X
pytables	3.6.1	X
s3fs	0.4.0
scipy	1.4.1	X
sqlalchemy	1.4.0	X
tabulate	0.8.7
xarray	0.15.1	X
xlrd	2.0.1	X
xlsxwriter	1.2.2	X
xlwt	1.3.0

更多信息请参见依赖项和可选依赖项。

其他 API 更改#

Index.get_indexer_for() 不再接受关键字参数（除了 target）；在过去，如果索引不是唯一的，这些参数会被静默忽略（GH 42310）
由于文档字符串的更改，DataFrame.to_string() 中 min_rows 参数的位置发生了变化 (GH 44304)
对于 DataFrame 或 Series 的归约操作，当为 skipna 传递 None 时，现在会引发 ValueError (GH 44178)
read_csv() 和 read_html() 当其中一个标题行仅由 Unnamed: 列组成时不再引发错误 (GH 13054)
在 USFederalHolidayCalendar 中更改了几个假日的 name 属性，以匹配官方联邦假日名称具体如下：
- “元旦” 获得所有格撇号
- “Presidents Day” 变成 “华盛顿诞辰日”
- “Martin Luther King Jr. Day” 现在是 “马丁·路德·金纪念日”
- “7月4日” 现在是 “独立日”
- “Thanksgiving” 现在是 “感恩节”
- “Christmas” 现在是 “圣诞节”
- 添加了“六月独立日”

弃用#

已弃用 Int64Index, UInt64Index & Float64Index#

Int64Index、UInt64Index 和 Float64Index 已被弃用，取而代之的是基础的 Index 类，并将在 pandas 2.0 中移除 (GH 43028)。

对于构建一个数值索引，你可以使用基础的 Index 类来代替指定数据类型（这在旧版本的 pandas 中也能工作）：

# replace
pd.Int64Index([1, 2, 3])
# with
pd.Index([1, 2, 3], dtype="int64")

要检查索引对象的数据类型，可以将 isinstance 检查替换为检查 dtype：

# replace
isinstance(idx, pd.Int64Index)
# with
idx.dtype == "int64"

目前，为了保持向后兼容性，对 Index 的调用将继续在给定数值数据时返回 Int64Index 、 UInt64Index 和 Float64Index ，但在未来，将返回一个 Index 。

当前行为:

In [1]: pd.Index([1, 2, 3], dtype="int32")
Out [1]: Int64Index([1, 2, 3], dtype='int64')
In [1]: pd.Index([1, 2, 3], dtype="uint64")
Out [1]: UInt64Index([1, 2, 3], dtype='uint64')

未来行为:

In [3]: pd.Index([1, 2, 3], dtype="int32")
Out [3]: Index([1, 2, 3], dtype='int32')
In [4]: pd.Index([1, 2, 3], dtype="uint64")
Out [4]: Index([1, 2, 3], dtype='uint64')

已弃用的 DataFrame.append 和 Series.append#

DataFrame.append() 和 Series.append() 已被弃用，并将在未来版本中移除。请改用 pandas.concat() (GH 35407)。

已弃用的语法

In [1]: pd.Series([1, 2]).append(pd.Series([3, 4])
Out [1]:
<stdin>:1: FutureWarning: The series.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
0    1
1    2
0    3
1    4
dtype: int64

In [2]: df1 = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
In [3]: df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))
In [4]: df1.append(df2)
Out [4]:
<stdin>:1: FutureWarning: The series.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
   A  B
0  1  2
1  3  4
0  5  6
1  7  8

推荐的语法

In [28]: pd.concat([pd.Series([1, 2]), pd.Series([3, 4])])
Out[28]: 
0    1
1    2
0    3
1    4
dtype: int64

In [29]: df1 = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))

In [30]: df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))

In [31]: pd.concat([df1, df2])
Out[31]: 
   A  B
0  1  2
1  3  4
0  5  6
1  7  8

其他弃用#

已弃用 Index.is_type_compatible() (GH 42113)
在 Index.get_loc() 中弃用的 method 参数，请使用 index.get_indexer([label], method=...) 代替 (GH 42269)
当索引是 Float64Index 且不包含该键、IntervalIndex 且没有条目包含该键，或 MultiIndex 且前导 Float64Index 级别不包含该键时，弃用将 Series.__setitem__() 中的整数键视为位置 (GH 33469)
当与时区一起传递给 Timestamp 构造函数时，不推荐将 numpy.datetime64 对象视为 UTC 时间。在未来的版本中，这些将被视为本地时间。要保留旧的行为，请使用 Timestamp(dt64).tz_localize("UTC").tz_convert(tz) (GH 24559)
在 MultiIndex 的一个级别上使用标签序列进行索引时，弃用忽略缺失标签的行为 (GH 42351)
创建一个没有 dtype 的空 Series 现在会引发一个更明显的 FutureWarning 而不是 DeprecationWarning (GH 30017)
在 Index.get_slice_bound()、Index.slice_indexer() 和 Index.slice_locs() 中弃用了 kind 参数；在未来的版本中传递 kind 将引发 (GH 42857)
在 Rolling、Expanding 和 EWM 聚合中弃用丢弃烦人列的行为 (GH 42738)
已弃用 Index.reindex() 使用非唯一 Index (GH 42568)
弃用 Styler.render() 而改为使用 Styler.to_html() (GH 42140)
弃用 Styler.hide_index() 和 Styler.hide_columns()，改为使用 Styler.hide() (GH 43758)
在 DataFrame.ewm() 中将字符串列标签传递到 times 已被弃用 (GH 43265)
在 DataFrame.between_time() 中弃用了 include_start 和 include_end 参数；在未来的版本中传递 include_start 或 include_end 将会引发 (GH 40245)
已弃用 squeeze 参数到 read_csv()、read_table() 和 read_excel()。用户应在之后使用 .squeeze("columns") 压缩 DataFrame 代替 (GH 43242)
弃用了 SparseArray 构造中的 index 参数 (GH 23089)
在 date_range() 和 bdate_range() 中弃用了 closed 参数，改为使用 inclusive 参数；在未来的版本中传递 closed 将引发 (GH 40245)
已弃用 Rolling.validate()、Expanding.validate() 和 ExponentialMovingWindow.validate() (GH 43665)
在 Series.transform 和 DataFrame.transform 中，当与字典一起使用时，会引发 TypeError 的列的弃用静默删除 (GH 43740)
在 Series.aggregate()、DataFrame.aggregate()、Series.groupby.aggregate() 和 DataFrame.groupby.aggregate() 中使用列表时，已弃用静默删除引发 TypeError、DataError 和某些 ValueError 的列 (GH 43740)
在将时区感知值设置到时区感知的 Series 或 DataFrame 列时，如果时区不匹配，则弃用之前的转换行为。以前这会转换为对象类型。在未来的版本中，插入的值将被转换为系列或列的现有时区 (GH 37605)
当传递一个时区不匹配的项目给 DatetimeIndex.insert(), DatetimeIndex.putmask(), DatetimeIndex.where() DatetimeIndex.fillna(), Series.mask(), Series.where(), Series.fillna(), Series.shift(), Series.replace(), Series.reindex() (以及 DataFrame 列类似方法) 时，已弃用的类型转换行为。过去这会转换为对象 dtype。在未来的版本中，这些方法会将传递的项目转换为索引或系列的时区 (GH 37605, GH 44940)
在 read_csv() 和 read_table() 中弃用了 prefix 关键字参数，在未来的版本中该参数将被移除 (GH 43396)
在 concat() 中弃用传递非布尔参数给 sort (GH 41518)
不推荐将参数作为位置参数传递给 read_fwf() 除了 filepath_or_buffer (GH 41485)
不推荐将参数作为位置参数传递给 read_xml() 除了 path_or_buffer (GH 45133)
弃用传递 skipna=None 给 DataFrame.mad() 和 Series.mad()，请改为传递 skipna=True (GH 44580)
弃用了在 utc=False 情况下使用字符串 “now” 的 to_datetime() 行为；在未来的版本中，这将匹配 Timestamp("now")，后者反过来匹配返回本地时间的 Timestamp.now() (GH 18705)
已弃用 DateOffset.apply()，请改用 offset + other (GH 44522)
在 Index.copy() 中弃用的参数 names (GH 44916)
现在会显示一个针对 DataFrame.to_latex() 的弃用警告，指出参数签名可能会在未来的版本中更改，并更接近于 Styler.to_latex() 的参数 (GH 44411)
在 concat() 函数中，布尔型和数值型之间的弃用行为；在未来的版本中，这些将转换为对象类型，而不是将布尔值强制转换为数值型 (GH 39817)
已弃用 Categorical.replace()，请改用 Series.replace() (GH 44929)
不推荐将 set 或 dict 作为 DataFrame.loc.__setitem__()、DataFrame.loc.__getitem__()、Series.loc.__setitem__()、Series.loc.__getitem__()、DataFrame.__getitem__()、Series.__getitem__() 和 Series.__setitem__() 的索引器传递 (GH 42825)
已弃用 Index.__getitem__() 使用布尔键；使用 index.values[key] 获取旧行为 (GH 44051)
在 DataFrame.where() 中逐列向下转换已弃用，使用整数类型 (GH 44597)
已弃用 DatetimeIndex.union_many()，请改用 DatetimeIndex.union() (GH 44091)
弃用 Groupby.pad() 而改用 Groupby.ffill() (GH 33396)
弃用 Groupby.backfill() 而改用 Groupby.bfill() (GH 33396)
弃用 Resample.pad() 而改用 Resample.ffill() (GH 33396)
弃用 Resample.backfill() 而改用 Resample.bfill() (GH 33396)
在 DataFrame.rank() 中弃用了 numeric_only=None；在未来的版本中，numeric_only 必须是 True 或 ``False``（默认值）(GH 45036)
弃用了 Timestamp.utcfromtimestamp() 的行为，未来它将返回一个带时区的 UTC Timestamp (GH 22451)
已弃用 NaT.freq() (GH 45071)
当传递包含 NaN 的浮点型数据和忽略 dtype 参数的整数型数据时，Series 和 DataFrame 构造的弃用行为；在未来的版本中这将引发 (GH 40110)
弃用了 Series.to_frame() 和 Index.to_frame() 的行为，以忽略当 name=None 时的 name 参数。目前，这意味着保留现有的名称，但在未来显式传递 name=None 将把 None 设置为结果 DataFrame 中的列名 (GH 44212)

性能提升#

在 DataFrameGroupBy.sample() 和 SeriesGroupBy.sample() 中的性能提升，特别是在提供 weights 参数时 (GH 34483)
将非字符串数组转换为字符串数组时的性能改进 (GH 34483)
在 DataFrameGroupBy.transform() 和 SeriesGroupBy.transform() 中对用户定义函数的性能改进 (GH 41598)
在构建 DataFrame 对象时的性能提升 (GH 42631, GH 43142, GH 43147, GH 43307, GH 43144, GH 44826)
在提供 fill_value 参数时，DataFrameGroupBy.shift() 和 SeriesGroupBy.shift() 的性能改进 (GH 26615)
在 DataFrame.corr() 中，对于没有缺失值的数据，method=pearson 的性能改进 (GH 40956)
在一些 DataFrameGroupBy.apply() 和 SeriesGroupBy.apply() 操作中的性能提升 (GH 42992, GH 43578)
在 read_stata() 中的性能提升 (GH 43059, GH 43227)
在 read_sas() 中的性能提升 (GH 43333)
在使用 uint dtypes 时，to_datetime() 的性能提升 (GH 42606)
在 to_datetime() 中，当 infer_datetime_format 设置为 True 时的性能提升 (GH 43901)
在 Series.sparse.to_coo() 中的性能提升 (GH 42880)
使用 UInt64Index 在索引中的性能提升 (GH 43862)
使用 Float64Index 在索引中的性能提升 (GH 43705)
使用非唯一 索引 进行索引的性能改进 (GH 43792)
在使用 MultiIndex 上的类列表索引器进行索引时的性能改进 (GH 43370)
在使用 MultiIndex 索引器对另一个 MultiIndex 进行索引时的性能提升 (GH 43370)
在 DataFrameGroupBy.quantile() 和 SeriesGroupBy.quantile() 中的性能提升 (GH 43469, GH 43725)
在 DataFrameGroupBy.count() 和 SeriesGroupBy.count() 中的性能提升 (GH 43730, GH 43694)
在 DataFrameGroupBy.any()、SeriesGroupBy.any()、DataFrameGroupBy.all() 和 SeriesGroupBy.all() 中的性能改进 (GH 43675, GH 42841)
在 DataFrameGroupBy.std() 和 SeriesGroupBy.std() 中的性能改进 (GH 43115, GH 43576)
在 DataFrameGroupBy.cumsum() 和 SeriesGroupBy.cumsum() 中的性能提升 (GH 43309)
SparseArray.min() 和 SparseArray.max() 不再需要转换为密集数组 (GH 43526)
使用 step=1 的 slice 对 SparseArray 进行索引不再需要转换为密集数组 (GH 43777)
在 allow_fill=False 的情况下，SparseArray.take() 的性能改进 (GH 43654)
在 Rolling.mean()、Expanding.mean()、Rolling.sum()、Expanding.sum()、Rolling.max()、Expanding.max()、Rolling.min() 和 Expanding.min() 中使用 engine="numba" 的性能提升 (GH 43612, GH 44176, GH 45170)
改进了当文件编码为 UTF-8 时使用 memory_map=True 的 pandas.read_csv() 性能 (GH 43787)
在 RangeIndex.sort_values() 中覆盖 Index.sort_values() 的性能改进 (GH 43666)
在 RangeIndex.insert() 中的性能提升 (GH 43988)
在 Index.insert() 中的性能提升 (GH 43953)
在 DatetimeIndex.tolist() 中的性能提升 (GH 43823)
在 DatetimeIndex.union() 中的性能提升 (GH 42353)
在 Series.nsmallest() 中的性能提升 (GH 43696)
在 DataFrame.insert() 中的性能提升 (GH 42998)
在 DataFrame.dropna() 中的性能提升 (GH 43683)
在 DataFrame.fillna() 中的性能提升 (GH 43316)
在 DataFrame.values() 中的性能提升 (GH 43160)
在 DataFrame.select_dtypes() 中的性能提升 (GH 42611)
在 DataFrame 简化中的性能提升 (GH 43185, GH 43243, GH 43311, GH 43609)
在 Series.unstack() 和 DataFrame.unstack() 中的性能提升 (GH 43335, GH 43352, GH 42704, GH 43025)
在 Series.to_frame() 中的性能提升 (GH 43558)
在 Series.mad() 中的性能提升 (GH 43010)
在 merge() 中的性能提升 (GH 43332)
当索引列是日期时间并且已格式化时，to_csv() 的性能提升 (GH 39413)
当 MultiIndex 包含大量未使用的级别时，to_csv() 的性能改进 (GH 37484)
当 index_col 设置为数字列时，read_csv() 的性能提升 (GH 44158)
在 concat() 中的性能提升 (GH 43354)
在 SparseArray.__getitem__() 中的性能提升 (GH 23122)
从类似数组的对象（如 Pytorch 张量）构建 DataFrame 时的性能提升 (GH 44616)

错误修复#

Categorical#

在设置 dtype-不兼容的值到一个 Categorical (或由 Categorical 支持的 Series 或 DataFrame) 时引发 ValueError 而不是 TypeError (GH 41919)
在传递与 dtype 不兼容的值时，Categorical.searchsorted() 中的错误引发 KeyError 而不是 TypeError (GH 41919)
在 Categorical.astype() 中将日期时间和 Timestamp 转换为 dtype object 的整数时存在错误 (GH 44930)
在传递一个与dtype不兼容的值时，Series.where() 中的 CategoricalDtype 错误地引发 ValueError 而不是 TypeError (GH 41919)
当传递一个与数据类型不兼容的值时，Categorical.fillna() 中的错误引发 ValueError 而不是 TypeError (GH 41919)
在 Categorical.fillna() 中使用类似元组的类别时，填充非类别元组时引发 ValueError 而不是 TypeError 的错误 (GH 41919)

Datetimelike#

DataFrame 构造函数中不必要的复制非类日期型 2D 对象数组的错误 (GH 39272)
在使用 format 和 pandas.NA 时，to_datetime() 中的错误会引发 ValueError (GH 42957)
to_datetime() 如果在给定的 dayfirst 选项无法被遵守的情况下，会静默地交换 MM/DD/YYYY 和 DD/MM/YYYY 格式 - 现在，在分隔日期字符串的情况下（例如 31-12-2012）会引发警告 (GH 12585)
date_range() 和 bdate_range() 中的错误在 start = end 且集合在一侧关闭时不会返回正确的边界 (GH 43394)
在 DatetimeIndex 或 TimedeltaIndex 与 DatetimeArray 或 TimedeltaArray 的就地加法和减法中的错误 (GH 43904)
在调用 np.isnan、np.isfinite 或 np.isinf 时，对一个带有时区的 DatetimeIndex 错误地引发 TypeError 的 Bug (GH 43917)
在从带有混合时区的类日期时间字符串构建 Series 时，错误地部分推断日期时间值的错误 (GH 40111)
在添加 Tick 对象和 np.timedelta64 对象时，错误地引发而不是返回 Timedelta (GH 44474)
np.maximum.reduce 和 np.minimum.reduce 现在在操作 datetime64[ns] 或 timedelta64[ns] 类型的 Series、DataFrame 或 Index 时，正确返回 Timestamp 和 Timedelta 对象 (GH 43923)
在将 np.timedelta64 对象添加到 BusinessDay 或 CustomBusinessDay 对象时错误地引发 (GH 44532)
在 Index.insert() 中插入 np.datetime64, np.timedelta64 或 tuple 到 dtype='object' 的 Index 时，负 loc 添加 None 并替换现有值的错误 (GH 44509)
在 Timestamp.to_pydatetime() 中存在一个错误，未能保留 fold 属性 (GH 45087)
在 Series.mode() 中使用 DatetimeTZDtype 时错误地返回时区无关的结果，以及使用 PeriodDtype 时错误地引发 (GH 41927)
修复了在使用与 datetime-like 类型不兼容的填充值时，reindex() 引发错误的问题（或者在使用 datetime.date 作为填充值时未引发弃用警告的问题）(GH 42921)
在 DateOffset 与 Timestamp 相加时存在一个错误，结果中不包含 offset.nanoseconds (GH 43968, GH 36589)
Timestamp.fromtimestamp() 中的错误不支持 tz 参数 (GH 45083)
从包含不匹配索引数据类型的 Series 字典构造 DataFrame 时存在错误，有时会根据传递的字典顺序引发异常 (GH 44091)
Timestamp 在某些夏令时转换期间哈希的错误导致了段错误（GH 33931 和 GH 40817）

Timedelta#

在所有-NaT TimeDeltaIndex、Series 或 DataFrame 列与对象类型的数字数组进行除法时，无法推断结果为 timedelta64 类型 (GH 39750)
timedelta64[ns] 数据与标量进行整除时出现错误，返回垃圾值 (GH 44466)
Timedelta 中的错误现在正确地考虑了任何关键字参数的纳秒贡献 (GH 43764, GH 45227)

时区#

to_datetime() 中的错误，当 infer_datetime_format=True 时无法正确解析零 UTC 偏移（Z）(GH 41047)
Series.dt.tz_convert() 中的错误在带有 CategoricalIndex 的 Series 中重置索引 (GH 43080)
在 Timestamp 和 DatetimeIndex 中，当减去两个时区不匹配的时区感知对象时，错误地引发了一个 TypeError (GH 31793)

Numeric#

在将整数列表或元组除以 Series 时出现的错误 (GH 44674)
DataFrame.rank() 在 object 列和 method="first" 时引发 ValueError 的错误 (GH 41931)
DataFrame.rank() 中的一个错误，将缺失值和极值视为相等（例如 np.nan 和 np.inf），在使用 na_option="bottom" 或 na_option="top" 时导致结果不正确 (GH 41931)
当选项 compute.use_numexpr 设置为 False 时，numexpr 引擎中的错误仍在使用 (GH 32556)
在 DataFrame 算术运算中存在一个错误，当子类的 _constructor() 属性是一个可调用对象而不是子类本身时 (GH 43201)
涉及 RangeIndex 的算术运算中的错误，结果会有不正确的 name (GH 43962)
涉及 Series 的算术运算中的错误，当操作数具有匹配的 NA 或匹配的元组名称时，结果可能具有不正确的 name (GH 44459)
使用 IntegerDtype 或 BooleanDtype 数组和 NA 标量进行除法时出现错误，不正确地引发 (GH 44685)
在将带有 FloatingDtype 的 Series 与类似时间增量的标量相乘时，错误地引发了一个 Bug (GH 44772)

转换#

在传递包含既可以转换为 int64 的足够小的正整数和无法容纳在 int64 中的整数的列表时，UInt64Index 构造函数中的错误 (GH 42201)
Series 构造函数中，对于缺失值，dtype int64 返回 0，dtype bool 返回 False 的错误 (GH 43017, GH 43018)
从包含 Series 对象的 PandasArray 构建 DataFrame 时出现的错误，其行为与等效的 np.ndarray 不同 (GH 43986)
IntegerDtype 中的错误不允许从字符串 dtype 强制转换 (GH 25472)
Bug in to_datetime() with arg:xr.DataArray and unit="ns" specified raises TypeError (GH 44053) 的中文翻译结果为：
在 DataFrame.convert_dtypes() 中的错误，当子类没有重载 _constructor_sliced() 时，没有返回正确的类型 (GH 43201)
在 DataFrame.astype() 中的错误未传播原始 DataFrame 的 attrs (GH 44414)
在 DataFrame.convert_dtypes() 结果中丢失 columns.names 的错误 (GH 41435)
从 pyarrow 数据构建 IntegerArray 时未能验证 dtypes 的错误 (GH 44891)
Series.astype() 中的一个错误，不允许从 PeriodDtype 转换为 datetime64 dtype，与 PeriodIndex 行为不一致 (GH 45038)

字符串#

在检查 string[pyarrow] dtype 时，当未安装 pyarrow 时错误地引发 ImportError 的错误 (GH 44276)

Interval#

在 Series.where() 中使用 IntervalDtype 时，当 where 调用不应替换任何内容时错误地引发 (GH 44181)

索引#

在提供 level 时，Series.rename() 与 MultiIndex 存在错误 (GH 43659)
当对象的 Index 长度大于一但只有一个唯一值时，DataFrame.truncate() 和 Series.truncate() 存在错误 (GH 42365)
在 Series.loc() 和 DataFrame.loc() 中存在一个错误，当使用 MultiIndex 进行索引时，如果索引是一个元组，其中某一个级别也是一个元组 (GH 27591)。
在第一个级别仅包含 np.nan 值的 MultiIndex 中使用 Series.loc() 的错误 (GH 42055)
在带有 DatetimeIndex 的 Series 或 DataFrame 上进行索引时存在一个错误，当传递一个字符串时，返回类型取决于索引是否是单调的 (GH 24892)
在 MultiIndex 上的索引错误，当索引器是一个包含类日期字符串的元组时，未能删除标量级别 (GH 42476)
在传递一个升序值时，DataFrame.sort_values() 和 Series.sort_values() 中的错误，未能正确引发或错误地引发 ValueError (GH 41634)
在使用布尔索引更新 pandas.Series 的值时出现的错误，该布尔索引是通过 pandas.DataFrame.pop() 创建的 (GH 42530)
当索引包含多个 np.nan 时，Index.get_indexer_non_unique() 中的错误 (GH 35392)
Bug in DataFrame.query() 在反引号列名中没有处理度符号，例如 `Temp(°C)`，在查询 DataFrame 的表达式中使用 (GH 42826)
在 DataFrame.drop() 中的一个错误，当引发 KeyError 时，错误信息没有显示缺少的标签带有逗号 (GH 42881)
在 DataFrame.query() 中的错误，当查询字符串中的方法调用导致错误时，如果安装了 numexpr 包 (GH 22435)
在 DataFrame.nlargest() 和 Series.nlargest() 中的错误，排序结果未计算包含 np.nan 的索引 (GH 28984)
在具有 NA 标量（例如 np.nan）的非唯一 object-dtype Index 中的索引错误 (GH 43711)
在 DataFrame.__setitem__() 中的错误，错误地写入现有列的数组而不是在新数据类型和旧数据类型匹配时设置新数组 (GH 43406)
在将浮点型数据类型的值设置到具有整数数据类型的 Series 时，当这些值可以无损转换为整数时，无法就地设置的错误 (GH 44316)
在对象数据类型的 Series.__setitem__() 中，当设置一个大小匹配且 dtype=’datetime64[ns]’ 或 dtype=’timedelta64[ns]’ 的数组时，日期时间/时间增量不正确地转换为整数 (GH 43868)
在 DataFrame.sort_index() 中的一个错误，当索引已经排序时，ignore_index=True 未被遵守 (GH 43591)
当索引包含多个 np.datetime64("NaT") 和 np.timedelta64("NaT") 时，Index.get_indexer_non_unique() 中的错误 (GH 43869)
在将标量 Interval 值设置到具有 IntervalDtype 的 Series 中时出现的错误，当标量的边是浮点数而值的边是整数 (GH 44201)
当设置可以解析为日期时间的字符串支持的 Categorical 值到 DatetimeArray 或 Series 或 DataFrame 列时出现的错误，该列由 DatetimeArray 支持，无法解析这些字符串 (GH 44236)
在 Series.__setitem__() 中使用非 int64 整数类型设置 range 对象时，不必要地向上转换为 int64 的错误 (GH 44261)
使用布尔掩码索引器在 Series.__setitem__() 中设置长度为1的类列表值时，错误地广播该值的错误 (GH 44265)
在 Series.reset_index() 中的错误，当 drop 和 inplace 设置为 True 时，不忽略 name 参数 (GH 44575)
在混合数据类型的情况下，DataFrame.loc.__setitem__() 和 DataFrame.iloc.__setitem__() 有时无法就地操作的错误 (GH 44345)
在 DataFrame.loc.__getitem__() 中的错误，在用布尔键选择单个列时错误地引发 KeyError (GH 44322)。
在设置带有单个 ExtensionDtype 列的 DataFrame.iloc() 并设置二维值（例如 df.iloc[:] = df.values）时出现错误，错误地引发 (GH 44514)
在使用 DataFrame.iloc() 设置值时，单个 ExtensionDtype 列和作为索引器的数组元组存在错误 (GH 44703)
在使用 loc 或 iloc 对包含 ExtensionDtype 列的列进行索引时，使用带有负步长的切片不正确地引发错误 (GH 44551)
在索引器完全为 False 时，DataFrame.loc.__setitem__() 中的错误改变了数据类型 (GH 37550)
在 IntervalIndex.get_indexer_non_unique() 中存在一个错误，对于非唯一且非单调的索引，返回的是布尔掩码而不是整数数组 (GH 44084)
IntervalIndex.get_indexer_non_unique() 中的错误未正确处理 dtype 为 ‘object’ 且包含 NaNs 的目标 (GH 44482)
修复了一个回归问题，即单列 np.matrix 在添加到 DataFrame 时不再被强制转换为 1d np.ndarray (GH 42376)
在带有 CategoricalIndex 的 Series.__getitem__() 中的错误，将整数列表视为位置索引器，与单个标量整数的行为不一致 (GH 15470, GH 14865)
在将浮点数或整数设置到整数类型的 Series 中时，Series.__setitem__() 中的错误在必要时未能进行向上转换以保持精度 (GH 45121)
在 DataFrame.iloc.__setitem__() 中的错误忽略了轴参数 (GH 45032)

缺失#

在 DataFrame.fillna() 中使用 limit 且没有 method 时，忽略 axis='columns' 或 axis = 1 的问题 (GH 40989, GH 17399)
DataFrame.fillna() 中的一个错误，在使用类似字典的 value 和重复的列名时，不会替换缺失值 (GH 43476)
在构建一个包含字典 np.datetime64 作为值且 dtype='timedelta64[ns]' 的 DataFrame 时，或者反之，错误地进行类型转换而不是引发异常 (GH 44428)
在 inplace=True 的情况下，Series.interpolate() 和 DataFrame.interpolate() 中的错误不会就地写入底层数组(GH 44749)
在 Index.fillna() 中存在一个错误，当存在NA值并且指定了 downcast 参数时，错误地返回一个未填充的 Index。现在改为引发 NotImplementedError；不要传递 downcast 参数 (GH 44873)
在 DataFrame.dropna() 中存在一个错误，即使没有删除任何条目，也会改变 Index (GH 41965)
在 Series.fillna() 中存在一个错误，当使用对象类型时，错误地忽略 downcast="infer" (GH 44241)

MultiIndex#

在 MultiIndex.get_loc() 中的错误，其中第一层是 DatetimeIndex 并且传递了一个字符串键 (GH 42465)
当传递一个对应于 ExtensionDtype 级别的 level 时，MultiIndex.reindex() 中的错误 (GH 42043)
在嵌套元组上 MultiIndex.get_loc() 引发 TypeError 而不是 KeyError 的错误 (GH 42440)
在 MultiIndex.union() 中的错误设置了错误的 sortorder，导致在后续使用切片进行索引操作时出现错误 (GH 44752)
在 MultiIndex.putmask() 中的错误，其中另一个值也是一个 MultiIndex (GH 43212)
在 MultiIndex.dtypes() 中的错误：重复的层级名称只返回了一个名称对应的dtype (GH 45174)

I/O#

在尝试从 .xlsx 文件中读取图表工作表时，read_excel() 中的错误 (GH 41448)
在 json_normalize() 中的一个错误，当 record_path 的长度大于一时，errors=ignore 可能无法忽略 meta 的缺失值 (GH 41876)
在具有多标题输入和引用列名作为元组的参数的 read_csv() 中存在错误 (GH 42446)
在 read_fwf() 中的错误，其中 colspecs 和 names 的长度差异没有引发 ValueError (GH 40830)
在 Series.to_json() 和 DataFrame.to_json() 中的一个错误，当将纯Python对象序列化为JSON时，某些属性被跳过 (GH 42768, GH 33043)
从 sqlalchemy 的 Row 对象构造 DataFrame 时，列标题会被丢弃 (GH 40682)
在解封装带有对象dtype的 Index 时，错误地推断数值类型的问题 (GH 43188)
在 read_csv() 中读取多标题输入且长度不均时，错误地引发了 IndexError 的 Bug (GH 43102)
Bug in read_csv() raising ParserError when reading file in chunks and some chunk blocks have fewer columns than header for engine="c" (GH 21211)
在 read_csv() 中的错误，当期望文件路径名或类文件对象时，异常类从 OSError 改为 TypeError (GH 43366)
在指定 engine='python' 时，read_csv() 和 read_fwf() 中的错误会忽略所有 skiprows 除了第一个，当 nrows 被指定时 (GH 44021, GH 10261)
在设置 keep_date_col=True 时，read_csv() 中的错误保留了原始列的对象格式 (GH 13378)
read_json() 中的错误未能正确处理非 numpy 数据类型（尤其是 category）（GH 21892, GH 33205）
在 json_normalize() 中的错误，其中多字符 sep 参数被错误地添加到每个键的前缀中 (GH 43831)
Bug in json_normalize() 读取缺少多级元数据的数据时，不尊重 errors="ignore" (GH 44312)
在 read_csv() 中的错误，如果 header 设置为 None 并且 engine 设置为 python，则使用第二行来猜测隐式索引 (GH 22144)
Bug in read_csv() not recognizing bad lines when names were given for engine="c" (GH 22144)
在 read_csv() 中使用 float_precision="round_trip" 时没有跳过初始/尾随空白的错误 (GH 43713)
当 Python 在没有 lzma 模块的情况下构建时出现错误：即使在未使用 lzma 功能的情况下，也会在 pandas 导入时引发警告 (GH 43495)
在 read_csv() 中未应用 index_col 的 dtype 错误 (GH 9435)
在通过 yaml.dump(frame) 转储/加载 DataFrame 时出现的错误 (GH 42748)
Bug in read_csv() raising ValueError when names was longer than header but equal to data rows for engine="python" (GH 38453)
ExcelWriter 中的一个错误，其中 engine_kwargs 没有传递给所有引擎 (GH 43442)
当 parse_dates 与 MultiIndex 列一起使用时，read_csv() 中的错误引发 ValueError (GH 8991)
在 read_csv() 中的错误，当指定 `` `` 作为 delimiter 或 sep 时，不会引发 ValueError，这与 lineterminator 冲突 (GH 43528)
在 to_csv() 中将分类 Series 中的日期时间转换为整数的错误 (GH 40754)
在日期解析失败后，read_csv() 将列转换为数值的错误 (GH 11019)
在尝试日期转换之前，read_csv() 中的错误未将 NaN 值替换为 np.nan (GH 26203)
在尝试读取 .csv 文件并从可空整数类型推断索引列数据类型时，read_csv() 中的错误引发 AttributeError (GH 44079)
在 to_csv() 中的错误总是将具有不同格式的日期时间列强制转换为相同格式 (GH 21734)
DataFrame.to_csv() 和 Series.to_csv() 在 compression 设置为 'zip' 时不再创建一个包含以 “.zip” 结尾的文件的 zip 文件。相反，它们尝试更智能地推断内部文件名 (GH 39465)
在 read_csv() 中的一个错误，当读取一个包含布尔值和缺失值的混合列并转换为浮点类型时，缺失值变为 1.0 而不是 NaN (GH 42808, GH 34120)
在 to_xml() 中对 pd.NA 使用扩展数组数据类型时引发错误的错误 (GH 43903)
当同时传递 date_parser 中的解析器和 parse_dates=False 时，read_csv() 中存在一个错误，解析仍然被调用 (GH 44366)
在 index_col 不是第一列时，read_csv() 中的错误未正确设置 MultiIndex 列的名称 (GH 38549)
在无法创建内存映射文件时，read_csv() 中的错误会静默忽略 (GH 44766)
当传递一个以二进制模式打开的 tempfile.SpooledTemporaryFile 时，read_csv() 中的错误 (GH 44748)
在尝试解析包含“://”的json字符串时，read_json() 引发 ValueError 的错误 (GH 36271)
Bug in read_csv() when engine="c" and encoding_errors=None which caused a segfault (GH 45180)
read_csv() 中的一个错误，usecols 的无效值导致未关闭的文件句柄 (GH 45384)
修复 DataFrame.to_json() 中的错误以解决内存泄漏问题 (GH 43877)

周期#

在将 Period 对象添加到 np.timedelta64 对象时错误地引发 TypeError 的错误 (GH 44182)
当索引具有 freq="B" 时，PeriodIndex.to_timestamp() 中的错误推断结果的 freq="D" 而不是 freq="B" (GH 44105)
Period 构造函数中的错误错误地允许 np.timedelta64("NaT") (GH 44507)
在 PeriodIndex.to_timestamp() 中的错误导致对非连续数据索引的值不正确 (GH 44100)
在 Series.where() 中使用 PeriodDtype 时，当 where 调用不应替换任何内容时错误地引发 (GH 45135)

绘图#

当给定非数值数据时，DataFrame.boxplot() 现在会引发 ValueError 而不是隐晦的 KeyError 或 ZeroDivisionError，与其他绘图函数如 DataFrame.hist() 一致 (GH 43480)

分组/重采样/滚动#

在 SeriesGroupBy.apply() 中的一个错误，当传递一个无法识别的字符串参数时，如果底层 Series 为空，未能引发 TypeError (GH 42021)
在 Series.rolling.apply()、DataFrame.rolling.apply()、Series.expanding.apply() 和 DataFrame.expanding.apply() 中使用 engine="numba" 时，*args 被缓存到用户传递的函数中的错误 (GH 42287)
在 DataFrameGroupBy.max(), SeriesGroupBy.max(), DataFrameGroupBy.min(), 和 SeriesGroupBy.min() 中，可空整数数据类型丢失精度的问题 (GH 41743)
在 DataFrame.groupby.rolling.var() 中的错误只会计算第一个组的滚动方差 (GH 42442)
DataFrameGroupBy.shift() 和 SeriesGroupBy.shift() 中的一个错误，如果 fill_value 不是 None，则会返回分组列 (GH 41556)
SeriesGroupBy.nlargest() 和 SeriesGroupBy.nsmallest() 中的错误在输入的 Series 已排序且 n 大于或等于所有组大小时会导致不一致的索引 (GH 15272, GH 16345, GH 29129)
在 pandas.DataFrame.ewm() 中的错误，其中非 float64 数据类型会静默失败 (GH 42452)
在 pandas.DataFrame.rolling() 操作中沿行 (axis=1) 的错误会错误地忽略包含 float16 和 float32 的列 (GH 41779)
Resampler.aggregate() 中的错误不允许使用命名聚合 (GH 32803)
当 Series 的 dtype 为 Int64 时，Series.rolling() 中的 Bug (GH 43016)
当 DataFrame 列是一个 MultiIndex 时，DataFrame.rolling.corr() 中的错误 (GH 21157)
在指定 on 并调用 __getitem__ 时，DataFrame.groupby.rolling() 中的错误会导致随后返回不正确的结果 (GH 43355)
DataFrameGroupBy.apply() 和 SeriesGroupBy.apply() 中的错误，当使用基于时间的 Grouper 对象时，在分组向量包含 NaT 的极端情况下不正确地引发 ValueError (GH 43500, GH 43515)
DataFrameGroupBy.mean() 和 SeriesGroupBy.mean() 中的错误在 complex 数据类型下失败 (GH 43701)
在 Series.rolling() 和 DataFrame.rolling() 中的错误，当 center=True 且索引递减时，未能正确计算第一行的窗口边界 (GH 43927)
在中心化的类日期时间窗口中，Series.rolling() 和 DataFrame.rolling() 存在错误，对于不均匀的纳秒 (GH 43997)
在 DataFrameGroupBy.mean() 和 SeriesGroupBy.mean() 中存在一个错误，当列被至少选择两次时会引发 KeyError (GH 44924)
DataFrameGroupBy.nth() 和 SeriesGroupBy.nth() 中的错误在 axis=1 上失败 (GH 43926)
在 Series.rolling() 和 DataFrame.rolling() 中的错误，如果在包含重复项的索引上使用中心化的类日期时间窗口，则不尊重右边界 (GH 3944)
在使用返回不等长起始和结束数组的 pandas.api.indexers.BaseIndexer 子类时，Series.rolling() 和 DataFrame.rolling() 中的错误会导致段错误，而不是引发 ValueError (GH 44470)
Groupby.nunique() 中的错误不尊重 categorical 分组列的 observed=True (GH 45128)
在 DataFrameGroupBy.head()、SeriesGroupBy.head()、DataFrameGroupBy.tail() 和 SeriesGroupBy.tail() 中，当 dropna=True 时，未删除包含 NaN 的组 (GH 45089)
在 GroupBy 对象中选择列的子集后，GroupBy.__iter__() 中的错误，返回了所有列而不是选定的子集 (GH 44821)
当传递非单调数据时，Groupby.rolling() 中的错误，未能正确引发 ValueError (GH 43909)
在按一个具有 categorical 数据类型且长度与分组轴不相等的 Series 进行分组时引发 ValueError 的错误 (GH 44179)

重塑#

当从多维的 numpy.ndarray 创建 DataFrame 列时，改进了错误信息 (GH 42463)
在连接一个在 Index 中有重复项的 DataFrame 和多个键时，concat() 创建的 MultiIndex 包含重复的级别条目（GH 42651）
在具有重复索引和非精确 pandas.CategoricalIndex() 的 Series 上的 pandas.cut() 中的错误 (GH 42185, GH 42425)
在 DataFrame.append() 中存在一个错误，当追加的列不匹配时无法保留dtypes (GH 43392)
在 bool 和 boolean dtypes 的 concat() 中存在一个错误，导致结果为 object dtype 而不是 boolean dtype (GH 42800)
当输入是分类的 Series 时，crosstab() 中的错误，这些分类在其中一个或两个 Series 中不存在，并且 margins=True。以前，缺失类别的边际值是 NaN。现在正确报告为 0 (GH 43505)
concat() 中的错误会在 objs 参数具有相同的索引且 keys 参数包含重复项时失败 (GH 43595)
在 concat() 中的错误，忽略了 sort 参数 (GH 43375)
在 merge() 中使用 MultiIndex 作为 on 参数的列索引时，内部赋值列时返回错误 (GH 43734)
当输入是列表或元组时，crosstab() 中的错误会导致失败 (GH 44076)
在 DataFrame.append() 中存在一个错误，当追加一个 Series 对象列表时，无法保留 index.name (GH 44109)
修复了 Dataframe.apply() 方法中的固定元数据传播，从而修复了 Dataframe.transform()、Dataframe.nunique() 和 Dataframe.mode() 中的相同问题 (GH 28283)
在 concat() 中，如果 MultiIndex 的所有层级仅由缺失值组成，则将层级转换为浮点数的错误 (GH 44900)
在带有 ExtensionDtype 列的 DataFrame.stack() 中存在错误，错误地引发 (GH 43561)
在通过关键字连接不同名称的索引时，merge() 中的错误引发 KeyError (GH 45094)
在 Series.unstack() 中存在一个错误，该错误在结果列上对对象进行了不希望的类型推断 (GH 44595)
在具有重叠 IntervalIndex 级别的 MultiIndex.join() 中存在错误 (GH 44096)
DataFrame.replace() 和 Series.replace() 中的错误导致基于 regex 参数的不同 dtype (GH 44864)
当 DataFrame 索引是 MultiIndex 时，DataFrame.pivot() 在 index=None 时存在错误 (GH 23955)

Sparse#

当列名不唯一时，DataFrame.sparse.to_coo() 中的错误引发 AttributeError (GH 29564)
SparseArray.max() 和 SparseArray.min() 在数组中没有非空元素时引发 ValueError 的错误 (GH 43527)
DataFrame.sparse.to_coo() 中的错误会静默地将非零填充值转换为零 (GH 24817)
SparseArray 比较方法中存在一个错误，当与长度不匹配的类数组操作数进行比较时，会根据输入引发 AssertionError 或不明确的 ValueError (GH 43863)
在 SparseArray 算术方法 floordiv 和 mod 中，当除以零时的行为与非稀疏的 Series 行为不匹配 (GH 38172)
SparseArray 的一元方法以及 SparseArray.isna() 中的错误不会重新计算索引 (GH 44955)

ExtensionArray#

在 array() 中未能保留 PandasArray 的错误 (GH 43887)
NumPy ufuncs np.abs, np.positive, np.negative 现在在调用实现了 __abs__, __pos__, __neg__ 的 ExtensionArrays 时正确地保留数据类型。特别是这对于 TimedeltaArray 已经修复 (GH 43899, GH 23316)
NumPy ufuncs np.minimum.reduce np.maximum.reduce, np.add.reduce, 和 np.prod.reduce 现在可以正确工作，而不是在具有 IntegerDtype 或 FloatDtype 的 Series 上引发 NotImplementedError (GH 43923, GH 44793)
现在支持带有 out 关键字的 NumPy ufuncs 通过 IntegerDtype 和 FloatingDtype 数组 (GH 45122)
在使用扩展数据类型 (GH 44098) 时，避免因使用许多列而引发的 PerformanceWarning 关于碎片化的 DataFrame。
IntegerArray 和 FloatingArray 构造中的错误不正确地将不匹配的 NA 值（例如 np.timedelta64("NaT")）强制转换为数值 NA (GH 44514)
在 BooleanArray.__eq__() 和 BooleanArray.__ne__() 中存在一个错误，当与不兼容的类型（如字符串）进行比较时会引发 TypeError。这导致 DataFrame.replace() 在包含可空布尔列时有时会引发 TypeError (GH 44499)
在传递 float16 dtype 的 ndarray 时，array() 中的错误不正确地引发 (GH 44715)
在调用 np.sqrt 时，BooleanArray 返回了一个格式错误的 FloatingArray (GH 44715)
当 other 是一个与 Series 的 dtype 不兼容的 NA 标量（例如，具有数值 dtype 的 NaT）时，Series.where() 中的 ExtensionDtype 错误地转换为兼容的 NA 值 (GH 44697)
在 Series.replace() 中的一个错误，其中显式传递 value=None 被视为没有传递 value，并且 None 不在结果中 (GH 36984, GH 19998)
在 Series.replace() 中存在一个错误，在无操作替换中进行了不必要的向下转换 (GH 44498)
在 Series.replace() 中存在一个错误，当使用 FloatDtype、string[python] 或 string[pyarrow] dtype 时，无法保留 dtype (GH 33484, GH 40732, GH 31644, GH 41215, GH 25438)

Styler#

在 Styler 中的一个错误，其中初始化时的 uuid 保留了一个浮动的下划线 (GH 43037)
在 Styler.to_html() 中的错误，当 Styler 对象在调用 to_html 方法时如果带有某些参数会被更新 (GH 43034)
在 Styler.copy() 中的错误，其中 uuid 之前未被复制 (GH 40675)
在 Styler.apply() 中的一个错误，其中返回 Series 对象的函数在对其索引标签进行对齐处理时未正确处理 (GH 13657, GH 42014)
在渲染带有命名 索引 的空 DataFrame 时出现错误 (GH 43305)
渲染单级 MultiIndex 时的错误 (GH 43383)
当结合非稀疏渲染和 Styler.hide_columns() 或 Styler.hide_index() 时出现的错误 (GH 43464)
在使用 Styler 中的多个选择器时设置表格样式的问题 (GH 44011)
行修剪和列修剪未能反映隐藏行的问题 (GH 43703, GH 44247)

其他#

在具有非唯一列的 DataFrame.astype() 和 Series dtype 参数中的错误 (GH 44417)
在 CustomBusinessMonthBegin.__add__() (CustomBusinessMonthEnd.__add__()) 中的错误，当目标月份的开始（结束）已经是工作日时，未应用额外的 offset 参数 (GH 41356)
在具有匹配（甚至是）``step`` 和开始严格小于 step / 2 的另一个 RangeIndex 的 RangeIndex.union() 中的错误 (GH 44019)
在 sort=None 和 step<0 的情况下，RangeIndex.difference() 中的错误导致排序失败 (GH 44085)
在 Series.replace() 和 DataFrame.replace() 中使用 value=None 和 ExtensionDtypes 的错误 (GH 44270, GH 37899)
在 FloatingArray.equals() 中的错误，如果数组包含 np.nan 值，则无法将两个数组视为相等 (GH 44382)
在 axis=1 和 ExtensionDtype 列的情况下，DataFrame.shift() 中的错误在传递不兼容的 fill_value 时错误地引发 (GH 44564)
在 axis=1 和 periods 大于 len(frame.columns) 的情况下，DataFrame.shift() 中的错误产生了一个无效的 DataFrame (GH 44978)
当传递一个 NumPy 整数对象而不是一个 int 对象时，DataFrame.diff() 中的错误 (GH 44572)
在使用 regex=True 时，Series.replace() 中的错误会引发 ValueError，当 Series 包含 np.nan 值时 (GH 43344)
在 DataFrame.to_records() 中的错误，当缺少名称被 level_n 替换时使用了不正确的 n (GH 44818)
在 DataFrame.eval() 中的错误，其中 resolvers 参数覆盖了默认的解析器 (GH 34966)
Series.__repr__() 和 DataFrame.__repr__() 不再将索引中的所有空值替换为 “NaN”，而是使用它们的实际字符串表示。只有 float("nan") 使用 “NaN” (GH 45263)

贡献者#

总共有275人为此版本贡献了补丁。名字后面带有“+”的人是第一次贡献补丁。

Abhishek R
Albert Villanova del Moral
Alessandro Bisiani +
Alex Lim
Alex-Gregory-1 +
Alexander Gorodetsky
Alexander Regueiro +
Alexey Györi
Alexis Mignon
Aleš Erjavec
Ali McMaster
Alibi +
Andrei Batomunkuev +
Andrew Eckart +
Andrew Hawyrluk
Andrew Wood
Anton Lodder +
Armin Berres +
Arushi Sharma +
Benedikt Heidrich +
Beni Bienz +
Benoît Vinot
Bert Palm +
Boris Rumyantsev +
Brian Hulette
Brock
Bruno Costa +
Bryan Racic +
Caleb Epstein
Calvin Ho
ChristofKaufmann +
Christopher Yeh +
Chuliang Xiao +
ClaudiaSilver +
DSM
Daniel Coll +
Daniel Schmidt +
Dare Adewumi
David +
David Sanders +
David Wales +
Derzan Chiang +
DeviousLab +
Dhruv B Shetty +
Digres45 +
Dominik Kutra +
Drew Levitt +
DriesS
EdAbati
Elle
Elliot Rampono
Endre Mark Borza
Erfan Nariman
Evgeny Naumov +
Ewout ter Hoeven +
Fangchen Li
Felix Divo
Felix Dulys +
Francesco Andreuzzi +
Francois Dion +
Frans Larsson +
Fred Reiss
GYvan
Gabriel Di Pardi Arruda +
Gesa Stupperich
Giacomo Caria +
Greg Siano +
Griffin Ansel
Hiroaki Ogasawara +
Horace +
Horace Lai +
Irv Lustig
Isaac Virshup
JHM Darbyshire (MBP)
JHM Darbyshire (iMac)
JHM Darbyshire +
Jack Liu
Jacob Skwirsk +
Jaime Di Cristina +
James Holcombe +
Janosh Riebesell +
Jarrod Millman
Jason Bian +
Jeff Reback
Jernej Makovsek +
Jim Bradley +
Joel Gibson +
Joeperdefloep +
Johannes Mueller +
John S Bogaardt +
John Zangwill +
Jon Haitz Legarreta Gorroño +
Jon Wiggins +
Jonas Haag +
Joris Van den Bossche
Josh Friedlander
José Duarte +
Julian Fleischer +
Julien de la Bruère-T
Justin McOmie
Kadatatlu Kishore +
Kaiqi Dong
Kashif Khan +
Kavya9986 +
Kendall +
Kevin Sheppard
Kiley Hewitt
Koen Roelofs +
Krishna Chivukula
KrishnaSai2020
Leonardo Freua +
Leonardus Chen
Liang-Chi Hsieh +
Loic Diridollou +
Lorenzo Maffioli +
Luke Manley +
LunarLanding +
Marc Garcia
Marcel Bittar +
Marcel Gerber +
Marco Edward Gorelli
Marco Gorelli
MarcoGorelli
Marvin +
Mateusz Piotrowski +
Mathias Hauser +
Matt Richards +
Matthew Davis +
Matthew Roeschke
Matthew Zeitlin
Matthias Bussonnier
Matti Picus
Mauro Silberberg +
Maxim Ivanov
Maximilian Carr +
MeeseeksMachine
Michael Sarrazin +
Michael Wang +
Michał Górny +
Mike Phung +
Mike Taves +
Mohamad Hussein Rkein +
NJOKU OKECHUKWU VALENTINE +
Neal McBurnett +
Nick Anderson +
Nikita Sobolev +
Olivier Cavadenti +
PApostol +
Pandas Development Team
Patrick Hoefler
Peter
Peter Tillmann +
Prabha Arivalagan +
Pradyumna Rahul
Prerana Chakraborty
Prithvijit +
Rahul Gaikwad +
Ray Bell
Ricardo Martins +
Richard Shadrach
Robbert-jan ‘t Hoen +
Robert Voyer +
Robin Raymond +
Rohan Sharma +
Rohan Sirohia +
Roman Yurchak
Ruan Pretorius +
Sam James +
Scott Talbert
Shashwat Sharma +
Sheogorath27 +
Shiv Gupta
Shoham Debnath
Simon Hawkins
Soumya +
Stan West +
Stefanie Molin +
Stefano Alberto Russo +
Stephan Heßelmann
Stephen
Suyash Gupta +
Sven
Swanand01 +
Sylvain Marié +
TLouf
Tania Allard +
Terji Petersen
TheDerivator +
Thomas Dickson
Thomas Kastl +
Thomas Kluyver
Thomas Li
Thomas Smith
Tim Swast
Tim Tran +
Tobias McNulty +
Tobias Pitters
Tomoki Nakagawa +
Tony Hirst +
Torsten Wörtwein
V.I. Wood +
Vaibhav K +
Valentin Oliver Loftsson +
Varun Shrivastava +
Vivek Thazhathattil +
Vyom Pathak
Wenjun Si
William Andrea +
William Bradley +
Wojciech Sadowski +
Yao-Ching Huang +
Yash Gupta +
Yiannis Hadjicharalambous +
Yoshiki Vázquez Baeza
Yuanhao Geng
Yury Mikhaylov
Yvan Gatete +
Yves Delley +
Zach Rait
Zbyszek Królikowski +
Zero +
Zheyuan
Zhiyi Wu +
aiudirog
ali sayyah +
aneesh98 +
aptalca
arw2019 +
attack68
brendandrury +
bubblingoak +
calvinsomething +
claws +
deponovo +
dicristina
el-g-1 +
evensure +
fotino21 +
fshi01 +
gfkang +
github-actions[bot]
i-aki-y
jbrockmendel
jreback
juliandwain +
jxb4892 +
kendall smith +
lmcindewar +
lrepiton
maximilianaccardo +
michal-gh
neelmraman
partev
phofl +
pratyushsharan +
quantumalaviya +
rafael +
realead
rocabrera +
rosagold
saehuihwang +
salomondush +
shubham11941140 +
srinivasan +
stphnlyd
suoniq
trevorkask +
tushushu
tyuyoshi +
usersblock +
vernetya +
vrserpa +
willie3838 +
zeitlinv +
zhangxiaoxing +

1.4.0 中的新功能 (2022年1月22日)#

增强功能#

改进的警告信息#

索引可以包含任意的 ExtensionArrays#

Styler#

基于 pyarrow 的新 CSV 引擎的多线程 CSV 读取#

滚动和扩展窗口的排名函数#

按位置索引分组#

DataFrame.from_dict 和 DataFrame.to_dict 有了新的 'tight' 选项#

其他增强功能#

值得注意的错误修复#

不一致的日期字符串解析#

在合并时忽略具有空值或全为NA的列中的dtypes#

在 value_counts 和 mode 中，空值不再被强制转换为 NaN 值#

在 read_csv 中的 mangle_dupe_cols 不再重命名与目标名称冲突的唯一列#

unstack 和 pivot_table 不再对结果超出 int32 限制的情况引发 ValueError#

groupby.apply 一致变换检测#

向后不兼容的 API 变化#

增加 Python 的最小版本#

增加了依赖项的最低版本要求#

其他 API 更改#

弃用#

已弃用 Int64Index, UInt64Index & Float64Index#

已弃用的 DataFrame.append 和 Series.append#

其他弃用#

性能提升#

错误修复#

Categorical#

Datetimelike#

Timedelta#

时区#

Numeric#

转换#

字符串#

Interval#

索引#

缺失#

MultiIndex#

I/O#

周期#

绘图#

分组/重采样/滚动#

重塑#

Sparse#

ExtensionArray#

Styler#

其他#

贡献者#

DataFrame.from_dict 和 DataFrame.to_dict 有了新的 `'tight'` 选项#