0.24.0 中的新功能（2019年1月25日）#

警告

0.24.x 系列的发布将是支持 Python 2 的最后一个版本。未来的功能发布将仅支持 Python 3。更多详情请参见 Dropping Python 2.7。

这是从 0.23.4 版本以来的一个重大发布，包括了许多 API 变更、新功能、增强功能和性能改进，以及大量错误修复。

亮点包括：

可选整数NA支持
用于访问支持 Series 或 Index 的数组的新 API
用于创建数组的新顶级方法
在 Series 或 DataFrame 中存储间隔和周期数据
支持在两个MultiIndexes上进行连接

在更新之前，请检查 API 变更和弃用。

这些是 pandas 0.24.0 中的更改。请参阅发行说明以获取包括其他版本 pandas 的完整更新日志。

增强功能#

可选的整数 NA 支持#

pandas 已经获得了持有带有缺失值的整数数据类型的能力。这个长期请求的功能通过使用扩展类型实现。

备注

IntegerArray 目前是实验性的。它的 API 或实现可能会在没有警告的情况下发生变化。

我们可以使用指定的 dtype 构建一个 Series。字符串 Int64 是 pandas 的 ExtensionDtype。使用传统的缺失值标记 np.nan 指定列表或数组将推断为整数 dtype。Series 的显示也将使用 NaN 来指示字符串输出中的缺失值。(GH 20700, GH 20747, GH 22441, GH 21789, GH 22346)

In [1]: s = pd.Series([1, 2, np.nan], dtype='Int64')

In [2]: s
Out[2]: 
0       1
1       2
2    <NA>
Length: 3, dtype: Int64

这些数据类型的操作将像其他 pandas 操作一样传播 NaN。

# arithmetic
In [3]: s + 1
Out[3]: 
0       2
1       3
2    <NA>
Length: 3, dtype: Int64

# comparison
In [4]: s == 1
Out[4]: 
0     True
1    False
2     <NA>
Length: 3, dtype: boolean

# indexing
In [5]: s.iloc[1:3]
Out[5]: 
1       2
2    <NA>
Length: 2, dtype: Int64

# operate with other dtypes
In [6]: s + s.iloc[1:3].astype('Int8')
Out[6]: 
0    <NA>
1       4
2    <NA>
Length: 3, dtype: Int64

# coerce when needed
In [7]: s + 0.01
Out[7]: 
0    1.01
1    2.01
2    <NA>
Length: 3, dtype: Float64

这些数据类型可以作为 DataFrame 的一部分进行操作。

In [8]: df = pd.DataFrame({'A': s, 'B': [1, 1, 3], 'C': list('aab')})

In [9]: df
Out[9]: 
      A  B  C
0     1  1  a
1     2  1  a
2  <NA>  3  b

[3 rows x 3 columns]

In [10]: df.dtypes
Out[10]: 
A     Int64
B     int64
C    object
Length: 3, dtype: object

这些数据类型可以合并、重塑和转换。

In [11]: pd.concat([df[['A']], df[['B', 'C']]], axis=1).dtypes
Out[11]: 
A     Int64
B     int64
C    object
Length: 3, dtype: object

In [12]: df['A'].astype(float)
Out[12]: 
0    1.0
1    2.0
2    NaN
Name: A, Length: 3, dtype: float64

诸如 sum 的归约和分组操作可以工作。

In [13]: df.sum()
Out[13]: 
A      3
B      5
C    aab
Length: 3, dtype: object

In [14]: df.groupby('B').A.sum()
Out[14]: 
B
1    3
3    0
Name: A, Length: 2, dtype: Int64

警告

当前的 Integer NA 支持使用大写的 dtype 版本，例如 Int8 而不是传统的 int8。这可能会在未来某个日期更改。

更多信息请参见可空整数数据类型。

访问 Series 或 Index 中的值#

Series.array 和 Index.array 已添加，用于提取 Series 或 Index 的后备数组。(GH 19954, GH 23623)

In [15]: idx = pd.period_range('2000', periods=4)

In [16]: idx.array
Out[16]: 
<PeriodArray>
['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04']
Length: 4, dtype: period[D]

In [17]: pd.Series(idx).array
Out[17]: 
<PeriodArray>
['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04']
Length: 4, dtype: period[D]

从历史上看，这会使用 series.values 来完成，但使用 .values 时，无法确定返回值是实际的数组、对其的一些变换，还是 pandas 自定义数组之一（如 Categorical）。例如，对于 PeriodIndex，.values 每次都会生成一个新的周期对象 ndarray。

In [18]: idx.values
Out[18]: 
array([Period('2000-01-01', 'D'), Period('2000-01-02', 'D'),
       Period('2000-01-03', 'D'), Period('2000-01-04', 'D')], dtype=object)

In [19]: id(idx.values)
Out[19]: 281469640614096

In [20]: id(idx.values)
Out[20]: 281469653938416

如果你需要一个实际的 NumPy 数组，使用 Series.to_numpy() 或 Index.to_numpy()。

In [21]: idx.to_numpy()
Out[21]: 
array([Period('2000-01-01', 'D'), Period('2000-01-02', 'D'),
       Period('2000-01-03', 'D'), Period('2000-01-04', 'D')], dtype=object)

In [22]: pd.Series(idx).to_numpy()
Out[22]: 
array([Period('2000-01-01', 'D'), Period('2000-01-02', 'D'),
       Period('2000-01-03', 'D'), Period('2000-01-04', 'D')], dtype=object)

对于由普通 NumPy 数组支持的 Series 和 Indexes，Series.array 将返回一个新的 arrays.PandasArray，这是一个围绕 numpy.ndarray 的薄（无副本）包装器。PandasArray 本身并不是特别有用，但它确实提供了与 pandas 中或第三方库定义的任何扩展数组相同的接口。

In [23]: ser = pd.Series([1, 2, 3])

In [24]: ser.array
Out[24]: 
<NumpyExtensionArray>
[1, 2, 3]
Length: 3, dtype: int64

In [25]: ser.to_numpy()
Out[25]: array([1, 2, 3])

我们没有移除或弃用 Series.values 或 DataFrame.values，但我们强烈推荐使用 .array 或 .to_numpy() 替代。

更多信息请参见 Dtypes 和属性和底层数据。

`pandas.array`: 一个用于创建数组的新顶级方法#

新增了一个顶级方法 array() 用于创建一维数组 (GH 22860)。这可以用于创建任何扩展数组，包括由第三方库注册的扩展数组。有关扩展数组的更多信息，请参阅 dtypes 文档。

In [26]: pd.array([1, 2, np.nan], dtype='Int64')
Out[26]: 
<IntegerArray>
[1, 2, <NA>]
Length: 3, dtype: Int64

In [27]: pd.array(['a', 'b', 'c'], dtype='category')
Out[27]: 
['a', 'b', 'c']
Categories (3, object): ['a', 'b', 'c']

传递没有专用扩展类型（例如浮点数、整数等）的数据将返回一个新的 arrays.PandasArray，这只是围绕 numpy.ndarray 的一个薄（无复制）包装，满足 pandas 扩展数组接口。

In [28]: pd.array([1, 2, 3])
Out[28]: 
<IntegerArray>
[1, 2, 3]
Length: 3, dtype: Int64

单独来看，一个 PandasArray 并不是一个非常有用的对象。但如果你需要编写适用于任何 ExtensionArray 的低级代码，PandasArray 可以满足这一需求。

请注意，默认情况下，如果没有指定 dtype ，返回数组的 dtype 是从数据推断出来的。特别是，注意 [1, 2, np.nan] 的第一个例子会返回一个浮点数组，因为 NaN 是一个浮点数。

In [29]: pd.array([1, 2, np.nan])
Out[29]: 
<IntegerArray>
[1, 2, <NA>]
Length: 3, dtype: Int64

在 Series 和 DataFrame 中存储间隔和周期数据#

Interval 和 Period 数据现在可以存储在 Series 或 DataFrame 中，除了之前的 IntervalIndex 和 PeriodIndex 之外 (GH 19453, GH 22862)。

In [30]: ser = pd.Series(pd.interval_range(0, 5))

In [31]: ser
Out[31]: 
0    (0, 1]
1    (1, 2]
2    (2, 3]
3    (3, 4]
4    (4, 5]
Length: 5, dtype: interval

In [32]: ser.dtype
Out[32]: interval[int64, right]

对于周期：

In [33]: pser = pd.Series(pd.period_range("2000", freq="D", periods=5))

In [34]: pser
Out[34]: 
0    2000-01-01
1    2000-01-02
2    2000-01-03
3    2000-01-04
4    2000-01-05
Length: 5, dtype: period[D]

In [35]: pser.dtype
Out[35]: period[D]

之前，这些会被转换为一个具有对象数据类型的 NumPy 数组。通常，这应该在 Series 或 DataFrame 的列中存储间隔或周期数组时，提高性能。

使用 Series.array 从 Series 中提取基础的区间或周期数组：

In [36]: ser.array
Out[36]: 
<IntervalArray>
[(0, 1], (1, 2], (2, 3], (3, 4], (4, 5]]
Length: 5, dtype: interval[int64, right]

In [37]: pser.array
Out[37]: 
<PeriodArray>
['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04', '2000-01-05']
Length: 5, dtype: period[D]

这些返回一个 arrays.IntervalArray 或 arrays.PeriodArray 的实例，这是支持区间和周期数据的新扩展数组。

警告

为了向后兼容，Series.values 继续返回一个对象的 NumPy 数组用于 Interval 和 Period 数据。我们建议在需要 Series 中存储的数据数组时使用 Series.array，在知道需要一个 NumPy 数组时使用 Series.to_numpy()。

更多信息请参见 Dtypes 和属性和底层数据。

与两个多索引连接#

DataFrame.merge() 和 DataFrame.join() 现在可以用于在重叠的索引级别上连接多索引的 Dataframe 实例 (GH 6360)

请参阅合并、连接和连接文档部分。

In [38]: index_left = pd.MultiIndex.from_tuples([('K0', 'X0'), ('K0', 'X1'),
   ....:                                        ('K1', 'X2')],
   ....:                                        names=['key', 'X'])
   ....: 

In [39]: left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
   ....:                      'B': ['B0', 'B1', 'B2']}, index=index_left)
   ....: 

In [40]: index_right = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'),
   ....:                                         ('K2', 'Y2'), ('K2', 'Y3')],
   ....:                                         names=['key', 'Y'])
   ....: 

In [41]: right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
   ....:                       'D': ['D0', 'D1', 'D2', 'D3']}, index=index_right)
   ....: 

In [42]: left.join(right)
Out[42]: 
            A   B   C   D
key X  Y                 
K0  X0 Y0  A0  B0  C0  D0
    X1 Y0  A1  B1  C0  D0
K1  X2 Y1  A2  B2  C1  D1

[3 rows x 4 columns]

对于早期版本，可以使用以下方法完成。

In [43]: pd.merge(left.reset_index(), right.reset_index(),
   ....:          on=['key'], how='inner').set_index(['key', 'X', 'Y'])
   ....: 
Out[43]: 
            A   B   C   D
key X  Y                 
K0  X0 Y0  A0  B0  C0  D0
    X1 Y0  A1  B1  C0  D0
K1  X2 Y1  A2  B2  C1  D1

[3 rows x 4 columns]

函数 `read_html` 增强功能#

read_html() 之前忽略了 colspan 和 rowspan 属性。现在它理解了这些属性，将它们视为具有相同值的单元格序列。(GH 17054)

In [44]: from io import StringIO

In [45]: result = pd.read_html(StringIO("""
   ....:   <table>
   ....:     <thead>
   ....:       <tr>
   ....:         <th>A</th><th>B</th><th>C</th>
   ....:       </tr>
   ....:     </thead>
   ....:     <tbody>
   ....:       <tr>
   ....:         <td colspan="2">1</td><td>2</td>
   ....:       </tr>
   ....:     </tbody>
   ....:   </table>"""))
   ....: 

以前的行为:

In [13]: result
Out [13]:
[   A  B   C
 0  1  2 NaN]

新行为:

In [46]: result
Out[46]: 
[   A  B  C
 0  1  1  2
 
 [1 rows x 3 columns]]

新的 `Styler.pipe()` 方法#

Styler 类增加了一个 pipe() 方法。这提供了一种方便的方式来应用用户预定义的样式函数，并且可以在笔记本中重复使用 DataFrame 样式功能时帮助减少“样板代码”。(GH 23229)

In [47]: df = pd.DataFrame({'N': [1250, 1500, 1750], 'X': [0.25, 0.35, 0.50]})

In [48]: def format_and_align(styler):
   ....:     return (styler.format({'N': '{:,}', 'X': '{:.1%}'})
   ....:                   .set_properties(**{'text-align': 'right'}))
   ....: 

In [49]: df.style.pipe(format_and_align).set_caption('Summary of results.')
Out[49]: <pandas.io.formats.style.Styler at 0xfffeaf27b280>

类似的方法已经存在于 pandas 的其他类中，包括 DataFrame.pipe()、GroupBy.pipe() 和 Resampler.pipe()。

在 MultiIndex 中重命名名称#

DataFrame.rename_axis() 现在支持 index 和 columns 参数，并且 Series.rename_axis() 支持 index 参数 (GH 19978)。

此更改允许传递一个字典，以便更改 MultiIndex 的某些名称。

示例：

In [50]: mi = pd.MultiIndex.from_product([list('AB'), list('CD'), list('EF')],
   ....:                                 names=['AB', 'CD', 'EF'])
   ....: 

In [51]: df = pd.DataFrame(list(range(len(mi))), index=mi, columns=['N'])

In [52]: df
Out[52]: 
          N
AB CD EF   
A  C  E   0
      F   1
   D  E   2
      F   3
B  C  E   4
      F   5
   D  E   6
      F   7

[8 rows x 1 columns]

In [53]: df.rename_axis(index={'CD': 'New'})
Out[53]: 
           N
AB New EF   
A  C   E   0
       F   1
   D   E   2
       F   3
B  C   E   4
       F   5
   D   E   6
       F   7

[8 rows x 1 columns]

有关更多详细信息，请参阅重命名的高级文档。

其他增强功能#

merge() 现在可以直接在 DataFrame 类型和命名的 Series 对象之间进行合并，无需事先将 Series 对象转换为 DataFrame (GH 21220)
ExcelWriter 现在接受 mode 作为关键字参数，在使用 openpyxl 引擎时可以追加到现有工作簿 (GH 3441)
FrozenList 已经获得了 .union() 和 .difference() 方法。这个功能极大地简化了依赖于显式排除某些列的分组操作。更多信息请参见将对象分组 (GH 15475, GH 15506)。
DataFrame.to_parquet() 现在接受 index 作为参数，允许用户覆盖引擎的默认行为，以包含或省略数据框的索引从生成的 Parquet 文件中。(GH 20768)
read_feather() 现在接受 columns 作为参数，允许用户指定应读取哪些列。(GH 24025)
DataFrame.corr() 和 Series.corr() 现在接受一个可调用对象用于通用的相关性计算方法，例如直方图交集 (GH 22684)
DataFrame.to_string() 现在接受 decimal 作为参数，允许用户指定在输出中应使用哪个小数分隔符。(GH 23614)
DataFrame.to_html() 现在接受 render_links 作为参数，允许用户生成包含指向 DataFrame 中出现的任何 URL 链接的 HTML。请参阅 IO 文档中关于编写 HTML 的部分以查看示例用法。(GH 2679)
pandas.read_csv() 现在支持 pandas 扩展类型作为 dtype 的参数，允许用户在读取 CSV 文件时使用 pandas 扩展类型。(GH 23228)
shift() 方法现在接受 fill_value 作为参数，允许用户指定一个值，该值将在空周期中使用，而不是 NA/NaT。(GH 15486)
to_datetime() 现在支持在传递给 format 时使用 %Z 和 %z 指令 (GH 13486)
Series.mode() 和 DataFrame.mode() 现在支持 dropna 参数，该参数可用于指定是否应考虑 NaN/NaT 值 (GH 17534)
DataFrame.to_csv() 和 Series.to_csv() 现在在传递文件句柄时支持 compression 关键字。(GH 21227)
Index.droplevel() 现在也为扁平索引实现，以兼容 MultiIndex (GH 21115)
Series.droplevel() 和 DataFrame.droplevel() 现在已实现 (GH 20342)
增加了通过 gcsfs 库从/向 Google Cloud Storage 读取/写入的支持 (GH 19454, GH 23094)
DataFrame.to_gbq() 和 read_gbq() 的签名和文档已更新，以反映 pandas-gbq 库版本 0.8.0 的更改。添加了一个 credentials 参数，该参数支持使用任何类型的 google-auth 凭证。(GH 21627, GH 22557, GH 23662)
新方法 HDFStore.walk() 将递归遍历 HDF5 文件的组层次结构 (GH 10932)
read_html() 在 colspan 和 rowspan 之间复制单元格数据，如果没有给出 header kwarg 并且没有 thead，则将所有 th 表行视为标题 (GH 17054)
Series.nlargest(), Series.nsmallest(), DataFrame.nlargest(), 和 DataFrame.nsmallest() 现在接受 keep 参数的值 "all"。这会保留第 n 大/小值的所有并列值 (GH 16818)
IntervalIndex 获得了 set_closed() 方法来改变现有的 closed 值 (GH 21670)
to_csv(), to_csv(), to_json(), 和 to_json() 现在支持 compression='infer' 以根据文件扩展名推断压缩 (GH 15008)。to_csv, to_json, 和 to_pickle 方法的默认压缩已更新为 'infer' (GH 22004)。
DataFrame.to_sql() 现在支持为支持的数据库写入 TIMESTAMP WITH TIME ZONE 类型。对于不支持时区的数据库，日期时间数据将存储为无时区的本地时间戳。请参阅日期时间数据类型了解影响 (GH 9086)。
to_timedelta() 现在支持 iso 格式的 timedelta 字符串 (GH 21877)
Series 和 DataFrame 现在在构造函数中支持 Iterable 对象 (GH 2193)
DatetimeIndex 已经获得了 DatetimeIndex.timetz 属性。这将返回包含时区信息的本地时间。(GH 21358)
round(), ceil(), 和 floor() 对于 DatetimeIndex 和 Timestamp 现在支持一个 ambiguous 参数来处理被四舍五入到不明确时间的日期时间 (GH 18946) 和一个 nonexistent 参数来处理被四舍五入到不存在时间的日期时间。请参见本地化时的不存在时间 (GH 22647)
resample() 的结果现在可以像 groupby() 一样迭代 (GH 15314)。
Series.resample() 和 DataFrame.resample() 已经获得了 Resampler.quantile() (GH 15023)。
DataFrame.resample() 和 Series.resample() 使用 PeriodIndex 现在将以与 DatetimeIndex 相同的方式尊重 base 参数。(GH 23882)
pandas.api.types.is_list_like() 增加了一个关键字 allow_sets，默认值为 True；如果为 False，所有 set 的实例将不再被认为是“类列表”的 (GH 23061)
Index.to_frame() 现在支持覆盖列名 (GH 22580)。
Categorical.from_codes() 现在可以接受一个 dtype 参数作为传递 categories 和 ordered 的替代 (GH 24398)。
新属性 __git_version__ 将返回当前构建的 git 提交 sha (GH 21295)。
与 Matplotlib 3.0 的兼容性 (GH 22790)。
添加了 Interval.overlaps()、arrays.IntervalArray.overlaps() 和 IntervalIndex.overlaps() 用于确定类似区间对象之间的重叠 (GH 21998)
read_fwf() 现在接受关键字 infer_nrows (GH 15138)。
to_parquet() 现在支持在 engine = 'pyarrow' 时将 DataFrame 写入由列的子集分区的 parquet 文件目录 (GH 23283)
Timestamp.tz_localize(), DatetimeIndex.tz_localize(), 和 Series.tz_localize() 已经获得了 nonexistent 参数，用于处理不存在时间的替代处理。请参见本地化时的不存在时间 (GH 8917, GH 24466)
Index.difference(), Index.intersection(), Index.union(), 和 Index.symmetric_difference() 现在有一个可选的 sort 参数来控制如果可能的话结果是否应该排序 (GH 17839, GH 24471)
read_excel() 现在接受 usecols 作为列名列表或可调用对象 (GH 18273)
MultiIndex.to_flat_index() 已添加，用于将多个层级扁平化为单层级的 Index 对象。
DataFrame.to_stata() 和 pandas.io.stata.StataWriter117 可以将混合字符串列写入 Stata strl 格式 (GH 23633)
DataFrame.between_time() 和 DataFrame.at_time() 增加了 axis 参数 (GH 8839)
DataFrame.to_records() 现在接受 index_dtypes 和 column_dtypes 参数，以允许在存储的列和索引记录中使用不同的数据类型 (GH 18146)
IntervalIndex 获得了 is_overlapping 属性，用于指示 IntervalIndex 是否包含任何重叠区间 (GH 23309)
pandas.DataFrame.to_sql() 增加了 method 参数来控制 SQL 插入子句。请参阅文档中的插入方法部分。(GH 8953)
DataFrame.corrwith() 现在支持斯皮尔曼等级相关、肯德尔系数以及可调用的相关方法。(GH 21925)
DataFrame.to_json(), DataFrame.to_csv(), DataFrame.to_pickle() 和其他导出方法现在支持路径参数中的波浪号(~)。(GH 23473)

向后不兼容的 API 变化#

pandas 0.24.0 包含了许多 API 破坏性变化。

增加了依赖项的最小版本#

我们已经更新了依赖项的最低支持版本（GH 21242, GH 18742, GH 23774, GH 24767）。如果已安装，我们现在需要：

包	最低版本	必需的
numpy	1.12.0	X
bottleneck	1.2.0
fastparquet	0.2.1
matplotlib	2.0.0
numexpr	2.6.1
pandas-gbq	0.8.0
pyarrow	0.9.0
pytables	3.4.2
scipy	0.18.1
xlrd	1.0.0
pytest (开发版)	3.6

此外，我们不再依赖 feather-format 进行基于 feather 的存储，并将其替换为对 pyarrow 的引用（GH 21639 和 GH 23053）。

`os.linesep` 用于 `DataFrame.to_csv` 的 `line_terminator`#

DataFrame.to_csv() 现在使用 os.linesep() 而不是 '\n' 作为默认的行终止符 (GH 20353)。这个更改仅在 Windows 上运行时生效，即使传入了 '\n'，行终止符也使用了 '\r\n'。

Windows 上的先前行为

In [1]: data = pd.DataFrame({"string_with_lf": ["a\nbc"],
   ...:                      "string_with_crlf": ["a\r\nbc"]})

In [2]: # When passing file PATH to to_csv,
   ...: # line_terminator does not work, and csv is saved with '\r\n'.
   ...: # Also, this converts all '\n's in the data to '\r\n'.
   ...: data.to_csv("test.csv", index=False, line_terminator='\n')

In [3]: with open("test.csv", mode='rb') as f:
   ...:     print(f.read())
Out[3]: b'string_with_lf,string_with_crlf\r\n"a\r\nbc","a\r\r\nbc"\r\n'

In [4]: # When passing file OBJECT with newline option to
   ...: # to_csv, line_terminator works.
   ...: with open("test2.csv", mode='w', newline='\n') as f:
   ...:     data.to_csv(f, index=False, line_terminator='\n')

In [5]: with open("test2.csv", mode='rb') as f:
   ...:     print(f.read())
Out[5]: b'string_with_lf,string_with_crlf\n"a\nbc","a\r\nbc"\n'

Windows 上的新行为:

显式传递 line_terminator，将 行终止符 设置为该字符。

In [1]: data = pd.DataFrame({"string_with_lf": ["a\nbc"],
   ...:                      "string_with_crlf": ["a\r\nbc"]})

In [2]: data.to_csv("test.csv", index=False, line_terminator='\n')

In [3]: with open("test.csv", mode='rb') as f:
   ...:     print(f.read())
Out[3]: b'string_with_lf,string_with_crlf\n"a\nbc","a\r\nbc"\n'

在Windows上，os.linesep 的值是 '\r\n'，所以如果 line_terminator 未设置，则使用 '\r\n' 作为行终止符。

In [1]: data = pd.DataFrame({"string_with_lf": ["a\nbc"],
   ...:                      "string_with_crlf": ["a\r\nbc"]})

In [2]: data.to_csv("test.csv", index=False)

In [3]: with open("test.csv", mode='rb') as f:
   ...:     print(f.read())
Out[3]: b'string_with_lf,string_with_crlf\r\n"a\nbc","a\r\nbc"\r\n'

对于文件对象，仅指定 newline 不足以设置行终止符。你必须在这种情况下显式传递 line_terminator。

In [1]: data = pd.DataFrame({"string_with_lf": ["a\nbc"],
   ...:                      "string_with_crlf": ["a\r\nbc"]})

In [2]: with open("test2.csv", mode='w', newline='\n') as f:
   ...:     data.to_csv(f, index=False)

In [3]: with open("test2.csv", mode='rb') as f:
   ...:     print(f.read())
Out[3]: b'string_with_lf,string_with_crlf\r\n"a\nbc","a\r\nbc"\r\n'

在使用Python引擎的字符串数据类型列中正确处理 `np.nan`#

在 read_excel() 和 read_csv() 的 Python 引擎中存在一个错误，当 dtype=str 和 na_filter=True 时，缺失值会变成 'nan'。现在，这些缺失值被转换为字符串缺失指示符 np.nan。(GH 20377)

以前的行为:

In [5]: data = 'a,b,c\n1,,3\n4,5,6'
In [6]: df = pd.read_csv(StringIO(data), engine='python', dtype=str, na_filter=True)
In [7]: df.loc[0, 'b']
Out[7]:
'nan'

新行为:

In [54]: data = 'a,b,c\n1,,3\n4,5,6'

In [55]: df = pd.read_csv(StringIO(data), engine='python', dtype=str, na_filter=True)

In [56]: df.loc[0, 'b']
Out[56]: nan

注意我们现在如何输出 np.nan 本身，而不是它的字符串形式。

解析带有时区偏移的日期时间字符串#

之前，使用 to_datetime() 或 DatetimeIndex 解析带有 UTC 偏移的日期时间字符串会自动将日期时间转换为 UTC 而没有时区定位。这与使用 Timestamp 解析相同的日期时间字符串不一致，后者会在 tz 属性中保留 UTC 偏移。现在，当所有日期时间字符串具有相同的 UTC 偏移时，to_datetime() 会在 tz 属性中保留 UTC 偏移 (GH 17697, GH 11736, GH 22457)

以前的行为:

In [2]: pd.to_datetime("2015-11-18 15:30:00+05:30")
Out[2]: Timestamp('2015-11-18 10:00:00')

In [3]: pd.Timestamp("2015-11-18 15:30:00+05:30")
Out[3]: Timestamp('2015-11-18 15:30:00+0530', tz='pytz.FixedOffset(330)')

# Different UTC offsets would automatically convert the datetimes to UTC (without a UTC timezone)
In [4]: pd.to_datetime(["2015-11-18 15:30:00+05:30", "2015-11-18 16:30:00+06:30"])
Out[4]: DatetimeIndex(['2015-11-18 10:00:00', '2015-11-18 10:00:00'], dtype='datetime64[ns]', freq=None)

新行为:

In [57]: pd.to_datetime("2015-11-18 15:30:00+05:30")
Out[57]: Timestamp('2015-11-18 15:30:00+0530', tz='UTC+05:30')

In [58]: pd.Timestamp("2015-11-18 15:30:00+05:30")
Out[58]: Timestamp('2015-11-18 15:30:00+0530', tz='UTC+05:30')

使用相同UTC偏移解析日期时间字符串将保留``tz``中的UTC偏移

In [59]: pd.to_datetime(["2015-11-18 15:30:00+05:30"] * 2)
Out[59]: DatetimeIndex(['2015-11-18 15:30:00+05:30', '2015-11-18 15:30:00+05:30'], dtype='datetime64[s, UTC+05:30]', freq=None)

使用不同UTC偏移解析日期时间字符串现在将创建一个包含不同UTC偏移的``datetime.datetime``对象的索引

In [59]: idx = pd.to_datetime(["2015-11-18 15:30:00+05:30",
                               "2015-11-18 16:30:00+06:30"])

In[60]: idx
Out[60]: Index([2015-11-18 15:30:00+05:30, 2015-11-18 16:30:00+06:30], dtype='object')

In[61]: idx[0]
Out[61]: Timestamp('2015-11-18 15:30:00+0530', tz='UTC+05:30')

In[62]: idx[1]
Out[62]: Timestamp('2015-11-18 16:30:00+0630', tz='UTC+06:30')

传递 utc=True 将模仿之前的行为，但会正确地指示日期已转换为 UTC

In [60]: pd.to_datetime(["2015-11-18 15:30:00+05:30",
   ....:                 "2015-11-18 16:30:00+06:30"], utc=True)
   ....: 
Out[60]: DatetimeIndex(['2015-11-18 10:00:00+00:00', '2015-11-18 10:00:00+00:00'], dtype='datetime64[s, UTC]', freq=None)

使用 `read_csv()` 解析混合时区#

read_csv() 不再将混合时区列静默转换为 UTC (GH 24987)。

以前的行为

>>> import io
>>> content = """\
... a
... 2000-01-01T00:00:00+05:00
... 2000-01-01T00:00:00+06:00"""
>>> df = pd.read_csv(io.StringIO(content), parse_dates=['a'])
>>> df.a
0   1999-12-31 19:00:00
1   1999-12-31 18:00:00
Name: a, dtype: datetime64[ns]

新行为

In[64]: import io

In[65]: content = """\
   ...: a
   ...: 2000-01-01T00:00:00+05:00
   ...: 2000-01-01T00:00:00+06:00"""

In[66]: df = pd.read_csv(io.StringIO(content), parse_dates=['a'])

In[67]: df.a
Out[67]:
0   2000-01-01 00:00:00+05:00
1   2000-01-01 00:00:00+06:00
Name: a, Length: 2, dtype: object

可以看出，dtype 是对象；列中的每个值都是一个字符串。要将字符串转换为日期时间数组，可以使用 date_parser 参数

In [3]: df = pd.read_csv(
   ...:     io.StringIO(content),
   ...:     parse_dates=['a'],
   ...:     date_parser=lambda col: pd.to_datetime(col, utc=True),
   ...: )

In [4]: df.a
Out[4]:
0   1999-12-31 19:00:00+00:00
1   1999-12-31 18:00:00+00:00
Name: a, dtype: datetime64[ns, UTC]

更多信息请参见解析带有时区偏移的日期时间字符串。

`dt.end_time` 和 `to_timestamp(how='end')` 中的时间值#

在调用 Series.dt.end_time、Period.end_time、PeriodIndex.end_time、Period.to_timestamp() 并设置 how='end' 或 PeriodIndex.to_timestamp() 并设置 how='end' 时，Period 和 PeriodIndex 对象中的时间值现在设置为 ‘23:59:59.999999999’ (GH 17157)

以前的行为:

In [2]: p = pd.Period('2017-01-01', 'D')
In [3]: pi = pd.PeriodIndex([p])

In [4]: pd.Series(pi).dt.end_time[0]
Out[4]: Timestamp(2017-01-01 00:00:00)

In [5]: p.end_time
Out[5]: Timestamp(2017-01-01 23:59:59.999999999)

新行为:

调用 Series.dt.end_time 现在将产生一个时间为 ‘23:59:59.999999999’，例如 Period.end_time 的情况。

In [61]: p = pd.Period('2017-01-01', 'D')

In [62]: pi = pd.PeriodIndex([p])

In [63]: pd.Series(pi).dt.end_time[0]
Out[63]: Timestamp('2017-01-01 23:59:59.999999999')

In [64]: p.end_time
Out[64]: Timestamp('2017-01-01 23:59:59.999999999')

Series.unique 用于时区感知数据#

Series.unique() 方法对于带时区的时间日期值的返回类型已从 Timestamp 对象的 numpy.ndarray 更改为 arrays.DatetimeArray (GH 24024)。

In [65]: ser = pd.Series([pd.Timestamp('2000', tz='UTC'),
   ....:                  pd.Timestamp('2000', tz='UTC')])
   ....: 

以前的行为:

In [3]: ser.unique()
Out[3]: array([Timestamp('2000-01-01 00:00:00+0000', tz='UTC')], dtype=object)

新行为:

In [66]: ser.unique()
Out[66]: 
<DatetimeArray>
['2000-01-01 00:00:00+00:00']
Length: 1, dtype: datetime64[s, UTC]

稀疏数据结构重构#

SparseArray ，支持 SparseSeries 和 SparseDataFrame 中的列的数组，现在是一个扩展数组（GH 21978, GH 19056, GH 22835）。为了符合这个接口并与 pandas 的其他部分保持一致，进行了一些 API 破坏性更改：

SparseArray 不再是 numpy.ndarray 的子类。要将 SparseArray 转换为 NumPy 数组，请使用 numpy.asarray()。
SparseArray.dtype 和 SparseSeries.dtype 现在是 SparseDtype 的实例，而不是 np.dtype。使用 SparseDtype.subtype 访问底层的数据类型。
numpy.asarray(sparse_array) 现在返回一个包含所有值的密集数组，而不仅仅是非填充值的值 (GH 14167)
SparseArray.take 现在匹配 pandas.api.extensions.ExtensionArray.take() 的 API (GH 19506):
- allow_fill 的默认值已从 False 更改为 True。
- out 和 mode 参数不再被接受（之前，如果指定了这些参数，会引发错误）。
- 不再允许为 indices 传递标量。
使用 concat() 函数混合稀疏和密集 Series 的结果是一个包含稀疏值的 Series，而不是一个 SparseSeries。
SparseDataFrame.combine 和 DataFrame.combine_first 不再支持在保留稀疏子类型的情况下将稀疏列与密集列组合。结果将是一个对象类型的 SparseArray。
现在允许将 SparseArray.fill_value 设置为具有不同数据类型的填充值。
DataFrame[column] 现在是一个带有稀疏值的 Series ，而不是一个 SparseSeries ，当使用稀疏值对单个列进行切片时 (GH 23559)。
Series.where() 的结果现在是一个带有稀疏值的 Series ，与其他扩展数组类似 (GH 24077)

对于需要或可能导致生成大型密集数组的操作，会发出一些新的警告：

当使用带有 method 的 fillna 时，会发出一个 errors.PerformanceWarning ，因为会构造一个密集数组来创建填充的数组。使用 value 填充是填充稀疏数组的高效方法。
当连接具有不同填充值的稀疏系列时，现在会发出 errors.PerformanceWarning 。继续使用第一个稀疏数组的填充值。

除了这些API破坏性变化之外，还进行了许多性能改进和错误修复。

最后，添加了一个 Series.sparse 访问器，以提供特定的稀疏方法，如 Series.sparse.from_coo()。

In [67]: s = pd.Series([0, 0, 1, 1, 1], dtype='Sparse[int]')

In [68]: s.sparse.density
Out[68]: 0.6

`get_dummies()` 总是返回一个 DataFrame#

之前，当 sparse=True 传递给 get_dummies() 时，返回值可能是 DataFrame 或 SparseDataFrame，这取决于所有列还是仅部分列被虚拟编码。现在，总是返回一个 DataFrame (GH 24284)。

以前的行为

第一个 get_dummies() 返回一个 DataFrame ，因为列 A 没有进行虚拟编码。当仅传递 ["B", "C"] 到 get_dummies 时，所有列都被虚拟编码，并返回一个 SparseDataFrame 。

In [2]: df = pd.DataFrame({"A": [1, 2], "B": ['a', 'b'], "C": ['a', 'a']})

In [3]: type(pd.get_dummies(df, sparse=True))
Out[3]: pandas.DataFrame

In [4]: type(pd.get_dummies(df[['B', 'C']], sparse=True))
Out[4]: pandas.core.sparse.frame.SparseDataFrame

新行为

现在，返回类型一致为 DataFrame。

In [69]: type(pd.get_dummies(df, sparse=True))
Out[69]: pandas.DataFrame

In [70]: type(pd.get_dummies(df[['B', 'C']], sparse=True))
Out[70]: pandas.DataFrame

备注

在 SparseDataFrame 和带有稀疏值的 DataFrame 之间，内存使用没有差异。内存使用将与之前版本的 pandas 相同。

在 `DataFrame.to_dict(orient='index')` 中引发 ValueError#

DataFrame.to_dict() 中的错误在使用 orient='index' 和一个非唯一索引时会引发 ValueError 而不是丢失数据 (GH 22801)

In [71]: df = pd.DataFrame({'a': [1, 2], 'b': [0.5, 0.75]}, index=['A', 'A'])

In [72]: df
Out[72]: 
   a     b
A  1  0.50
A  2  0.75

[2 rows x 2 columns]

In [73]: df.to_dict(orient='index')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[73], line 1
----> 1 df.to_dict(orient='index')

File /home/pandas/pandas/core/frame.py:2097, in DataFrame.to_dict(self, orient, into, index)
   1992 """
   1993 Convert the DataFrame to a dictionary.
   1994 
   (...)
   2093  defaultdict(<class 'list'>, {'col1': 2, 'col2': 0.75})]
   2094 """
   2095 from pandas.core.methods.to_dict import to_dict
-> 2097 return to_dict(self, orient, into=into, index=index)

File /home/pandas/pandas/core/methods/to_dict.py:259, in to_dict(df, orient, into, index)
    257 elif orient == "index":
    258     if not df.index.is_unique:
--> 259         raise ValueError("DataFrame index must be unique for orient='index'.")
    260     columns = df.columns.tolist()
    261     if are_all_object_dtype_cols:

ValueError: DataFrame index must be unique for orient='index'.

Tick DateOffset 规范化限制#

创建一个 Tick 对象（Day、Hour、Minute、Second、Milli、Micro、Nano）并设置 normalize=True 不再支持。这防止了加法可能无法保持单调性或结合性的意外行为。（GH 21427）

以前的行为:

In [2]: ts = pd.Timestamp('2018-06-11 18:01:14')

In [3]: ts
Out[3]: Timestamp('2018-06-11 18:01:14')

In [4]: tic = pd.offsets.Hour(n=2, normalize=True)
   ...:

In [5]: tic
Out[5]: <2 * Hours>

In [6]: ts + tic
Out[6]: Timestamp('2018-06-11 00:00:00')

In [7]: ts + tic + tic + tic == ts + (tic + tic + tic)
Out[7]: False

新行为:

In [74]: ts = pd.Timestamp('2018-06-11 18:01:14')

In [75]: tic = pd.offsets.Hour(n=2)

In [76]: ts + tic + tic + tic == ts + (tic + tic + tic)
Out[76]: True

周期减法#

从一个 Period 中减去另一个 Period 将得到一个 DateOffset ，而不是一个整数 (GH 21314)

以前的行为:

In [2]: june = pd.Period('June 2018')

In [3]: april = pd.Period('April 2018')

In [4]: june - april
Out [4]: 2

新行为:

In [77]: june = pd.Period('June 2018')

In [78]: april = pd.Period('April 2018')

In [79]: june - april
Out[79]: <2 * MonthEnds>

同样地，从一个 PeriodIndex 中减去一个 Period 现在将返回一个 DateOffset 对象的 Index 而不是一个 Int64Index

以前的行为:

In [2]: pi = pd.period_range('June 2018', freq='M', periods=3)

In [3]: pi - pi[0]
Out[3]: Int64Index([0, 1, 2], dtype='int64')

新行为:

In [80]: pi = pd.period_range('June 2018', freq='M', periods=3)

In [81]: pi - pi[0]
Out[81]: Index([<0 * MonthEnds>, <MonthEnd>, <2 * MonthEnds>], dtype='object')

从 `DataFrame` 中添加/减去 `NaN`#

对具有 timedelta64[ns] 数据类型的 DataFrame 列进行 NaN 的加减操作现在会引发 TypeError 而不是返回全为 NaT 。这是为了与 TimedeltaIndex 和 Series 的行为兼容 (GH 22163)

In [82]: df = pd.DataFrame([pd.Timedelta(days=1)])

In [83]: df
Out[83]: 
       0
0 1 days

[1 rows x 1 columns]

以前的行为:

In [4]: df = pd.DataFrame([pd.Timedelta(days=1)])

In [5]: df - np.nan
Out[5]:
    0
0 NaT

新行为:

In [2]: df - np.nan
...
TypeError: unsupported operand type(s) for -: 'TimedeltaIndex' and 'float'

DataFrame 比较操作广播更改#

之前，DataFrame 比较操作（==, !=, …）的广播行为与算术操作（+, -, …）的行为不一致。在这些情况下，比较操作的行为已更改为与算术操作相匹配。(GH 22880)

受影响的案例是：

对一个二维的 np.ndarray 进行操作，无论是1行还是1列，现在都会以与 np.ndarray 相同的方式进行广播 (GH 23000)。
一个长度与 DataFrame 中的行数匹配的列表或元组现在将引发 ValueError 而不是逐列操作 (GH 22880。
一个长度与 DataFrame 中的列数匹配的列表或元组现在将逐行操作，而不是引发 ValueError (GH 22880)。

In [84]: arr = np.arange(6).reshape(3, 2)

In [85]: df = pd.DataFrame(arr)

In [86]: df
Out[86]: 
   0  1
0  0  1
1  2  3
2  4  5

[3 rows x 2 columns]

以前的行为:

In [5]: df == arr[[0], :]
    ...: # comparison previously broadcast where arithmetic would raise
Out[5]:
       0      1
0   True   True
1  False  False
2  False  False
In [6]: df + arr[[0], :]
...
ValueError: Unable to coerce to DataFrame, shape must be (3, 2): given (1, 2)

In [7]: df == (1, 2)
    ...: # length matches number of columns;
    ...: # comparison previously raised where arithmetic would broadcast
...
ValueError: Invalid broadcasting comparison [(1, 2)] with block values
In [8]: df + (1, 2)
Out[8]:
   0  1
0  1  3
1  3  5
2  5  7

In [9]: df == (1, 2, 3)
    ...:  # length matches number of rows
    ...:  # comparison previously broadcast where arithmetic would raise
Out[9]:
       0      1
0  False   True
1   True  False
2  False  False
In [10]: df + (1, 2, 3)
...
ValueError: Unable to coerce to Series, length must be 2: given 3

新行为:

# Comparison operations and arithmetic operations both broadcast.
In [87]: df == arr[[0], :]
Out[87]: 
       0      1
0   True   True
1  False  False
2  False  False

[3 rows x 2 columns]

In [88]: df + arr[[0], :]
Out[88]: 
   0  1
0  0  2
1  2  4
2  4  6

[3 rows x 2 columns]

# Comparison operations and arithmetic operations both broadcast.
In [89]: df == (1, 2)
Out[89]: 
       0      1
0  False  False
1  False  False
2  False  False

[3 rows x 2 columns]

In [90]: df + (1, 2)
Out[90]: 
   0  1
0  1  3
1  3  5
2  5  7

[3 rows x 2 columns]

# Comparison operations and arithmetic operations both raise ValueError.
In [6]: df == (1, 2, 3)
...
ValueError: Unable to coerce to Series, length must be 2: given 3

In [7]: df + (1, 2, 3)
...
ValueError: Unable to coerce to Series, length must be 2: given 3

DataFrame 算术运算广播更改#

DataFrame 在操作二维 np.ndarray 对象时的算术运算现在以与 np.ndarray 广播相同的方式进行广播。（GH 23000）

In [91]: arr = np.arange(6).reshape(3, 2)

In [92]: df = pd.DataFrame(arr)

In [93]: df
Out[93]: 
   0  1
0  0  1
1  2  3
2  4  5

[3 rows x 2 columns]

以前的行为:

In [5]: df + arr[[0], :]   # 1 row, 2 columns
...
ValueError: Unable to coerce to DataFrame, shape must be (3, 2): given (1, 2)
In [6]: df + arr[:, [1]]   # 1 column, 3 rows
...
ValueError: Unable to coerce to DataFrame, shape must be (3, 2): given (3, 1)

新行为:

In [94]: df + arr[[0], :]   # 1 row, 2 columns
Out[94]: 
   0  1
0  0  2
1  2  4
2  4  6

[3 rows x 2 columns]

In [95]: df + arr[:, [1]]   # 1 column, 3 rows
Out[95]: 
   0   1
0  1   2
1  5   6
2  9  10

[3 rows x 2 columns]

系列和索引数据类型的不兼容性#

Series 和 Index 构造函数现在会在数据与传递的 dtype= 不兼容时引发 (GH 15832)

以前的行为:

In [4]: pd.Series([-1], dtype="uint64")
Out [4]:
0    18446744073709551615
dtype: uint64

新行为:

In [4]: pd.Series([-1], dtype="uint64")
Out [4]:
...
OverflowError: Trying to coerce negative values to unsigned integers

连接变化#

在包含 NA 值的 Categorical 类型的 int 上调用 pandas.concat() 时，当与除另一个 Categorical 类型的 int 之外的任何内容连接时，它们现在会被当作对象处理 (GH 19214)

In [96]: s = pd.Series([0, 1, np.nan])

In [97]: c = pd.Series([0, 1, np.nan], dtype="category")

以前的行为

In [3]: pd.concat([s, c])
Out[3]:
0    0.0
1    1.0
2    NaN
0    0.0
1    1.0
2    NaN
dtype: float64

新行为

In [98]: pd.concat([s, c])
Out[98]: 
0    0.0
1    1.0
2    NaN
0    0.0
1    1.0
2    NaN
Length: 6, dtype: float64

Datetimelike API 变更#

对于具有非``None`` ``freq``属性的 DatetimeIndex 和 TimedeltaIndex，整数类型的数组或``Index``的加法或减法将返回相同类的对象 (GH 19959)
DateOffset 对象现在是不可变的。尝试更改这些对象中的一个将引发 AttributeError (GH 21341)
PeriodIndex 减去另一个 PeriodIndex 现在将返回一个对象类型的 Index，其中包含 DateOffset 对象，而不是引发 TypeError (GH 20049)
cut() 和 qcut() 现在在输入分别为日期时间或时间增量数据类型且 retbins=True 时返回 DatetimeIndex 或 TimedeltaIndex 的箱子 (GH 19891)
DatetimeIndex.to_period() 和 Timestamp.to_period() 在时区信息将丢失时会发出警告 (GH 21333)
PeriodIndex.tz_convert() 和 PeriodIndex.tz_localize() 已被移除 (GH 21781)

其他 API 更改#

一个新构造的空 DataFrame ，如果 dtype 是整数，现在只有在指定 index 时才会被转换为 float64 (GH 22858)
Series.str.cat() 现在如果 others 是一个 set 将会引发 (GH 23009)
传递标量值给 DatetimeIndex 或 TimedeltaIndex 现在会引发 TypeError 而不是 ValueError (GH 23539)
max_rows 和 max_cols 参数已从 HTMLFormatter 中移除，因为截断由 DataFrameFormatter 处理 (GH 23818)
read_csv() 现在如果将缺少值的列声明为 bool 类型，则会引发 ValueError (GH 20591)
从 MultiIndex.to_frame() 得到的 DataFrame 的列顺序现在保证与 MultiIndex.names 的顺序匹配。(GH 22420)
错误地将 DatetimeIndex 传递给 MultiIndex.from_tuples()，而不是元组序列，现在会引发 TypeError 而不是 ValueError (GH 24024)
pd.offsets.generate_range() 参数 time_rule 已被移除；请使用 offset 代替 (GH 24157)
在 0.23.x 版本中，pandas 在合并一个数值列（例如 int 类型的列）和一个 object 类型的列时会引发 ValueError (GH 9780)。我们已经重新启用了合并 object 类型和其他类型的能力；pandas 在合并一个仅由字符串组成的数值类型列和一个 object 类型列时仍然会引发错误 (GH 21681)
访问 MultiIndex 的一个具有重复名称的层级（例如在 get_level_values() 中）现在会引发 ValueError 而不是 KeyError (GH 21678)。
如果 IntervalDtype 的构造无效，现在将始终引发 TypeError 而不是 ValueError，如果子类型无效 (GH 21185)
尝试使用非唯一 MultiIndex 重新索引 DataFrame 现在会引发 ValueError 而不是 Exception (GH 21770)
索引 减法将尝试按元素进行操作，而不是引发 TypeError (GH 19369)
pandas.io.formats.style.Styler 在使用 to_excel() 时支持 number-format 属性 (GH 22015)
DataFrame.corr() 和 Series.corr() 现在在提供无效方法时会引发 ValueError 并附带一个有用的错误信息，而不是 KeyError (GH 22298)
shift() 现在将始终返回一个副本，而不是之前在移动量为0时返回自身的行为（GH 22397）
DataFrame.set_index() 现在给出了一个更好（且不那么频繁）的 KeyError，对于不正确的类型会引发 ValueError，并且在 drop=True 的情况下不会因重复的列名而失败。(GH 22484)
使用多个相同类型的 ExtensionArrays 对 DataFrame 的单行进行切片现在会保留 dtype，而不是强制转换为对象 (GH 22784)
DateOffset 属性 _cacheable 和方法 _should_cache 已被移除 (GH 23118)
Series.searchsorted()，当提供一个标量值进行搜索时，现在返回一个标量而不是一个数组 (GH 23801)。
Categorical.searchsorted()，当提供一个标量值进行搜索时，现在返回一个标量而不是一个数组 (GH 23466)。
Categorical.searchsorted() 现在如果搜索的键在其类别中找不到，会引发 KeyError 而不是 ValueError (GH 23466)。
Index.hasnans() 和 Series.hasnans() 现在总是返回一个 Python 布尔值。以前，根据情况可能会返回 Python 或 numpy 布尔值（GH 23294）。
DataFrame.to_html() 和 DataFrame.to_string() 的参数顺序已重新排列，以保持彼此一致。(GH 23614)
CategoricalIndex.reindex() 现在如果目标索引是非唯一的且不等于当前索引，则会引发 ValueError。之前只有在目标索引不是分类数据类型时才会引发 (GH 23963)。
Series.to_list() 和 Index.to_list() 现在是 Series.tolist 和 Index.tolist 的别名 (GH 8826)
SparseSeries.unstack 的结果现在是一个包含稀疏值的 DataFrame ，而不是一个 SparseDataFrame (GH 24372)。
DatetimeIndex 和 TimedeltaIndex 不再忽略 dtype 精度。传递一个非纳秒分辨率的 dtype 将引发 ValueError (GH 24753)

扩展类型更改#

平等和可哈希性

pandas 现在要求扩展数据类型是可哈希的（即相应的 ExtensionDtype 对象；哈希性不是相应 ExtensionArray 的值的要求）。基类实现了默认的 __eq__ 和 __hash__。如果你有一个参数化的数据类型，你应该更新 ExtensionDtype._metadata 元组以匹配你的 __init__ 方法的签名。更多信息请参见 pandas.api.extensions.ExtensionDtype (GH 22476)。

新方法和更改方法

dropna() 已添加 (GH 21185)
repeat() 已添加 (GH 24349)
ExtensionArray 构造函数 _from_sequence 现在接受关键字参数 copy=False (GH 21185)
pandas.api.extensions.ExtensionArray.shift() 作为基本 ExtensionArray 接口的一部分添加（GH 22387）。
searchsorted() 已添加 (GH 24350)
支持通过选择性基类方法重写（如 sum、mean）来减少操作 (GH 22762)
ExtensionArray.isna() 允许返回一个 ExtensionArray (GH 22325)。

数据类型变化

ExtensionDtype 已经获得了从字符串dtypes实例化的能力，例如 decimal 将实例化一个已注册的 DecimalDtype；此外，ExtensionDtype 获得了 construct_array_type 方法 (GH 21185)
添加了 ExtensionDtype._is_numeric 用于控制扩展数据类型是否被视为数值 (GH 22290)。
添加了 pandas.api.types.register_extension_dtype() 以在 pandas 中注册扩展类型 (GH 22664)
更新了 PeriodDtype、DatetimeTZDtype 和 IntervalDtype 的 .type 属性，使其成为 dtype 的实例（分别是 Period、Timestamp 和 Interval）(GH 22938)

操作符支持

基于 ExtensionArray 的 Series 现在支持算术和比较运算符 (GH 19577)。为 ExtensionArray 提供运算符支持有两种方法：

定义 ExtensionArray 子类上的每个操作符。
使用来自 pandas 的运算符实现，该实现依赖于已经在 ExtensionArray 的基础元素（标量）上定义的运算符。

有关添加操作支持的两种方法的详细信息，请参阅 ExtensionArray 操作支持文档部分。

其他更改

现在为 pandas.api.extensions.ExtensionArray 提供了一个默认的 repr (GH 23601)。
ExtensionArray._formatting_values() 已弃用。请改用 ExtensionArray._formatter。 (GH 23601)
一个具有布尔数据类型的 ExtensionArray 现在可以正确地作为布尔索引器工作。pandas.api.types.is_bool_dtype() 现在正确地将它们视为布尔类型 (GH 22326)

错误修复

使用 ExtensionArray 和整数索引的 Series 中的 Series.get() 错误 (GH 21257)
shift() 现在分派到 ExtensionArray.shift() (GH 22386)
Series.combine() 在 Series 内部与 ExtensionArray 一起正确工作 (GH 20825)
Series.combine() 现在可以使用标量参数处理任何函数类型 (GH 21248)
Series.astype() 和 DataFrame.astype() 现在分派到 ExtensionArray.astype() (GH 21185)。
使用多个相同类型的 ExtensionArrays 对 DataFrame 的单行进行切片现在会保留 dtype，而不是强制转换为对象 (GH 22784)
当连接多个具有不同扩展dtypes的``Series``时，不会转换为object dtype的错误 (GH 22994)
由 ExtensionArray 支持的 Series 现在可以与 util.hash_pandas_object() 一起使用 (GH 23066)
DataFrame.stack() 不再将每个列具有相同扩展数据类型的 DataFrame 转换为对象数据类型。输出 Series 将具有与列相同的数据类型 (GH 23077)。
Series.unstack() 和 DataFrame.unstack() 不再将扩展数组转换为 object-dtype ndarrays。输出 DataFrame 中的每一列现在将具有与输入相同的 dtype (GH 23077)。
当对 Dataframe.groupby() 进行分组并对 ExtensionArray 进行聚合时，它没有返回实际的 ExtensionArray dtype (GH 23227)。
在基于扩展数组列进行合并时，pandas.merge() 中的错误 (GH 23020)。

弃用#

MultiIndex.labels 已被弃用并替换为 MultiIndex.codes。功能保持不变。新名称更好地反映了这些代码的性质，并使 MultiIndex API 与 CategoricalIndex 的 API 更加相似（GH 13443）。因此，MultiIndex 中其他使用 labels 名称的地方也已被弃用并替换为 codes：
- 你应该使用名为 codes 的参数来初始化 MultiIndex 实例，而不是 labels。
- MultiIndex.set_labels 已被弃用，取而代之的是 MultiIndex.set_codes()。
- 对于方法 MultiIndex.copy()，参数 labels 已被弃用，并被参数 codes 取代。
DataFrame.to_stata(), read_stata(), StataReader 和 StataWriter 已弃用 encoding 参数。Stata dta 文件的编码由文件类型决定，无法更改 (GH 21244)
MultiIndex.to_hierarchical() 已被弃用，并将在未来版本中移除 (GH 21613)
Series.ptp() 已被弃用。请改用 numpy.ptp (GH 21614)
Series.compress() 已被弃用。请改用 Series[condition] (GH 18262)
Series.to_csv() 的签名已统一为 DataFrame.to_csv() 的签名：第一个参数的名称现在是 path_or_buf，后续参数的顺序已更改，header 参数现在默认为 True。(GH 19715)
Categorical.from_codes() 已弃用为 codes 参数提供浮点值。(GH 21767)
pandas.read_table() 已被弃用。相反，如果需要，请使用 read_csv() 并传递 sep='\t'。此弃用已在 0.25.0 版本中移除。(GH 21948)
Series.str.cat() 已弃用使用任意列表式在列表式中。一个列表式容器仍然可以包含许多 Series、Index 或一维 np.ndarray，或者替代地，只包含标量值。(GH 21950)
FrozenNDArray.searchsorted() 已弃用 v 参数，改为使用 value (GH 14645)
DatetimeIndex.shift() 和 PeriodIndex.shift() 现在接受 periods 参数而不是 n 以与 Index.shift() 和 Series.shift() 保持一致。使用 n 会抛出一个弃用警告 (GH 22458, GH 22912)
不同索引构造函数的 fastpath 关键字已被弃用 (GH 23110)。
Timestamp.tz_localize(), DatetimeIndex.tz_localize(), 和 Series.tz_localize() 已经弃用了 errors 参数，转而使用 nonexistent 参数 (GH 8917)
类 FrozenNDArray 已被弃用。当解封时，FrozenNDArray 将被解封为 np.ndarray 一旦这个类被移除 (GH 9031)
方法 DataFrame.update() 和 Panel.update() 已经弃用了 raise_conflict=False|True 关键字，取而代之的是 errors='ignore'|'raise' (GH 23585)
方法 Series.str.partition() 和 Series.str.rpartition() 已经弃用了 pat 关键字，改为使用 sep (GH 22676)
弃用了 pandas.read_feather() 的 nthreads 关键字，改为使用 use_threads 以反映 pyarrow>=0.11.0 中的更改。(GH 23053)
pandas.read_excel() 已弃用接受 usecols 作为整数。请改为传递从 0 到 usecols （含）的整数列表 (GH 23527)
从 datetime64 类型的数据构建 TimedeltaIndex 已被弃用，将在未来版本中引发 TypeError (GH 23539)
从包含 timedelta64 类型数据的数据中构建 DatetimeIndex 已被弃用，将在未来版本中引发 TypeError (GH 23675)
keep_tz=False 选项（默认）的 keep_tz 关键字 DatetimeIndex.to_series() 已被弃用 (GH 17832)。
使用 tz 参数将 tz-aware datetime.datetime 或 Timestamp 转换时区现在已被弃用。请改用 Timestamp.tz_convert() (GH 23579)
pandas.api.types.is_period() 已被弃用，取而代之的是 pandas.api.types.is_period_dtype (GH 23917)
pandas.api.types.is_datetimetz() 已被弃用，取而代之的是 pandas.api.types.is_datetime64tz (GH 23917)
通过传递范围参数 start, end, 和 periods 来创建 TimedeltaIndex, DatetimeIndex, 或 PeriodIndex 已被弃用，建议使用 timedelta_range(), date_range(), 或 period_range() (GH 23919)
将字符串别名如 'datetime64[ns, UTC]' 作为 unit 参数传递给 DatetimeTZDtype 已被弃用。请改用 DatetimeTZDtype.construct_from_string (GH 23990)。
skipna 参数 infer_dtype() 将在未来版本的 pandas 中默认切换为 True (GH 17066, GH 24050)
在带有分类数据的 Series.where() 中，提供一个在类别中不存在的 other 已被弃用。将分类转换为不同的数据类型或将 other 添加到类别中 (GH 24077)。
Series.clip_lower(), Series.clip_upper(), DataFrame.clip_lower() 和 DataFrame.clip_upper() 已被弃用，并将在未来版本中移除。请使用 Series.clip(lower=threshold), Series.clip(upper=threshold) 以及相应的 DataFrame 方法 (GH 24203)
Series.nonzero() 已被弃用，并将在未来版本中移除 (GH 18262)
将整数传递给带有 timedelta64[ns] 数据类型的 Series.fillna() 和 DataFrame.fillna() 已被弃用，将在未来版本中引发 TypeError。请改用 obj.fillna(pd.Timedelta(...)) (GH 24694)
Series.cat.categorical、Series.cat.name 和 Series.cat.index 已被弃用。请直接使用 Series.cat 或 Series 上的属性。(GH 24751)。
将没有精度的 np.dtype('datetime64') 或 timedelta64 传递给 Index、DatetimeIndex 和 TimedeltaIndex 现在已被弃用。请改用纳秒精度的 dtype (GH 24753)。

不推荐使用日期时间和时间增量进行整数加/减#

在过去，用户可以在某些情况下将整数或整数数据类型的数组添加或减去 Timestamp、DatetimeIndex 和 TimedeltaIndex。

此用法现已弃用。相反，请添加或减去对象 freq 属性的整数倍（GH 21939, GH 23878）。

以前的行为:

In [5]: ts = pd.Timestamp('1994-05-06 12:15:16', freq=pd.offsets.Hour())
In [6]: ts + 2
Out[6]: Timestamp('1994-05-06 14:15:16', freq='H')

In [7]: tdi = pd.timedelta_range('1D', periods=2)
In [8]: tdi - np.array([2, 1])
Out[8]: TimedeltaIndex(['-1 days', '1 days'], dtype='timedelta64[ns]', freq=None)

In [9]: dti = pd.date_range('2001-01-01', periods=2, freq='7D')
In [10]: dti + pd.Index([1, 2])
Out[10]: DatetimeIndex(['2001-01-08', '2001-01-22'], dtype='datetime64[ns]', freq=None)

新行为:

In [108]: ts = pd.Timestamp('1994-05-06 12:15:16', freq=pd.offsets.Hour())

In[109]: ts + 2 * ts.freq
Out[109]: Timestamp('1994-05-06 14:15:16', freq='H')

In [110]: tdi = pd.timedelta_range('1D', periods=2)

In [111]: tdi - np.array([2 * tdi.freq, 1 * tdi.freq])
Out[111]: TimedeltaIndex(['-1 days', '1 days'], dtype='timedelta64[ns]', freq=None)

In [112]: dti = pd.date_range('2001-01-01', periods=2, freq='7D')

In [113]: dti + pd.Index([1 * dti.freq, 2 * dti.freq])
Out[113]: DatetimeIndex(['2001-01-08', '2001-01-22'], dtype='datetime64[ns]', freq=None)

将整数数据和时区传递给 DatetimeIndex#

DatetimeIndex 在传递整数数据和时区时的行为将在 pandas 的未来版本中发生变化。以前，这些被解释为所需时区中的挂钟时间。未来，这些将被解释为 UTC 中的挂钟时间，然后转换为所需的时区 (GH 24559)。

默认行为保持不变，但会发出警告：

In [3]: pd.DatetimeIndex([946684800000000000], tz="US/Central")
/bin/ipython:1: FutureWarning:
    Passing integer-dtype data and a timezone to DatetimeIndex. Integer values
    will be interpreted differently in a future version of pandas. Previously,
    these were viewed as datetime64[ns] values representing the wall time
    *in the specified timezone*. In the future, these will be viewed as
    datetime64[ns] values representing the wall time *in UTC*. This is similar
    to a nanosecond-precision UNIX epoch. To accept the future behavior, use

        pd.to_datetime(integer_data, utc=True).tz_convert(tz)

    To keep the previous behavior, use

        pd.to_datetime(integer_data).tz_localize(tz)

 #!/bin/python3
 Out[3]: DatetimeIndex(['2000-01-01 00:00:00-06:00'], dtype='datetime64[ns, US/Central]', freq=None)

正如警告信息所解释的，通过指定整数值为UTC来选择未来的行为，然后转换到最终的时区：

In [99]: pd.to_datetime([946684800000000000], utc=True).tz_convert('US/Central')
Out[99]: DatetimeIndex(['1999-12-31 18:00:00-06:00'], dtype='datetime64[ns, US/Central]', freq=None)

通过直接本地化到最终时区，可以保留旧的行为：

In [100]: pd.to_datetime([946684800000000000]).tz_localize('US/Central')
Out[100]: DatetimeIndex(['2000-01-01 00:00:00-06:00'], dtype='datetime64[ns, US/Central]', freq=None)

将时区感知的 Series 和 Index 转换为 NumPy 数组#

从带有时区感知的日期时间数据的 Series 或 Index 的转换将默认更改以保留时区 (GH 23569)。

NumPy 没有专门用于时区感知日期时间的 dtype。过去，将带有时区感知日期时间的 Series 或 DatetimeIndex 转换为 NumPy 数组时

将时区感知数据转换为UTC
去掉时区信息
返回一个带有 datetime64[ns] 数据类型的 numpy.ndarray

未来版本的 pandas 将通过返回一个对象类型的 NumPy 数组来保留时区信息，其中每个值都是一个带有正确时区的 时间戳。

In [101]: ser = pd.Series(pd.date_range('2000', periods=2, tz="CET"))

In [102]: ser
Out[102]: 
0   2000-01-01 00:00:00+01:00
1   2000-01-02 00:00:00+01:00
Length: 2, dtype: datetime64[ns, CET]

默认行为保持不变，但会发出警告

In [8]: np.asarray(ser)
/bin/ipython:1: FutureWarning: Converting timezone-aware DatetimeArray to timezone-naive
      ndarray with 'datetime64[ns]' dtype. In the future, this will return an ndarray
      with 'object' dtype where each element is a 'pandas.Timestamp' with the correct 'tz'.

        To accept the future behavior, pass 'dtype=object'.
        To keep the old behavior, pass 'dtype="datetime64[ns]"'.
  #!/bin/python3
Out[8]:
array(['1999-12-31T23:00:00.000000000', '2000-01-01T23:00:00.000000000'],
      dtype='datetime64[ns]')

可以通过指定 dtype 来获得以前或未来的行为，而不会出现任何警告。

以前的行为

In [103]: np.asarray(ser, dtype='datetime64[ns]')
Out[103]: 
array(['1999-12-31T23:00:00.000000000', '2000-01-01T23:00:00.000000000'],
      dtype='datetime64[ns]')

未来行为

# New behavior
In [104]: np.asarray(ser, dtype=object)
Out[104]: 
array([Timestamp('2000-01-01 00:00:00+0100', tz='CET'),
       Timestamp('2000-01-02 00:00:00+0100', tz='CET')], dtype=object)

或者通过使用 Series.to_numpy()

In [105]: ser.to_numpy()
Out[105]: 
array([Timestamp('2000-01-01 00:00:00+0100', tz='CET'),
       Timestamp('2000-01-02 00:00:00+0100', tz='CET')], dtype=object)

In [106]: ser.to_numpy(dtype="datetime64[ns]")
Out[106]: 
array(['1999-12-31T23:00:00.000000000', '2000-01-01T23:00:00.000000000'],
      dtype='datetime64[ns]')

上述所有内容也适用于带有 tz-aware 值的 DatetimeIndex。

移除先前版本的弃用/更改#

LongPanel 和 WidePanel 类已被移除 (GH 10892)
Series.repeat() 已将 reps 参数重命名为 repeats (GH 14645)
从（非公开的）模块 pandas.core.common 中移除了几个私有函数 (GH 22001)
移除之前已弃用的模块 pandas.core.datetools (GH 14105, GH 14094)
传递给 DataFrame.groupby() 的字符串，如果同时引用列和索引级别，将引发 ValueError (GH 14432)
Index.repeat() 和 MultiIndex.repeat() 已将 n 参数重命名为 repeats (GH 14645)
如果 dtype 参数没有传递单位（例如 np.datetime64），Series 构造函数和 .astype 方法现在会引发 ``ValueError``（GH 15987）
从 str.match() 中完全移除之前已弃用的 as_indexer 关键字 (GH 22356, GH 6581)
模块 pandas.types、pandas.computation 和 pandas.util.decorators 已被移除 (GH 16157, GH 16250)
移除了 pandas.formats.style 对 pandas.io.formats.style.Styler 的兼容层 (GH 16059)
pandas.pnow, pandas.match, pandas.groupby, pd.get_store, pd.Expr, 和 pd.Term 已被移除 (GH 15538, GH 15940)
Categorical.searchsorted() 和 Series.searchsorted() 已将 v 参数重命名为 value (GH 14645)
pandas.parser、pandas.lib 和 pandas.tslib 已被移除 (GH 15537)
Index.searchsorted() 已将 key 参数重命名为 value (GH 14645)
DataFrame.consolidate 和 Series.consolidate 已被移除 (GH 15501)
移除之前弃用的模块 pandas.json (GH 19944)
模块 pandas.tools 已被移除 (GH 15358, GH 16005)
SparseArray.get_values() 和 SparseArray.to_dense() 已移除 fill 参数 (GH 14686)
DataFrame.sortlevel 和 Series.sortlevel 已被移除 (GH 15099)
SparseSeries.to_dense() 已删除 sparse_only 参数 (GH 14686)
DataFrame.astype() 和 Series.astype() 已将 raise_on_error 参数重命名为 errors (GH 14967)
is_sequence、is_any_int_dtype 和 is_floating_dtype 已从 pandas.api.types 中移除（GH 16163、GH 16189）

性能提升#

使用单调递增的 CategoricalIndex 对 Series 和 DataFrames 进行切片现在非常快，并且速度与使用 Int64Index 进行切片相当。无论是通过标签（使用 .loc）还是位置（.iloc）进行索引，速度都有所提升（GH 20395）。对单调递增的 CategoricalIndex 本身进行切片（即 ci[1000:2000]）也显示出类似的速度提升（GH 21659）
改进了在比较另一个 CategoricalIndex 时的 CategoricalIndex.equals() 性能 (GH 24023)
在数值类型的情况下，改进了 Series.describe() 的性能 (GH 21274)
在处理并列排名时，改进了 GroupBy.rank() 的性能 (GH 21237)
改进了包含 Period 对象的列在 DataFrame.set_index() 中的性能 (GH 21582, GH 21606)
改进了 Series.at() 和 Index.get_value() 对于扩展数组值（例如 Categorical）的性能 (GH 24204)
改进了 Categorical 和 CategoricalIndex 中的成员检查性能（即 x in cat 风格的检查速度更快）。CategoricalIndex.contains() 同样速度更快 (GH 21369, GH 21508)
改进了 HDFStore.groups() 的性能（以及依赖的函数如 HDFStore.keys()。（即 x in store 检查速度更快）(GH 21372)
改进了 sparse=True 时 pandas.get_dummies() 的性能 (GH 21997)
改进了对排序的非唯一索引的 IndexEngine.get_indexer_non_unique() 性能 (GH 9466)
改进了 PeriodIndex.unique() 的性能 (GH 23083)
改进了 Series 对象的 concat() 性能 (GH 23404)
改进了 DatetimeIndex.normalize() 和 Timestamp.normalize() 在时区未知或UTC日期时间中的性能 (GH 23634)
改进了 DatetimeIndex.tz_localize() 的性能以及使用 dateutil UTC 时区的各种 DatetimeIndex 属性 (GH 23772)
修复了在 Windows 上使用 Python 3.7 的 read_csv() 的性能退化问题 (GH 23516)
改进了 Series 对象的 Categorical 构造函数的性能 (GH 23814)
改进了 where() 对分类数据的性能 (GH 24077)
改进了对 Series 进行迭代时的性能。使用 DataFrame.itertuples() 现在创建的迭代器不会在内部分配所有元素的列表 (GH 20783)
改进了 Period 构造函数的性能，同时使 PeriodArray 和 PeriodIndex 的创建受益（GH 24084, GH 24118）
改进了 tz-aware DatetimeArray 二进制操作的性能 (GH 24491)

错误修复#

分类#

在 Categorical.from_codes() 中的错误，其中 codes 中的 NaN 值被静默转换为 0 (GH 21767)。未来这将引发 ValueError。同时改变了 .from_codes([1.1, 2.0]) 的行为。
在 Categorical.sort_values() 中的一个错误，其中 NaN 值总是位于前面，无论 na_position 值如何。(GH 22556)。
使用布尔值的 Categorical 进行索引时出现错误。现在布尔值的 Categorical 被视为布尔掩码 (GH 22665)
在更改数据类型强制转换后，使用空值和布尔类别构建 CategoricalIndex 会引发 ValueError (GH 22702)。
在 Categorical.take() 中存在一个错误，当用户提供的 fill_value 未编码 fill_value 时，可能会导致 ValueError、不正确的结果或段错误 (GH 23296)。
在 Series.unstack() 中，指定一个在类别中不存在的 fill_value 现在会引发 TypeError 而不是忽略 fill_value (GH 23284)
重采样时 DataFrame.resample() 并对分类数据进行聚合时，分类数据类型丢失的错误。(GH 23227)
在 .str 访问器的许多方法中存在一个错误，该错误在调用 CategoricalIndex.str 构造函数时总是失败 (GH 23555, GH 23556)
在 Series.where() 中丢失分类数据的分类数据类型的问题 (GH 24077)
在 Categorical.apply() 中的一个错误，其中 NaN 值可能无法预测地处理。现在它们保持不变 (GH 24241)
Categorical 比较方法中的错误在针对 DataFrame 操作时错误地引发 ValueError (GH 24630)
在 Categorical.set_categories() 中的一个错误，当使用 rename=True 设置较少的新类别时会导致段错误 (GH 24675)

Datetimelike#

修复了两个具有不同 normalize 属性的 DateOffset 对象可能被评估为相等的错误 (GH 21404)
修复了 Timestamp.resolution() 错误地返回 1 微秒 timedelta 而不是 1 纳秒 Timedelta 的 bug (GH 21336, GH 21365)
在 to_datetime() 中的一个错误，当指定 box=True 时，未能一致地返回一个 Index (GH 21864)
在 DatetimeIndex 比较中的错误，字符串比较错误地引发 TypeError (GH 22074)
在比较 DatetimeIndex 时，当与 timedelta64[ns] dtyped 数组比较时存在错误；在某些情况下，TypeError 被错误地引发，在其他情况下，它错误地未能引发 (GH 22074)
在比较对象类型的数组时，DatetimeIndex 比较中的错误 (GH 22074)
在 DataFrame 中使用 datetime64[ns] 数据类型与 Timedelta 类对象进行加减运算时出现的错误 (GH 22005, GH 22163)
在 DataFrame 中，使用 datetime64[ns] 数据类型进行加减 DateOffset 对象时，返回的是 object 数据类型而不是 datetime64[ns] 数据类型的问题 (GH 21610, GH 22163)
在 DataFrame 中使用 datetime64[ns] dtype 与 NaT 比较时出现错误 (GH 22242, GH 22163)
在 DataFrame 中，使用 datetime64[ns] 数据类型减去 Timestamp 类对象时，错误地返回了 datetime64[ns] 数据类型，而不是 timedelta64[ns] 数据类型 (GH 8554, GH 22163)
DataFrame 中 datetime64[ns] 数据类型减去非纳秒单位的 np.datetime64 对象未能转换为纳秒的错误 (GH 18874, GH 22163)
DataFrame 比较 Timestamp 类对象时，未能对类型不匹配的不等检查引发 TypeError 的错误 (GH 8932, GH 22163)
在包含 datetime64[ns] 的混合数据类型 DataFrame 中，在相等比较时错误地引发 TypeError 的错误 (GH 13128, GH 22163)
在具有时区感知日期时间值的单列 DataFrame 中，DataFrame.values 返回 DatetimeIndex 的错误。现在返回的是 Timestamp 对象的二维 numpy.ndarray (GH 24024)
DataFrame.eq() 比较 NaT 时错误地返回 True 或 NaN (GH 15697, GH 22163)
DatetimeIndex 减法中的一个错误，错误地未能引发 OverflowError (GH 22492, GH 22508)
在 DatetimeIndex 中存在一个错误，错误地允许使用 Timedelta 对象进行索引 (GH 20464)
在 DatetimeIndex 中的错误，如果原始频率为 None ，则设置频率 (GH 22150)
DatetimeIndex 的舍入方法 (round(), ceil(), floor()) 和 Timestamp 的舍入方法 (round(), ceil(), floor()) 可能会导致精度丢失 (GH 22591)
在带有 Index 参数的 to_datetime() 中存在一个错误，该错误会导致结果中丢失 name (GH 21697)
在 PeriodIndex 中的一个错误，其中添加或减去一个 timedelta 或 Tick 对象会产生不正确的结果 (GH 22988)
在 Series 的 repr 中存在一个 bug，周期性数据类型数据在数据前缺少一个空格 (GH 23601)
当通过负频率将开始日期递减到过去的结束日期时，date_range() 中的错误 (GH 23270)
在 Series.min() 中的错误，当对 NaT 系列调用时会返回 NaN 而不是 NaT (GH 23282)
Series.combine_first() 中的错误未能正确对齐分类数据，因此 self 中的缺失值未能被 other 中的有效值填充 (GH 24147)
DataFrame.combine() 中处理类似日期时间值时引发 TypeError 的错误 (GH 23079)
在频率为 Day 或更高的 date_range() 中存在一个错误，其中足够远的未来日期会绕回到过去，而不是引发 OutOfBoundsDatetime (GH 14187)
period_range() 中的一个错误，当 start 和 end 作为 Period 对象提供时，忽略它们的频率 (GH 20535)。
在 PeriodIndex 中，当属性 freq.n 大于 1 时，添加一个 DateOffset 对象会返回不正确的结果 (GH 23215)
在设置类似日期时间值时，Series 中的一个错误将字符串索引解释为字符列表 (GH 23451)
在从包含时区的 Timestamp 对象的 ndarray 创建新列时，DataFrame 中的错误创建了一个对象类型的列，而不是带时区的日期时间 (GH 23932)
Timestamp 构造函数中的一个错误，会丢失输入 Timestamp 的频率 (GH 22311)
在 DatetimeIndex 中的一个错误，调用 np.array(dtindex, dtype=object) 会错误地返回一个 long 对象数组 (GH 23524)
Index 中的一个错误，当传递一个时区感知的 DatetimeIndex 和 dtype=object 时会错误地引发 ValueError (GH 23524)
在 Index 中的一个错误，当在一个没有时区的 DatetimeIndex 上调用 np.array(dtindex, dtype=object) 时，会返回一个 datetime 对象的数组，而不是 Timestamp 对象的数组，可能会丢失时间戳的纳秒部分 (GH 23524)
在 Categorical.__setitem__ 中的错误，不允许在两者都是无序且具有相同类别但顺序不同的情况下，用另一个 Categorical 进行设置 (GH 24142)
在 date_range() 中的一个错误，当使用具有毫秒分辨率或更高分辨率的日期时，可能会返回不正确的值或在索引中返回错误的值数量 (GH 24110)
在 DatetimeIndex 中的错误，从 Categorical 或 CategoricalIndex 构建 DatetimeIndex 时会错误地丢弃时区信息 (GH 18664)
DatetimeIndex 和 TimedeltaIndex 中的一个错误，其中使用 Ellipsis 进行索引会错误地丢失索引的 freq 属性 (GH 21282)
当将错误的 freq 参数传递给 DatetimeIndex 时，如果传递的数据中第一个条目是 NaT，则生成清晰的错误消息 (GH 11587)
在 to_datetime() 中的一个错误，当传递一个 DataFrame 或 dict 的单位映射时，box 和 utc 参数被忽略 (GH 23760)
在 Series.dt 中的一个错误，即在进行就地操作后缓存不会正确更新 (GH 24408)
PeriodIndex 中的一个错误，当与长度为1的类数组对象进行比较时，未能引发 ValueError (GH 23078)
在 DatetimeIndex.astype()、PeriodIndex.astype() 和 TimedeltaIndex.astype() 中的错误，忽略了无符号整数 dtype 的符号 (GH 24405)。
修复了在存在空值且传递 skipna=False 时，Series.max() 在 datetime64[ns]-dtype 下无法返回 NaT 的错误 (GH 24265)
to_datetime() 中的一个错误，其中包含时区感知和时区不可知 datetime 对象的数组会失败并引发 ValueError (GH 24569)
在 to_datetime() 中使用无效的日期时间格式时，即使 errors='coerce' 也不会将输入强制转换为 NaT (GH 24763)

Timedelta#

在 DataFrame 中使用 timedelta64[ns] 数据类型除以 Timedelta 类标量时，错误地返回 timedelta64[ns] 数据类型而不是 float64 数据类型 (GH 20088, GH 22163)
在将具有 object dtype 的 Index 添加到具有 timedelta64[ns] dtype 的 Series 时，错误地引发 (GH 22390)
在将具有数值类型的 Series 与 timedelta 对象相乘时出现的错误 (GH 22390)
在具有数值类型的 Series 中添加或减去具有 timedelta64 类型的数组或 Series 时出现的错误 (GH 22390)
在乘以或除以一个 timedelta64 dtype 数组时，Index 中存在数值 dtype 的错误 (GH 22390)
TimedeltaIndex 中的一个错误，错误地允许使用 Timestamp 对象进行索引 (GH 20464)
修复了从 object-dtyped 数组中减去 Timedelta 会引发 TypeError 的错误 (GH 21980)
修复了将所有 timedelta64[ns] dtypes 的 DataFrame 添加到所有整数 dtypes 的 DataFrame 时返回不正确的结果而不是引发 TypeError 的错误 (GH 22696)
在 TimedeltaIndex 中的一个错误，其中添加一个带时区的 datetime 标量错误地返回了一个不带时区的 DatetimeIndex (GH 23215)
在 TimedeltaIndex 中的一个错误，其中添加 np.timedelta64('NaT') 错误地返回了一个全为 NaT 的 DatetimeIndex 而不是全为 NaT 的 TimedeltaIndex (GH 23215)
Timedelta 和 to_timedelta() 中的错误在支持的单位字符串中存在不一致 (GH 21762)
在 TimedeltaIndex 除法中的错误，其中除以另一个 TimedeltaIndex 会引发 TypeError 而不是返回一个 Float64Index (GH 23829, GH 22631)
在 TimedeltaIndex 比较操作中的错误，当与非 Timedelta 类对象比较时，会引发 TypeError 而不是返回 __eq__ 的所有-False 和 __ne__ 的所有-True (GH 24056)
在将 Timedelta 与 Tick 对象进行比较时，比较中不正确地引发 TypeError 的错误 (GH 24710)

时区#

在 Index.shift() 中的错误，当跨越夏令时（DST）时会引发 AssertionError (GH 8616)
Timestamp 构造函数中的一个错误，其中传递无效的时区偏移指示符（Z）不会引发 ValueError (GH 8910)
在 Timestamp.replace() 中替换到夏令时边界时会保留不正确的偏移的错误 (GH 7825)
在替换 NaT 时，Series.replace() 中 datetime64[ns, tz] 数据的错误 (GH 11792)
当传递带有时区偏移的不同字符串日期格式时，Timestamp 中的错误会产生不同的时区偏移 (GH 12064)
在比较一个 tz-naive Timestamp 和一个 tz-aware DatetimeIndex 时出现的错误，这会强制将 DatetimeIndex 转换为 tz-naive (GH 12601)
在具有 tz-aware DatetimeIndex 的 Series.truncate() 中存在一个会导致核心转储的错误 (GH 9243)
Series 构造函数中的一个错误，该错误会将 tz-aware 和 tz-naive 的 Timestamp 强制转换为 tz-aware (GH 13051)
在带有 datetime64[ns, tz] dtype 的 Index 中的错误，未正确本地化整数数据 (GH 20964)
在 DatetimeIndex 中存在一个错误，当使用整数和时区构造时不会正确本地化 (GH 12619)
修复了在时区感知的日期时间上调用 DataFrame.describe() 和 Series.describe() 时未显示 first 和 last 结果的错误 (GH 21328)
DatetimeIndex 比较中的错误，在比较时区感知的 DatetimeIndex 与 np.datetime64 时未能引发 TypeError (GH 22074)
在带有时区感知的标量分配 DataFrame 中的错误 (GH 19843)
在尝试比较 tz-naive 和 tz-aware 时间戳时，DataFrame.asof() 中的一个错误引发了 TypeError (GH 21194)
在使用 replace 方法构建的 Timestamp 构造 DatetimeIndex 时出现的Bug，跨越夏令时 (GH 18785)
在使用 DatetimeIndex 和 DST 转换时，通过 DataFrame.loc() 设置新值时出现的错误 (GH 18308, GH 20724)
Index.unique() 中的一个错误，未能正确重新本地化时区感知的日期 (GH 21737)
在DST转换时使用 Series 进行索引时出现的错误 (GH 21846)
在 DataFrame.resample() 和 Series.resample() 中的一个错误，如果在DST转换时结束一个时区感知的时序数据，会引发 AmbiguousTimeError 或 NonExistentTimeError (GH 19375, GH 10117)
在指定一个支持时区的 Timestamp 键从具有 DST 转换的 DatetimeIndex 中删除时，DataFrame.drop() 和 Series.drop() 中的 Bug (GH 21761)
DatetimeIndex 构造函数中的一个错误，其中 NaT 和 dateutil.tz.tzlocal 会引发 OutOfBoundsDatetime 错误 (GH 23807)
在 DatetimeIndex.tz_localize() 和 Timestamp.tz_localize() 中使用 dateutil.tz.tzlocal 接近夏令时转换时，会返回一个不正确本地化的日期时间 (GH 23807)
Timestamp 构造函数中的一个错误，其中传递了带有 datetime.datetime 参数的 dateutil.tz.tzutc 时区会被转换为 pytz.UTC 时区 (GH 23807)
在 to_datetime() 中的错误，当指定 unit 和 errors='ignore' 时，utc=True 未被尊重 (GH 23758)
在 to_datetime() 中的一个错误，当传递一个 Timestamp 时，utc=True 未被尊重 (GH 24415)
在 axis=1 且数据为 datetimelike 类型时，DataFrame.any() 中的 Bug 返回错误值 (GH 23070)
在 DatetimeIndex.to_period() 中的一个错误，其中时区感知的索引在创建 PeriodIndex 之前首先被转换为 UTC (GH 22905)
在 DataFrame.tz_localize()、DataFrame.tz_convert()、Series.tz_localize() 和 Series.tz_convert() 中的错误，当 copy=False 时会就地修改原始参数 (GH 6326)
在 axis=1 的情况下，DataFrame.max() 和 DataFrame.min() 中的错误，当所有列包含相同的时区时会返回一个包含 NaN 的 Series (GH 10390)

偏移量#

在 FY5253 中的一个错误，其中日期偏移量可能在算术运算中不正确地引发 AssertionError (GH 14774)
在 DateOffset 中的一个错误，其中关键字参数 week 和 milliseconds 被接受但被忽略。传递这些参数现在将引发 ValueError (GH 19398)
在添加带有 DataFrame 或 PeriodIndex 的 DateOffset 时错误地引发 TypeError (GH 23215)
在比较 DateOffset 对象与非 DateOffset 对象（特别是字符串）时存在错误，会引发 ValueError 而不是在相等检查时返回 False ，在不相等检查时返回 True (GH 23524)

Numeric#

Series 中的 __rmatmul__ 错误不支持矩阵向量乘法 (GH 21530)
在 factorize() 中的错误在只读数组上失败 (GH 12813)
修复了 unique() 处理带符号零不一致的错误：对于某些输入，0.0 和 -0.0 被视为相等，而对于某些输入则被视为不同。现在它们在所有输入中都被视为相等 (GH 21866)
在 DataFrame.agg()、DataFrame.transform() 和 DataFrame.apply() 中的一个错误，当提供一个函数列表和 axis=1``（例如 ``df.apply(['sum', 'mean'], axis=1)）时，会错误地引发 TypeError。对于所有这三种方法，现在这种计算都能正确完成。(GH 16679)。
在 Series 与类似日期时间的标量和数组进行比较时存在错误 (GH 22074)
在 DataFrame 中，布尔类型和整数相乘返回 object 类型而不是整数类型的问题 (GH 22047, GH 22163)
在 DataFrame.apply() 中的一个错误，当提供一个字符串参数和额外的位置或关键字参数（例如 df.apply('sum', min_count=1)）时，会错误地引发 TypeError (GH 22376)
在 DataFrame.astype() 到扩展数据类型的错误可能会引发 AttributeError (GH 22578)
DataFrame 中存在一个错误，当 timedelta64[ns] 数据类型的算术运算与整数数据类型的 ndarray 一起使用时，错误地将 narray 视为 timedelta64[ns] 数据类型 (GH 23114)
在 Series.rpow() 中使用对象类型 NaN 时，1 ** NA 而不是 1 的错误 (GH 22922)。
Series.agg() 现在可以处理像 numpy.nansum() 这样的 numpy NaN-aware 方法 (GH 19629)
Bug in Series.rank() 和 DataFrame.rank() 当 pct=True 并且存在超过 2²⁴ 行时，导致百分比大于 1.0 (GH 18271)
像 DataFrame.round() 这样的调用，如果使用非唯一的 CategoricalIndex() ，现在会返回预期的数据。之前，数据会被不正确地重复 (GH 21809)。
在 DataFrame.eval() 中添加了 log10、floor 和 ceil 到支持的函数列表中 (GH 24139, GH 24353)
在 Series 和 Index 之间的逻辑操作 &, |, ^ 将不再引发 ValueError (GH 22092)
在 is_scalar() 函数中检查 PEP 3141 数字返回 True (GH 22903)
像 Series.sum() 这样的归约方法现在在从 NumPy ufunc 调用时接受默认值 keepdims=False，而不是引发 TypeError。对 keepdims 的全面支持尚未实现 (GH 24356)。

转换#

在 DataFrame.combine_first() 中的一个错误，其中列类型意外地被转换为浮点数 (GH 20699)
在 DataFrame.clip() 中的错误，其中列类型未被保留并被转换为浮点型 (GH 24162)
当数据框的列顺序不匹配时，DataFrame.clip() 中存在错误，观察到的数值结果是错误的 (GH 20911)
在 DataFrame.astype() 中的一个错误，当存在重复列名时转换为扩展数据类型会导致 RecursionError (GH 24704)

字符串#

Index.str.partition() 中的错误不是 nan-安全的 (GH 23558)。
Index.str.split() 中的错误不是 nan-安全的 (GH 23677)。
错误 Series.str.contains() 不尊重 Categorical dtype Series 的 na 参数 (GH 22158)
当结果仅包含 NaN 时，Index.str.cat() 中的错误 (GH 24044)

Interval#

在 IntervalIndex 构造函数中的一个错误，其中 closed 参数并不总是覆盖推断的 closed (GH 19370)
IntervalIndex repr 中的一个错误，在区间列表后缺少一个尾随逗号 (GH 20611)
在 Interval 中的一个错误，其中标量算术运算没有保留 closed 值 (GH 22313)
在 IntervalIndex 中存在一个错误，当使用类似日期时间的值进行索引时会引发 KeyError (GH 20636)
IntervalTree 中的一个错误，其中包含 NaN 的数据触发了警告，并导致使用 IntervalIndex 进行不正确的索引查询 (GH 23352)

索引#

如果列包含列名“dtype”，DataFrame.ne() 中的错误会失败 (GH 22383)
当向 .loc 请求一个缺失的标签时，KeyError 的回溯现在更短且更清晰 (GH 21557)
PeriodIndex 现在在查找格式错误字符串时会发出 KeyError ，这与 DatetimeIndex 的行为一致 (GH 22803)
当在 MultiIndex 中请求一个缺失的整数标签，并且第一层是整数类型时，现在会一致地引发 KeyError，而不是像在扁平的 Int64Index 中那样回退到位置索引 (GH 21593)
在重新索引 tz-naive 和 tz-aware DatetimeIndex 时 Index.reindex() 中的 Bug (GH 8306)
在 Series.reindex() 中重新索引一个 datetime64[ns, tz] 类型的空系列时出现的错误 (GH 20869)
在使用 .loc 和时区感知的 DatetimeIndex 设置值时 DataFrame 中的错误 (GH 11365)
DataFrame.__getitem__ 现在接受字典和作为标签类列表的词典键，与 Series.__getitem__ 一致 (GH 21294)
当列不唯一时修复 DataFrame[np.nan] (GH 21428)
在索引 DatetimeIndex 时出现错误，涉及纳秒分辨率日期和时区 (GH 11679)
使用包含负值的 Numpy 数组进行索引时会改变索引器的问题 (GH 21867)
混合索引的错误，不允许 .at 使用整数 (GH 19860)
Float64Index.get_loc 现在在传递布尔键时会引发 KeyError。 (GH 19087)
在使用 IntervalIndex 进行索引时 DataFrame.loc() 中的错误 (GH 19977)
索引 不再混淆 None、NaN 和 NaT，即它们被视为三个不同的键。然而，对于数值索引，所有三个仍然被强制转换为 NaN (GH 22332)
如果标量是浮点数而 Index 是整数类型，则 scalar in Index 中的错误 (GH 22085)
当 levels 值不可下标时，MultiIndex.set_levels() 中的 Bug (GH 23273)
通过 Index 设置 timedelta 列时会导致其被转换为双精度型，从而丢失精度的问题 (GH 23511)
在 Index.union() 和 Index.intersection() 中的错误，其中结果的 Index 名称在某些情况下未正确计算 (GH 9943, GH 9862)
在带有布尔值 Index 的 Index 切片中存在错误，可能会引发 TypeError (GH 22533)
当接受切片和类列表值时 PeriodArray.__setitem__ 中的错误 (GH 23978)
在 DatetimeIndex 和 TimedeltaIndex 中存在一个错误，使用 Ellipsis 进行索引会丢失它们的 freq 属性 (GH 21282)
在 iat 中的一个错误，使用它分配一个不兼容的值会创建一个新列 (GH 23236)

缺失#

在 DataFrame.fillna() 中的一个错误，当某一列包含 datetime64[ns, tz] 数据类型时会引发 ValueError (GH 15522)
Series.hasnans() 中的一个错误，如果在初始调用后引入了空元素，可能会被错误地缓存并返回不正确的答案 (GH 19700)
Series.isin() 现在也将所有 NaN-float 视为相等，即使是对于 np.object_-dtype。此行为与 float64 的行为一致 (GH 22119)
unique() 不再对 np.object_-dtype 的 NaN-浮点和 NaT-对象进行处理，即 NaT 不再被强制转换为 NaN-值，而是被视为不同的实体。(GH 22295)
DataFrame 和 Series 现在能够正确处理带有硬掩码的 numpy 掩码数组。以前，从带有硬掩码的掩码数组构造 DataFrame 或 Series 时，会创建一个包含底层值的 pandas 对象，而不是预期的 NaN。(GH 24574)
DataFrame 构造函数中的一个错误，在处理 numpy 掩码记录数组时没有遵守 dtype 参数。(GH 24874)

MultiIndex#

在 io.formats.style.Styler.applymap() 中的错误，当 subset= 使用 MultiIndex 切片时会缩减为 Series (GH 19861)
移除了对 0.8.0 版本之前的 MultiIndex 序列化的兼容性；保持了对 0.13 版本及之后的 MultiIndex 序列化的兼容性 (GH 21654)
MultiIndex.get_loc_level`（因此，在具有 :class:`MultiIndex() 索引的 Series 或 DataFrame 上使用 .loc）现在会引发 KeyError，而不是返回一个空的 slice，如果请求的标签在 levels 中存在但未使用（GH 22221）
MultiIndex 已经获得了 MultiIndex.from_frame()，它允许从一个 DataFrame 构建一个 MultiIndex 对象（GH 22420）
在Python 3中创建 MultiIndex 时修复 TypeError，其中某些级别具有混合类型，例如当某些标签是元组时（GH 15457）

IO#

read_csv() 中的一个错误，其中使用布尔类别的 CategoricalDtype 指定的列未能正确地将字符串值强制转换为布尔值 (GH 20498)
在 read_csv() 中的一个错误，在 Python 2.x 中无法正确识别 Unicode 列名 (GH 13253)
在写入时区感知数据（datetime64[ns, tz] 数据类型）时，DataFrame.to_sql() 中的错误会引发 TypeError (GH 9086)
在 DataFrame.to_sql() 中的一个错误，其中一个天真的 DatetimeIndex 会被写成支持的数据库中的 TIMESTAMP WITH TIMEZONE 类型，例如 PostgreSQL (GH 23510)
当 parse_cols 指定了一个空数据集时，read_excel() 中的错误 (GH 9208)
read_html() 在考虑 skiprows 和 header 参数时，不再忽略 <thead> 中所有空白 <tr>。以前，用户必须在这些表格上减少他们的 header 和 skiprows 值以解决这个问题。(GH 21641)
read_excel() 将正确显示之前已弃用的 sheetname 的弃用警告（GH 17994）
read_csv() 和 read_table() 将在错误编码的字符串上抛出 UnicodeError 而不是核心转储 (GH 22748)
read_csv() 将正确解析时区感知的日期时间 (GH 22256)
在 read_csv() 中的一个错误，当数据以块的形式读取时，内存管理过早地为C引擎进行了优化 (GH 23509)
在未命名列中，read_csv() 的错误在提取多索引时被不当识别 (GH 23687)
read_sas() 将正确解析宽度小于8字节的sas7bdat文件中的数字。(GH 21616)
read_sas() 将正确解析具有许多列的 sas7bdat 文件 (GH 22628)
read_sas() 将正确解析带有数据页类型的 sas7bdat 文件，这些数据页类型还设置了第 7 位（因此页类型是 128 + 256 = 384）(GH 16615)
在 read_sas() 中的错误，在无效文件格式上引发了一个不正确的错误。(GH 24548)
在 detect_client_encoding() 中的错误，由于对标准输出的访问受限，在 mod_wsgi 进程中导入时未处理潜在的 IOError。(GH 21552)
在 index=False 的情况下，DataFrame.to_html() 中的错误会遗漏截断 DataFrame 上的截断指示符 (…) (GH 15019, GH 22783)
当列和行索引都是 MultiIndex 时，DataFrame.to_html() 中 index=False 的错误 (GH 22579)
在 index_names=False 的情况下，DataFrame.to_html() 中存在显示索引名称的错误 (GH 22747)
DataFrame.to_html() 中 header=False 时未显示行索引名称的错误 (GH 23788)
在 sparsify=False 的情况下，DataFrame.to_html() 中的一个错误导致其引发 TypeError (GH 22887)
在 DataFrame.to_string() 中的一个错误，当 index=False 且第一列值的宽度大于第一列标题的宽度时，会破坏列对齐 (GH 16839, GH 13032)
DataFrame.to_string() 中的一个错误导致 DataFrame 的表示无法占满整个窗口 (GH 22984)
在 DataFrame.to_csv() 中的一个错误，其中单层 MultiIndex 错误地写入了一个元组。现在只写入索引的值 (GH 19589)。
HDFStore 在 format kwarg 传递给构造函数时会引发 ValueError (GH 13291)
当附加一个包含空字符串列且 min_itemsize < 8 的 DataFrame 时，HDFStore.append() 中的错误 (GH 12242)
在 read_csv() 中的一个错误，在解析 NaN 值时，由于完成或错误时清理不充分，导致C引擎中发生内存泄漏 (GH 21353)
在 read_csv() 中的一个错误，当 skipfooter 与 nrows、iterator 或 chunksize 一起传递时，会引发不正确的错误消息 (GH 23711)
read_csv() 中的一个错误，其中 MultiIndex 索引名称在未提供时未被正确处理（GH 23484）
在 read_csv() 中的一个错误，当方言的值与默认参数冲突时，会引发不必要的警告 (GH 23761)
在 read_html() 中的错误，当提供无效的flavor时，错误消息没有显示有效的flavors (GH 23549)
在 read_excel() 中的一个错误，即使没有指定，也会提取多余的标题名称 (GH 11733)
在 read_excel() 中的一个错误，在 Python 2.x 中有时列名没有被正确转换为字符串 (GH 23874)
在 read_excel() 中的一个错误，其中 index_col=None 未被遵守，仍然解析索引列 (GH 18792, GH 20480)
在 read_excel() 中的一个错误，当 usecols 作为字符串传递时，没有对正确的列名进行验证 (GH 20480)
当结果字典包含数值数据情况下的非Python标量时，DataFrame.to_dict() 中的错误 (GH 23753)
DataFrame.to_string(), DataFrame.to_html(), DataFrame.to_latex() 在将字符串作为 float_format 参数传递时将正确格式化输出 (GH 21625, GH 22270)
在 read_csv() 中的一个错误，当尝试使用 ‘inf’ 作为 na_value 与整数索引列时会导致 OverflowError (GH 17128)
在 read_csv() 中的错误导致在 Windows 上的 Python 3.6+ 的 C 引擎无法正确读取带有重音或特殊字符的 CSV 文件名 (GH 15086)
在 read_fwf() 中的一个错误，其中文件的压缩类型没有被正确推断 (GH 22199)
pandas.io.json.json_normalize() 中的一个错误，当 record_path 的两个连续元素是字典时会导致其引发 TypeError (GH 22706)
在 DataFrame.to_stata()、pandas.io.stata.StataWriter 和 pandas.io.stata.StataWriter117 中的一个错误，会导致异常留下部分写入且无效的 dta 文件 (GH 23573)
DataFrame.to_stata() 和 pandas.io.stata.StataWriter117 中的一个错误，当使用包含非ASCII字符的strLs时会产生无效文件 (GH 23573)
HDFStore 中的一个错误导致在从用 Python 2 编写的固定格式中读取 Python 3 中的 Dataframe 时引发 ValueError (GH 24510)
在 DataFrame.to_string() 中存在一个错误，更广泛地说，在浮点 repr 格式化器中也存在这个问题。如果在某一列中存在 inf，则零不会被修剪，而当存在 NA 值时则会被修剪。现在，零在存在 NA 的情况下会被修剪 (GH 24861)。
当截断列数并且最后一列较宽时，repr 中的错误 (GH 24849)。

绘图#

在IPython内联后端中，当颜色条开启时，DataFrame.plot.scatter() 和 DataFrame.plot.hexbin() 中的错误导致x轴标签和刻度标签消失（GH 10611, GH 10678, 和 GH 20455）
在使用 matplotlib.axes.Axes.scatter() 绘制带有日期时间的序列时出现错误 (GH 22039)
DataFrame.plot.bar() 中的一个错误导致条形图使用了多种颜色而不是单一颜色 (GH 20585)
验证颜色参数的错误导致额外的颜色被附加到给定的颜色数组中。这发生在使用matplotlib的多个绘图函数中。(GH 20726)

GroupBy/重采样/滚动#

在 Rolling.min() 和 Rolling.max() 中存在一个错误，当 closed='left' 时，使用类似日期时间的索引并且序列中只有一个条目，导致段错误 (GH 24718)
在 as_index=False 的情况下，GroupBy.first() 和 GroupBy.last() 中的错误导致时区信息丢失 (GH 15884)
在跨越夏令时边界进行降采样时 DateFrame.resample() 中的错误 (GH 8531)
当 n > 1 时，使用偏移量 Day 的 DateFrame.resample() 的日期锚定存在错误 (GH 24127)
在调用 SeriesGroupBy 的 SeriesGroupBy.count() 方法时，如果分组变量仅包含 NaNs 且 numpy 版本 < 1.13，会错误地引发 ValueError 的 Bug (GH 21956)。
在使用 closed='left' 和类似日期时间的索引时，Rolling.min() 中的多个错误导致结果不正确，并且还会导致段错误。(GH 21704)
在传递位置参数给应用的函数时，Resampler.apply() 中的错误 (GH 14615)。
在传递 numpy.timedelta64 到 loffset 关键字参数时 Series.resample() 中的 Bug (GH 7687)。
当 TimedeltaIndex 的频率是新频率的子周期时，Resampler.asfreq() 中的错误 (GH 13022)。
当值是整数但不能适应int64时，SeriesGroupBy.mean() 中的错误，而是溢出。(GH 22487)
RollingGroupby.agg() 和 ExpandingGroupby.agg() 现在支持多个聚合函数作为参数 (GH 15072)
在 DataFrame.resample() 和 Series.resample() 中，当按周偏移 ('W') 重采样跨越夏令时转换时存在错误 (GH 9119, GH 21459)
在 DataFrame.expanding() 中的一个错误，在聚合过程中没有尊重 axis 参数 (GH 23372)
GroupBy.transform() 中的一个错误，当输入函数可以接受 DataFrame 但重命名它时，会导致缺失值 (GH 23455)。
在 GroupBy.nth() 中的错误，其中列顺序并不总是被保留 (GH 20760)
当一个组只有一个成员时，使用 method='dense' 和 pct=True 的 GroupBy.rank() 中的错误会引发 ZeroDivisionError (GH 23666)。
调用 GroupBy.rank() 时，如果组为空且 pct=True，会引发 ZeroDivisionError (GH 22519)
在 TimeDeltaIndex 中重采样 NaT 时 DataFrame.resample() 中的错误 (GH 13223)。
DataFrame.groupby() 中的一个错误在选择列时没有尊重 observed 参数，而是总是使用 observed=False (GH 23970)
在 SeriesGroupBy.pct_change() 或 DataFrameGroupBy.pct_change() 中的错误以前会在计算百分比变化时跨组工作，现在它正确地在每个组内工作 (GH 21200, GH 21235)。
防止创建具有非常大数量（2^32）行的哈希表的错误 (GH 22805)
在按分类进行分组时，如果 observed=True 并且分类列中存在 nan，会导致 ValueError 和不正确的分组 (GH 24740, GH 21151)。

重塑#

当连接带有时区感知索引的重采样DataFrame时，pandas.concat() 中的错误 (GH 13783)
当仅连接 Series 时，pandas.concat() 中的 names 参数不再被忽略 (GH 23490)
在带有 datetime64[ns, tz] dtype 的 Series.combine_first() 中的错误，会返回时区无感知的结果 (GH 21469)
在 datetime64[ns, tz] dtype 下 Series.where() 和 DataFrame.where() 中的 Bug (GH 21546)
在空的 DataFrame 和非布尔数据类型的空 cond 中 DataFrame.where() 的错误 (GH 21947)
Series.mask() 和 DataFrame.mask() 中使用 list 条件时的错误 (GH 21891)
在转换 OutOfBounds datetime64[ns, tz] 时，DataFrame.replace() 中的 Bug 引发 RecursionError (GH 20380)
GroupBy.rank() 现在在为参数 na_option 传递无效值时会引发 ValueError (GH 22124)
在 Python 2 中使用 Unicode 属性时 get_dummies() 的 Bug (GH 22084)
在 DataFrame.replace() 中的错误在替换空列表时引发 RecursionError (GH 22083)
当使用字典作为 to_replace 值时，Series.replace() 和 DataFrame.replace() 存在一个错误，如果字典中的一个键是另一个键的值，使用整数键和使用字符串键的结果不一致 (GH 20656)
在空的 DataFrame 中 DataFrame.drop_duplicates() 的错误，错误地引发了一个错误 (GH 20516)
当字符串传递给 stubnames 参数且某一列名是该 stubname 的子字符串时，pandas.wide_to_long() 中的错误 (GH 22468)
当合并包含DST转换的 datetime64[ns, tz] 数据时，merge() 中的错误 (GH 18885)
在 merge_asof() 中合并浮点值时，在定义的容差内存在错误 (GH 22981)
当将具有时区感知数据的多元列 DataFrame 与具有不同列数的 DataFrame 连接时，pandas.concat() 中的错误 (GH 22796)
在 merge_asof() 中的错误，当尝试合并缺失值时会引发令人困惑的错误信息 (GH 23189)
在具有 MultiIndex 列的 DataFrame 中，DataFrame.nsmallest() 和 DataFrame.nlargest() 存在 Bug (GH 23033)。
当传递不存在于 DataFrame 中的列名时，pandas.melt() 中的错误 (GH 23575)
在带有 dateutil 时区的 Series 中使用 DataFrame.append() 会出现 TypeError (GH 23682)
在传递无数据且 dtype=str 时 Series 构造中的错误 (GH 22477)
在 cut() 中存在一个错误，当 bins 是一个重叠的 IntervalIndex 时，每个项目会返回多个 bin，而不是引发 ValueError (GH 23980)
当连接 Series datetimetz 和 Series category 时，pandas.concat() 中的错误会丢失时区 (GH 23816)
当基于部分 MultiIndex 进行连接时，DataFrame.join() 中的错误会删除名称 (GH 20452)。
DataFrame.nlargest() 和 DataFrame.nsmallest() 现在在 keep != ‘all’ 时返回正确的 n 值，即使在第一个列上打平时也是如此 (GH 22752)
使用一个尚未是 Index 实例的索引参数构建 DataFrame 是不可行的 (GH 22227)。
在 DataFrame 中的错误阻止了使用列表子类进行构造 (GH 21226)
在 DataFrame.unstack() 和 DataFrame.pivot_table() 中存在一个错误，当生成的 DataFrame 元素数量超过 int32 所能处理的范围时，会返回一个误导性的错误信息。现在，错误信息得到了改进，指向实际问题 (GH 20601)。
在 DataFrame.unstack() 中的一个错误，当解栈时区感知值时会引发 ValueError (GH 18338)
在 DataFrame.stack() 中的一个错误，其中时区感知的值被转换为时区无知的值 (GH 19420)
在 merge_asof() 中的一个错误，当 by_col 是时区感知值时会引发 TypeError (GH 21184)
在 DataFrame 构造期间抛出错误时显示不正确形状的错误。(GH 20742)

Sparse#

更新布尔值、日期时间或时间增量列以使其成为稀疏列现在可以正常工作 (GH 22367)
在已经持有稀疏数据的 Series.to_sparse() 中存在错误，未能正确构造 (GH 22389)
在 SparseArray 构造函数中提供 sparse_index 不再默认将 na-value 设置为 np.nan 用于所有数据类型。现在使用 data.dtype 的正确 na_value。
SparseArray.nbytes 中的错误，由于未包括其稀疏索引的大小，导致其内存使用量报告不足。
改进了 Series.shift() 在非NA fill_value 的性能，因为值不再转换为密集数组。
DataFrame.groupby 中的错误，在按稀疏列分组时，对于非NA的 fill_value ，分组中不包括 fill_value (GH 5078)
布尔值的 SparseSeries 上的 ~ 一元反转运算符存在错误。此性能也得到了改进 (GH 22835)
在 SparseArary.unique() 中存在一个错误，未返回唯一值 (GH 19595)
SparseArray.nonzero() 和 SparseDataFrame.dropna() 中的错误返回了偏移/不正确的结果 (GH 21172)
在 DataFrame.apply() 中的一个错误，导致 dtypes 失去稀疏性 (GH 23744)
当连接一个包含所有稀疏值的 Series 列表时，concat() 中的错误会改变 fill_value 并转换为密集 Series (GH 24371)

风格#

background_gradient() 现在接受一个 text_color_threshold 参数，以根据背景颜色的亮度自动使文本颜色变亮。这提高了深色背景颜色下的可读性，而无需限制背景色图范围。(GH 21258)
background_gradient() 现在也支持表级应用（除了行级和列级之外），使用 axis=None (GH 15204)
bar() 现在也支持表级应用（除了行级和列级之外），使用 axis=None 并且可以通过 vmin 和 vmax 设置裁剪范围（GH 21548 和 GH 21526）。NaN 值也得到了适当的处理。

构建变化#

现在为开发构建 pandas 需要 cython >= 0.28.2 (GH 21688)
现在测试 pandas 需要 hypothesis>=3.58。你可以在这里找到 Hypothesis 文档，以及在贡献指南中 pandas 特定的介绍。(GH 22280)
在 macOS 上构建 pandas 现在以 macOS 10.9 为最低目标，如果在 macOS 10.9 或更高版本上运行 (GH 23424)

其他#

如果某些其他C库在pandas之前被导入，会导致导入错误的C变量使用外部链接声明的错误。(GH 24113)

贡献者#

共有 337 人为此版本贡献了补丁。名字后面带有“+”的人是第一次贡献补丁。

AJ Dyka +
AJ Pryor, Ph.D +
Aaron Critchley
Adam Hooper
Adam J. Stewart
Adam Kim
Adam Klimont +
Addison Lynch +
Alan Hogue +
Alex Radu +
Alex Rychyk
Alex Strick van Linschoten +
Alex Volkov +
Alexander Buchkovsky
Alexander Hess +
Alexander Ponomaroff +
Allison Browne +
Aly Sivji
Andrew
Andrew Gross +
Andrew Spott +
Andy +
Aniket uttam +
Anjali2019 +
Anjana S +
Antti Kaihola +
Anudeep Tubati +
Arjun Sharma +
Armin Varshokar
Artem Bogachev
ArtinSarraf +
Barry Fitzgerald +
Bart Aelterman +
Ben James +
Ben Nelson +
Benjamin Grove +
Benjamin Rowell +
Benoit Paquet +
Boris Lau +
Brett Naul
Brian Choi +
C.A.M. Gerlach +
Carl Johan +
Chalmer Lowe
Chang She
Charles David +
Cheuk Ting Ho
Chris
Chris Roberts +
Christopher Whelan
Chu Qing Hao +
Da Cheezy Mobsta +
Damini Satya
Daniel Himmelstein
Daniel Saxton +
Darcy Meyer +
DataOmbudsman
David Arcos
David Krych
Dean Langsam +
Diego Argueta +
Diego Torres +
Dobatymo +
Doug Latornell +
Dr. Irv
Dylan Dmitri Gray +
Eric Boxer +
Eric Chea
Erik +
Erik Nilsson +
Fabian Haase +
Fabian Retkowski
Fabien Aulaire +
Fakabbir Amin +
Fei Phoon +
Fernando Margueirat +
Florian Müller +
Fábio Rosado +
Gabe Fernando
Gabriel Reid +
Giftlin Rajaiah
Gioia Ballin +
Gjelt
Gosuke Shibahara +
Graham Inggs
Guillaume Gay
Guillaume Lemaitre +
Hannah Ferchland
Haochen Wu
Hubert +
HubertKl +
HyunTruth +
Iain Barr
Ignacio Vergara Kausel +
Irv Lustig +
IsvenC +
Jacopo Rota
Jakob Jarmar +
James Bourbeau +
James Myatt +
James Winegar +
Jan Rudolph
Jared Groves +
Jason Kiley +
Javad Noorbakhsh +
Jay Offerdahl +
Jeff Reback
Jeongmin Yu +
Jeremy Schendel
Jerod Estapa +
Jesper Dramsch +
Jim Jeon +
Joe Jevnik
Joel Nothman
Joel Ostblom +
Jordi Contestí
Jorge López Fueyo +
Joris Van den Bossche
Jose Quinones +
Jose Rivera-Rubio +
Josh
Jun +
Justin Zheng +
Kaiqi Dong +
Kalyan Gokhale
Kang Yoosam +
Karl Dunkle Werner +
Karmanya Aggarwal +
Kevin Markham +
Kevin Sheppard
Kimi Li +
Koustav Samaddar +
Krishna +
Kristian Holsheimer +
Ksenia Gueletina +
Kyle Prestel +
LJ +
LeakedMemory +
Li Jin +
Licht Takeuchi
Luca Donini +
Luciano Viola +
Mak Sze Chun +
Marc Garcia
Marius Potgieter +
Mark Sikora +
Markus Meier +
Marlene Silva Marchena +
Martin Babka +
MatanCohe +
Mateusz Woś +
Mathew Topper +
Matt Boggess +
Matt Cooper +
Matt Williams +
Matthew Gilbert
Matthew Roeschke
Max Kanter
Michael Odintsov
Michael Silverstein +
Michael-J-Ward +
Mickaël Schoentgen +
Miguel Sánchez de León Peque +
Ming Li
Mitar
Mitch Negus
Monson Shao +
Moonsoo Kim +
Mortada Mehyar
Myles Braithwaite
Nehil Jain +
Nicholas Musolino +
Nicolas Dickreuter +
Nikhil Kumar Mengani +
Nikoleta Glynatsi +
Ondrej Kokes
Pablo Ambrosio +
Pamela Wu +
Parfait G +
Patrick Park +
Paul
Paul Ganssle
Paul Reidy
Paul van Mulbregt +
Phillip Cloud
Pietro Battiston
Piyush Aggarwal +
Prabakaran Kumaresshan +
Pulkit Maloo
Pyry Kovanen
Rajib Mitra +
Redonnet Louis +
Rhys Parry +
Rick +
Robin
Roei.r +
RomainSa +
Roman Imankulov +
Roman Yurchak +
Ruijing Li +
Ryan +
Ryan Nazareth +
Rüdiger Busche +
SEUNG HOON, SHIN +
Sandrine Pataut +
Sangwoong Yoon
Santosh Kumar +
Saurav Chakravorty +
Scott McAllister +
Sean Chan +
Shadi Akiki +
Shengpu Tang +
Shirish Kadam +
Simon Hawkins +
Simon Riddell +
Simone Basso
Sinhrks
Soyoun(Rose) Kim +
Srinivas Reddy Thatiparthy (శ్రీనివాస్ రెడ్డి తాటిపర్తి) +
Stefaan Lippens +
Stefano Cianciulli
Stefano Miccoli +
Stephen Childs
Stephen Pascoe
Steve Baker +
Steve Cook +
Steve Dower +
Stéphan Taljaard +
Sumin Byeon +
Sören +
Tamas Nagy +
Tanya Jain +
Tarbo Fukazawa
Thein Oo +
Thiago Cordeiro da Fonseca +
Thierry Moisan
Thiviyan Thanapalasingam +
Thomas Lentali +
Tim D. Smith +
Tim Swast
Tom Augspurger
Tomasz Kluczkowski +
Tony Tao +
Triple0 +
Troels Nielsen +
Tuhin Mahmud +
Tyler Reddy +
Uddeshya Singh
Uwe L. Korn +
Vadym Barda +
Varad Gunjal +
Victor Maryama +
Victor Villas
Vincent La
Vitória Helena +
Vu Le
Vyom Jain +
Weiwen Gu +
Wenhuan
Wes Turner
Wil Tan +
William Ayd
Yeojin Kim +
Yitzhak Andrade +
Yuecheng Wu +
Yuliya Dovzhenko +
Yury Bayda +
Zac Hatfield-Dodds +
aberres +
aeltanawy +
ailchau +
alimcmaster1
alphaCTzo7G +
amphy +
araraonline +
azure-pipelines[bot] +
benarthur91 +
bk521234 +
cgangwar11 +
chris-b1
cxl923cc +
dahlbaek +
dannyhyunkim +
darke-spirits +
david-liu-brattle-1
davidmvalente +
deflatSOCO
doosik_bae +
dylanchase +
eduardo naufel schettino +
euri10 +
evangelineliu +
fengyqf +
fjdiod
fl4p +
fleimgruber +
gfyoung
h-vetinari
harisbal +
henriqueribeiro +
himanshu awasthi
hongshaoyang +
igorfassen +
jalazbe +
jbrockmendel
jh-wu +
justinchan23 +
louispotok
marcosrullan +
miker985
nicolab100 +
nprad
nsuresh +
ottiP
pajachiet +
raguiar2 +
ratijas +
realead +
robbuckley +
saurav2608 +
sideeye +
ssikdar1
svenharris +
syutbai +
testvinder +
thatneat
tmnhat2001
tomascassidy +
tomneep
topper-123
vkk800 +
winlu +
ym-pett +
yrhooke +
ywpark1 +
zertrin
zhezherun +

0.24.0 中的新功能（2019年1月25日）#

增强功能#

可选的整数 NA 支持#

访问 Series 或 Index 中的值#

pandas.array: 一个用于创建数组的新顶级方法#

在 Series 和 DataFrame 中存储间隔和周期数据#

与两个多索引连接#

函数 read_html 增强功能#

新的 Styler.pipe() 方法#

在 MultiIndex 中重命名名称#

其他增强功能#

向后不兼容的 API 变化#

增加了依赖项的最小版本#

os.linesep 用于 DataFrame.to_csv 的 line_terminator#

在使用Python引擎的字符串数据类型列中正确处理 np.nan#

解析带有时区偏移的日期时间字符串#

使用 read_csv() 解析混合时区#

dt.end_time 和 to_timestamp(how='end') 中的时间值#

Series.unique 用于时区感知数据#

稀疏数据结构重构#

get_dummies() 总是返回一个 DataFrame#

在 DataFrame.to_dict(orient='index') 中引发 ValueError#

Tick DateOffset 规范化限制#

周期减法#

从 DataFrame 中添加/减去 NaN#

DataFrame 比较操作广播更改#

DataFrame 算术运算广播更改#

系列和索引数据类型的不兼容性#

连接变化#

Datetimelike API 变更#

其他 API 更改#

扩展类型更改#

弃用#

不推荐使用日期时间和时间增量进行整数加/减#

将整数数据和时区传递给 DatetimeIndex#

将时区感知的 Series 和 Index 转换为 NumPy 数组#

移除先前版本的弃用/更改#

性能提升#

错误修复#

分类#

Datetimelike#

Timedelta#

时区#

偏移量#

Numeric#

转换#

字符串#

Interval#

索引#

缺失#

MultiIndex#

IO#

绘图#

GroupBy/重采样/滚动#

重塑#

Sparse#

风格#

构建变化#

其他#

贡献者#

`pandas.array`: 一个用于创建数组的新顶级方法#

函数 `read_html` 增强功能#

新的 `Styler.pipe()` 方法#

`os.linesep` 用于 `DataFrame.to_csv` 的 `line_terminator`#

在使用Python引擎的字符串数据类型列中正确处理 `np.nan`#

使用 `read_csv()` 解析混合时区#

`dt.end_time` 和 `to_timestamp(how='end')` 中的时间值#

`get_dummies()` 总是返回一个 DataFrame#

在 `DataFrame.to_dict(orient='index')` 中引发 ValueError#

从 `DataFrame` 中添加/减去 `NaN`#