版本 0.17.0 (2015年10月9日)#

这是从 0.16.2 版本以来的一个重大发布，包括少量 API 更改、几个新功能、增强功能和性能改进，以及大量错误修复。我们建议所有用户升级到此版本。

警告

pandas >= 0.17.0 将不再支持与 Python 版本 3.2 的兼容性 (GH 9118)

警告

pandas.io.data 包已被弃用，并将被 pandas-datareader 包取代。这将允许数据模块独立于您的 pandas 安装进行更新。pandas-datareader v0.1.1 的 API 与 pandas v0.17.0 中的完全相同 (GH 8961, GH 10861)。

安装 pandas-datareader 后，您可以轻松更改导入：

from pandas.io import data, wb

变为

from pandas_datareader import data, wb

亮点包括：

在某些 cython 操作上释放全局解释器锁 (GIL)，请参见这里
绘图方法现在可以通过 .plot 访问器作为属性使用，详见这里
排序 API 已经进行了改进，以消除一些长期存在的不一致性，请参见这里
支持 datetime64[ns] 带时区作为第一类数据类型，参见这里
to_datetime 的默认行为现在将在遇到不可解析的格式时 raise，之前这将返回原始输入。此外，日期解析函数现在返回一致的结果。请参见这里
在 HDFStore 中 dropna 的默认值已更改为 False，以默认存储所有行，即使它们全部是 NaN，请参见这里
Datetime 访问器 (dt) 现在支持 Series.dt.strftime 以生成格式化的字符串用于 datetime-like 对象，以及 Series.dt.total_seconds 以生成 timedelta 的每个持续时间（以秒为单位）。请参见这里
Period 和 PeriodIndex 可以处理乘法频率，如 3D，这对应于 3 天的跨度。请参见这里
现在安装的 pandas 开发版本将具有符合 PEP440 的版本字符串 (GH 9518)
使用 Air Speed Velocity 库进行基准测试的开发支持 (GH 8361)
支持读取 SAS xport 文件，请参见这里
比较 SAS 与 pandas 的文档，见这里
移除自 0.8.0 版本以来已弃用的自动 TimeSeries 广播，详见这里
使用纯文本的显示格式可以选择与 Unicode 东亚宽度对齐，请参见这里
与 Python 3.5 的兼容性 (GH 11097)
与 matplotlib 1.5.0 的兼容性 (GH 11111)

在更新之前，请检查 API 变更和弃用。

新功能#

带时区的日期时间#

我们正在添加一个原生支持带时区的datetime的实现。一个 Series 或一个 DataFrame 列之前可以被分配一个带时区的datetime，并且会作为一个 object dtype 工作。这在行数较多时存在性能问题。更多详情请参见文档。(GH 8260, GH 10763, GH 11034)。

新的实现允许在所有行中使用单一时区，并以高效的方式进行操作。

In [1]: df = pd.DataFrame(
   ...:     {
   ...:         "A": pd.date_range("20130101", periods=3),
   ...:         "B": pd.date_range("20130101", periods=3, tz="US/Eastern"),
   ...:         "C": pd.date_range("20130101", periods=3, tz="CET"),
   ...:     }
   ...: )
   ...: 

In [2]: df
Out[2]: 
           A                         B                         C
0 2013-01-01 2013-01-01 00:00:00-05:00 2013-01-01 00:00:00+01:00
1 2013-01-02 2013-01-02 00:00:00-05:00 2013-01-02 00:00:00+01:00
2 2013-01-03 2013-01-03 00:00:00-05:00 2013-01-03 00:00:00+01:00

[3 rows x 3 columns]

In [3]: df.dtypes
Out[3]: 
A                datetime64[ns]
B    datetime64[ns, US/Eastern]
C           datetime64[ns, CET]
Length: 3, dtype: object

In [4]: df.B
Out[4]: 
0   2013-01-01 00:00:00-05:00
1   2013-01-02 00:00:00-05:00
2   2013-01-03 00:00:00-05:00
Name: B, Length: 3, dtype: datetime64[ns, US/Eastern]

In [5]: df.B.dt.tz_localize(None)
Out[5]: 
0   2013-01-01
1   2013-01-02
2   2013-01-03
Name: B, Length: 3, dtype: datetime64[ns]

这还使用了一种新的数据类型表示，它在外观和感觉上与它的numpy表亲 datetime64[ns] 非常相似。

In [6]: df["B"].dtype
Out[6]: datetime64[ns, US/Eastern]

In [7]: type(df["B"].dtype)
Out[7]: pandas.core.dtypes.dtypes.DatetimeTZDtype

备注

由于数据类型变化，底层 DatetimeIndex 的字符串表示略有不同，但功能上这些是相同的。

之前的行为：

In [1]: pd.date_range('20130101', periods=3, tz='US/Eastern')
Out[1]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00',
                       '2013-01-03 00:00:00-05:00'],
                      dtype='datetime64[ns]', freq='D', tz='US/Eastern')

In [2]: pd.date_range('20130101', periods=3, tz='US/Eastern').dtype
Out[2]: dtype('<M8[ns]')

新行为：

In [8]: pd.date_range("20130101", periods=3, tz="US/Eastern")
Out[8]: 
DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00',
               '2013-01-03 00:00:00-05:00'],
              dtype='datetime64[ns, US/Eastern]', freq='D')

In [9]: pd.date_range("20130101", periods=3, tz="US/Eastern").dtype
Out[9]: datetime64[ns, US/Eastern]

释放GIL#

我们正在某些cython操作中释放全局解释器锁（GIL）。这将允许其他线程在计算期间同时运行，潜在地允许通过多线程提高性能。值得注意的是，groupby、nsmallest、value_counts 和一些索引操作受益于此。(GH 8882)

例如，以下代码中的 groupby 表达式在因式分解步骤中会释放 GIL，例如 df.groupby('key') 以及 .sum() 操作。

N = 1000000
ngroups = 10
df = DataFrame(
    {"key": np.random.randint(0, ngroups, size=N), "data": np.random.randn(N)}
)
df.groupby("key")["data"].sum()

释放GIL可以有利于使用线程进行用户交互（例如QT）或执行多线程计算的应用程序。一个很好的可以处理这些类型并行计算的库例子是dask_库。

绘图子方法#

Series 和 DataFrame 的 .plot() 方法允许通过提供 kind 关键字参数来自定义图表类型。不幸的是，许多这类图表使用不同的必需和可选关键字参数，这使得很难发现任何给定图表类型使用了数十种可能参数中的哪些。

为了缓解这个问题，我们添加了一个新的、可选的绘图接口，该接口将每种类型的图表作为 .plot 属性的方法公开。现在，您不仅可以使用 series.plot(kind=<kind>, ...)，还可以使用 series.plot.<kind>(...)：

In [10]: df = pd.DataFrame(np.random.rand(10, 2), columns=['a', 'b'])

In [11]: df.plot.bar()

由于这一更改，这些方法现在都可以通过选项卡补全功能找到：

In [12]: df.plot.<TAB>  # noqa: E225, E999
df.plot.area     df.plot.barh     df.plot.density  df.plot.hist     df.plot.line     df.plot.scatter
df.plot.bar      df.plot.box      df.plot.hexbin   df.plot.kde      df.plot.pie

每个方法签名仅包括相关的参数。目前，这些仅限于必需的参数，但在未来，这些还将包括可选参数。有关概述，请参阅新的绘图 API 文档。

`dt` 访问器的其他方法#

Series.dt.strftime#

我们现在支持一个 Series.dt.strftime 方法用于生成格式化字符串的日期时间类型 (GH 10110)。示例：

# DatetimeIndex
In [13]: s = pd.Series(pd.date_range("20130101", periods=4))

In [14]: s
Out[14]: 
0   2013-01-01
1   2013-01-02
2   2013-01-03
3   2013-01-04
Length: 4, dtype: datetime64[ns]

In [15]: s.dt.strftime("%Y/%m/%d")
Out[15]: 
0    2013/01/01
1    2013/01/02
2    2013/01/03
3    2013/01/04
Length: 4, dtype: object

# PeriodIndex
In [16]: s = pd.Series(pd.period_range("20130101", periods=4))

In [17]: s
Out[17]: 
0    2013-01-01
1    2013-01-02
2    2013-01-03
3    2013-01-04
Length: 4, dtype: period[D]

In [18]: s.dt.strftime("%Y/%m/%d")
Out[18]: 
0    2013/01/01
1    2013/01/02
2    2013/01/03
3    2013/01/04
Length: 4, dtype: object

字符串格式与Python标准库相同，详细信息可以在这里找到

Series.dt.total_seconds#

pd.Series 类型为 timedelta64 有新的方法 .dt.total_seconds() 返回 timedelta 的持续时间（以秒为单位）(GH 10817)

# TimedeltaIndex
In [19]: s = pd.Series(pd.timedelta_range("1 minutes", periods=4))

In [20]: s
Out[20]: 
0   0 days 00:01:00
1   1 days 00:01:00
2   2 days 00:01:00
3   3 days 00:01:00
Length: 4, dtype: timedelta64[ns]

In [21]: s.dt.total_seconds()
Out[21]: 
0        60.0
1     86460.0
2    172860.0
3    259260.0
Length: 4, dtype: float64

周期频率增强#

Period、PeriodIndex 和 period_range 现在可以接受乘法频率。此外，Period.freq 和 PeriodIndex.freq 现在存储为 DateOffset 实例，类似于 DatetimeIndex，而不是 str (GH 7811)

一个乘以的 freq 表示相应长度的跨度。下面的示例创建了一个为期3天的周期。加法和减法将按其跨度移动周期。

In [22]: p = pd.Period("2015-08-01", freq="3D")

In [23]: p
Out[23]: Period('2015-08-01', '3D')

In [24]: p + 1
Out[24]: Period('2015-08-04', '3D')

In [25]: p - 2
Out[25]: Period('2015-07-26', '3D')

In [26]: p.to_timestamp()
Out[26]: Timestamp('2015-08-01 00:00:00')

In [27]: p.to_timestamp(how="E")
Out[27]: Timestamp('2015-08-03 23:59:59.999999999')

你可以在 PeriodIndex 和 period_range 中使用乘法频率。

In [28]: idx = pd.period_range("2015-08-01", periods=4, freq="2D")

In [29]: idx
Out[29]: PeriodIndex(['2015-08-01', '2015-08-03', '2015-08-05', '2015-08-07'], dtype='period[2D]')

In [30]: idx + 1
Out[30]: PeriodIndex(['2015-08-03', '2015-08-05', '2015-08-07', '2015-08-09'], dtype='period[2D]')

SAS XPORT 文件支持#

read_sas() 提供了对读取 SAS XPORT 格式文件的支持。(GH 4052)。

df = pd.read_sas("sas_xport.xpt")

也可以获取一个迭代器并增量读取 XPORT 文件。

for df in pd.read_sas("sas_xport.xpt", chunksize=10000):
    do_something(df)

查看文档了解更多细节。

在 .eval() 中对数学函数的支持#

eval() 现在支持调用数学函数 (GH 4893)

df = pd.DataFrame({"a": np.random.randn(10)})
df.eval("b = sin(a)")

支持的数学函数有 sin, cos, exp, log, expm1, log1p, sqrt, sinh, cosh, tanh, arcsin, arccos, arctan, arccosh, arcsinh, arctanh, abs 和 arctan2。

这些函数映射到 NumExpr 引擎的内在函数。对于 Python 引擎，它们映射到 NumPy 调用。

对 Excel 的更改与 `MultiIndex`#

在版本 0.16.2 中，带有 MultiIndex 列的 DataFrame 无法通过 to_excel 写入 Excel。该功能已添加（GH 10564），同时还更新了 read_excel，以便可以通过指定 header 和 index_col 参数中的哪些列/行构成 MultiIndex 来读回数据，且不会丢失信息（GH 4679）

查看文档了解更多详情。

In [31]: df = pd.DataFrame(
   ....:     [[1, 2, 3, 4], [5, 6, 7, 8]],
   ....:     columns=pd.MultiIndex.from_product(
   ....:         [["foo", "bar"], ["a", "b"]], names=["col1", "col2"]
   ....:     ),
   ....:     index=pd.MultiIndex.from_product([["j"], ["l", "k"]], names=["i1", "i2"]),
   ....: )
   ....: 

In [32]: df
Out[32]: 
col1  foo    bar   
col2    a  b   a  b
i1 i2              
j  l    1  2   3  4
   k    5  6   7  8

[2 rows x 4 columns]

In [33]: df.to_excel("test.xlsx")

In [34]: df = pd.read_excel("test.xlsx", header=[0, 1], index_col=[0, 1])

In [35]: df
Out[35]: 
col1  foo    bar   
col2    a  b   a  b
i1 i2              
j  l    1  2   3  4
   k    5  6   7  8

[2 rows x 4 columns]

之前，如果序列化数据有索引名称，则需要在 read_excel 中指定 has_index_names 参数。对于版本 0.17.0，to_excel 的输出格式已更改，使得此关键字不再必要 - 更改如下所示。

旧

新

警告

在0.16.2或更早版本中保存的带有索引名称的Excel文件仍然可以读取，但必须将 has_index_names 参数指定为 True。

Google BigQuery 增强功能#

增加了在目标表/数据集不存在时，使用 pandas.io.gbq.to_gbq() 函数自动创建表/数据集的功能。(GH 8325, GH 11121)。
在调用 pandas.io.gbq.to_gbq() 函数时，通过 if_exists 参数增加了替换现有表和模式的能力。更多详情请参见文档 (GH 8325)。
在 gbq 模块中，InvalidColumnOrder 和 InvalidPageToken 将引发 ValueError 而不是 IOError。
generate_bq_schema() 函数现已弃用，并将在未来版本中移除 (GH 11121)
gbq 模块现在将支持 Python 3 (GH 11094)。

使用 Unicode 东亚宽度进行显示对齐#

警告

启用此选项将影响 DataFrame 和 Series 的打印性能（大约慢2倍）。仅在实际需要时使用。

一些东亚国家使用的 Unicode 字符其宽度相当于 2 个字母。如果一个 DataFrame 或 Series 包含这些字符，默认输出将无法正确对齐。以下选项被添加以启用对这些字符的精确处理。

display.unicode.east_asian_width: 是否使用 Unicode 东亚宽度来计算显示文本宽度。(GH 2612)
display.unicode.ambiguous_as_wide: 是否将属于 Ambiguous 的 Unicode 字符作为 Wide 处理。(GH 11102)

In [36]: df = pd.DataFrame({u"国籍": ["UK", u"日本"], u"名前": ["Alice", u"しのぶ"]})

In [37]: df
Out[37]: 
   国籍     名前
0  UK  Alice
1  日本    しのぶ

[2 rows x 2 columns]

In [38]: pd.set_option("display.unicode.east_asian_width", True)

In [39]: df
Out[39]: 
   国籍    名前
0    UK   Alice
1  日本  しのぶ

[2 rows x 2 columns]

欲了解更多详情，请参见这里

其他增强功能#

对 openpyxl >= 2.2 的支持。样式支持的 API 现在已稳定 (GH 10125)

merge 现在接受参数 indicator，它会在输出对象中添加一个类别类型的列（默认称为 _merge），该列的取值为 (GH 8790)

观察原点	`_merge` 值
仅在 `'left'` 帧中合并键	`left_only`
仅在 `'right'` 帧中合并键	`right_only`
两个帧中的合并键	`both`

In [40]: df1 = pd.DataFrame({"col1": [0, 1], "col_left": ["a", "b"]})

In [41]: df2 = pd.DataFrame({"col1": [1, 2, 2], "col_right": [2, 2, 2]})

In [42]: pd.merge(df1, df2, on="col1", how="outer", indicator=True)
Out[42]: 
   col1 col_left  col_right      _merge
0     0        a        NaN   left_only
1     1        b        2.0        both
2     2      NaN        2.0  right_only
3     2      NaN        2.0  right_only

[4 rows x 4 columns]

更多信息，请参见更新的文档

pd.to_numeric 是一个新的函数，用于将字符串强制转换为数字（可能带有强制转换） (GH 11133)
pd.merge 现在允许在未合并的列名中存在重复项（GH 10639）。
pd.pivot 现在允许传递 None 作为索引 (GH 3962)。

pd.concat 现在如果提供了的话，将使用现有的 Series 名称 (GH 10698)。

In [43]: foo = pd.Series([1, 2], name="foo")

In [44]: bar = pd.Series([1, 2])

In [45]: baz = pd.Series([4, 5])

之前的行为：

In [1]: pd.concat([foo, bar, baz], axis=1)
Out[1]:
      0  1  2
   0  1  1  4
   1  2  2  5

新行为：

In [46]: pd.concat([foo, bar, baz], axis=1)
Out[46]: 
   foo  0  1
0    1  1  4
1    2  2  5

[2 rows x 3 columns]

DataFrame 获得了 nlargest 和 nsmallest 方法 (GH 10393)

添加一个 limit_direction 关键字参数，该参数与 limit 一起工作，以使 interpolate 能够向前、向后或双向填充 NaN 值（GH 9218, GH 10420, GH 11115）

In [47]: ser = pd.Series([np.nan, np.nan, 5, np.nan, np.nan, np.nan, 13])

In [48]: ser.interpolate(limit=1, limit_direction="both")
Out[48]: 
0     NaN
1     5.0
2     5.0
3     7.0
4     NaN
5    11.0
6    13.0
Length: 7, dtype: float64

添加了一个 DataFrame.round 方法，用于将数值四舍五入到可变的小数位数 (GH 10568)。

In [49]: df = pd.DataFrame(
   ....:     np.random.random([3, 3]),
   ....:     columns=["A", "B", "C"],
   ....:     index=["first", "second", "third"],
   ....: )
   ....: 

In [50]: df
Out[50]: 
               A         B         C
first   0.126970  0.966718  0.260476
second  0.897237  0.376750  0.336222
third   0.451376  0.840255  0.123102

[3 rows x 3 columns]

In [51]: df.round(2)
Out[51]: 
           A     B     C
first   0.13  0.97  0.26
second  0.90  0.38  0.34
third   0.45  0.84  0.12

[3 rows x 3 columns]

In [52]: df.round({"A": 0, "C": 2})
Out[52]: 
          A         B     C
first   0.0  0.966718  0.26
second  1.0  0.376750  0.34
third   0.0  0.840255  0.12

[3 rows x 3 columns]

drop_duplicates 和 duplicated 现在接受一个 keep 关键字来针对第一个、最后一个和所有重复项。take_last 关键字已被弃用，请参见这里 (GH 6511, GH 8505)

In [53]: s = pd.Series(["A", "B", "C", "A", "B", "D"])

In [54]: s.drop_duplicates()
Out[54]: 
0    A
1    B
2    C
5    D
Length: 4, dtype: object

In [55]: s.drop_duplicates(keep="last")
Out[55]: 
2    C
3    A
4    B
5    D
Length: 4, dtype: object

In [56]: s.drop_duplicates(keep=False)
Out[56]: 
2    C
5    D
Length: 2, dtype: object

Reindex 现在有一个 tolerance 参数，允许对重新索引时的填充限制进行更精细的控制 (GH 10411):

In [57]: df = pd.DataFrame({"x": range(5), "t": pd.date_range("2000-01-01", periods=5)})

In [58]: df.reindex([0.1, 1.9, 3.5], method="nearest", tolerance=0.2)
Out[58]: 
       x          t
0.1  0.0 2000-01-01
1.9  2.0 2000-01-03
3.5  NaN        NaT

[3 rows x 2 columns]

当在 DatetimeIndex 、 TimedeltaIndex 或 PeriodIndex 上使用时， tolerance 将会尽可能地被强制转换为 Timedelta 。这允许你用字符串指定容差：

In [59]: df = df.set_index("t")

In [60]: df.reindex(pd.to_datetime(["1999-12-31"]), method="nearest", tolerance="1 day")
Out[60]: 
            x
1999-12-31  0

[1 rows x 1 columns]

tolerance 也通过较低级别的 Index.get_indexer 和 Index.get_loc 方法暴露出来。

在重采样 TimeDeltaIndex 时增加了使用 base 参数的功能 (GH 10530)
DatetimeIndex 可以使用包含 NaT 的字符串来实例化 (GH 7599)
to_datetime 现在可以接受 yearfirst 关键字 (GH 7599)
pandas.tseries.offsets 大于 Day 偏移的现在可以与 Series 一起用于加法/减法 (GH 10699)。更多详情请参见文档。
pd.Timedelta.total_seconds() 现在返回 Timedelta 持续时间到 ns 精度（之前是微秒精度）(GH 10939)
PeriodIndex 现在支持与 np.ndarray 的算术运算 (GH 10638)
支持 Period 对象的序列化 (GH 10439)
.as_blocks 现在将接受一个 copy 可选参数以返回数据的副本，默认是复制（与之前版本的行为没有变化），(GH 9607)
regex 参数到 DataFrame.filter 现在可以处理数字列名，而不是引发 ValueError (GH 10384)。
通过URL启用读取gzip压缩文件，可以通过显式设置压缩参数或通过从响应中的HTTP Content-Encoding头推断来实现 (GH 8685)
启用使用 StringIO/BytesIO 在内存中写入 Excel 文件 (GH 7074)
在 ExcelWriter 中启用列表和字典到字符串的序列化 (GH 8188)
SQL io 函数现在接受一个 SQLAlchemy 可连接对象。(GH 7877)
pd.read_sql 和 to_sql 可以接受数据库 URI 作为 con 参数 (GH 10214)
read_sql_table 现在允许从视图中读取 (GH 10750)。
在使用 table 格式时，启用将复杂值写入 HDFStores 的功能 (GH 10447)
启用 pd.read_hdf 在 HDF 文件包含单个数据集时无需指定键即可使用 (GH 10443)
pd.read_stata 现在可以读取 Stata 118 类型的文件。(GH 9882)
msgpack 子模块已更新到 0.4.6，保持向后兼容性 (GH 10581)
DataFrame.to_dict 现在接受 orient='index' 关键字参数 (GH 10844)。
DataFrame.apply 如果传递的函数返回一个字典并且 reduce=True ，将返回一个字典系列 (GH 8735)。
允许传递 kwargs 到插值方法 (GH 10378)。
当连接一个空的 Dataframe 对象的可迭代对象时，改进了错误信息 (GH 9157)
pd.read_csv 现在可以增量读取 bz2 压缩文件，C 解析器可以从 AWS S3 读取 bz2 压缩文件 (GH 11070, GH 11072)。
在 pd.read_csv 中，识别 s3n:// 和 s3a:// URL 作为指定 S3 文件存储 (GH 11070, GH 11071)。
从AWS S3增量读取CSV文件，而不是首先下载整个文件。（在Python 2中仍需要完整下载压缩文件。）（GH 11070, GH 11073）
pd.read_csv 现在能够推断从 AWS S3 存储读取的文件的压缩类型（GH 11070, GH 11074）。

向后不兼容的 API 变化#

对排序 API 的更改#

排序 API 有一些长期的不一致性。(GH 9816, GH 8239)。

以下是 API 在 0.17.0 之前 的总结：

Series.sort 是 INPLACE 的，而 DataFrame.sort 返回一个新对象。
Series.order 返回一个新对象
可以使用 Series/DataFrame.sort_index 通过传递 by 关键字按值进行排序。
Series/DataFrame.sortlevel 仅在 MultiIndex 上工作，用于按索引排序。

为了解决这些问题，我们重构了API：

我们引入了一种新方法，DataFrame.sort_values()，它是 DataFrame.sort()、Series.sort() 和 Series.order() 的合并，用于处理值的排序。
现有的方法 Series.sort(), Series.order(), 和 DataFrame.sort() 已被弃用，并将在未来版本中移除。
DataFrame.sort_index() 的 by 参数已被弃用，并将在未来版本中移除。
现有的方法 .sort_index() 将获得 level 关键字，以启用级别排序。

我们现在有两种不同的且不重叠的排序方法。一个 * 标记的项目将显示一个 FutureWarning。

要按值排序：

上一个	Replacement
* `Series.order()`	`Series.sort_values()`
* `Series.sort()`	`Series.sort_values(inplace=True)`
* `DataFrame.sort(columns=...)`	`DataFrame.sort_values(by=...)`

按索引排序：

上一个	Replacement
`Series.sort_index()`	`Series.sort_index()`
`Series.sortlevel(level=...)`	`Series.sort_index(level=...`)
`DataFrame.sort_index()`	`DataFrame.sort_index()`
`DataFrame.sortlevel(level=...)`	`DataFrame.sort_index(level=...)`
* `DataFrame.sort()`	`DataFrame.sort_index()`

我们还弃用了两个类似 Series 的类 Index 和 Categorical 中的类似方法，并进行了更改。

上一个	Replacement
* `Index.order()`	`Index.sort_values()`
* `Categorical.order()`	`Categorical.sort_values()`

对 to_datetime 和 to_timedelta 的更改#

错误处理#

pd.to_datetime 的错误处理默认值已更改为 errors='raise'。在之前的版本中，它是 errors='ignore'。此外，coerce 参数已被弃用，取而代之的是 errors='coerce'。这意味着无效的解析将引发错误，而不是像以前版本那样返回原始输入。(GH 10636)

之前的行为：

In [2]: pd.to_datetime(['2009-07-31', 'asd'])
Out[2]: array(['2009-07-31', 'asd'], dtype=object)

新行为：

In [3]: pd.to_datetime(['2009-07-31', 'asd'])
ValueError: Unknown string format

当然，你也可以强制这样做。

In [61]: pd.to_datetime(["2009-07-31", "asd"], errors="coerce")
Out[61]: DatetimeIndex(['2009-07-31', 'NaT'], dtype='datetime64[s]', freq=None)

要保留之前的行为，你可以使用 errors='ignore':

In [4]: pd.to_datetime(["2009-07-31", "asd"], errors="ignore")
Out[4]: Index(['2009-07-31', 'asd'], dtype='object')

此外，pd.to_timedelta 获得了类似的 API，即 errors='raise'|'ignore'|'coerce'，并且 coerce 关键字已被弃用，取而代之的是 errors='coerce'。

一致的解析#

to_datetime、Timestamp 和 DatetimeIndex 的字符串解析已经保持一致。(GH 7599)

在 v0.17.0 之前，Timestamp 和 to_datetime 可能会使用今天的日期错误地解析仅包含年份的日期时间字符串，否则 DatetimeIndex 使用该年份的开始。Timestamp 和 to_datetime 可能会在某些类型的日期时间字符串上引发 ValueError，而 DatetimeIndex 可以解析这些字符串，例如季度字符串。

之前的行为：

In [1]: pd.Timestamp('2012Q2')
Traceback
   ...
ValueError: Unable to parse 2012Q2

# Results in today's date.
In [2]: pd.Timestamp('2014')
Out [2]: 2014-08-12 00:00:00

v0.17.0 可以如下解析它们。它也适用于 DatetimeIndex。

新行为：

In [62]: pd.Timestamp("2012Q2")
Out[62]: Timestamp('2012-04-01 00:00:00')

In [63]: pd.Timestamp("2014")
Out[63]: Timestamp('2014-01-01 00:00:00')

In [64]: pd.DatetimeIndex(["2012Q2", "2014"])
Out[64]: DatetimeIndex(['2012-04-01', '2014-01-01'], dtype='datetime64[s]', freq=None)

备注

如果你想基于今天的日期进行计算，使用 Timestamp.now() 和 pandas.tseries.offsets。

In [65]: import pandas.tseries.offsets as offsets

In [66]: pd.Timestamp.now()
Out[66]: Timestamp('2024-08-26 03:54:36.458413')

In [67]: pd.Timestamp.now() + offsets.DateOffset(years=1)
Out[67]: Timestamp('2025-08-26 03:54:36.459509')

索引比较的更改#

在 Index 上的操作符等于应该与 Series 的行为相似（GH 9947, GH 10637）

从 v0.17.0 开始，比较不同长度的 Index 对象将引发 ValueError。这是为了与 Series 的行为保持一致。

之前的行为：

In [2]: pd.Index([1, 2, 3]) == pd.Index([1, 4, 5])
Out[2]: array([ True, False, False], dtype=bool)

In [3]: pd.Index([1, 2, 3]) == pd.Index([2])
Out[3]: array([False,  True, False], dtype=bool)

In [4]: pd.Index([1, 2, 3]) == pd.Index([1, 2])
Out[4]: False

新行为：

In [8]: pd.Index([1, 2, 3]) == pd.Index([1, 4, 5])
Out[8]: array([ True, False, False], dtype=bool)

In [9]: pd.Index([1, 2, 3]) == pd.Index([2])
ValueError: Lengths must match to compare

In [10]: pd.Index([1, 2, 3]) == pd.Index([1, 2])
ValueError: Lengths must match to compare

请注意，这与 numpy 的行为不同，在 numpy 中比较可以广播：

In [68]: np.array([1, 2, 3]) == np.array([1])
Out[68]: array([ True, False, False])

或者如果广播不能完成，它可以返回 False：

In [11]: np.array([1, 2, 3]) == np.array([1, 2])
Out[11]: False

布尔比较与 None 的变化#

Series 与 None 的布尔比较现在将等同于与 np.nan 比较，而不是引发 TypeError。(GH 1079)。

In [69]: s = pd.Series(range(3), dtype="float")

In [70]: s.iloc[1] = None

In [71]: s
Out[71]: 
0    0.0
1    NaN
2    2.0
Length: 3, dtype: float64

之前的行为：

In [5]: s == None
TypeError: Could not compare <type 'NoneType'> type with Series

新行为：

In [72]: s == None
Out[72]: 
0    False
1    False
2    False
Length: 3, dtype: bool

通常你只是想知道哪些值是空的。

In [73]: s.isnull()
Out[73]: 
0    False
1     True
2    False
Length: 3, dtype: bool

警告

通常你会想使用 isnull/notnull 来进行这些类型的比较，因为 isnull/notnull 告诉你哪些元素是空的。需要注意的是 nan 不等于 nan，但 None 等于 None。注意 pandas/numpy 使用了 np.nan != np.nan 这一事实，并将 None 视为 np.nan。

In [74]: None == None
Out[74]: True

In [75]: np.nan == np.nan
Out[75]: False

HDFStore dropna 行为#

对于 format='table' 的 HDFStore 写函数，默认行为现在是保留所有缺失的行。以前的行为是删除所有缺失的行（保存索引除外）。以前的行为可以通过使用 dropna=True 选项来复制。(GH 9382)

之前的行为：

In [76]: df_with_missing = pd.DataFrame(
   ....:     {"col1": [0, np.nan, 2], "col2": [1, np.nan, np.nan]}
   ....: )
   ....: 

In [77]: df_with_missing
Out[77]: 
   col1  col2
0   0.0   1.0
1   NaN   NaN
2   2.0   NaN

[3 rows x 2 columns]

In [27]:
df_with_missing.to_hdf('file.h5',
                       key='df_with_missing',
                       format='table',
                       mode='w')

In [28]: pd.read_hdf('file.h5', 'df_with_missing')

Out [28]:
      col1  col2
  0     0     1
  2     2   NaN

新行为：

In [78]: df_with_missing.to_hdf("file.h5", key="df_with_missing", format="table", mode="w")

In [79]: pd.read_hdf("file.h5", "df_with_missing")
Out[79]: 
   col1  col2
0   0.0   1.0
1   NaN   NaN
2   2.0   NaN

[3 rows x 2 columns]

查看文档了解更多细节。

对 `display.precision` 选项的更改#

display.precision 选项已澄清，指代小数位数 (GH 10451)。

早期版本的 pandas 会将浮点数格式化为比 display.precision 中的值少一个小数位。

In [1]: pd.set_option('display.precision', 2)

In [2]: pd.DataFrame({'x': [123.456789]})
Out[2]:
       x
0  123.5

如果将精度解释为“有效数字”，这在科学计数法中确实有效，但同样的解释在标准格式值中并不适用。这与numpy处理格式的方式也不一致。

向前推进，display.precision 的值将直接控制小数点后的位数，用于常规格式化以及科学记数法，类似于 numpy 的 precision 打印选项的工作方式。

In [80]: pd.set_option("display.precision", 2)

In [81]: pd.DataFrame({"x": [123.456789]})
Out[81]: 
        x
0  123.46

[1 rows x 1 columns]

为了保持与先前版本的输出行为一致，display.precision 的默认值已从 7 减少到 6。

对 `Categorical.unique` 的更改#

Categorical.unique 现在返回具有唯一 categories 和 codes 的新 Categoricals，而不是返回 np.array (GH 10508)

无序类别：值和类别按出现顺序排序。
有序类别：值按出现顺序排序，类别保持现有顺序。

In [82]: cat = pd.Categorical(["C", "A", "B", "C"], categories=["A", "B", "C"], ordered=True)

In [83]: cat
Out[83]: 
['C', 'A', 'B', 'C']
Categories (3, object): ['A' < 'B' < 'C']

In [84]: cat.unique()
Out[84]: 
['C', 'A', 'B']
Categories (3, object): ['A' < 'B' < 'C']

In [85]: cat = pd.Categorical(["C", "A", "B", "C"], categories=["A", "B", "C"])

In [86]: cat
Out[86]: 
['C', 'A', 'B', 'C']
Categories (3, object): ['A', 'B', 'C']

In [87]: cat.unique()
Out[87]: 
['C', 'A', 'B']
Categories (3, object): ['A', 'B', 'C']

在解析器中作为 `header` 传递的 `bool` 的更改#

在早期的 pandas 版本中，如果传递给 read_csv、read_excel 或 read_html 的 header 参数是一个布尔值，它会隐式转换为整数，结果是 False 对应 header=0，True 对应 header=1 (GH 6113)

header 的 bool 输入现在会引发 TypeError

In [29]: df = pd.read_csv('data.csv', header=False)
TypeError: Passing a bool to header is invalid. Use header=None for no header or
header=int or list-like of ints to specify the row(s) making up the column names

其他 API 更改#

使用 subplots=True 的线和 kde 图现在使用默认颜色，不再是全黑。指定 color='k' 以黑色绘制所有线条 (GH 9894)
在具有 categorical dtype 的 Series 上调用 .value_counts() 方法现在返回一个具有 CategoricalIndex 的 Series (GH 10704)
pandas对象子类的元数据属性现在将被序列化 (GH 10553)。
使用 Categorical 的 groupby 遵循与上述 Categorical.unique 相同的规则 (GH 10508)
当使用 complex64 dtype 的数组构造 DataFrame 时，以前意味着相应的列会自动提升到 complex128 dtype。pandas 现在将保留复杂数据的输入项大小（GH 10952）
某些数值缩减运算符在包含字符串和数字的对象类型上会返回 ValueError，而不是 TypeError (GH 11131)
将当前不支持的 chunksize 参数传递给 read_excel 或 ExcelFile.parse 现在将引发 NotImplementedError (GH 8011)
允许将 ExcelFile 对象传递给 read_excel (GH 11198)
DatetimeIndex.union 如果 self 和输入的 freq 为 None，则不推断 freq (GH 11086)

NaT 的方法现在要么引发 ValueError，要么返回 np.nan 或 NaT (GH 9513)

行为	方法
返回 `np.nan`	`weekday`, `isoweekday`
返回 `NaT`	`date`, `now`, `replace`, `to_datetime`, `today`
返回 `np.datetime64('NaT')`	`to_datetime64` (不变)
引发 `ValueError`	所有其他公共方法（名称不以下划线开头）

弃用#

对于 Series ，以下索引函数已被弃用（GH 10177）。

弃用函数

Replacement

.irow(i)

.iloc[i] 或 .iat[i]

.iget(i)

.iloc[i] 或 .iat[i]

.iget_value(i)

.iloc[i] 或 .iat[i]
对于 DataFrame ，以下索引函数已被弃用（GH 10177）。

弃用函数

Replacement

.irow(i)

.iloc[i]

.iget_value(i, j)

.iloc[i, j] 或 .iat[i, j]

.icol(j)

.iloc[:, j]

备注

自 0.11.0 版本起，这些索引功能在文档中已被弃用。

Categorical.name 已被弃用，以使 Categorical 更像 numpy.ndarray。请改用 Series(cat, name="whatever") (GH 10482)。
在 Categorical 的 categories 中设置缺失值（NaN）将发出警告（GH 10748）。您仍然可以在 values 中有缺失值。
drop_duplicates 和 duplicated 的 take_last 关键字已被弃用，取而代之的是 keep。 (GH 6511, GH 8505)
Series.nsmallest 和 nlargest 的 take_last 关键字已被弃用，取而代之的是 keep。 (GH 10792)
DataFrame.combineAdd 和 DataFrame.combineMult 已被弃用。它们可以很容易地被 add 和 mul 方法替代：DataFrame.add(other, fill_value=0) 和 DataFrame.mul(other, fill_value=1.) (GH 10735)。
TimeSeries 已被 Series 取代（注意，自 0.13.0 版本以来，这已经是一个别名），(GH 10890)
SparsePanel 已弃用，并将在未来版本中移除 (GH 11157)。
Series.is_time_series 已被弃用，取而代之的是 Series.index.is_all_dates (GH 11135)
遗留偏移量（如 'A@JAN'）已被弃用（注意，这自 0.8.0 版本以来一直是别名）(GH 10878)
WidePanel 已弃用，改为使用 Panel，LongPanel 改为使用 ``DataFrame``（注意这些自 < 0.11.0 版本以来一直是别名），(GH 10892)
DataFrame.convert_objects 已被弃用，取而代之的是特定类型的函数 pd.to_datetime、pd.to_timestamp 和 ``pd.to_numeric``（0.17.0 新增）(GH 11133)。

移除先前版本的弃用/更改#

从 Series.order() 和 Series.sort() 中移除 na_last 参数，改为使用 na_position。 (GH 5231)
从 .describe() 中移除 percentile_width，改为使用 percentiles。(GH 7088)
在 0.8.0 版本左右，从 DataFrame.to_string() 中移除了 colSpace 参数，改为使用 col_space。

移除自动时间序列广播 (GH 2304)

In [88]: np.random.seed(1234)

In [89]: df = pd.DataFrame(
   ....:     np.random.randn(5, 2),
   ....:     columns=list("AB"),
   ....:     index=pd.date_range("2013-01-01", periods=5),
   ....: )
   ....: 

In [90]: df
Out[90]: 
                   A         B
2013-01-01  0.471435 -1.190976
2013-01-02  1.432707 -0.312652
2013-01-03 -0.720589  0.887163
2013-01-04  0.859588 -0.636524
2013-01-05  0.015696 -2.242685

[5 rows x 2 columns]

之前

In [3]: df + df.A
FutureWarning: TimeSeries broadcasting along DataFrame index by default is deprecated.
Please use DataFrame.<op> to explicitly broadcast arithmetic operations along the index

Out[3]:
                    A         B
2013-01-01  0.942870 -0.719541
2013-01-02  2.865414  1.120055
2013-01-03 -1.441177  0.166574
2013-01-04  1.719177  0.223065
2013-01-05  0.031393 -2.226989

当前

In [91]: df.add(df.A, axis="index")
Out[91]: 
                   A         B
2013-01-01  0.942870 -0.719541
2013-01-02  2.865414  1.120055
2013-01-03 -1.441177  0.166574
2013-01-04  1.719177  0.223065
2013-01-05  0.031393 -2.226989

[5 rows x 2 columns]

在 HDFStore.put/append 中移除 table 关键字，改为使用 format= (GH 4645)
在 read_excel/ExcelFile 中移除 kind ，因为它未被使用 (GH 4712)
从 pd.read_html 中移除 infer_type 关键字，因为它未被使用 (GH 4770, GH 7032)
从 Series.tshift/shift 中移除 offset 和 timeRule 关键字，改为使用 freq (GH 4853, GH 4864)
移除 pd.load/pd.save 别名，改为使用 pd.to_pickle/pd.read_pickle (GH 3787)

性能提升#

使用 Air Speed Velocity 库进行基准测试的开发支持 (GH 8361)
为替代的 ExcelWriter 引擎和读取 Excel 文件添加了 vbench 基准测试 (GH 7171)
Categorical.value_counts 中的性能改进 (GH 10804)
在 SeriesGroupBy.nunique 和 SeriesGroupBy.value_counts 以及 SeriesGroupby.transform 中的性能改进 (GH 10820, GH 11077)
在 DataFrame.drop_duplicates 中使用整数数据类型时的性能改进 (GH 10917)
在 DataFrame.duplicated 中使用宽帧的性能改进。(GH 10161, GH 11180)
timedelta 字符串解析的 4 倍改进 (GH 6755, GH 10426)
timedelta64 和 datetime64 操作的8倍改进 (GH 6755)
显著提升了使用切片器对 MultiIndex 进行索引的性能 (GH 10287)
使用类似列表的输入，iloc 提升了 8 倍 (GH 10791)
改进了 Series.isin 对于日期时间类型/整数 Series 的性能 (GH 10287)
当类别相同时，concat 的 Categoricals 性能提升了 20 倍 (GH 10587)
在指定格式字符串为ISO8601时，to_datetime 的性能得到了提升 (GH 10178)
Series.value_counts 对于 float 类型的 2 倍改进 (GH 10821)
在 to_datetime 中启用 infer_datetime_format 当日期组件没有0填充时 (GH 11142)
从 0.16.1 版本开始在从嵌套字典构造 DataFrame 时出现的回归问题 (GH 11084)
DateOffset 在 Series 或 DatetimeIndex 的加减操作中的性能改进 (GH 10744, GH 11205)

错误修复#

由于溢出导致的 timedelta64[ns] 上的 .mean() 计算错误 (GH 9442)
在旧版 numpy 中的 .isin 存在错误 (GH 11232)
DataFrame.to_html(index=False) 中的错误渲染了不必要的 name 行 (GH 10344)
DataFrame.to_latex() 中的 column_format 参数无法传递 (GH 9402)
在 DatetimeIndex 本地化时带有 NaT 的错误 (GH 10477)
Series.dt 操作中保留元数据的错误 (GH 10477)
在传递给 to_datetime 构造函数时保留 NaT 的错误（GH 10477）
当函数返回分类系列时，DataFrame.apply 中的错误。(GH 9573)
to_datetime 中存在一个错误，当提供无效的日期和格式时 (GH 10154)
Index.drop_duplicates 中删除名称的错误 (GH 10115)
Series.quantile 中的错误导致名称丢失 (GH 10881)
在 pd.Series 中设置值时，当索引具有频率的空 Series 存在错误。(GH 10193)
pd.Series.interpolate 中 order 关键字值无效的错误。(GH 10633)
在 DataFrame.plot 中的错误在指定颜色名称由多个字符组成时引发 ValueError (GH 10387)
在 Index 构建中存在一个包含混合元组列表的错误 (GH 10697)
当索引包含 NaT 时，DataFrame.reset_index 中的错误。(GH 10388)
当工作表为空时 ExcelReader 中的错误 (GH 6403)
BinGrouper.group_info 中的错误，返回值与基类不兼容 (GH 10914)
在 DataFrame.pop 上清除缓存的错误以及随后就地操作的错误 (GH 10912)
使用混合整数 Index 进行索引时出现 ImportError 的错误 (GH 10610)
当索引包含空值时 Series.count 中的错误 (GH 10946)
在非规则频率 DatetimeIndex 中的序列化错误 (GH 11002)
当框架具有对称形状时，导致 DataFrame.where 不尊重 axis 参数的错误。(GH 9736)
Table.select_column 中的错误，名称未保留 (GH 10392)
offsets.generate_range 中的错误，其中 start 和 end 的精度比 offset 更高 (GH 9907)
pd.rolling_* 中的一个错误，其中 Series.name 会在输出中丢失 (GH 10565)
当索引或列不唯一时，stack 中的错误。(GH 10417)
当轴具有 MultiIndex 时设置 Panel 的错误 (GH 10360)
USFederalHolidayCalendar 中的错误，其中 USMemorialDay 和 USMartinLutherKingJr 不正确 (GH 10278 和 GH 9760)
在 .sample() 中的错误，如果设置了返回对象，会给出不必要的 SettingWithCopyWarning (GH 10738)
在 .sample() 中的一个错误，当权重作为 Series 传递时，在按位置处理之前没有沿轴对齐，如果权重索引与采样对象未对齐，可能会导致问题。(GH 10738)
在 (GH 9311, GH 6620, GH 9345) 中修复了回归问题，其中 groupby 与某些聚合器一起将类似日期时间的转换为浮点数 (GH 10979)
DataFrame.interpolate 中 axis=1 和 inplace=True 的错误 (GH 10395)
当指定多个列作为主键时，io.sql.get_schema 中的错误 (GH 10385)。
在 groupby(sort=False) 中使用类似日期时间的 Categorical 会引发 ValueError (GH 10505)
groupby(axis=1) 中使用 filter() 抛出 IndexError 的错误 (GH 11041)
在大端构建中的 test_categorical 中的错误 (GH 10425)
Series.shift 和 DataFrame.shift 中的错误不支持分类数据 (GH 9416)
在 Series.map 中使用分类 Series 引发 AttributeError (GH 10324)
MultiIndex.get_level_values 中包含 Categorical 的错误引发 AttributeError (GH 10460)
pd.get_dummies 中 sparse=True 时未返回 SparseDataFrame 的错误 (GH 10531)
Index 子类型（如 PeriodIndex）在 .drop 和 .insert 方法中没有返回它们自己的类型 (GH 10620)
当 right 数组为空时，algos.outer_join_indexer 中的错误 (GH 10618)
filter 中的错误（从 0.16.0 版本回归）和 transform 在按多个键分组时，其中一个键是类似日期时间的（GH 10114）
to_datetime 和 to_timedelta 中的错误导致 Index 名称丢失 (GH 10875)
len(DataFrame.groupby) 中的错误导致当存在仅包含 NaN 的列时引发 IndexError (GH 11016)
在重采样空系列时导致段错误的问题 (GH 10228)
DatetimeIndex 和 PeriodIndex.value_counts 中的错误会从其结果中重置名称，但在结果的 Index 中保留。 (GH 10150)
使用 numexpr 引擎的 pd.eval 中的错误将单元素 numpy 数组强制转换为标量 (GH 10546)
当列的数据类型为 category 时，pd.concat 在 axis=0 存在错误 (GH 10177)
read_msgpack 中的错误，其中输入类型并不总是被检查 (GH 10369, GH 10630)
pd.read_csv 中使用 kwargs index_col=False, index_col=['a', 'b'] 或 dtype 的错误 (GH 10413, GH 10467, GH 10577)
Series.from_csv 中 header 关键字参数未设置 Series.name 或 Series.index.name 的错误 (GH 10483)
groupby.var 中的错误导致小浮点值的方差不准确 (GH 10448)
Series.plot(kind='hist') 中的错误：Y 标签不具有信息性 (GH 10485)
当使用生成 uint8 类型的转换器时，read_csv 中的错误 (GH 9266)
在时间序列线和面积图中存在内存泄漏的错误 (GH 9003)
当右侧是一个 DataFrame 时，沿着主轴或次轴设置 Panel 时出现的错误 (GH 11014)
当 Panel 的运算符函数（例如 .add）未实现时，返回 None 且不引发 NotImplementedError 的错误 (GH 7692)
当 subplots=True 时，线和kde图不能接受多种颜色 (GH 9894)
在 DataFrame.plot 中的错误在指定颜色名称由多个字符组成时引发 ValueError (GH 10387)
带有 MultiIndex 的 Series 的左右 align 中的错误可能被颠倒 (GH 10665)
左侧和右侧 join 与 MultiIndex 的错误可能被颠倒 (GH 10741)
在 columns 中设置了不同顺序时读取文件时 read_stata 中的错误 (GH 10757)
Categorical 中的错误在类别包含 tz 或 Period 时可能无法正确表示 (GH 10713)
Categorical.__iter__ 中的错误可能不会返回正确的 datetime 和 Period (GH 10713)
在具有 PeriodIndex 的对象上使用 PeriodIndex 进行索引时出现的错误 (GH 4125)
使用 engine='c' 的 read_csv 中的错误：注释、空白行等之前的 EOF 未正确处理 (GH 10728, GH 10548)
通过 DataReader 读取“famafrench”数据会导致 HTTP 404 错误，因为网站 URL 已更改 (GH 10591)。
read_msgpack 中存在一个错误，解码的 DataFrame 具有重复的列名 (GH 9618)
io.common.get_filepath_or_buffer 中的错误，如果桶中还包含用户没有读取权限的键，则会导致读取有效的 S3 文件失败 (GH 10604)
在使用 python datetime.date 和 numpy datetime64 设置时间戳列的向量化设置中的错误 (GH 10408, GH 10412)
Index.take 中的错误可能会添加不必要的 freq 属性 (GH 10791)
在 merge 中使用空的 DataFrame 可能会引发 IndexError (GH 10824)
to_latex 中的错误，对某些已记录的参数出现意外的关键字参数 (GH 10888)
在大型 DataFrame 索引中未捕获 IndexError 的错误 (GH 10645 和 GH 10692)
当文件仅包含标题行时，使用 nrows 或 chunksize 参数在 read_csv 中存在错误 (GH 9535)
在存在替代编码的情况下，HDF5 中 category 类型的序列化存在错误。(GH 10366)
在 pd.DataFrame 中构造一个带有字符串数据类型的空 DataFrame 时出现的错误 (GH 9428)
当 DataFrame 未合并时，pd.DataFrame.diff 中的错误 (GH 10907)
在 datetime64 或 timedelta64 数据类型的数组中，pd.unique 存在一个错误，这意味着返回的是一个对象数据类型的数组，而不是原始数据类型 (GH 9431)。
Timedelta 从 0s 切片时引发错误的错误 (GH 10583)
DatetimeIndex.take 和 TimedeltaIndex.take 中的错误可能不会对无效索引引发 IndexError (GH 10295)
Series([np.nan]).astype('M8[ms]') 中的错误，现在返回 Series([pd.NaT]) (GH 10747)
PeriodIndex.order 重置频率中的错误 (GH 10295)
当 freq 以纳秒划分 end 时 date_range 中的 Bug (GH 10885)
iloc 中的一个错误，允许使用负整数访问 Series 边界外的内存 (GH 10779)
read_msgpack 中编码未被尊重的错误 (GH 10581)
使用包含适当负整数的列表时，阻止访问第一个索引的错误（GH 10547, GH 10779）
TimedeltaIndex 格式化器中的错误导致在尝试使用 to_csv 保存带有 TimedeltaIndex 的 DataFrame 时出错 (GH 10833)
在处理 Series 切片时 DataFrame.where 中的 Bug (GH 10218, GH 9558)
当 Bigquery 返回零行时 pd.read_gbq 抛出 ValueError 的错误 (GH 10273)
to_json 中的一个错误，在序列化 0 秩 ndarray 时导致段错误 (GH 9576)
绘图函数中的错误在 GridSpec 上绘制时可能会引发 IndexError (GH 10819)
绘图结果中的错误可能会显示不必要的次要刻度标签 (GH 10657)
groupby 中的错误：在包含 NaT 的 DataFrame 上进行聚合计算不正确（例如 first, last, min）。(GH 10590, GH 11010)
在构建 DataFrame 时，传递仅包含标量值的字典并指定列时未引发错误 (GH 10856)
.var() 中的错误导致高度相似值的舍入误差 (GH 10242)
DataFrame.plot(subplots=True) 中重复列的错误输出不正确的结果 (GH 10962)
Index 算术中的错误可能导致不正确的类 (GH 10638)
date_range 中的错误导致如果 freq 是负的年、季度和月，结果为空 (GH 11018)
DatetimeIndex 中的错误无法推断负频率 (GH 11018)
移除一些已弃用的 numpy 比较操作的使用，主要在测试中。(GH 10569)
Index dtype 中的错误可能未正确应用 (GH 11017)
在测试最低 google api 客户端版本时 io.gbq 中的错误 (GH 10652)
从嵌套 dict 构造 DataFrame 时，timedelta 键的错误 (GH 11129)
当数据包含datetime类型时，.fillna 中的错误可能会引发 TypeError (GH 7095, GH 11153)
当分组的键数与索引长度相同时，.groupby 中的错误 (GH 11185)
convert_objects 中的错误，如果在所有值都为空且 coerce 的情况下可能不会返回转换后的值 (GH 9589)
convert_objects 中的错误，其中 copy 关键字未被尊重 (GH 9589)

贡献者#

总共有 112 人为此版本贡献了补丁。名字后面带有 “+” 的人首次贡献了补丁。

Alex Rothberg
Andrea Bedini +
Andrew Rosenfeld
Andy Hayden
Andy Li +
Anthonios Partheniou +
Artemy Kolchinsky
Bernard Willers
Charlie Clark +
Chris +
Chris Whelan
Christoph Gohlke +
Christopher Whelan
Clark Fitzgerald
Clearfield Christopher +
Dan Ringwalt +
Daniel Ni +
Data & Code Expert Experimenting with Code on Data +
David Cottrell
David John Gagne +
David Kelly +
ETF +
Eduardo Schettino +
Egor +
Egor Panfilov +
Evan Wright
Frank Pinter +
Gabriel Araujo +
Garrett-R
Gianluca Rossi +
Guillaume Gay
Guillaume Poulin
Harsh Nisar +
Ian Henriksen +
Ian Hoegen +
Jaidev Deshpande +
Jan Rudolph +
Jan Schulz
Jason Swails +
Jeff Reback
Jonas Buyl +
Joris Van den Bossche
Joris Vankerschaver +
Josh Levy-Kramer +
Julien Danjou
Ka Wo Chen
Karrie Kehoe +
Kelsey Jordahl
Kerby Shedden
Kevin Sheppard
Lars Buitinck
Leif Johnson +
Luis Ortiz +
Mac +
Matt Gambogi +
Matt Savoie +
Matthew Gilbert +
Maximilian Roos +
Michelangelo D’Agostino +
Mortada Mehyar
Nick Eubank
Nipun Batra
Ondřej Čertík
Phillip Cloud
Pratap Vardhan +
Rafal Skolasinski +
Richard Lewis +
Rinoc Johnson +
Rob Levy
Robert Gieseke
Safia Abdalla +
Samuel Denny +
Saumitra Shahapure +
Sebastian Pölsterl +
Sebastian Rubbert +
Sheppard, Kevin +
Sinhrks
Siu Kwan Lam +
Skipper Seabold
Spencer Carrucciu +
Stephan Hoyer
Stephen Hoover +
Stephen Pascoe +
Terry Santegoeds +
Thomas Grainger
Tjerk Santegoeds +
Tom Augspurger
Vincent Davis +
Winterflower +
Yaroslav Halchenko
Yuan Tang (Terry) +
agijsberts
ajcr +
behzad nouri
cel4
chris-b1 +
cyrusmaher +
davidovitch +
ganego +
jreback
juricast +
larvian +
maximilianr +
msund +
rekcahpassyla
robertzk +
scls19fr
seth-p
sinhrks
springcoil +
terrytangyuan +
tzinckgraf +

弃用函数	Replacement
`.irow(i)`	`.iloc[i]` 或 `.iat[i]`
`.iget(i)`	`.iloc[i]` 或 `.iat[i]`
`.iget_value(i)`	`.iloc[i]` 或 `.iat[i]`

弃用函数	Replacement
`.irow(i)`	`.iloc[i]`
`.iget_value(i, j)`	`.iloc[i, j]` 或 `.iat[i, j]`
`.icol(j)`	`.iloc[:, j]`

版本 0.17.0 (2015年10月9日)#

新功能#

带时区的日期时间#

释放GIL#

绘图子方法#

dt 访问器的其他方法#

Series.dt.strftime#

Series.dt.total_seconds#

周期频率增强#

SAS XPORT 文件支持#

在 .eval() 中对数学函数的支持#

对 Excel 的更改与 MultiIndex#

Google BigQuery 增强功能#

使用 Unicode 东亚宽度进行显示对齐#

其他增强功能#

向后不兼容的 API 变化#

对排序 API 的更改#

对 to_datetime 和 to_timedelta 的更改#

错误处理#

一致的解析#

索引比较的更改#

布尔比较与 None 的变化#

HDFStore dropna 行为#

对 display.precision 选项的更改#

对 Categorical.unique 的更改#

在解析器中作为 header 传递的 bool 的更改#

其他 API 更改#

弃用#

移除先前版本的弃用/更改#

性能提升#

错误修复#

贡献者#

`dt` 访问器的其他方法#

对 Excel 的更改与 `MultiIndex`#

对 `display.precision` 选项的更改#

对 `Categorical.unique` 的更改#

在解析器中作为 `header` 传递的 `bool` 的更改#