版本 0.17.1 (2015年11月21日)#

备注

我们自豪地宣布，pandas 已成为 (NumFOCUS 组织) 的赞助项目。这将有助于确保 pandas 作为世界级开源项目的开发成功。

这是从0.17.0版本的一个小错误修复发布，包括大量错误修复以及一些新功能、增强功能和性能改进。我们建议所有用户升级到此版本。

亮点包括：

支持条件HTML格式化，请参见这里
在csv阅读器和其他操作上释放GIL，参见这里
修复了 DataFrame.drop_duplicates 在 0.16.2 版本中的回归问题，导致整数值结果不正确 (GH 11376)

新功能#

条件 HTML 格式化#

警告

这是一个新功能，目前正在积极开发中。我们将在未来的版本中添加功能，并可能进行破坏性更改。欢迎在 GH 11610 提供反馈。

我们已经添加了对条件HTML格式化的*实验性*支持：基于数据的DataFrame的可视化样式。样式是通过HTML和CSS实现的。通过 pandas.DataFrame.style 属性访问 styler 类，这是一个附带了你的数据的 Styler 实例。

这里是一个快速示例：

In [1]: np.random.seed(123)

In [2]: df = pd.DataFrame(np.random.randn(10, 5), columns=list("abcde"))

In [3]: html = df.style.background_gradient(cmap="viridis", low=0.5)

我们可以渲染 HTML 以获得以下表格。

	a	b	c	d	e
0	-1.085631	0.997345	0.282978	-1.506295	-0.5786
1	1.651437	-2.426679	-0.428913	1.265936	-0.86674
2	-0.678886	-0.094709	1.49139	-0.638902	-0.443982
3	-0.434351	2.20593	2.186786	1.004054	0.386186
4	0.737369	1.490732	-0.935834	1.175829	-1.253881
5	-0.637752	0.907105	-1.428681	-0.140069	-0.861755
6	-0.255619	-2.798589	-1.771533	-0.699877	0.927462
7	-0.173636	0.002846	0.688223	-0.879536	0.283627
8	-0.805367	-1.727669	-0.3909	0.573806	0.338589
9	-0.01183	2.392365	0.412912	0.978736	2.238143

Styler 与 Jupyter Notebook 交互良好。更多信息请参见文档。

增强功能#

DatetimeIndex 现在支持使用 astype(str) 转换为字符串 (GH 10442)
在 pandas.DataFrame.to_csv() 中支持 compression (gzip/bz2) (GH 7615)
pd.read_* 函数现在也可以接受 pathlib.Path 或 py:py._path.local.LocalPath 对象作为 filepath_or_buffer 参数。(GH 11033) - DataFrame 和 Series 函数 .to_csv(), .to_html() 和 .to_latex() 现在可以处理以波浪号开头的路径（例如 ~/Documents/）(GH 11438)
DataFrame 现在使用 namedtuple 的字段作为列，如果未提供列的话 (GH 11181)
DataFrame.itertuples() 现在在可能的情况下返回 namedtuple 对象。(GH 11269, GH 11625)
在平行坐标图中添加了 axvlines_kwds (GH 10709)

.info() 和 .memory_usage() 的选项，用于提供内存消耗的深度内省。请注意，这可能计算成本较高，因此是一个可选参数。(GH 11595)

In [4]: df = pd.DataFrame({"A": ["foo"] * 1000})  # noqa: F821

In [5]: df["B"] = df["A"].astype("category")

# shows the '+' as we have object dtypes
In [6]: df.info()
<class 'pandas.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype   
---  ------  --------------  -----   
 0   A       1000 non-null   object  
 1   B       1000 non-null   category
dtypes: category(1), object(1)
memory usage: 8.9+ KB

# we have an accurate memory assessment (but can be expensive to compute this)
In [7]: df.info(memory_usage="deep")
<class 'pandas.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype   
---  ------  --------------  -----   
 0   A       1000 non-null   object  
 1   B       1000 non-null   category
dtypes: category(1), object(1)
memory usage: 59.8 KB

Index 现在有一个 fillna 方法 (GH 10089)

In [8]: pd.Index([1, np.nan, 3]).fillna(2)
Out[8]: Index([1.0, 2.0, 3.0], dtype='float64')

类型为 category 的系列现在可以在类别为该类型时使用 .str.<...> 和 .dt.<...> 访问器方法/属性。(GH 10661)

In [9]: s = pd.Series(list("aabb")).astype("category")

In [10]: s
Out[10]: 
0    a
1    a
2    b
3    b
Length: 4, dtype: category
Categories (2, object): ['a', 'b']

In [11]: s.str.contains("a")
Out[11]: 
0     True
1     True
2    False
3    False
Length: 4, dtype: bool

In [12]: date = pd.Series(pd.date_range("1/1/2015", periods=5)).astype("category")

In [13]: date
Out[13]: 
0   2015-01-01
1   2015-01-02
2   2015-01-03
3   2015-01-04
4   2015-01-05
Length: 5, dtype: category
Categories (5, datetime64[ns]): [2015-01-01, 2015-01-02, 2015-01-03, 2015-01-04, 2015-01-05]

In [14]: date.dt.day
Out[14]: 
0    1
1    2
2    3
3    4
4    5
Length: 5, dtype: int32

pivot_table 现在有一个 margins_name 参数，因此你可以使用除默认的 ‘All’ 之外的其他内容 (GH 3335)
实现 datetime64[ns, tz] dtypes 的固定 HDF5 存储导出 (GH 11411)
现在，在 DataFrame 单元格中打印集合时使用集合字面量语法（{x, y}），而不是旧版 Python 语法（set([x, y])）（GH 11215）
在 pandas.io.gbq.to_gbq() 中改进错误信息，当流插入失败时 (GH 11285) 和当 DataFrame 与目标表的模式不匹配时 (GH 11359)

API 变化#

在 Index.shift 中对于不支持的索引类型引发 NotImplementedError (GH 8038)
min 和 max 对 datetime64 和 timedelta64 类型的系列进行缩减现在会得到 NaT 而不是 nan (GH 11245)。
使用空键进行索引将引发 TypeError ，而不是 ValueError (GH 11356)
Series.ptp 现在默认忽略缺失值 (GH 11163)

弃用#

pandas.io.ga 模块实现了 google-analytics 支持，该模块已被弃用，并将在未来版本中移除 (GH 11308)
在 .to_csv() 中弃用 engine 关键字，该关键字将在未来版本中移除 (GH 11274)

性能提升#

在索引上排序之前检查单调性 (GH 11080)
当 Series.dropna 的数据类型不能包含 NaN 时，性能提升 (GH 11159)
在大多数 datetime 字段操作（例如 DatetimeIndex.year、Series.dt.year）、规范化以及与 Period、DatetimeIndex.to_period 和 PeriodIndex.to_timestamp 之间的转换中释放 GIL (GH 11263)
在某些滚动算法上释放GIL：rolling_median、rolling_mean、rolling_max、rolling_min、rolling_var、rolling_kurt、rolling_skew (GH 11450)
在 read_csv 和 read_table 中读取和解析文本文件时释放GIL (GH 11272)
改进了 rolling_median 的性能 (GH 11450)
改进了 to_excel 的性能 (GH 11352)
Categorical 类别的 repr 中存在性能问题，它在截断字符串以供显示之前先渲染字符串 (GH 11305)
在 Categorical.remove_unused_categories 中的性能改进，(GH 11643)。
改进了没有数据和 DatetimeIndex 的 Series 构造函数的性能 (GH 11433)
通过 groupby 提高了 shift、cumprod 和 cumsum 的性能 (GH 4095)

错误修复#

SparseArray.__iter__() 现在在 Python 3.5 中不会引起 PendingDeprecationWarning (GH 11622)
从 0.16.2 版本开始，长浮点数/非数值的输出格式回归，已在 (GH 11302) 中恢复
Series.sort_index() 现在正确处理 inplace 选项 (GH 11402)
在 PyPi 构建中错误分布的 .c 文件，当读取一个包含浮点数的 csv 文件并传递 na_values=<一个标量> 时会显示一个异常 (GH 11374)
当索引有名称时，.to_latex() 输出损坏的错误 (GH 10660)
在 HDFStore.append 中，字符串的编码长度超过最大未编码长度的问题 (GH 11234)
合并 datetime64[ns, tz] 数据类型中的错误 (GH 11405)
在 HDFStore.select 中与 where 子句中的 numpy 标量进行比较时出现的错误 (GH 11283)
使用 DataFrame.ix 与 MultiIndex 索引器时的错误 (GH 11372)
date_range 中存在模糊端点的错误 (GH 11626)
防止向访问器 .str, .dt 和 .cat 添加新属性。检索这样的值是不可能的，所以在设置时报错。(GH 10673)
在模糊时间与 .dt 访问器中处理时区转换的错误 (GH 11295)
在使用模糊时间索引时的输出格式化错误 (GH 11619)
在 Series 与类列表对象比较中的 Bug (GH 11339)
DataFrame.replace 中存在一个 datetime64[ns, tz] 和非兼容 to_replace 的错误 (GH 11326, GH 11153)
isnull 中的一个错误，其中 numpy.array 中的 numpy.datetime64('NaT') 未被判定为空(GH 11206)
使用混合整数索引的类似列表的索引中的错误 (GH 11320)
当索引是 Categorical 数据类型时，pivot_table 中 margins=True 的错误 (GH 10993)
DataFrame.plot 中的错误无法使用十六进制字符串颜色 (GH 10299)
在 0.16.2 版本中 DataFrame.drop_duplicates 的回归，导致整数值结果不正确 (GH 11376)
pd.eval 中列表中的单目运算符错误 (GH 11235)
squeeze() 中零长度数组的错误 (GH 11230, GH 8999)
describe() 中存在一个错误，会丢失分层索引的列名 (GH 11517)
DataFrame.pct_change() 中的错误未在 .fillna 方法中传播 axis 关键字 (GH 11150)
当传递整数和字符串列名混合作为 columns 参数时，.to_csv() 中的错误 (GH 11637)
使用 range 进行索引时的错误，(GH 11652)
在设置列时推断 numpy 标量并保留 dtype 的错误 (GH 11638)
使用unicode列名时，to_sql 中的Bug导致UnicodeEncodeError（GH 11431）。
修复了在 plot 中设置 xticks 的回归问题 (GH 11529)。
holiday.dates 中的错误，其中无法将遵守规则应用于假期和文档增强 (GH 11477, GH 11533)
修复当使用普通的 Axes 实例而不是 SubplotAxes 时的绘图问题 (GH 11520, GH 11556)。
DataFrame.to_latex() 中的错误在 header=False 时会产生一个额外的规则 (GH 7124)
当一个函数返回包含新日期时间类型列的 Series 时，df.groupby(...).apply(func) 中的错误 (GH 11324)
在加载的文件很大时 pandas.json 中的错误 (GH 11344)
to_excel 中重复列的错误 (GH 11007, GH 10982, GH 10970)
修复了一个阻止构建 datetime64[ns, tz] 类型空系列的错误 (GH 11245)。
read_excel 中包含整数的多级索引的错误 (GH 11317)
使用 openpyxl 2.2+ 和合并时的 to_excel 错误 (GH 11408)
在 DataFrame.to_dict() 中的错误在数据中只有日期时间时产生 np.datetime64 对象而不是 Timestamp (GH 11327)
DataFrame.corr() 中的错误在计算布尔和非布尔列的肯德尔相关性时引发异常 (GH 11560)
在 FreeBSD 10+ 上由于 C inline 函数引起的链接时错误（使用 clang）(GH 10510)
在 DataFrame.to_csv 中传递用于格式化 MultiIndexes 的参数时存在错误，包括 date_format (GH 7791)
DataFrame.join() 中 how='right' 产生 TypeError 的错误 (GH 11519)
Series.quantile 中存在一个错误，当结果为空列表时，Index 具有 object 数据类型 (GH 11588)
在 pd.merge 中的错误导致当合并结果为空时，结果是空的 Int64Index 而不是 Index(dtype=object) (GH 11588)
当包含 NaN 值时，Categorical.remove_unused_categories 中的错误 (GH 11599)
DataFrame.to_sparse() 中的错误会丢失 MultiIndexes 的列名 (GH 11600)
DataFrame.round() 中存在一个错误，当列索引不唯一时会产生致命的 Python 错误 (GH 11611)
DataFrame.round() 中 decimals 为非唯一索引的 Series 时产生额外列的错误 (GH 11618)

贡献者#

共有63人为此版本贡献了补丁。名字后面带有“+”的人首次贡献了补丁。

Aleksandr Drozd +
Alex Chase +
Anthonios Partheniou
BrenBarn +
Brian J. McGuirk +
Chris
Christian Berendt +
Christian Perez +
Cody Piersall +
Data & Code Expert Experimenting with Code on Data
DrIrv +
Evan Wright
Guillaume Gay
Hamed Saljooghinejad +
Iblis Lin +
Jake VanderPlas
Jan Schulz
Jean-Mathieu Deschenes +
Jeff Reback
Jimmy Callin +
Joris Van den Bossche
K.-Michael Aye
Ka Wo Chen
Loïc Séguin-C +
Luo Yicheng +
Magnus Jöud +
Manuel Leonhardt +
Matthew Gilbert
Maximilian Roos
Michael +
Nicholas Stahl +
Nicolas Bonnotte +
Pastafarianist +
Petra Chong +
Phil Schaf +
Philipp A +
Rob deCarvalho +
Roman Khomenko +
Rémy Léone +
Sebastian Bank +
Sinhrks
Stephan Hoyer
Thierry Moisan
Tom Augspurger
Tux1 +
Varun +
Wieland Hoffmann +
Winterflower
Yoav Ram +
Younggun Kim
Zeke +
ajcr
azuranski +
behzad nouri
cel4
emilydolson +
hironow +
lexual
llllllllll +
rockg
silentquasar +
sinhrks
taeold +