版本 0.16.0 (2015年3月22日)#

这是从 0.15.2 版本以来的一个重大发布，包括少量 API 更改、几个新功能、增强功能和性能改进，以及大量错误修复。我们建议所有用户升级到此版本。

亮点包括：

DataFrame.assign 方法，见这里
Series.to_coo/from_coo 方法与 scipy.sparse 交互，参见这里
对 Timedelta 的向后不兼容更改，以符合 .seconds 属性与 datetime.timedelta，请参见这里
对 .loc 切片 API 的更改以符合 .ix 的行为，请参见这里
在 Categorical 构造函数中对默认排序的更改，请参见这里
对 .str 访问器的增强，使字符串操作更容易，见这里
pandas.tools.rplot、pandas.sandbox.qtpandas 和 pandas.rpy 模块已被弃用。我们建议用户使用外部包，如 seaborn、pandas-qt 和 rpy2 以获得类似或等效的功能，详见这里

在更新之前，请检查 API 变更和弃用。

新功能#

DataFrame assign#

受 dplyr 的 mutate 动词启发，DataFrame 有了一个新的 assign() 方法。assign 的函数签名很简单，就是 **kwargs。键是新字段的列名，值要么是要插入的值（例如，Series 或 NumPy 数组），要么是作用于 DataFrame 的单参数函数。新值被插入，并返回包含所有原始列和新列的整个 DataFrame。

In [1]: iris = pd.read_csv('data/iris.data')

In [2]: iris.head()
Out[2]: 
   SepalLength  SepalWidth  PetalLength  PetalWidth         Name
0          5.1         3.5          1.4         0.2  Iris-setosa
1          4.9         3.0          1.4         0.2  Iris-setosa
2          4.7         3.2          1.3         0.2  Iris-setosa
3          4.6         3.1          1.5         0.2  Iris-setosa
4          5.0         3.6          1.4         0.2  Iris-setosa

[5 rows x 5 columns]

In [3]: iris.assign(sepal_ratio=iris['SepalWidth'] / iris['SepalLength']).head()
Out[3]: 
   SepalLength  SepalWidth  PetalLength  PetalWidth         Name  sepal_ratio
0          5.1         3.5          1.4         0.2  Iris-setosa     0.686275
1          4.9         3.0          1.4         0.2  Iris-setosa     0.612245
2          4.7         3.2          1.3         0.2  Iris-setosa     0.680851
3          4.6         3.1          1.5         0.2  Iris-setosa     0.673913
4          5.0         3.6          1.4         0.2  Iris-setosa     0.720000

[5 rows x 6 columns]

上面是一个插入预计算值的例子。我们也可以传入一个待评估的函数。

In [4]: iris.assign(sepal_ratio=lambda x: (x['SepalWidth']
   ...:                                    / x['SepalLength'])).head()
   ...: 
Out[4]: 
   SepalLength  SepalWidth  PetalLength  PetalWidth         Name  sepal_ratio
0          5.1         3.5          1.4         0.2  Iris-setosa     0.686275
1          4.9         3.0          1.4         0.2  Iris-setosa     0.612245
2          4.7         3.2          1.3         0.2  Iris-setosa     0.680851
3          4.6         3.1          1.5         0.2  Iris-setosa     0.673913
4          5.0         3.6          1.4         0.2  Iris-setosa     0.720000

[5 rows x 6 columns]

assign 的强大之处在于它在操作链中的使用。例如，我们可以将 DataFrame 限制为仅那些萼片长度大于 5 的，计算比率，并绘图

In [5]: iris = pd.read_csv('data/iris.data')

In [6]: (iris.query('SepalLength > 5')
   ...:      .assign(SepalRatio=lambda x: x.SepalWidth / x.SepalLength,
   ...:              PetalRatio=lambda x: x.PetalWidth / x.PetalLength)
   ...:      .plot(kind='scatter', x='SepalRatio', y='PetalRatio'))
   ...: 
Out[6]: <Axes: xlabel='SepalRatio', ylabel='PetalRatio'>

查看更多信息请参见文档。 (GH 9229)

与 scipy.sparse 的交互#

添加了 SparseSeries.to_coo() 和 SparseSeries.from_coo() 方法 (GH 8048)，用于在 scipy.sparse.coo_matrix 实例之间进行转换（参见这里）。例如，给定一个带有 MultiIndex 的 SparseSeries，我们可以通过指定行和列标签作为索引级别来转换为 scipy.sparse.coo_matrix：

s = pd.Series([3.0, np.nan, 1.0, 3.0, np.nan, np.nan])
s.index = pd.MultiIndex.from_tuples([(1, 2, 'a', 0),
                                     (1, 2, 'a', 1),
                                     (1, 1, 'b', 0),
                                     (1, 1, 'b', 1),
                                     (2, 1, 'b', 0),
                                     (2, 1, 'b', 1)],
                                    names=['A', 'B', 'C', 'D'])

s

# SparseSeries
ss = s.to_sparse()
ss

A, rows, columns = ss.to_coo(row_levels=['A', 'B'],
                             column_levels=['C', 'D'],
                             sort_labels=False)

A
A.todense()
rows
columns

from_coo 方法是一个方便的方法，用于从 scipy.sparse.coo_matrix 创建一个 SparseSeries：

from scipy import sparse
A = sparse.coo_matrix(([3.0, 1.0, 2.0], ([1, 0, 0], [0, 2, 3])),
                      shape=(3, 4))
A
A.todense()

ss = pd.SparseSeries.from_coo(A)
ss

字符串方法增强#

以下新方法可以通过 .str 访问器访问，以将函数应用于每个值。这是为了使其与字符串上的标准方法更加一致。(GH 9282, GH 9352, GH 9386, GH 9387, GH 9439)

方法

isalnum()

isalpha()

isdigit()

isdigit()

isspace()

islower()

isupper()

istitle()

isnumeric()

isdecimal()

find()

rfind()

ljust()

rjust()

zfill()
```
In [7]: s = pd.Series(['abcd', '3456', 'EFGH'])

In [8]: s.str.isalpha()
Out[8]: 
0     True
1    False
2     True
Length: 3, dtype: bool

In [9]: s.str.find('ab')
Out[9]: 
0    0
1   -1
2   -1
Length: 3, dtype: int64
```

Series.str.pad() 和 Series.str.center() 现在接受 fillchar 选项来指定填充字符 (GH 9352)

In [10]: s = pd.Series(['12', '300', '25'])

In [11]: s.str.pad(5, fillchar='_')
Out[11]: 
0    ___12
1    __300
2    ___25
Length: 3, dtype: object

添加了 Series.str.slice_replace()，它之前会引发 NotImplementedError (GH 8888)

In [12]: s = pd.Series(['ABCD', 'EFGH', 'IJK'])

In [13]: s.str.slice_replace(1, 3, 'X')
Out[13]: 
0    AXD
1    EXH
2     IX
Length: 3, dtype: object

# replaced with empty char
In [14]: s.str.slice_replace(0, 1)
Out[14]: 
0    BCD
1    FGH
2     JK
Length: 3, dtype: object

其他增强功能#

现在，Reindex 支持 method='nearest' 用于具有单调递增或递减索引的帧或系列 (GH 9258):
```
In [15]: df = pd.DataFrame({'x': range(5)})

In [16]: df.reindex([0.2, 1.8, 3.5], method='nearest')
Out[16]: 
     x
0.2  0
1.8  2
3.5  4

[3 rows x 1 columns]
```
这个方法也可以通过较低级别的 Index.get_indexer 和 Index.get_loc 方法来实现。
read_excel() 函数的 sheetname 参数现在接受列表和 None，分别用于获取多个或所有工作表。如果指定了多个工作表，则返回一个字典。(GH 9450)
```
# Returns the 1st and 4th sheet, as a dictionary of DataFrames.
pd.read_excel('path_to_file.xls', sheetname=['Sheet1', 3])
```
允许使用迭代器增量读取 Stata 文件；支持 Stata 文件中的长字符串。请参阅文档此处 (GH 9493:)。
以 ~ 开头的路径现在将被扩展为以用户的家目录开始 (GH 9066)
在 get_data_yahoo 中添加了时间间隔选择 (GH 9071)
添加了 Timestamp.to_datetime64() 以补充 Timedelta.to_timedelta64() (GH 9255)
tseries.frequencies.to_offset() 现在接受 Timedelta 作为输入 (GH 9064)
滞后参数已添加到 Series 的自相关方法中，默认为滞后-1 自相关 (GH 9192)
Timedelta 现在在构造函数中接受 nanoseconds 关键字 (GH 9273)
SQL 代码现在安全地转义表和列名称 (GH 8986)
为 Series.str.<tab>、Series.dt.<tab> 和 Series.cat.<tab> 添加了自动补全功能 (GH 9322)
Index.get_indexer 现在支持 method='pad' 和 method='backfill' ，即使是任何目标数组，而不仅仅是单调目标。这些方法也适用于单调递减和单调递增索引 (GH 9258)。
Index.asof 现在适用于所有索引类型 (GH 9258)。
在 io.read_excel() 中增加了一个 verbose 参数，默认为 False。设置为 True 以在解析时打印工作表名称。(GH 9450)
在 Timestamp、DatetimeIndex、Period、PeriodIndex 和 Series.dt 中添加了 days_in_month``（兼容别名 ``daysinmonth）属性 (GH 9572)
在 to_csv 中添加了 decimal 选项，以提供非’.’小数分隔符的格式化 (GH 781)
为 Timestamp 添加了 normalize 选项，以归一化到午夜 (GH 8794)
为使用 HDF5 文件和 rhdf5 库将 DataFrame 导入 R 添加了示例。更多信息请参见文档 (GH 9636)。

向后不兼容的 API 变化#

timedelta 中的变化#

在 v0.15.0 中引入了一种新的标量类型 Timedelta，它是 datetime.timedelta 的子类。在这里提到的是关于 .seconds 访问器的 API 变更通知。其目的是提供一组用户友好的访问器，这些访问器给出该单位的 ‘自然’ 值，例如，如果你有一个 Timedelta('1 天, 10:11:12')，那么 .seconds 将返回 12。然而，这与 datetime.timedelta 的定义不一致，后者将 .seconds 定义为 10 * 3600 + 11 * 60 + 12 == 36672。

所以在 v0.16.0 版本中，我们将 API 恢复到与 datetime.timedelta 相匹配。此外，组件值仍然可以通过 .components 访问器获得。这影响了 .seconds 和 .microseconds 访问器，并移除了 .hours、.minutes、.milliseconds 访问器。这些更改也影响了 TimedeltaIndex 和 Series .dt 访问器。(GH 9185, GH 9139)

之前的操作

In [2]: t = pd.Timedelta('1 day, 10:11:12.100123')

In [3]: t.days
Out[3]: 1

In [4]: t.seconds
Out[4]: 12

In [5]: t.microseconds
Out[5]: 123

新行为

In [17]: t = pd.Timedelta('1 day, 10:11:12.100123')

In [18]: t.days
Out[18]: 1

In [19]: t.seconds
Out[19]: 36672

In [20]: t.microseconds
Out[20]: 100123

使用 .components 允许完全访问组件

In [21]: t.components
Out[21]: Components(days=1, hours=10, minutes=11, seconds=12, milliseconds=100, microseconds=123, nanoseconds=0)

In [22]: t.components.seconds
Out[22]: 12

索引变化#

对于使用 .loc 的一小部分边缘情况的行为已经改变（GH 8613）。此外，我们已经改进了引发的错误消息的内容：

使用 .loc 进行切片，其中起始和/或停止边界在索引中未找到，现在允许这样做；这之前会引发 KeyError 。这使得在这种情况下行为与 .ix 相同。此更改仅适用于切片，不适用于使用单个标签进行索引。

In [23]: df = pd.DataFrame(np.random.randn(5, 4),
   ....:                   columns=list('ABCD'),
   ....:                   index=pd.date_range('20130101', periods=5))
   ....: 

In [24]: df
Out[24]: 
                   A         B         C         D
2013-01-01  0.469112 -0.282863 -1.509059 -1.135632
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804
2013-01-04  0.721555 -0.706771 -1.039575  0.271860
2013-01-05 -0.424972  0.567020  0.276232 -1.087401

[5 rows x 4 columns]

In [25]: s = pd.Series(range(5), [-2, -1, 1, 2, 3])

In [26]: s
Out[26]: 
-2    0
-1    1
 1    2
 2    3
 3    4
Length: 5, dtype: int64

之前的操作

In [4]: df.loc['2013-01-02':'2013-01-10']
KeyError: 'stop bound [2013-01-10] is not in the [index]'

In [6]: s.loc[-10:3]
KeyError: 'start bound [-10] is not the [index]'

新行为

In [27]: df.loc['2013-01-02':'2013-01-10']
Out[27]: 
                   A         B         C         D
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804
2013-01-04  0.721555 -0.706771 -1.039575  0.271860
2013-01-05 -0.424972  0.567020  0.276232 -1.087401

[4 rows x 4 columns]

In [28]: s.loc[-10:3]
Out[28]: 
-2    0
-1    1
 1    2
 2    3
 3    4
Length: 5, dtype: int64

允许在整数索引上使用类似浮点数的切片操作 .ix。以前这仅对 .loc 启用：

之前的操作

In [8]: s.ix[-1.0:2]
TypeError: the slice start value [-1.0] is not a proper indexer for this index type (Int64Index)

新行为

In [2]: s.ix[-1.0:2]
Out[2]:
-1    1
 1    2
 2    3
dtype: int64

在使用 .loc 时，为使用无效类型的索引提供一个有用的异常。例如，尝试在 DatetimeIndex 或 PeriodIndex 或 TimedeltaIndex 类型的索引上使用整数（或浮点数）。

之前的操作
```
In [4]: df.loc[2:3]
KeyError: 'start bound [2] is not the [index]'
```
新行为
```
In [4]: df.loc[2:3]
TypeError: Cannot do slice indexing on <class 'pandas.tseries.index.DatetimeIndex'> with <type 'int'> keys
```

分类变化#

在之前的版本中，未指定排序的 Categoricals``（意味着没有传递 ``ordered 关键字）默认被视为 ordered Categoricals。今后，Categorical 构造函数中的 ordered 关键字将默认设置为 False。排序现在必须显式指定。

此外，以前你可以通过直接设置 ordered 属性来改变 Categorical 的 ordered 属性，例如 cat.ordered=True；现在这已被弃用，你应该使用 cat.as_ordered() 或 cat.as_unordered()。这些方法默认会返回一个新对象，而不是修改现有对象。(GH 9347, GH 9190)

之前的操作

In [3]: s = pd.Series([0, 1, 2], dtype='category')

In [4]: s
Out[4]:
0    0
1    1
2    2
dtype: category
Categories (3, int64): [0 < 1 < 2]

In [5]: s.cat.ordered
Out[5]: True

In [6]: s.cat.ordered = False

In [7]: s
Out[7]:
0    0
1    1
2    2
dtype: category
Categories (3, int64): [0, 1, 2]

新行为

In [29]: s = pd.Series([0, 1, 2], dtype='category')

In [30]: s
Out[30]: 
0    0
1    1
2    2
Length: 3, dtype: category
Categories (3, int64): [0, 1, 2]

In [31]: s.cat.ordered
Out[31]: False

In [32]: s = s.cat.as_ordered()

In [33]: s
Out[33]: 
0    0
1    1
2    2
Length: 3, dtype: category
Categories (3, int64): [0 < 1 < 2]

In [34]: s.cat.ordered
Out[34]: True

# you can set in the constructor of the Categorical
In [35]: s = pd.Series(pd.Categorical([0, 1, 2], ordered=True))

In [36]: s
Out[36]: 
0    0
1    1
2    2
Length: 3, dtype: category
Categories (3, int64): [0 < 1 < 2]

In [37]: s.cat.ordered
Out[37]: True

为了便于创建一系列分类数据，我们增加了在调用 .astype() 时传递关键字的能力。这些关键字直接传递给构造函数。

In [54]: s = pd.Series(["a", "b", "c", "a"]).astype('category', ordered=True)

In [55]: s
Out[55]:
0    a
1    b
2    c
3    a
dtype: category
Categories (3, object): [a < b < c]

In [56]: s = (pd.Series(["a", "b", "c", "a"])
   ....:        .astype('category', categories=list('abcdef'), ordered=False))

In [57]: s
Out[57]:
0    a
1    b
2    c
3    a
dtype: category
Categories (6, object): [a, b, c, d, e, f]

其他 API 更改#

Index.duplicated 现在返回 np.array(dtype=bool) 而不是包含 bool 值的 Index(dtype=object)。(GH 8875)
DataFrame.to_json 现在为混合数据类型的数据框的每一列返回准确的类型序列化 (GH 9037)

之前，数据在序列化之前被强制转换为通用数据类型，例如，这导致整数被序列化为浮点数：
```
In [2]: pd.DataFrame({'i': [1,2], 'f': [3.0, 4.2]}).to_json()
Out[2]: '{"f":{"0":3.0,"1":4.2},"i":{"0":1.0,"1":2.0}}'
```
现在每一列都使用其正确的数据类型进行序列化：
```
In [2]:  pd.DataFrame({'i': [1,2], 'f': [3.0, 4.2]}).to_json()
Out[2]: '{"f":{"0":3.0,"1":4.2},"i":{"0":1,"1":2}}'
```
DatetimeIndex、PeriodIndex 和 TimedeltaIndex.summary 现在输出相同的格式。(GH 9116)
TimedeltaIndex.freqstr 现在输出与 DatetimeIndex 相同的字符串格式。(GH 9116)
条形图和水平条形图不再沿着信息轴添加虚线。可以通过 matplotlib 的 axhline 或 axvline 方法实现之前的样式 (GH 9088)。
Series 访问器 .dt, .cat 和 .str 现在如果系列不包含适当类型的数据，会引发 AttributeError 而不是 TypeError (GH 9617)。这更紧密地遵循了 Python 内置异常层次结构，并确保像 hasattr(s, 'cat') 这样的测试在 Python 2 和 3 上都是一致的。

Series 现在支持整数类型的按位运算（GH 9016）。以前即使输入的 dtypes 是整数，输出 dtype 也会被强制转换为 bool。

之前的操作

In [2]: pd.Series([0, 1, 2, 3], list('abcd')) | pd.Series([4, 4, 4, 4], list('abcd'))
Out[2]:
a    True
b    True
c    True
d    True
dtype: bool

新行为。如果输入的数据类型是整数，则输出的数据类型也是整数，并且输出值是按位运算的结果。

In [2]: pd.Series([0, 1, 2, 3], list('abcd')) | pd.Series([4, 4, 4, 4], list('abcd'))
Out[2]:
a    4
b    5
c    6
d    7
dtype: int64

在进行涉及 Series 或 DataFrame 的除法时，0/0 和 0//0 现在给出 np.nan 而不是 np.inf。(GH 9144, GH 8445)

之前的操作

In [2]: p = pd.Series([0, 1])

In [3]: p / 0
Out[3]:
0    inf
1    inf
dtype: float64

In [4]: p // 0
Out[4]:
0    inf
1    inf
dtype: float64

新行为

In [38]: p = pd.Series([0, 1])

In [39]: p / 0
Out[39]: 
0    NaN
1    inf
Length: 2, dtype: float64

In [40]: p // 0
Out[40]: 
0    NaN
1    inf
Length: 2, dtype: float64

Series.values_counts 和 Series.describe 对于分类数据现在会将 NaN 条目放在最后。(GH 9443)
Series.describe 对于分类数据现在会给出未使用类别的计数和频率为 0，而不是 NaN (GH 9443)
由于一个错误修复，使用 DatetimeIndex.asof 查找部分字符串标签现在包括与字符串匹配的值，即使它们在部分字符串标签的开始之后 (GH 9258)。

旧行为：
```
In [4]: pd.to_datetime(['2000-01-31', '2000-02-28']).asof('2000-02')
Out[4]: Timestamp('2000-01-31 00:00:00')
```
固定行为：
```
In [41]: pd.to_datetime(['2000-01-31', '2000-02-28']).asof('2000-02')
Out[41]: Timestamp('2000-02-28 00:00:00')
```
要重现旧的行为，只需在标签中添加更多精度（例如，使用 2000-02-01 而不是 2000-02）。

弃用#

rplot 网格绘图接口已弃用，并将在未来版本中移除。我们建议使用类似但更精细的功能的外部包，如 seaborn (GH 3445)。文档包含一些示例，说明如何将现有代码从 rplot 转换为 seaborn 这里。
pandas.sandbox.qtpandas 接口已被弃用，并将在未来版本中移除。我们建议用户使用外部包 pandas-qt。(GH 9615)
pandas.rpy 接口已被弃用，并将在未来版本中移除。类似的功能可以通过 rpy2 项目访问 (GH 9602)
将 DatetimeIndex/PeriodIndex 添加到另一个 DatetimeIndex/PeriodIndex 作为集合操作已被弃用。这将在未来版本中更改为 TypeError。应使用 .union() 进行并集操作。(GH 9094)
从另一个 DatetimeIndex/PeriodIndex 中减去 DatetimeIndex/PeriodIndex 作为集合操作已被弃用。这将在未来版本中更改为实际的数值减法，生成一个 TimeDeltaIndex。应使用 .difference() 进行差异集合操作。(GH 9094)

移除先前版本的弃用/更改#

DataFrame.pivot_table 和 crosstab 的 rows 和 cols 关键字参数已被移除，取而代之的是 index 和 columns (GH 6581)
DataFrame.to_excel 和 DataFrame.to_csv 中的 cols 关键字参数已被移除，取而代之的是 columns (GH 6581)
移除了 convert_dummies 以支持 get_dummies (GH 6581)
移除了 value_range 以支持 describe (GH 6581)

性能提升#

修复了使用数组或类列表进行 .loc 索引时的性能退化 (GH 9126:)。
DataFrame.to_json 对于混合数据类型帧的性能提升了30倍。(GH 9037)
通过使用标签而不是值来改进 MultiIndex.duplicated 的性能 (GH 9125)
通过调用 unique 而不是 value_counts 来提高 nunique 的速度 (GH 9129, GH 7771)
通过适当利用同质/异质 dtypes，DataFrame.count 和 DataFrame.dropna 的性能提升高达 10 倍 (GH 9136)
在使用 MultiIndex 和 level 关键字参数时，DataFrame.count 的性能提升高达 20 倍 (GH 9163)
当键空间超过 int64 界限时，merge 的性能和内存使用改进 (GH 9151)
多键 groupby 的性能改进 (GH 9429)
在 MultiIndex.sortlevel 中的性能改进 (GH 9445)
在 DataFrame.duplicated 中的性能和内存使用改进 (GH 9398)
Cythonized Period (GH 9440)
在 to_hdf 上减少了内存使用 (GH 9648)

错误修复#

更改了 .to_html 以移除表格主体中的前导/尾随空格 (GH 4987)
使用 read_csv 在 s3 上使用 Python 3 修复的问题 (GH 9452)
修复了在 DatetimeIndex 中的兼容性问题，影响 numpy.int_ 默认为 numpy.int32 的架构 (GH 8943)
Panel 索引中的错误与类对象 (GH 9140)
返回的 Series.dt.components 索引中的错误被重置为默认索引 (GH 9247)
在 Categorical.__getitem__/__setitem__ 中使用类列表输入时，从索引器强制转换中得到不正确的结果 (GH 9469)
使用 DatetimeIndex 的部分设置中的错误 (GH 9478)
在对整数和datetime64列应用导致值在数字足够大时发生变化的聚合器时，groupby中的错误 (GH 9311, GH 6620)
在映射 Timestamp 对象列（带有时区信息的日期时间列）到适当的 sqlalchemy 类型时修复了 to_sql 中的错误 (GH 9085)。
修复了 to_sql 的 dtype 参数不接受实例化的 SQLAlchemy 类型的问题 (GH 9083)。
在 .loc 部分设置中使用 np.datetime64 的错误 (GH 9516)
在看起来像日期时间类型的 Series 上推断出不正确的 dtypes 以及在 .xs 切片上 (GH 9477)
在 Categorical.unique() （如果 s 是 category 数据类型，则包括 s.unique()）中的项目现在按照它们最初被找到的顺序出现，而不是按排序顺序出现（GH 9331）。这与 pandas 中其他数据类型的行为现在是一致的。
在大端平台上修复了在 StataReader 中产生不正确结果的错误 (GH 8688)。
当有多个层级时，MultiIndex.has_duplicates 中的错误会导致索引器溢出 (GH 9075, GH 5873)
pivot 和 unstack 中的错误，其中 nan 值会破坏索引对齐 (GH 4862, GH 7401, GH 7403, GH 7405, GH 7466, GH 9497)
在 MultiIndex 上使用 sort=True 或空值进行左 join 时的错误 (GH 9210)。
在 MultiIndex 中插入新键会失败的错误 (GH 9250)。
当键空间超过 int64 界限时 groupby 中的错误 (GH 9096)。
unstack 在 TimedeltaIndex 或 DatetimeIndex 和空值中存在错误 (GH 9491)。
rank 中的一个错误，其中使用容差比较浮点数会导致不一致的行为 (GH 8365)。
在从URL加载数据时，修复了 read_stata 和 StataReader 中的固定字符编码错误 (GH 9231)。
在将 offsets.Nano 添加到其他偏移量时引发 TypeError 的错误 (GH 9284)
DatetimeIndex 迭代中的错误，与 (GH 8890) 相关，在 (GH 9100) 中修复
resample 在 DST 转换期间的错误。这需要修复偏移类，以便它们在 DST 转换时正确行为。(GH 5172, GH 8744, GH 8653, GH 9173, GH 9468)。
二元运算符方法（例如 .mul()）与整数级别对齐的错误 (GH 9463)。
在箱线图、散点图和六边形图中的错误可能会显示不必要的警告 (GH 8877)
使用 layout kw 的子图中的错误可能会显示不必要的警告 (GH 9464)
在使用需要传递参数（例如轴）的分组函数时出现的错误（例如使用包装函数 fillna），(GH 9221)
DataFrame 现在在构造函数中正确支持同时使用 copy 和 dtype 参数 (GH 9099)
当使用 c 引擎在带有 CR 行结尾的文件上使用 skiprows 时，read_csv 中的错误。(GH 9079)
isnull 现在可以检测 PeriodIndex 中的 NaT (GH 9129)
在多列分组中使用 .nth() 的错误 (GH 8979)
DataFrame.where 和 Series.where 中的错误将数值错误地强制转换为字符串 (GH 9280)
在 DataFrame.where 和 Series.where 中，当传递字符串类列表时会引发 ValueError 的错误。(GH 9280)
在非字符串值上访问 Series.str 方法现在会引发 TypeError 而不是产生不正确的结果 (GH 9184)
当索引有重复且不是单调递增时，DatetimeIndex.__contains__ 中的错误 (GH 9512)
修复了当所有值相等时 Series.kurt() 的零除错误 (GH 9197)
修复了 xlsxwriter 引擎中的问题，即在没有应用其他格式的情况下，它会为单元格添加默认的 ‘General’ 格式。这阻止了应用其他行或列格式。(GH 9167)
修复了当同时指定 usecols 时 index_col=False 的问题。(GH 9082)
wide_to_long 会修改输入的存根名称列表的错误 (GH 9204)
to_sql 中的错误导致无法使用双精度存储 float64 值。(GH 9009)
SparseSeries 和 SparsePanel 现在接受零参数构造函数（与其非稀疏对应物相同）(GH 9272)。
合并 Categorical 和 object 数据类型时的回归问题 (GH 9426)
read_csv 中存在缓冲区溢出问题，当处理某些格式错误的输入文件时 (GH 9205)
在缺少配对的情况下使用 groupby MultiIndex 的错误 (GH 9049, GH 9344)
修复了 Series.groupby 中在 MultiIndex 级别上分组时会忽略排序参数的错误 (GH 9444)
修复了 DataFrame.Groupby 中的一个错误，其中在分类列的情况下忽略了 sort=False。(GH 8868)
修复了在python 3上从Amazon S3读取CSV文件时引发TypeError的错误 (GH 9452)
在 Google BigQuery 读取器中的一个错误，其中 ‘jobComplete’ 键可能存在于查询结果中但为 False (GH 8728)
在 dropna=True 的情况下，Series.values_counts 在排除分类类型 Series 中的 NaN 时存在错误 (GH 9443)
修复了 DataFrame.std/var/sem 缺少 numeric_only 选项的问题 (GH 9201)
支持使用标量数据构造 Panel 或 Panel4D (GH 8285)
Series 文本表示与 max_rows/max_columns 断开连接 (GH 7508)。

Series 数字格式在截断时不一致 (GH 8532)。

之前的操作

In [2]: pd.options.display.max_rows = 10
In [3]: s = pd.Series([1,1,1,1,1,1,1,1,1,1,0.9999,1,1]*10)
In [4]: s
Out[4]:
0    1
1    1
2    1
...
127    0.9999
128    1.0000
129    1.0000
Length: 130, dtype: float64

新行为

    1.0000
    1.0000
    1.0000
    1.0000
    1.0000
...
  1.0000
  1.0000
  0.9999
  1.0000
  1.0000
dtype: float64

在某些情况下，当在一个框架中设置新项目时，生成了一个虚假的 SettingWithCopy 警告 (GH 8730)

以下内容之前会报告一个 SettingWithCopy 警告。

In [42]: df1 = pd.DataFrame({'x': pd.Series(['a', 'b', 'c']),
   ....:                     'y': pd.Series(['d', 'e', 'f'])})
   ....: 

In [43]: df2 = df1[['x']]

In [44]: df2['y'] = ['g', 'h', 'i']

贡献者#

总共有60人为此版本贡献了补丁。名字后面带有“+”的人首次贡献了补丁。

Aaron Toth +
Alan Du +
Alessandro Amici +
Artemy Kolchinsky
Ashwini Chaudhary +
Ben Schiller
Bill Letson
Brandon Bradley +
Chau Hoang +
Chris Reynolds
Chris Whelan +
Christer van der Meeren +
David Cottrell +
David Stephens
Ehsan Azarnasab +
Garrett-R +
Guillaume Gay
Jake Torcasso +
Jason Sexauer
Jeff Reback
John McNamara
Joris Van den Bossche
Joschka zur Jacobsmühlen +
Juarez Bochi +
Junya Hayashi +
K.-Michael Aye
Kerby Shedden +
Kevin Sheppard
Kieran O’Mahony
Kodi Arfer +
Matti Airas +
Min RK +
Mortada Mehyar
Robert +
Scott E Lasley
Scott Lasley +
Sergio Pascual +
Skipper Seabold
Stephan Hoyer
Thomas Grainger
Tom Augspurger
TomAugspurger
Vladimir Filimonov +
Vyomkesh Tripathi +
Will Holmgren
Yulong Yang +
behzad nouri
bertrandhaut +
bjonen
cel4 +
clham
hsperr +
ischwabacher
jnmclarty
josham +
jreback
omtinez +
roch +
sinhrks
unutbu

		方法
`isalnum()`	`isalpha()`	`isdigit()`	`isdigit()`	`isspace()`
`islower()`	`isupper()`	`istitle()`	`isnumeric()`	`isdecimal()`
`find()`	`rfind()`	`ljust()`	`rjust()`	`zfill()`