pandas.Categorical#

class pandas.Categorical(values, categories=None, ordered=None, dtype=None, copy=True)[源代码][源代码]#

以经典的 R / S-plus 风格表示一个分类变量。

Categoricals 只能取有限且通常固定的可能值（categories）。与统计分类变量不同，Categorical 可能有顺序，但数值运算（加法、除法等）是不可能的。

Categorical 的所有值要么在 categories 中，要么是 np.nan。赋值给 categories 之外的值将引发 ValueError。顺序由 categories 的顺序定义，而不是值的字典顺序。

参数:

值类似列表: 分类的值。如果给出了类别，不在类别中的值将被替换为 NaN。
类别索引式（唯一），可选: 这个分类的唯一类别。如果没有给出，则假定类别是 values 的唯一值（如果可能，则排序，否则按它们出现的顺序）。
有序bool, 默认为 False: 这个分类是否被视为有序分类。如果是True，则生成的分类将是有序的。有序分类在排序时，会遵循其`categories`属性（如果提供了`categories`参数，则遵循该参数）的顺序。
dtypeCategoricalDtype: 一个 CategoricalDtype 的实例，用于此分类。
复制布尔值, 默认为 True: 如果代码未更改是否复制。

属性

`类别`	这个分类的类别。
`代码`	这个分类索引的类别代码。
`有序`	类别之间是否存在有序关系。
`dtype`	这个实例的 `CategoricalDtype`。

方法

`from_codes`(codes[, categories, ordered, ...])	从代码和类别或数据类型创建一个分类类型。
`as_ordered`()	设置类别为有序。
`as_unordered`()	将分类设置为无序。
`set_categories`(new_categories[, ordered, rename])	将类别设置为指定的新类别。
`rename_categories`(new_categories)	重命名类别。
`reorder_categories`(new_categories[, ordered])	按照 new_categories 中的指定重新排序类别。
`add_categories`(new_categories)	添加新类别。
`remove_categories`(移除项)	移除指定的类别。
`remove_unused_categories`()	移除未使用的类别。
`map`(mapper[, na_action])	使用输入映射或函数映射类别。
`__array__`([dtype, copy])	numpy 数组接口。

引发:

ValueError: 如果类别未通过验证。
TypeError: 如果明确给出 ordered=True 但没有 categories 并且 values 不可排序。

参见

CategoricalDtype: 分类数据的类型。
CategoricalIndex: 一个基于 Categorical 的索引。

备注

更多信息请参见用户指南。

例子

>>> pd.Categorical([1, 2, 3, 1, 2, 3])
[1, 2, 3, 1, 2, 3]
Categories (3, int64): [1, 2, 3]

>>> pd.Categorical(["a", "b", "c", "a", "b", "c"])
['a', 'b', 'c', 'a', 'b', 'c']
Categories (3, object): ['a', 'b', 'c']

缺失值不作为类别包含在内。

>>> c = pd.Categorical([1, 2, 3, 1, 2, 3, np.nan])
>>> c
[1, 2, 3, 1, 2, 3, NaN]
Categories (3, int64): [1, 2, 3]

然而，它们的存在通过 codes 属性中的代码 -1 来表示。

>>> c.codes
array([ 0,  1,  2,  0,  1,  2, -1], dtype=int8)

有序的 Categoricals 可以根据类别的自定义顺序进行排序，并且可以有一个最小值和最大值。

>>> c = pd.Categorical(
...     ["a", "b", "c", "a", "b", "c"], ordered=True, categories=["c", "b", "a"]
... )
>>> c
['a', 'b', 'c', 'a', 'b', 'c']
Categories (3, object): ['c' < 'b' < 'a']
>>> c.min()
'c'