One hot encoding

one_hot: 类别标签数组的一种独热编码函数

一个为类别标签执行独热编码的函数。

> 从 mlxtend.preprocessing 导入 one_hot

概述

典型的监督机器学习算法用于分类时，假设类别标签是名义的（这是一种分类的特例，其中不暗示任何顺序）。一个名义特征的典型例子是“颜色”，因为在大多数应用中，我们不能说“橙色 > 蓝色 > 红色”。

one_hot函数提供了一个简单的接口，将类别标签整数转换为所谓的一热编码数组，其中每个唯一标签在新数组中代表为一列。

例如，假设我们有来自3个不同类别（0, 1和2）的5个数据点。

y = [0, # 样本1，类别0
     1, # 样本2，类别1
     0, # 样本3，类别0
     2, # 样本4，类别2
     2] # 样本5，类别2

一热编码后，我们将获得以下数组（请注意，每行中“1”的索引位置表示该样本的类别标签）：

y = [[1,  0,  0], # 样本1，类别0
     [0,  1,  0], # 样本2，类别1  
     [1,  0,  0], # 样本3，类别0
     [0,  0,  1], # 样本4，类别2
     [0,  0,  1]  # 样本5，类别2
     ])

示例 1 - 默认值

from mlxtend.preprocessing import one_hot
import numpy as np

y = np.array([0, 1, 2, 1, 2])
one_hot(y)

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

示例 2 - Python 列表

from mlxtend.preprocessing import one_hot

y = [0, 1, 2, 1, 2]
one_hot(y)

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

示例 3 - 整数数组

from mlxtend.preprocessing import one_hot

y = [0, 1, 2, 1, 2]
one_hot(y, dtype='int')

array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1],
       [0, 1, 0],
       [0, 0, 1]])

示例 4 - 任意数量的类别标签

from mlxtend.preprocessing import one_hot

y = [0, 1, 2, 1, 2]
one_hot(y, num_labels=10)

array([[ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])

API

one_hot(y, num_labels='auto', dtype='float')

One-hot encoding of class labels

Parameters

y : array-like, shape = [n_classlabels]

Python list or numpy array consisting of class labels.
num_labels : int or 'auto'

Number of unique labels in the class label array. Infers the number of unique labels from the input array if set to 'auto'.
dtype : str

NumPy array type (float, float32, float64) of the output array.

Returns

ary : numpy.ndarray, shape = [n_classlabels]

One-hot encoded array, where each sample is represented as a row vector in the returned array.

Examples

For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/preprocessing/one_hot/