感知器：一种简单的二元分类器

感知器学习算法用于分类的实现。

> 从 mlxtend.classifier 导入感知机

概述

这个“阈值”感知器的想法是模仿大脑中单个神经元的工作方式：它要么“发射”，要么不发射。
感知器接收多个输入信号，如果输入信号的总和超过某个阈值，它就会返回一个信号；否则，它将保持“沉默”。使其成为“机器学习”算法的是弗兰克·罗森布拉特的感知器学习规则的想法：感知器算法是关于学习输入信号的权重，以绘制线性决策边界，从而使我们能够区分两个线性可分的类别 +1 和 -1。

基本符号

在我们深入探索感知器分类器权重学习的算法之前，让我们简要回顾一下基本的符号。在接下来的章节中，我们将把二分类设置中的正类和负类分别标记为“1”和“-1”。接下来，我们定义一个激活函数 $g(\mathbf{z})$，它以输入值 $\mathbf{x}$ 和权重 $\mathbf{w}$ 的线性组合为输入 ($\mathbf{z} = w_1x_{1} + \dots + w_mx_{m}$)，如果 $g(\mathbf{z})$ 大于定义的阈值 $\theta$，我们预测为 1，否则预测为 -1；在这种情况下，这个激活函数 $g$ 是一个简单的“单位阶跃函数”，有时也称为“海维赛德阶跃函数”。

$$ g(z) =\begin{cases} 1 & \text{如果 $z \ge \theta$}\ -1 & \text{否则}. \end{cases} $$

哪里

$$z = w_1x_{1} + \dots + w_mx_{m} = \sum_{j=1}^{m} x_{j}w_{j} \ = \mathbf{w}^T\mathbf{x}$$

$\mathbf{w}$ 是特征向量，而 $\mathbf{x}$ 是来自训练数据集的 $m$ 维样本：

$$ \mathbf{w} = \begin{bmatrix} w_{1} \ \vdots \ w_{m} \end{bmatrix} \quad \mathbf{x} = \begin{bmatrix} x_{1} \ \vdots \ x_{m} \end{bmatrix}$$

为了简化符号，我们将 $\theta$ 移到方程的左侧，并定义 $w_0 = -\theta \text{ 和 } x_0=1$。

因此，

$$ g({z}) =\begin{cases} 1 & \text{如果 $z \ge 0$}\ -1 & \text{否则}. \end{cases} $$

并且

$$z = w_0x_{0} + w_1x_{1} + \dots + w_mx_{m} = \sum_{j=0}^{m} x_{j}w_{j} \ = \mathbf{w}^T\mathbf{x}.$$

感知器规则

罗森布拉特的初始感知器规则相当简单，可以通过以下步骤来总结：

将权重初始化为0或小的随机数。
对于每个训练样本 $\mathbf{x^{(i)}}$:
1. 计算输出值。
2. 更新权重。

输出值是由我们之前定义的单位阶跃函数预测的类别标签（输出 $=g(\mathbf{z})$），权重更新可以更正式地写为 $w_j := w_j + \Delta w_j$。

在每次增量时更新权重的值是通过学习规则计算的

$\Delta w_j = \eta \; (\text{target}^{(i)} - \text{output}^{(i)})\;x^{(i)}_{j}$

其中 $\eta$ 是学习率（一个在 0.0 和 1.0 之间的常数），“目标”是真实类别标签，而“输出”是预测的类别标签。

重要的是要注意，权重向量中的所有权重都是同时更新的。具体来说，对于一个二维数据集，我们可以将更新写为：

$\Delta w_0 = \eta(\text{目标}^{(i)} - \text{输出}^{(i)})$
$\Delta w_1 = \eta(\text{目标}^{(i)} - \text{输出}^{(i)})\;x^{(i)}{1}$
$\Delta w_2 = \eta(\text{目标}^{(i)} - \text{输出}^{(i)})\;x^{(i)}{2}$

在我们用Python实现感知器规则之前，让我们做一个简单的思想实验，以说明这个学习规则是多么的简单优雅。在感知器正确预测分类标签的两个场景中，权重保持不变：

$\Delta w_j = \eta(-1^{(i)} - -1^{(i)})\;x^{(i)}_{j} = 0$
$\Delta w_j = \eta(1^{(i)} - 1^{(i)})\;x^{(i)}_{j} = 0$

然而，在错误预测的情况下，权重会被“推动”到正的或负的目标类别方向上：

$\Delta w_j = \eta(1^{(i)} - -1^{(i)})\;x^{(i)}{j} = \eta(2)\;x^{(i)}{j}$
$\Delta w_j = \eta(-1^{(i)} - 1^{(i)})\;x^{(i)}{j} = \eta(-2)\;x^{(i)}{j}$

需要注意的是，感知器的收敛性仅在两个类别线性可分的情况下得以保证。如果两个类别无法通过线性决策边界分开，我们可以设置对训练数据集的最大遍历次数（“轮次”）和/或容忍的错误分类数量的阈值。

参考文献

F. 玫瑰布拉特。《感知器，一个感知和识别的自动机》项目Para。康奈尔航空实验室，1957年。

示例 1 - 鸢尾花的分类

from mlxtend.data import iris_data
from mlxtend.plotting import plot_decision_regions
from mlxtend.classifier import Perceptron
import matplotlib.pyplot as plt

# 加载数据

X, y = iris_data()
X = X[:, [0, 3]] # 萼片长度和花瓣宽度
X = X[0:100] # 0类和1类
y = y[0:100] # 0类和1类

# 标准化
X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std()
X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std()


# 罗森布拉特感知器

ppn = Perceptron(epochs=5, 
                 eta=0.05, 
                 random_seed=0,
                 print_progress=3)
ppn.fit(X, y)

plot_decision_regions(X, y, clf=ppn)
plt.title('Perceptron - Rosenblatt Perceptron Rule')
plt.show()

print('Bias & Weights: %s' % ppn.w_)

plt.plot(range(len(ppn.cost_)), ppn.cost_)
plt.xlabel('Iterations')
plt.ylabel('Missclassifications')
plt.show()

Iteration: 5/5 | Elapsed: 00:00:00 | ETA: 00:00:00

png

Bias & Weights: [[-0.04500809]
 [ 0.11048855]]

png

API

Perceptron(eta=0.1, epochs=50, random_seed=None, print_progress=0)

Perceptron classifier.

Note that this implementation of the Perceptron expects binary class labels in {0, 1}.

Parameters

eta : float (default: 0.1)

Learning rate (between 0.0 and 1.0)
epochs : int (default: 50)

Number of passes over the training dataset. Prior to each epoch, the dataset is shuffled to prevent cycles.
random_seed : int

Random state for initializing random weights and shuffling.
print_progress : int (default: 0)

Prints progress in fitting to stderr. 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion

Attributes

w_ : 2d-array, shape={n_features, 1}

Model weights after fitting.
b_ : 1d-array, shape={1,}

Bias unit after fitting.
cost_ : list

Number of misclassifications in every epoch.

Examples

For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/classifier/Perceptron/

Methods

fit(X, y, init_params=True)

Learn model from training data.

Parameters

X : {array-like, sparse matrix}, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.
y : array-like, shape = [n_samples]

Target values.
init_params : bool (default: True)

Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting.

Returns

self : object

predict(X)

Predict targets from X.

Parameters

X : {array-like, sparse matrix}, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.

Returns

target_values : array-like, shape = [n_samples]

Predicted target values.

score(X, y)

Compute the prediction accuracy

Parameters

X : {array-like, sparse matrix}, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.
y : array-like, shape = [n_samples]

Target values (true class labels).

Returns

acc : float

The prediction accuracy as a float between 0.0 and 1.0 (perfect score).

ython