多层感知器：一个简单的多层神经网络

多层感知器的实现，前馈人工神经网络。

多层感知器（MLP）分类器

这个 notebook 演示了如何使用 mlxtend 库中的多层感知器（MLP）实现分类任务。我们将使用一个简单的数据集来展示 MLP 模型的训练和预测过程。

概述

虽然代码完全可用并可以用于常见的分类任务，但此实现并非针对效率而设计，而是为了清晰度 - 原始代码是为了演示目的而编写的。

基本架构

神经元 $x_0$ 和 $a_0$ 代表偏置单元（$x_0=1$, $a_0=1$）。

$i$的上标表示第$i$层，而j的下标表示相应单元的索引。例如，$a_{1}^{(2)}$ 是指在第2层（这里是隐藏层）中偏置单元之后的第一个激活单元（即第2个激活单元）。

$\begin{align} \mathbf{a^{(2)}} &= \begin{bmatrix} a_{0}^{(2)} \ a_{1}^{(2)} \ \vdots \ a_{m}^{(2)} \end{bmatrix}. \end{align}$

在多层感知器中，每一层 $(l)$ 与下一层 $(l+1)$ 是完全连接的。我们将连接第 $l$ 层中第 $k$ 个单元与第 $l+1$ 层中第 $j$ 个单元的权重系数记为 $w^{(l)}_{j, k}$。

例如，连接单元

$a_0^{(2)} \rightarrow a_1^{(3)}$

的权重系数可以写作 $w_{1,0}^{(2)}$。

激活

在当前的实现中，隐藏层的激活是通过逻辑（ sigmoid ）函数 $\phi(z) = \frac{1}{1 + e^{-z}}$ 计算得出的。

（有关逻辑函数的更多详情，请参见 classifier.LogisticRegression；不同激活函数的一般概述可以在这里找到。）

此外，MLP在输出层使用softmax函数，有关逻辑函数的更多细节，请参见classifier.SoftmaxRegression。

参考文献

D. R. G. H. R. Williams 和 G. Hinton. 通过反向传播错误学习表示. 自然, 页码 323–533, 1986.
C. M. Bishop. 用于模式识别的神经网络. 牛津大学出版社, 1995.
T. Hastie, J. Friedman, 和 R. Tibshirani. 统计学习的要素, 第二卷. 斯普林格, 2009.

示例 1 - 鸢尾花分类

加载Iris数据集中的两个特征（花瓣长度和花瓣宽度）以进行可视化：

from mlxtend.data import iris_data
X, y = iris_data()
X = X[:, [0, 3]]    

# 标准化训练数据
X_std = (X - X.mean(axis=0)) / X.std(axis=0)

训练神经网络用于3个输出花卉类别（'Setosa'，'Versicolor'，'Virginica'），使用常规梯度下降（`minibatches=1`），30个隐藏单元，并且不进行正则化。

梯度下降

将 minibatches 设置为 1 将导致梯度下降训练；请参见梯度下降与随机梯度下降以获取详细信息。

from mlxtend.classifier import MultiLayerPerceptron as MLP

nn1 = MLP(hidden_layers=[50], 
          l2=0.00, 
          l1=0.0, 
          epochs=150, 
          eta=0.05, 
          momentum=0.1,
          decrease_const=0.0,
          minibatches=1, 
          random_seed=1,
          print_progress=3)

nn1 = nn1.fit(X_std, y)

Iteration: 150/150 | Cost 0.06 | Elapsed: 0:00:00 | ETA: 0:00:00

from mlxtend.plotting import plot_decision_regions
import matplotlib.pyplot as plt

fig = plot_decision_regions(X=X_std, y=y, clf=nn1, legend=2)
plt.title('Multi-layer Perceptron w. 1 hidden layer (logistic sigmoid)')
plt.show()

png

import matplotlib.pyplot as plt
plt.plot(range(len(nn1.cost_)), nn1.cost_)
plt.ylabel('Cost')
plt.xlabel('Epochs')
plt.show()

png

print('Accuracy: %.2f%%' % (100 * nn1.score(X_std, y)))

Accuracy: 96.67%

随机梯度下降

将 minibatches 设置为 n_samples 将导致随机梯度下降训练；请参阅梯度下降与随机梯度下降以获取详细信息。

nn2 = MLP(hidden_layers=[50], 
          l2=0.00, 
          l1=0.0, 
          epochs=5, 
          eta=0.005, 
          momentum=0.1,
          decrease_const=0.0,
          minibatches=len(y), 
          random_seed=1,
          print_progress=3)

nn2.fit(X_std, y)

plt.plot(range(len(nn2.cost_)), nn2.cost_)
plt.ylabel('Cost')
plt.xlabel('Epochs')
plt.show()

Iteration: 5/5 | Cost 0.11 | Elapsed: 00:00:00 | ETA: 00:00:00

png

继续训练 25 个周期...

nn2.epochs = 25
nn2 = nn2.fit(X_std, y)

Iteration: 25/25 | Cost 0.07 | Elapsed: 0:00:00 | ETA: 0:00:00

plt.plot(range(len(nn2.cost_)), nn2.cost_)
plt.ylabel('Cost')
plt.xlabel('Epochs')
plt.show()

png

示例 2 - 从 10% 的 MNIST 子集分类手写数字

加载一个5000个样本的子集来自MNIST数据集（如果你想下载并读取完整的MNIST数据集，请参见data.loadlocal_mnist）。

from mlxtend.data import mnist_data
from mlxtend.preprocessing import shuffle_arrays_unison

X, y = mnist_data()
X, y = shuffle_arrays_unison((X, y), random_seed=1)
X_train, y_train = X[:500], y[:500]
X_test, y_test = X[500:], y[500:]

可视化MNIST数据集中的一个样本，以检查它是否正确加载：

import matplotlib.pyplot as plt

def plot_digit(X, y, idx):
    img = X[idx].reshape(28,28)
    plt.imshow(img, cmap='Greys',  interpolation='nearest')
    plt.title('true label: %d' % y[idx])
    plt.show()

plot_digit(X, y, 3500)

png

标准化像素值：

import numpy as np
from mlxtend.preprocessing import standardize

X_train_std, params = standardize(X_train, 
                                  columns=range(X_train.shape[1]), 
                                  return_params=True)

X_test_std = standardize(X_test,
                         columns=range(X_test.shape[1]),
                         params=params)

初始化神经网络以识别10个不同的数字（0-9），使用300个周期和小批量学习。

nn1 = MLP(hidden_layers=[150], 
          l2=0.00, 
          l1=0.0, 
          epochs=100, 
          eta=0.005, 
          momentum=0.0,
          decrease_const=0.0,
          minibatches=100, 
          random_seed=1,
          print_progress=3)

在学习特征的同时打印进度，以便了解大概需要多长时间。

import matplotlib.pyplot as plt

nn1.fit(X_train_std, y_train)

plt.plot(range(len(nn1.cost_)), nn1.cost_)
plt.ylabel('Cost')
plt.xlabel('Epochs')
plt.show()

Iteration: 100/100 | Cost 0.01 | Elapsed: 0:00:17 | ETA: 0:00:00

png

print('Train Accuracy: %.2f%%' % (100 * nn1.score(X_train_std, y_train)))
print('Test Accuracy: %.2f%%' % (100 * nn1.score(X_test_std, y_test)))

Train Accuracy: 100.00%
Test Accuracy: 84.62%

请注意，该神经网络仅在10%的MNIST数据上进行了训练，以便进行技术演示，因此预测性能较差。

API

MultiLayerPerceptron(eta=0.5, epochs=50, hidden_layers=[50], n_classes=None, momentum=0.0, l1=0.0, l2=0.0, dropout=1.0, decrease_const=0.0, minibatches=1, random_seed=None, print_progress=0)

Multi-layer perceptron classifier with logistic sigmoid activations

Parameters

eta : float (default: 0.5)

Learning rate (between 0.0 and 1.0)
epochs : int (default: 50)

Passes over the training dataset. Prior to each epoch, the dataset is shuffled if minibatches > 1 to prevent cycles in stochastic gradient descent.
hidden_layers : list (default: [50])

Number of units per hidden layer. By default 50 units in the first hidden layer. At the moment only 1 hidden layer is supported
n_classes : int (default: None)

A positive integer to declare the number of class labels if not all class labels are present in a partial training set. Gets the number of class labels automatically if None.
l1 : float (default: 0.0)

L1 regularization strength
l2 : float (default: 0.0)

L2 regularization strength
momentum : float (default: 0.0)

Momentum constant. Factor multiplied with the gradient of the previous epoch t-1 to improve learning speed w(t) := w(t) - (grad(t) + momentum * grad(t-1))
decrease_const : float (default: 0.0)

Decrease constant. Shrinks the learning rate after each epoch via eta / (1 + epoch*decrease_const)
minibatches : int (default: 1)

Divide the training data into k minibatches for accelerated stochastic gradient descent learning. Gradient Descent Learning if minibatches = 1 Stochastic Gradient Descent learning if minibatches = len(y) Minibatch learning if minibatches > 1
random_seed : int (default: None)

Set random state for shuffling and initializing the weights.
print_progress : int (default: 0)

Prints progress in fitting to stderr. 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion

Attributes

w_ : 2d-array, shape=[n_features, n_classes]

Weights after fitting.
b_ : 1D-array, shape=[n_classes]

Bias units after fitting.
cost_ : list

List of floats; the mean categorical cross entropy cost after each epoch.

Examples

For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/classifier/MultiLayerPerceptron/

Methods

fit(X, y, init_params=True)

Learn model from training data.

Parameters

X : {array-like, sparse matrix}, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.
y : array-like, shape = [n_samples]

Target values.
init_params : bool (default: True)

Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting.

Returns

self : object

predict(X)

Predict targets from X.

Parameters

X : {array-like, sparse matrix}, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.

Returns

target_values : array-like, shape = [n_samples]

Predicted target values.

predict_proba(X)

Predict class probabilities of X from the net input.

Parameters

X : {array-like, sparse matrix}, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.

Returns

Class probabilties : array-like, shape= [n_samples, n_classes]

score(X, y)

Compute the prediction accuracy

Parameters

X : {array-like, sparse matrix}, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.
y : array-like, shape = [n_samples]

Target values (true class labels).

Returns

acc : float

The prediction accuracy as a float between 0.0 and 1.0 (perfect score).

ython