Adaline: 自适应线性神经元分类器

自适应线性神经元（Adaline）的实现，用于二分类任务。

> 从 mlxtend.classifier 导入 Adaline

概述

自适应线性神经元（Adaline）的示意图 -- 一个具有阈值单元的单层人工线性神经元：

Adaline 分类器与普通最小二乘法（OLS）线性回归算法密切相关；在 OLS 回归中，我们找到最小化垂直偏移的直线（或超平面）。换句话说，我们定义最佳拟合线为使我们的目标变量（y）与我们在大小为 $n$ 的数据集中所有样本 $i$ 的预测输出之间的平方误差总和（SSE）或均方误差（MSE）最小化的线。

$$ SSE = \sum_i (\text{target}^{(i)} - \text{output}^{(i)})^2$$

$$MSE = \frac{1}{n} \times SSE$$

LinearRegression 实现了一个线性回归模型，用于执行普通最小二乘法回归，而在 Adaline 中，我们添加了一个阈值函数 $g(\cdot)$ 将连续输出转换为分类标签：

$$y = g({z}) = \begin{cases} 1 & \text{如果 z $\ge$ 0}\ -1 & \text{否则}. \end{cases} $$

Adaline模型可以通过以下三种方法之一进行训练：

正规方程
梯度下降
随机梯度下降

正规方程（闭式解）

对于“较小”的数据集，优先考虑闭式解，因为计算（“昂贵的”）矩阵逆并不是问题。对于非常大的数据集，或者逆矩阵 $[X^T X]$ 可能不存在的情况（矩阵是不可逆或奇异的，例如完美多重共线性的情况），则优先考虑梯度下降或随机梯度下降方法。

线性函数（线性回归模型）定义为：

$$z = w_0x_0 + w_1x_1 + ... + w_mx_m = \sum_{j=0}^{m} w_j x_j = \mathbf{w}^T\mathbf{x}$$

其中 $y$ 是响应变量，$\mathbf{x}$ 是 $m$ 维样本向量，$\mathbf{w}$ 是权重向量（系数向量）。注意，$w_0$ 表示模型的 y 轴截距，因此 $x_0=1$。

使用闭式解（正规方程），我们计算模型的权重如下：

$$ \mathbf{w} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^Ty$$

梯度下降 (GD) 和随机梯度下降 (SGD)

在当前的实现中，Adaline模型是通过梯度下降或随机梯度下降来学习的。

有关详细信息，请参见梯度下降和随机梯度下降和线性回归和Adaline的梯度下降规则推导。

随机洗牌实现如下：

针对一个或多个周期
- 随机打乱训练集中的样本
  - 对于训练样本 i
    - 计算梯度并执行权重更新

参考文献

B. Widrow, M. E. Hoff 等人自适应开关电路。1960。

示例 1 - 闭式解法

from mlxtend.data import iris_data
from mlxtend.plotting import plot_decision_regions
from mlxtend.classifier import Adaline
import matplotlib.pyplot as plt

# 加载数据

X, y = iris_data()
X = X[:, [0, 3]] # 萼片长度和花瓣宽度
X = X[0:100] # 0类和1类
y = y[0:100] # 0类和1类

# 标准化
X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std()
X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std()


ada = Adaline(epochs=30, 
              eta=0.01, 
              minibatches=None, 
              random_seed=1)
ada.fit(X, y)
plot_decision_regions(X, y, clf=ada)
plt.title('Adaline - Closed Form')

plt.show()

png

示例 2 - 梯度下降

from mlxtend.data import iris_data
from mlxtend.plotting import plot_decision_regions
from mlxtend.classifier import Adaline
import matplotlib.pyplot as plt

# 加载数据

X, y = iris_data()
X = X[:, [0, 3]] # 萼片长度和花瓣宽度
X = X[0:100] # 0类和1类
y = y[0:100] # 0类和1类

# 标准化
X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std()
X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std()


ada = Adaline(epochs=30, 
              eta=0.01, 
              minibatches=1, # 用于梯度下降学习
              random_seed=1,
              print_progress=3)

ada.fit(X, y)
plot_decision_regions(X, y, clf=ada)
plt.title('Adaline - Gradient Descent')
plt.show()

plt.plot(range(len(ada.cost_)), ada.cost_)
plt.xlabel('Iterations')
plt.ylabel('Cost')

Iteration: 30/30 | Cost 3.79 | Elapsed: 0:00:00 | ETA: 0:00:00

png

Text(0, 0.5, 'Cost')

png

示例 3 - 随机梯度下降

from mlxtend.data import iris_data
from mlxtend.plotting import plot_decision_regions
from mlxtend.classifier import Adaline
import matplotlib.pyplot as plt

# 加载数据

X, y = iris_data()
X = X[:, [0, 3]] # 萼片长度和花瓣宽度
X = X[0:100] # 0类和1类
y = y[0:100] # 0类和1类

# 标准化
X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std()
X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std()


ada = Adaline(epochs=15, 
              eta=0.02, 
              minibatches=len(y), # 对于随机梯度下降学习 
              random_seed=1,
              print_progress=3)

ada.fit(X, y)
plot_decision_regions(X, y, clf=ada)
plt.title('Adaline - Stochastic Gradient Descent')
plt.show()

plt.plot(range(len(ada.cost_)), ada.cost_)
plt.xlabel('Iterations')
plt.ylabel('Cost')
plt.show()

Iteration: 15/15 | Cost 3.81 | Elapsed: 0:00:00 | ETA: 0:00:00

png

示例 4 - 使用小批量的随机梯度下降

from mlxtend.data import iris_data
from mlxtend.plotting import plot_decision_regions
from mlxtend.classifier import Adaline
import matplotlib.pyplot as plt

# 加载数据

X, y = iris_data()
X = X[:, [0, 3]] # 萼片长度和花瓣宽度
X = X[0:100] # 0类和1类
y = y[0:100] # 0类和1类

# 标准化
X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std()
X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std()


ada = Adaline(epochs=15, 
              eta=0.02, 
              minibatches=5, # 对于使用小批量大小为20的随机梯度下降学习
              random_seed=1,
              print_progress=3)

ada.fit(X, y)
plot_decision_regions(X, y, clf=ada)
plt.title('Adaline - Stochastic Gradient Descent w. Minibatches')
plt.show()

plt.plot(range(len(ada.cost_)), ada.cost_)
plt.xlabel('Iterations')
plt.ylabel('Cost')
plt.show()

Iteration: 15/15 | Cost 3.87 | Elapsed: 0:00:00 | ETA: 0:00:00

png

API

Adaline(eta=0.01, epochs=50, minibatches=None, random_seed=None, print_progress=0)

ADAptive LInear NEuron classifier.

Note that this implementation of Adaline expects binary class labels in {0, 1}.

Parameters

eta : float (default: 0.01)

solver rate (between 0.0 and 1.0)
epochs : int (default: 50)

Passes over the training dataset. Prior to each epoch, the dataset is shuffled if minibatches > 1 to prevent cycles in stochastic gradient descent.
minibatches : int (default: None)

The number of minibatches for gradient-based optimization. If None: Normal Equations (closed-form solution) If 1: Gradient Descent learning If len(y): Stochastic Gradient Descent (SGD) online learning If 1 < minibatches < len(y): SGD Minibatch learning
random_seed : int (default: None)

Set random state for shuffling and initializing the weights.
print_progress : int (default: 0)

Prints progress in fitting to stderr if not solver='normal equation' 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion

Attributes

w_ : 2d-array, shape={n_features, 1}

Model weights after fitting.
b_ : 1d-array, shape={1,}

Bias unit after fitting.
cost_ : list

Sum of squared errors after each epoch.

Examples

For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/classifier/Adaline/

Methods

fit(X, y, init_params=True)

Learn model from training data.

Parameters

X : {array-like, sparse matrix}, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.
y : array-like, shape = [n_samples]

Target values.
init_params : bool (default: True)

Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting.

Returns

self : object

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any

Parameter names mapped to their values.'

adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause

predict(X)

Predict targets from X.

Parameters

X : {array-like, sparse matrix}, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.

Returns

target_values : array-like, shape = [n_samples]

Predicted target values.

score(X, y)

Compute the prediction accuracy

Parameters

X : {array-like, sparse matrix}, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.
y : array-like, shape = [n_samples]

Target values (true class labels).

Returns

acc : float

The prediction accuracy as a float between 0.0 and 1.0 (perfect score).

set_params(params)

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Returns

self

adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause