Adaline: 自适应线性神经元分类器
自适应线性神经元(Adaline)的实现,用于二分类任务。
> 从 mlxtend.classifier 导入 Adaline
概述
自适应线性神经元(Adaline)的示意图 -- 一个具有阈值单元的单层人工线性神经元:
Adaline 分类器与普通最小二乘法(OLS)线性回归算法密切相关;在 OLS 回归中,我们找到最小化垂直偏移的直线(或超平面)。换句话说,我们定义最佳拟合线为使我们的目标变量(y)与我们在大小为 $n$ 的数据集中所有样本 $i$ 的预测输出之间的平方误差总和(SSE)或均方误差(MSE)最小化的线。
$$ SSE = \sum_i (\text{target}^{(i)} - \text{output}^{(i)})^2$$
$$MSE = \frac{1}{n} \times SSE$$
LinearRegression
实现了一个线性回归模型,用于执行普通最小二乘法回归,而在 Adaline 中,我们添加了一个阈值函数 $g(\cdot)$ 将连续输出转换为分类标签:
$$y = g({z}) = \begin{cases} 1 & \text{如果 z $\ge$ 0}\ -1 & \text{否则}. \end{cases} $$
Adaline模型可以通过以下三种方法之一进行训练:
- 正规方程
- 梯度下降
- 随机梯度下降
正规方程(闭式解)
对于“较小”的数据集,优先考虑闭式解,因为计算(“昂贵的”)矩阵逆并不是问题。对于非常大的数据集,或者逆矩阵 $[X^T X]$ 可能不存在的情况(矩阵是不可逆或奇异的,例如完美多重共线性的情况),则优先考虑梯度下降或随机梯度下降方法。
线性函数(线性回归模型)定义为:
$$z = w_0x_0 + w_1x_1 + ... + w_mx_m = \sum_{j=0}^{m} w_j x_j = \mathbf{w}^T\mathbf{x}$$
其中 $y$ 是响应变量,$\mathbf{x}$ 是 $m$ 维样本向量,$\mathbf{w}$ 是权重向量(系数向量)。注意,$w_0$ 表示模型的 y 轴截距,因此 $x_0=1$。
使用闭式解(正规方程),我们计算模型的权重如下:
$$ \mathbf{w} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^Ty$$
梯度下降 (GD) 和随机梯度下降 (SGD)
在当前的实现中,Adaline模型是通过梯度下降或随机梯度下降来学习的。
有关详细信息,请参见梯度下降和随机梯度下降和线性回归和Adaline的梯度下降规则推导。
随机洗牌实现如下:
- 针对一个或多个周期
- 随机打乱训练集中的样本
- 对于训练样本 i
- 计算梯度并执行权重更新
- 对于训练样本 i
- 随机打乱训练集中的样本
参考文献
- B. Widrow, M. E. Hoff 等人 自适应开关电路。1960。
示例 1 - 闭式解法
from mlxtend.data import iris_data
from mlxtend.plotting import plot_decision_regions
from mlxtend.classifier import Adaline
import matplotlib.pyplot as plt
# 加载数据
X, y = iris_data()
X = X[:, [0, 3]] # 萼片长度和花瓣宽度
X = X[0:100] # 0类和1类
y = y[0:100] # 0类和1类
# 标准化
X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std()
X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std()
ada = Adaline(epochs=30,
eta=0.01,
minibatches=None,
random_seed=1)
ada.fit(X, y)
plot_decision_regions(X, y, clf=ada)
plt.title('Adaline - Closed Form')
plt.show()
示例 2 - 梯度下降
from mlxtend.data import iris_data
from mlxtend.plotting import plot_decision_regions
from mlxtend.classifier import Adaline
import matplotlib.pyplot as plt
# 加载数据
X, y = iris_data()
X = X[:, [0, 3]] # 萼片长度和花瓣宽度
X = X[0:100] # 0类和1类
y = y[0:100] # 0类和1类
# 标准化
X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std()
X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std()
ada = Adaline(epochs=30,
eta=0.01,
minibatches=1, # 用于梯度下降学习
random_seed=1,
print_progress=3)
ada.fit(X, y)
plot_decision_regions(X, y, clf=ada)
plt.title('Adaline - Gradient Descent')
plt.show()
plt.plot(range(len(ada.cost_)), ada.cost_)
plt.xlabel('Iterations')
plt.ylabel('Cost')
Iteration: 30/30 | Cost 3.79 | Elapsed: 0:00:00 | ETA: 0:00:00
Text(0, 0.5, 'Cost')
示例 3 - 随机梯度下降
from mlxtend.data import iris_data
from mlxtend.plotting import plot_decision_regions
from mlxtend.classifier import Adaline
import matplotlib.pyplot as plt
# 加载数据
X, y = iris_data()
X = X[:, [0, 3]] # 萼片长度和花瓣宽度
X = X[0:100] # 0类和1类
y = y[0:100] # 0类和1类
# 标准化
X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std()
X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std()
ada = Adaline(epochs=15,
eta=0.02,
minibatches=len(y), # 对于随机梯度下降学习
random_seed=1,
print_progress=3)
ada.fit(X, y)
plot_decision_regions(X, y, clf=ada)
plt.title('Adaline - Stochastic Gradient Descent')
plt.show()
plt.plot(range(len(ada.cost_)), ada.cost_)
plt.xlabel('Iterations')
plt.ylabel('Cost')
plt.show()
Iteration: 15/15 | Cost 3.81 | Elapsed: 0:00:00 | ETA: 0:00:00
示例 4 - 使用小批量的随机梯度下降
from mlxtend.data import iris_data
from mlxtend.plotting import plot_decision_regions
from mlxtend.classifier import Adaline
import matplotlib.pyplot as plt
# 加载数据
X, y = iris_data()
X = X[:, [0, 3]] # 萼片长度和花瓣宽度
X = X[0:100] # 0类和1类
y = y[0:100] # 0类和1类
# 标准化
X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std()
X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std()
ada = Adaline(epochs=15,
eta=0.02,
minibatches=5, # 对于使用小批量大小为20的随机梯度下降学习
random_seed=1,
print_progress=3)
ada.fit(X, y)
plot_decision_regions(X, y, clf=ada)
plt.title('Adaline - Stochastic Gradient Descent w. Minibatches')
plt.show()
plt.plot(range(len(ada.cost_)), ada.cost_)
plt.xlabel('Iterations')
plt.ylabel('Cost')
plt.show()
Iteration: 15/15 | Cost 3.87 | Elapsed: 0:00:00 | ETA: 0:00:00
API
Adaline(eta=0.01, epochs=50, minibatches=None, random_seed=None, print_progress=0)
ADAptive LInear NEuron classifier.
Note that this implementation of Adaline expects binary class labels in {0, 1}.
Parameters
-
eta
: float (default: 0.01)solver rate (between 0.0 and 1.0)
-
epochs
: int (default: 50)Passes over the training dataset. Prior to each epoch, the dataset is shuffled if
minibatches > 1
to prevent cycles in stochastic gradient descent. -
minibatches
: int (default: None)The number of minibatches for gradient-based optimization. If None: Normal Equations (closed-form solution) If 1: Gradient Descent learning If len(y): Stochastic Gradient Descent (SGD) online learning If 1 < minibatches < len(y): SGD Minibatch learning
-
random_seed
: int (default: None)Set random state for shuffling and initializing the weights.
-
print_progress
: int (default: 0)Prints progress in fitting to stderr if not solver='normal equation' 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion
Attributes
-
w_
: 2d-array, shape={n_features, 1}Model weights after fitting.
-
b_
: 1d-array, shape={1,}Bias unit after fitting.
-
cost_
: listSum of squared errors after each epoch.
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/classifier/Adaline/
Methods
fit(X, y, init_params=True)
Learn model from training data.
Parameters
-
X
: {array-like, sparse matrix}, shape = [n_samples, n_features]Training vectors, where n_samples is the number of samples and n_features is the number of features.
-
y
: array-like, shape = [n_samples]Target values.
-
init_params
: bool (default: True)Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting.
Returns
self
: object
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: boolean, optionalIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.'
adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause
predict(X)
Predict targets from X.
Parameters
-
X
: {array-like, sparse matrix}, shape = [n_samples, n_features]Training vectors, where n_samples is the number of samples and n_features is the number of features.
Returns
-
target_values
: array-like, shape = [n_samples]Predicted target values.
score(X, y)
Compute the prediction accuracy
Parameters
-
X
: {array-like, sparse matrix}, shape = [n_samples, n_features]Training vectors, where n_samples is the number of samples and n_features is the number of features.
-
y
: array-like, shape = [n_samples]Target values (true class labels).
Returns
-
acc
: floatThe prediction accuracy as a float between 0.0 and 1.0 (perfect score).
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Returns
self
adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause