用于分类的普通、Ledoit-Wolf 和 OAS 线性判别分析#

此示例说明了 Ledoit-Wolf 和 Oracle 近似收缩 (OAS) 协方差估计器如何改进分类。

LDA (Linear Discriminant Analysis) vs.  LDA with Ledoit Wolf vs.  LDA with OAS (1 discriminative feature)
import matplotlib.pyplot as plt
import numpy as np

from sklearn.covariance import OAS
from sklearn.datasets import make_blobs
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

n_train = 20  # samples for training
n_test = 200  # samples for testing
n_averages = 50  # how often to repeat classification
n_features_max = 75  # maximum number of features
step = 4  # step size for the calculation


def generate_data(n_samples, n_features):
    """生成具有噪声特征的随机斑点状数据。

这将返回一个形状为 `(n_samples, n_features)` 的输入数据数组和一个包含 `n_samples` 个目标标签的数组。

只有一个特征包含判别信息,其他特征仅包含噪声。
"""
    X, y = make_blobs(n_samples=n_samples, n_features=1, centers=[[-2], [2]])

    # 添加非歧视性特征
    if n_features > 1:
        X = np.hstack([X, np.random.randn(n_samples, n_features - 1)])
    return X, y


acc_clf1, acc_clf2, acc_clf3 = [], [], []
n_features_range = range(1, n_features_max + 1, step)
for n_features in n_features_range:
    score_clf1, score_clf2, score_clf3 = 0, 0, 0
    for _ in range(n_averages):
        X, y = generate_data(n_train, n_features)

        clf1 = LinearDiscriminantAnalysis(solver="lsqr", shrinkage=None).fit(X, y)
        clf2 = LinearDiscriminantAnalysis(solver="lsqr", shrinkage="auto").fit(X, y)
        oa = OAS(store_precision=False, assume_centered=False)
        clf3 = LinearDiscriminantAnalysis(solver="lsqr", covariance_estimator=oa).fit(
            X, y
        )

        X, y = generate_data(n_test, n_features)
        score_clf1 += clf1.score(X, y)
        score_clf2 += clf2.score(X, y)
        score_clf3 += clf3.score(X, y)

    acc_clf1.append(score_clf1 / n_averages)
    acc_clf2.append(score_clf2 / n_averages)
    acc_clf3.append(score_clf3 / n_averages)

features_samples_ratio = np.array(n_features_range) / n_train

plt.plot(
    features_samples_ratio,
    acc_clf1,
    linewidth=2,
    label="LDA",
    color="gold",
    linestyle="solid",
)
plt.plot(
    features_samples_ratio,
    acc_clf2,
    linewidth=2,
    label="LDA with Ledoit Wolf",
    color="navy",
    linestyle="dashed",
)
plt.plot(
    features_samples_ratio,
    acc_clf3,
    linewidth=2,
    label="LDA with OAS",
    color="red",
    linestyle="dotted",
)

plt.xlabel("n_features / n_samples")
plt.ylabel("Classification accuracy")

plt.legend(loc="lower left")
plt.ylim((0.65, 1.0))
plt.suptitle(
    "LDA (Linear Discriminant Analysis) vs. "
    + "\n"
    + "LDA with Ledoit Wolf vs. "
    + "\n"
    + "LDA with OAS (1 discriminative feature)"
)
plt.show()

Total running time of the script: (0 minutes 2.835 seconds)

Related examples

收缩协方差估计:LedoitWolf vs OAS 和最大似然

收缩协方差估计:LedoitWolf vs OAS 和最大似然

Ledoit-Wolf 与 OAS 估计

Ledoit-Wolf 与 OAS 估计

线性判别分析和二次判别分析的协方差椭球体

线性判别分析和二次判别分析的协方差椭球体

LDA和PCA在鸢尾花数据集上的二维投影比较

LDA和PCA在鸢尾花数据集上的二维投影比较

Gallery generated by Sphinx-Gallery