
kNN 示例

完整示例: knn_example.py

  1. 导入模型

    from pyod.models.knn import KNN   # kNN detector
  2. 使用 pyod.utils.data.generate_data() 生成样本数据:

    contamination = 0.1  # percentage of outliers
    n_train = 200  # number of training points
    n_test = 100  # number of testing points
    X_train, X_test, y_train, y_test = generate_data(
        n_train=n_train, n_test=n_test, contamination=contamination)
  3. 初始化一个 pyod.models.knn.KNN 检测器,拟合模型,并进行预测。

    # train kNN detector
    clf_name = 'KNN'
    clf = KNN()
    # get the prediction labels and outlier scores of the training data
    y_train_pred = clf.labels_  # binary labels (0: inliers, 1: outliers)
    y_train_scores = clf.decision_scores_  # raw outlier scores
    # get the prediction on the test data
    y_test_pred = clf.predict(X_test)  # outlier labels (0 or 1)
    y_test_scores = clf.decision_function(X_test)  # outlier scores
    # it is possible to get the prediction confidence as well
    y_test_pred, y_test_pred_confidence = clf.predict(X_test, return_confidence=True)  # outlier labels (0 or 1) and confidence in the range of [0,1]
  4. 使用 ROC 和 Precision @ Rank n 评估预测 pyod.utils.data.evaluate_print()

    from pyod.utils.data import evaluate_print
    # evaluate and print the results
    print("\nOn Training Data:")
    evaluate_print(clf_name, y_train, y_train_scores)
    print("\nOn Test Data:")
    evaluate_print(clf_name, y_test, y_test_scores)
  5. 查看训练和测试数据上的样本输出。

    On Training Data:
    KNN ROC:1.0, precision @ rank n:1.0
    On Test Data:
    KNN ROC:0.9989, precision @ rank n:0.9
  6. 通过所有示例中包含的 visualize 函数生成可视化。

    visualize(clf_name, X_train, y_train, X_test, y_test, y_train_pred,
              y_test_pred, show_figure=True, save_figure=False)
kNN 演示


由于其无监督的特性,异常检测常常受到模型不稳定性的影响。因此,建议通过平均等方式结合多种检测器的输出,以提高其鲁棒性。检测器组合是异常检测集成的一个子领域;更多信息请参考 [BKalayciE18]


  1. 平均值: 所有检测器的平均分数。

  2. 最大化:所有检测器中的最高分数。

  3. 最大值的平均值 (AOM):将基础检测器分成子组,并为每个子组取最大分数。最终分数是所有子组分数的平均值。

  4. 最大平均值 (MOA):将基础检测器分成子组,并对每个子组取平均分。最终得分是所有子组得分的最大值。

“examples/comb_example.py” 展示了组合多个基础检测器输出的API(comb_example.py, Jupyter Notebooks)。对于Jupyter Notebooks,请导航至 “/notebooks/Model Combination.ipynb”

  1. 导入模型并生成示例数据。

    from pyod.models.knn import KNN  # kNN detector
    from pyod.models.combination import aom, moa, average, maximization
    from pyod.utils.data import generate_data
    X, y= generate_data(train_only=True)  # load data
  2. 初始化20个不同的k(从10到200)的kNN异常检测器,并获取异常分数。

    # initialize 20 base detectors for combination
    k_list = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140,
                150, 160, 170, 180, 190, 200]
    n_clf = len(k_list) # Number of classifiers being trained
    train_scores = np.zeros([X_train.shape[0], n_clf])
    test_scores = np.zeros([X_test.shape[0], n_clf])
    for i in range(n_clf):
        k = k_list[i]
        clf = KNN(n_neighbors=k, method='largest')
        train_scores[:, i] = clf.decision_scores_
        test_scores[:, i] = clf.decision_function(X_test_norm)
  3. 然后,输出分数在组合之前被标准化为零平均值和单位标准差。这一步对于将检测器输出调整为相同尺度至关重要。

    from pyod.utils.utility import standardizer
    # scores have to be normalized before combination
    train_scores_norm, test_scores_norm = standardizer(train_scores, test_scores)
  4. 如上所述,应用了四种不同的组合算法:

    comb_by_average = average(test_scores_norm)
    comb_by_maximization = maximization(test_scores_norm)
    comb_by_aom = aom(test_scores_norm, 5) # 5 groups
    comb_by_moa = moa(test_scores_norm, 5) # 5 groups
  5. 最后,所有四种组合方法都通过ROC和Precision @ Rank n进行评估:

    Combining 20 kNN detectors
    Combination by Average ROC:0.9194, precision @ rank n:0.4531
    Combination by Maximization ROC:0.9198, precision @ rank n:0.4688
    Combination by AOM ROC:0.9257, precision @ rank n:0.4844
    Combination by MOA ROC:0.9263, precision @ rank n:0.4688



  1. 导入模型

    from pyod.models.knn import KNN   # kNN detector
    from pyod.models.thresholds import FILTER  # Filter thresholder
  2. 使用 pyod.utils.data.generate_data() 生成样本数据:

    contamination = 0.1  # percentage of outliers
    n_train = 200  # number of training points
    n_test = 100  # number of testing points
    X_train, X_test, y_train, y_test = generate_data(
        n_train=n_train, n_test=n_test, contamination=contamination)
  3. 初始化一个 pyod.models.knn.KNN 检测器,拟合模型,并进行预测。

    # train kNN detector and apply FILTER thresholding
    clf_name = 'KNN'
    clf = KNN(contamination=FILTER())
    # get the prediction labels and outlier scores of the training data
    y_train_pred = clf.labels_  # binary labels (0: inliers, 1: outliers)
    y_train_scores = clf.decision_scores_  # raw outlier scores


