mnist_data:用于分类的MNIST数据集的子集

一个加载 MNIST 数据集到 NumPy 数组的函数。

> 从 mlxtend.data 导入 mnist_data

概述

MNIST 数据集是由美国国家标准与技术研究院 (NIST) 的两个数据集构建而成的。训练集包含来自250个不同人的手写数字,其中50%为高中生,50%为人口普查局的员工。请注意,测试集包含来自不同人的手写数字,遵循相同的拆分。

特征

每个特征向量(特征矩阵中的一行)由784个像素(强度)组成——从原始的28x28像素图像展开而来。

参考文献

示例 1 - 数据集概述

from mlxtend.data import mnist_data
X, y = mnist_data()

print('Dimensions: %s x %s' % (X.shape[0], X.shape[1]))
print('1st row', X[0])

Dimensions: 5000 x 784
1st row [   0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.   51.  159.  253.  159.   50.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.   48.  238.
  252.  252.  252.  237.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.   54.  227.  253.  252.  239.  233.  252.   57.    6.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.   10.   60.  224.  252.  253.  252.  202.   84.  252.
  253.  122.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.  163.  252.  252.  252.  253.
  252.  252.   96.  189.  253.  167.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.   51.  238.
  253.  253.  190.  114.  253.  228.   47.   79.  255.  168.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.   48.  238.  252.  252.  179.   12.   75.  121.   21.    0.    0.
  253.  243.   50.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.   38.  165.  253.  233.  208.   84.    0.    0.
    0.    0.    0.    0.  253.  252.  165.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    7.  178.  252.  240.   71.
   19.   28.    0.    0.    0.    0.    0.    0.  253.  252.  195.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.   57.
  252.  252.   63.    0.    0.    0.    0.    0.    0.    0.    0.    0.
  253.  252.  195.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.  198.  253.  190.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.  255.  253.  196.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.   76.  246.  252.  112.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.  253.  252.  148.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.   85.  252.
  230.   25.    0.    0.    0.    0.    0.    0.    0.    0.    7.  135.
  253.  186.   12.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.   85.  252.  223.    0.    0.    0.    0.    0.    0.    0.
    0.    7.  131.  252.  225.   71.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.   85.  252.  145.    0.    0.    0.
    0.    0.    0.    0.   48.  165.  252.  173.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.   86.  253.
  225.    0.    0.    0.    0.    0.    0.  114.  238.  253.  162.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.   85.  252.  249.  146.   48.   29.   85.  178.  225.  253.
  223.  167.   56.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.   85.  252.  252.  252.  229.  215.
  252.  252.  252.  196.  130.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.   28.  199.
  252.  252.  253.  252.  252.  233.  145.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.   25.  128.  252.  253.  252.  141.   37.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.]
import numpy as np
print('Classes: Setosa, Versicolor, Virginica')
print(np.unique(y))
print('Class distribution: %s' % np.bincount(y))

Classes: Setosa, Versicolor, Virginica
[0 1 2 3 4 5 6 7 8 9]
Class distribution: [500 500 500 500 500 500 500 500 500 500]

示例 2 - 可视化 MNIST

%matplotlib inline
import matplotlib.pyplot as plt
def plot_digit(X, y, idx):
    img = X[idx].reshape(28,28)
    plt.imshow(img, cmap='Greys',  interpolation='nearest')
    plt.title('true label: %d' % y[idx])
    plt.show()
plot_digit(X, y, 4)       

png

API

mnist_data()

5000 samples from the MNIST handwritten digits dataset.

Returns

Examples

For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/data/mnist_data/