iris_data:用于分类的三类鸢尾花数据集
一个将 iris
数据集加载到 NumPy 数组中的函数。
# 鸢尾花数据集
本节将介绍如何使用`mlxtend`库中的鸢尾花数据集。
概述
用于分类的鸢尾花数据集。
特征
- 花萼长度
- 花萼宽度
- 花瓣长度
-
花瓣宽度
-
样本数量:150
-
目标变量(离散):{50个山鸢尾, 50个变色鸢尾, 50个维吉尼亚鸢尾}
参考文献
- 来源: https://archive.ics.uci.edu/ml/datasets/Iris
- Bache, K. & Lichman, M. (2013). UCI机器学习库. 美国加利福尼亚州尔湾: 加利福尼亚大学信息与计算机科学学院.
示例 1 - 数据集概述
from mlxtend.data import iris_data
X, y = iris_data()
print('Dimensions: %s x %s' % (X.shape[0], X.shape[1]))
print('\nHeader: %s' % ['sepal length', 'sepal width',
'petal length', 'petal width'])
print('1st row', X[0])
Dimensions: 150 x 4
Header: ['sepal length', 'sepal width', 'petal length', 'petal width']
1st row [5.1 3.5 1.4 0.2]
import numpy as np
print('Classes: Setosa, Versicolor, Virginica')
print(np.unique(y))
print('Class distribution: %s' % np.bincount(y))
Classes: Setosa, Versicolor, Virginica
[0 1 2]
Class distribution: [50 50 50]
API
iris_data(version='uci')
Iris flower dataset.
-
Source
: https://archive.ics.uci.edu/ml/datasets/Iris -
Number of samples
: 150 -
Class labels
: {0, 1, 2}, distribution: [50, 50, 50]0 = setosa, 1 = versicolor, 2 = virginica.
Dataset Attributes:
- 1) sepal length [cm]
- 2) sepal width [cm]
- 3) petal length [cm]
- 4) petal width [cm]
Parameters
-
version
: string, optional (default: 'uci').Version to use {'uci', 'corrected'}. 'uci' loads the dataset as deposited on the UCI machine learning repository, and 'corrected' provides the version that is consistent with Fisher's original paper. See Note for details.
Returns
-
X, y
: [n_samples, n_features], [n_class_labels]X is the feature matrix with 150 flower samples as rows, and 4 feature columns sepal length, sepal width, petal length, and petal width. y is a 1-dimensional array of the class labels {0, 1, 2}
Note
The Iris dataset (originally collected by Edgar Anderson) and
available in UCI's machine learning repository is different from
the Iris dataset described in the original paper by R.A. Fisher [1]).
Precisely, there are two data points (row number
34 and 37) in UCI's Machine Learning repository are different from the
origianlly published Iris dataset. Also, the original version of the Iris
Dataset, which can be loaded via version='corrected'
is the same
as the one in R.
[1] . A. Fisher (1936). "The use of multiple measurements in taxonomic
problems". Annals of Eugenics. 7 (2): 179–188
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/data/iris_data/