autompg_data: Auto-MPG 数据集用于回归分析
一个将autompg
数据集加载到NumPy数组中的函数。
> from mlxtend.data import autompg_data
概述
用于回归分析的Auto-MPG数据集。目标(y
)定义为392辆汽车的每加仑英里数(mpg)(已删除包含"NaN"的6行)。8个特征列为:
特征
- cylinders(气缸数):多值离散
- displacement(排量):连续
- horsepower(马力):连续
- weight(重量):连续
- acceleration(加速度):连续
- model year(车型年份):多值离散
- origin(来源):多值离散
-
car name(汽车名称):字符串(每个实例唯一)
-
样本数量:392
-
目标变量(连续):mpg
参考文献
- 来源: https://archive.ics.uci.edu/ml/datasets/Auto+MPG
- Quinlan,R. (1993). 结合基于实例的学习和基于模型的学习. 载于第十届国际机器学习会议论文集, 236-243, 麻省大学阿默斯特分校. 摩根·高曼。
示例 - 数据集概述
from mlxtend.data import autompg_data
X, y = autompg_data()
print('Dimensions: %s x %s' % (X.shape[0], X.shape[1]))
print('\nHeader: %s' % ['cylinders', 'displacement',
'horsepower', 'weight', 'acceleration',
'model year', 'origin', 'car name'])
print('1st row', X[0])
Dimensions: 392 x 8
Header: ['cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model year', 'origin', 'car name']
1st row [ 8.00000000e+00 3.07000000e+02 1.30000000e+02 3.50400000e+03
1.20000000e+01 7.00000000e+01 1.00000000e+00 nan]
注意到特征数组包含一个 str
类型的列(“汽车名称”),因此建议根据需要选择特征并将其转换为 float
数组以进行进一步分析。下面的示例显示了如何去掉 汽车名称
列并将 NumPy 数组转换为 float
数组。
X[:, :-1].astype(float)
array([[ 8. , 307. , 130. , ..., 12. , 70. , 1. ],
[ 8. , 350. , 165. , ..., 11.5, 70. , 1. ],
[ 8. , 318. , 150. , ..., 11. , 70. , 1. ],
...,
[ 4. , 135. , 84. , ..., 11.6, 82. , 1. ],
[ 4. , 120. , 79. , ..., 18.6, 82. , 1. ],
[ 4. , 119. , 82. , ..., 19.4, 82. , 1. ]])
API
autompg_data()
Auto MPG dataset.
-
Source
: https://archive.ics.uci.edu/ml/datasets/Auto+MPG -
Number of samples
: 392 -
Continuous target variable
: mpgDataset Attributes:
- 1) cylinders: multi-valued discrete
- 2) displacement: continuous
- 3) horsepower: continuous
- 4) weight: continuous
- 5) acceleration: continuous
- 6) model year: multi-valued discrete
- 7) origin: multi-valued discrete
- 8) car name: string (unique for each instance)
Returns
-
X, y
: [n_samples, n_features], [n_targets]X is the feature matrix with 392 auto samples as rows and 8 feature columns (6 rows with NaNs removed). y is a 1-dimensional array of the target MPG values.
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/data/autompg_data/