autompg_data: Auto-MPG 数据集用于回归分析

一个将autompg数据集加载到NumPy数组中的函数。

> from mlxtend.data import autompg_data

概述

用于回归分析的Auto-MPG数据集。目标（y）定义为392辆汽车的每加仑英里数（mpg）（已删除包含"NaN"的6行）。8个特征列为：

特征

cylinders（气缸数）：多值离散
displacement（排量）：连续
horsepower（马力）：连续
weight（重量）：连续
acceleration（加速度）：连续
model year（车型年份）：多值离散
origin（来源）：多值离散
car name（汽车名称）：字符串（每个实例唯一）
样本数量：392
目标变量（连续）：mpg

参考文献

来源: https://archive.ics.uci.edu/ml/datasets/Auto+MPG
Quinlan,R. (1993). 结合基于实例的学习和基于模型的学习. 载于第十届国际机器学习会议论文集, 236-243, 麻省大学阿默斯特分校. 摩根·高曼。

示例 - 数据集概述

from mlxtend.data import autompg_data
X, y = autompg_data()

print('Dimensions: %s x %s' % (X.shape[0], X.shape[1]))
print('\nHeader: %s' % ['cylinders', 'displacement', 
                        'horsepower', 'weight', 'acceleration',
                        'model year', 'origin', 'car name'])
print('1st row', X[0])

Dimensions: 392 x 8

Header: ['cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model year', 'origin', 'car name']
1st row [  8.00000000e+00   3.07000000e+02   1.30000000e+02   3.50400000e+03
   1.20000000e+01   7.00000000e+01   1.00000000e+00              nan]

注意到特征数组包含一个 str 类型的列（“汽车名称”），因此建议根据需要选择特征并将其转换为 float 数组以进行进一步分析。下面的示例显示了如何去掉 汽车名称 列并将 NumPy 数组转换为 float 数组。

X[:, :-1].astype(float)

array([[   8. ,  307. ,  130. , ...,   12. ,   70. ,    1. ],
       [   8. ,  350. ,  165. , ...,   11.5,   70. ,    1. ],
       [   8. ,  318. ,  150. , ...,   11. ,   70. ,    1. ],
       ..., 
       [   4. ,  135. ,   84. , ...,   11.6,   82. ,    1. ],
       [   4. ,  120. ,   79. , ...,   18.6,   82. ,    1. ],
       [   4. ,  119. ,   82. , ...,   19.4,   82. ,    1. ]])

API

autompg_data()

Auto MPG dataset.

Source : https://archive.ics.uci.edu/ml/datasets/Auto+MPG
Number of samples : 392
Continuous target variable : mpg

Dataset Attributes:
- 1) cylinders: multi-valued discrete
- 2) displacement: continuous
- 3) horsepower: continuous
- 4) weight: continuous
- 5) acceleration: continuous
- 6) model year: multi-valued discrete
- 7) origin: multi-valued discrete
- 8) car name: string (unique for each instance)

Returns

X, y : [n_samples, n_features], [n_targets]

X is the feature matrix with 392 auto samples as rows and 8 feature columns (6 rows with NaNs removed). y is a 1-dimensional array of the target MPG values.

Examples

For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/data/autompg_data/