boston_housing_data: 波士顿房价数据集用于回归分析
一个将boston_housing_data
数据集加载到NumPy数组中的函数。
从 mlxtend.data 导入波士顿房价数据
概述
波士顿房价数据集用于回归分析。
特征
- CRIM:每个城镇的人均犯罪率
- ZN:规划为超过25,000平方英尺住宅用地的比例
- INDUS:每个城镇非零售商业用地的比例
- CHAS:查尔斯河虚拟变量(= 1如果地块边界靠近河流;否则为0)
- NOX:氮氧化物浓度(每千万部分的浓度)
- RM:每个居住单元的平均房间数量
- AGE:1940年前建造的自有住房单位的比例
- DIS:到波士顿五个就业中心的加权距离
- RAD:辐射高速公路的可达性指数
- TAX:每10,000美元的全额物业税率
- PTRATIO:每个城镇的师生比例
- B:1000(Bk - 0.63)^2,其中Bk是每个城镇的人口比例
-
LSTAT:人口中低状态的百分比
-
样本数量:506
-
目标变量(连续型):MEDV,自有住房的中位数价值(单位为$1000)
参考文献
- 来源: https://archive.ics.uci.edu/ml/datasets/Wine
- Harrison, D. 和 Rubinfeld, D.L. 《享乐价格与对清洁空气的需求》,环境经济学与管理杂志,第5卷,81-102,1978。
示例 1 - 数据集概述
from mlxtend.data import boston_housing_data
X, y = boston_housing_data()
print('Dimensions: %s x %s' % (X.shape[0], X.shape[1]))
print('1st row', X[0])
(506, 14)
Dimensions: 506 x 13
1st row [ 6.32000000e-03 1.80000000e+01 2.31000000e+00 0.00000000e+00
5.38000000e-01 6.57500000e+00 6.52000000e+01 4.09000000e+00
1.00000000e+00 2.96000000e+02 1.53000000e+01 3.96900000e+02
4.98000000e+00]
API
boston_housing_data()
Boston Housing dataset.
-
Source
: https://archive.ics.uci.edu/ml/datasets/Housing -
Number of samples
: 506 -
Continuous target variable
: MEDVMEDV = Median value of owner-occupied homes in $1000's
Dataset Attributes:
- 1) CRIM per capita crime rate by town
- 2) ZN proportion of residential land zoned for lots over 25,000 sq.ft.
- 3) INDUS proportion of non-retail business acres per town
- 4) CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
- 5) NOX nitric oxides concentration (parts per 10 million)
- 6) RM average number of rooms per dwelling
- 7) AGE proportion of owner-occupied units built prior to 1940
- 8) DIS weighted distances to five Boston employment centres
- 9) RAD index of accessibility to radial highways
- 10) TAX full-value property-tax rate per $10,000
- 11) PTRATIO pupil-teacher ratio by town
- 12) B 1000(Bk - 0.63)^2 where Bk is the prop. of b. by town
- 13) LSTAT % lower status of the population
Returns
-
X, y
: [n_samples, n_features], [n_class_labels]X is the feature matrix with 506 housing samples as rows and 13 feature columns. y is a 1-dimensional array of the continuous target variable MEDV
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/data/boston_housing_data/