随机保留划分:将数据集拆分为训练集和验证集以进行验证

随机将数据集分割为训练集和验证集。

> `from mlxtend.evaluate import RandomHoldoutSplit`

概述

RandomHoldoutSplit 类是 scikit-learn 的 KFold 类的替代方案,其中 RandomHoldoutSplit 类将数据集拆分为训练集和验证集,而不进行轮换。RandomHoldoutSplit 可以作为 scikit-learn 的 GridSearchCV 等中的 cv 参数的参数。

RandomHoldoutSplit 中的“随机”一词来自于拆分是由 random_seed 指定的,而不是像 mlxtend 中的 PredefinedHoldoutSplit 类那样手动指定训练集和验证集的索引。

示例 1 -- 遍历随机保留分割

from mlxtend.evaluate import RandomHoldoutSplit
from mlxtend.data import iris_data

X, y = iris_data()
h_iter = RandomHoldoutSplit(valid_size=0.3, random_seed=123)

cnt = 0
for train_ind, valid_ind in h_iter.split(X, y):
    cnt += 1
    print(cnt)

1
print(train_ind[:5])
print(valid_ind[:5])

[ 60  16  88 130   6]
[ 72 125  80  86 117]

示例 2 -- GridSearch 中的 RandomHoldoutSplit

from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
from mlxtend.evaluate import RandomHoldoutSplit
from mlxtend.data import iris_data

X, y = iris_data()

params = {'n_neighbors': [1, 2, 3, 4, 5]}

grid = GridSearchCV(KNeighborsClassifier(),
                    param_grid=params,
                    cv=RandomHoldoutSplit(valid_size=0.3, random_seed=123))

grid.fit(X, y)

GridSearchCV(cv=<mlxtend.evaluate.holdout.RandomHoldoutSplit object at 0x7fae707f6610>,
             estimator=KNeighborsClassifier(),
             param_grid={'n_neighbors': [1, 2, 3, 4, 5]})

API

RandomHoldoutSplit(valid_size=0.5, random_seed=None, stratify=False)

Train/Validation set splitter for sklearn's GridSearchCV etc.

Provides train/validation set indices to split a dataset
into train/validation sets using random indices.

Parameters

Examples

For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/evaluate/RandomHoldoutSplit/

Methods


get_n_splits(X=None, y=None, groups=None)

Returns the number of splitting iterations in the cross-validator

Parameters

Returns


split(X, y, groups=None)

Generate indices to split data into training and test set.

Parameters

Yields