Note
Go to the end to download the full example code. or to run this example in your browser via Binder
随机搜索与网格搜索在超参数估计中的比较#
比较随机搜索和网格搜索在优化线性SVM超参数时的表现,使用SGD进行训练。 所有影响学习的参数同时进行搜索(除了估计器的数量,因为这涉及时间/质量的权衡)。
随机搜索和网格搜索探索完全相同的参数空间。参数设置的结果非常相似,但随机搜索的运行时间大大缩短。
随机搜索的性能可能略差,这可能是由于噪声效应,并且不会影响到保留的测试集。
请注意,在实际操作中,使用网格搜索时不会同时搜索这么多不同的参数,而是只选择那些被认为最重要的参数。
RandomizedSearchCV took 2.71 seconds for 15 candidates parameter settings.
Model with rank: 1
Mean validation score: 0.983 (std: 0.007)
Parameters: {'alpha': np.float64(0.014515898657996523), 'average': False, 'l1_ratio': np.float64(0.12831708887652515)}
Model with rank: 2
Mean validation score: 0.981 (std: 0.018)
Parameters: {'alpha': np.float64(0.0825944504273073), 'average': False, 'l1_ratio': np.float64(0.2869934312245894)}
Model with rank: 3
Mean validation score: 0.981 (std: 0.015)
Parameters: {'alpha': np.float64(0.10794028792348503), 'average': False, 'l1_ratio': np.float64(0.29170373035507635)}
GridSearchCV took 8.70 seconds for 60 candidate parameter settings.
Model with rank: 1
Mean validation score: 0.996 (std: 0.007)
Parameters: {'alpha': np.float64(0.1), 'average': False, 'l1_ratio': np.float64(0.1111111111111111)}
Model with rank: 2
Mean validation score: 0.993 (std: 0.007)
Parameters: {'alpha': np.float64(0.01), 'average': False, 'l1_ratio': np.float64(0.3333333333333333)}
Model with rank: 3
Mean validation score: 0.993 (std: 0.009)
Parameters: {'alpha': np.float64(0.01), 'average': False, 'l1_ratio': np.float64(0.7777777777777777)}
from time import time
import numpy as np
import scipy.stats as stats
from sklearn.datasets import load_digits
from sklearn.linear_model import SGDClassifier
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
# 获取一些数据
#
#
X, y = load_digits(return_X_y=True, n_class=3)
# 构建一个分类器
clf = SGDClassifier(loss="hinge", penalty="elasticnet", fit_intercept=True)
# 用于报告最佳分数的实用函数
def report(results, n_top=3):
for i in range(1, n_top + 1):
candidates = np.flatnonzero(results["rank_test_score"] == i)
for candidate in candidates:
print("Model with rank: {0}".format(i))
print(
"Mean validation score: {0:.3f} (std: {1:.3f})".format(
results["mean_test_score"][candidate],
results["std_test_score"][candidate],
)
)
print("Parameters: {0}".format(results["params"][candidate]))
print("")
# 指定参数和分布以进行采样
param_dist = {
"average": [True, False],
"l1_ratio": stats.uniform(0, 1),
"alpha": stats.loguniform(1e-2, 1e0),
}
# 运行随机搜索
n_iter_search = 15
random_search = RandomizedSearchCV(
clf, param_distributions=param_dist, n_iter=n_iter_search
)
start = time()
random_search.fit(X, y)
print(
"RandomizedSearchCV took %.2f seconds for %d candidates parameter settings."
% ((time() - start), n_iter_search)
)
report(random_search.cv_results_)
# 使用所有参数的完整网格
param_grid = {
"average": [True, False],
"l1_ratio": np.linspace(0, 1, num=10),
"alpha": np.power(10, np.arange(-2, 1, dtype=float)),
}
# 运行网格搜索
grid_search = GridSearchCV(clf, param_grid=param_grid)
start = time()
grid_search.fit(X, y)
print(
"GridSearchCV took %.2f seconds for %d candidate parameter settings."
% (time() - start, len(grid_search.cv_results_["params"]))
)
report(grid_search.cv_results_)
Total running time of the script: (0 minutes 11.416 seconds)
Related examples
网格搜索与交叉验证的自定义重拟合策略
sphx_glr_auto_examples_model_selection_plot_successive_halving_iterations.py
连续减半迭代
网格搜索与逐步减半的比较
高斯混合模型选择