ValidationCurveDisplay#

class sklearn.model_selection.ValidationCurveDisplay(*, param_name, param_range, train_scores, test_scores, score_name=None)#

验证曲线可视化。

建议使用 from_estimator 来创建一个 ValidationCurveDisplay 实例。所有参数都存储为属性。

更多信息请参阅用户指南以了解有关可视化 API 的一般信息和详细文档关于验证曲线可视化。

Added in version 1.3.

Parameters:

param_namestr: 被变化的参数名称。
param_rangearray-like of shape (n_ticks,): 被评估的参数值。
train_scoresndarray of shape (n_ticks, n_cv_folds): 训练集上的得分。
test_scoresndarray of shape (n_ticks, n_cv_folds): 测试集上的得分。
score_namestr, default=None: 在 validation_curve 中使用的得分名称。它会覆盖从 scoring 参数推断出的名称。如果 score 是 None ，如果 negate_score 是 False ，我们使用 "Score" ，否则使用 "Negative score" 。如果 scoring 是一个字符串或可调用对象，我们推断名称。我们将 _ 替换为空格，并将首字母大写。我们移除 neg_ 并在 negate_score 为 False 时替换为 "Negative" ，否则仅移除它。

Attributes:

ax_matplotlib Axes: 带有验证曲线的轴。
figure_matplotlib Figure: 包含验证曲线的图形。
errorbar_list of matplotlib Artist or None: 当 std_display_style 是 "errorbar" 时，这是一个 matplotlib.container.ErrorbarContainer 对象的列表。如果使用其他样式， errorbar_ 是 None 。
lines_list of matplotlib Artist or None: 当 std_display_style 是 "fill_between" 时，这是一个对应于平均训练和测试得分的 matplotlib.lines.Line2D 对象的列表。如果使用其他样式， line_ 是 None 。
fill_between_list of matplotlib Artist or None: 当 std_display_style 是 "fill_between" 时，这是一个 matplotlib.collections.PolyCollection 对象的列表。如果使用其他样式， fill_between_ 是 None 。

See also

sklearn.model_selection.validation_curve: 计算验证曲线。

Examples

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import ValidationCurveDisplay, validation_curve
>>> from sklearn.linear_model import LogisticRegression
>>> X, y = make_classification(n_samples=1_000, random_state=0)
>>> logistic_regression = LogisticRegression()
>>> param_name, param_range = "C", np.logspace(-8, 3, 10)
>>> train_scores, test_scores = validation_curve(
...     logistic_regression, X, y, param_name=param_name, param_range=param_range
... )
>>> display = ValidationCurveDisplay(
...     param_name=param_name, param_range=param_range,
...     train_scores=train_scores, test_scores=test_scores, score_name="Score"
... )
>>> display.plot()
<...>
>>> plt.show()

../../_images/sklearn-model_selection-ValidationCurveDisplay-1.png

classmethod from_estimator(estimator, X, y, *, param_name, param_range, groups=None, cv=None, scoring=None, n_jobs=None, pre_dispatch='all', verbose=0, error_score=nan, fit_params=None, ax=None, negate_score=False, score_name=None, score_type='both', std_display_style='fill_between', line_kw=None, fill_between_kw=None, errorbar_kw=None)#

创建一个从估计器生成的验证曲线显示。

在用户指南中阅读更多关于可视化 API 的一般信息，以及关于验证曲线可视化的详细文档。

Parameters:

estimator实现 “fit” 和 “predict” 方法的对象类型

每个验证都会克隆该类型的对象。

X形状为 (n_samples, n_features) 的类数组

训练数据，其中 n_samples 是样本数量， n_features 是特征数量。

y形状为 (n_samples,) 或 (n_samples, n_outputs) 的类数组或 None

分类或回归的目标变量；无监督学习时为 None。

param_namestr

将要变化的参数名称。

param_range形状为 (n_values,) 的类数组

将要评估的参数值。

groups形状为 (n_samples,) 的类数组，默认=None

在将数据集分割为训练/测试集时使用的样本组标签。仅在与 “Group” cv 实例（例如 GroupKFold ）结合使用时使用。

cvint, 交叉验证生成器或可迭代对象，默认=None

确定交叉验证分割策略。 cv 的可能输入包括：

None，使用默认的 5 折交叉验证，
int，指定 (Stratified)KFold 中的折数，
CV splitter ,
一个可迭代对象，生成 (train, test) 索引数组。

对于 int/None 输入，如果估计器是分类器且 y 是二分类或多分类，使用 StratifiedKFold 。在所有其他情况下，使用 KFold 。这些分割器以 shuffle=False 实例化，因此分割在多次调用中将保持一致。

参考用户指南了解可以在此使用的各种交叉验证策略。

scoringstr 或 callable，默认=None

一个字符串（参见 scoring_parameter ）或一个带有签名 scorer(estimator, X, y) 的评分器可调用对象/函数（参见从指标函数定义您的评分策略）。

n_jobsint，默认=None

并行运行的作业数量。估计器的训练和分数计算在不同的训练和测试集上并行化。 None 意味着 1，除非在 joblib.parallel_backend 上下文中。 -1 意味着使用所有处理器。参见 Glossary 了解更多详情。

pre_dispatchint 或 str，默认=’all’

并行执行的预分派作业数量（默认是所有）。该选项可以减少分配的内存。str 可以是像 ‘2*n_jobs’ 这样的表达式。

verboseint，默认=0

控制详细程度：越高，消息越多。

error_score‘raise’ 或 numeric，默认=np.nan

在估计器拟合过程中发生错误时分配给分数的值。如果设置为 ‘raise’，则引发错误。如果给定数值，则引发 FitFailedWarning。

fit_paramsdict，默认=None

传递给估计器 fit 方法的参数。

axmatplotlib Axes，默认=None

要绘制的 Axes 对象。如果为 None ，则创建一个新的图形和 Axes。

negate_scorebool，默认=False

是否通过 validation_curve 获得的分数取反。这在使用 scikit-learn 中的 neg_* 表示的错误时特别有用。

score_namestr，默认=None

用于装饰绘图 y 轴的分数名称。它将覆盖从 scoring 参数推断的名称。如果 score 为 None ，如果 negate_score 为 False ，我们使用 "Score" ，否则使用 "Negative score" 。如果 scoring 是字符串或可调用对象，我们推断名称。我们将 _ 替换为空格，并将首字母大写。我们删除 neg_ 并将其替换为 "Negative" （如果 negate_score 为 False ）或仅删除它。

score_type{“test”, “train”, “both”}，默认=”both”

要绘制的分数类型。可以是 "test" 、 "train" 或 "both" 之一。

std_display_style{“errorbar”, “fill_between”} 或 None，默认=”fill_between”

用于显示分数标准差围绕平均分数的样式。如果为 None ，则不显示标准差的表示。

line_kwdict，默认=None

传递给用于绘制平均分数的 plt.plot 的额外关键字参数。

fill_between_kwdict，默认=None

传递给用于绘制分数标准差的 plt.fill_between 的额外关键字参数。

errorbar_kwdict，默认=None

传递给用于绘制平均分数和标准差分数的 plt.errorbar 的额外关键字参数。

Returns:

displayValidationCurveDisplay: 存储计算值的对象。

Examples

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import ValidationCurveDisplay
>>> from sklearn.linear_model import LogisticRegression
>>> X, y = make_classification(n_samples=1_000, random_state=0)
>>> logistic_regression = LogisticRegression()
>>> param_name, param_range = "C", np.logspace(-8, 3, 10)
>>> ValidationCurveDisplay.from_estimator(
...     logistic_regression, X, y, param_name=param_name,
...     param_range=param_range,
... )
<...>
>>> plt.show()

../../_images/sklearn-model_selection-ValidationCurveDisplay-2.png

plot(ax=None, *, negate_score=False, score_name=None, score_type='both', std_display_style='fill_between', line_kw=None, fill_between_kw=None, errorbar_kw=None)#

绘图可视化。

Parameters:

axmatplotlib Axes, 默认=None: 要在其上绘制的Axes对象。如果为 None ，则创建一个新的图形和轴。
negate_scorebool, 默认=False: 是否通过:func:~sklearn.model_selection.validation_curve 获得的分数取反。这在使用 scikit-learn 中的 neg_* 表示的错误时特别有用。
score_namestr, 默认=None: 用于装饰绘图y轴的分数名称。它将覆盖从 scoring 参数推断出的名称。如果 score 为 None ，如果 negate_score 为 False ，我们使用 "Score" ，否则使用 "Negative score" 。如果 scoring 是一个字符串或可调用对象，我们推断名称。我们将 _ 替换为空格，并将首字母大写。如果 negate_score 为 False ，我们将 neg_ 替换为 "Negative" ，否则仅删除它。
score_type{“test”, “train”, “both”}, 默认=”both”: 要绘制的分数类型。可以是 "test" 、 "train" 或 "both" 之一。
std_display_style{“errorbar”, “fill_between”} 或 None, 默认=”fill_between”: 用于显示平均分数标准偏差的风格。如果为None，则不显示标准偏差。
line_kwdict, 默认=None: 传递给用于绘制平均分数的 plt.plot 的额外关键字参数。
fill_between_kwdict, 默认=None: 传递给用于绘制分数标准偏差的 plt.fill_between 的额外关键字参数。
errorbar_kwdict, 默认=None: 传递给用于绘制平均分数和标准偏差分数的 plt.errorbar 的额外关键字参数。

Returns:

displayValidationCurveDisplay: 存储计算值的对象。

Gallery examples#

scikit-learn 1.3 版本发布亮点

绘制验证曲线