.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/release_highlights/plot_release_highlights_1_3_0.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_release_highlights_plot_release_highlights_1_3_0.py: ======================================= scikit-learn 1.3 版本发布亮点 ======================================= .. currentmodule:: sklearn 我们很高兴地宣布发布 scikit-learn 1.3!此次更新包含了许多错误修复和改进,以及一些新的关键功能。以下是本次发布的一些主要功能。 **有关所有更改的详尽列表** ,请参阅 :ref:`发布说明 ` 。 要安装最新版本(使用 pip):: pip install --upgrade scikit-learn 或使用 conda:: conda install -c conda-forge scikit-learn .. GENERATED FROM PYTHON SOURCE LINES 22-27 元数据路由 ---------------- 我们正在引入一种新的方式来路由诸如 ``sample_weight`` 之类的元数据,这将影响到诸如:class:`pipeline.Pipeline` 和:class:`model_selection.GridSearchCV` 等元估计器的元数据路由方式。虽然此功能的基础设施已经包含在本次发布中,但工作仍在进行中,并非所有元估计器都支持此新功能。您可以在:ref:`元数据路由用户指南` 中阅读更多关于此功能的信息。请注意,此功能仍在开发中,并未在大多数元估计器中实现。 第三方开发者已经可以开始将其整合到他们的元估计器中。更多详情,请参见::ref:`元数据路由开发者指南` 。 .. GENERATED FROM PYTHON SOURCE LINES 29-34 HDBSCAN:基于密度的层次聚类 ---------------------------------------------- 最初托管在 scikit-learn-contrib 仓库中,:class:`cluster.HDBSCAN` 已被纳入 scikit-learn。它缺少原始实现中的一些功能,这些功能将在未来的版本中添加。 通过同时对多个 epsilon 值执行修改版的 :class:`cluster.DBSCAN` ,:class:`cluster.HDBSCAN` 可以找到不同密度的聚类,使其比 :class:`cluster.DBSCAN` 更加稳健,参数选择更加灵活。 更多详情请参见 :ref:`用户指南 ` 。 .. GENERATED FROM PYTHON SOURCE LINES 34-49 .. code-block:: Python import numpy as np from sklearn.cluster import HDBSCAN from sklearn.datasets import load_digits from sklearn.metrics import v_measure_score X, true_labels = load_digits(return_X_y=True) print(f"number of digits: {len(np.unique(true_labels))}") hdbscan = HDBSCAN(min_cluster_size=15).fit(X) non_noisy_labels = hdbscan.labels_[hdbscan.labels_ != -1] print(f"number of clusters found: {len(np.unique(non_noisy_labels))}") print(v_measure_score(true_labels[hdbscan.labels_ != -1], non_noisy_labels)) .. rst-class:: sphx-glr-script-out .. code-block:: none number of digits: 10 number of clusters found: 10 0.9751818034688537 .. GENERATED FROM PYTHON SOURCE LINES 50-54 目标编码器:一种新的类别编码策略 ----------------------------------------------- 非常适合具有高基数的类别特征,:class:`preprocessing.TargetEncoder` 基于属于该类别的观测值的平均目标值的缩减估计对类别进行编码。 更多详情请参见 :ref:`用户指南 ` 。 .. GENERATED FROM PYTHON SOURCE LINES 54-66 .. code-block:: Python import numpy as np from sklearn.preprocessing import TargetEncoder X = np.array([["cat"] * 30 + ["dog"] * 20 + ["snake"] * 38], dtype=object).T y = [90.3] * 30 + [20.4] * 20 + [21.2] * 38 enc = TargetEncoder(random_state=0) X_trans = enc.fit_transform(X, y) enc.encodings_ .. rst-class:: sphx-glr-script-out .. code-block:: none [array([90.3, 20.4, 21.2])] .. GENERATED FROM PYTHON SOURCE LINES 67-71 决策树中的缺失值支持 ---------------------------------------- 类 :class:`tree.DecisionTreeClassifier` 和 :class:`tree.DecisionTreeRegressor` 现在支持缺失值。对于非缺失数据的每个潜在阈值,分割器将评估将所有缺失值分配到左节点或右节点的分割情况。 请参阅 :ref:`用户指南 ` 了解更多详细信息,或参阅 :ref:`sphx_glr_auto_examples_ensemble_plot_hgbt_regression.py` 了解 :class:`~ensemble.HistGradientBoostingRegressor` 中此功能的用例示例。 .. GENERATED FROM PYTHON SOURCE LINES 71-81 .. code-block:: Python import numpy as np from sklearn.tree import DecisionTreeClassifier X = np.array([0, 1, 6, np.nan]).reshape(-1, 1) y = [0, 0, 1, 1] tree = DecisionTreeClassifier(random_state=0).fit(X, y) tree.predict(X) .. rst-class:: sphx-glr-script-out .. code-block:: none array([0, 0, 1, 1]) .. GENERATED FROM PYTHON SOURCE LINES 82-85 新显示 :class:`~model_selection.ValidationCurveDisplay` ------------------------------------------------------------ :class:`model_selection.ValidationCurveDisplay` 现在可以用于绘制 :func:`model_selection.validation_curve` 的结果。 .. GENERATED FROM PYTHON SOURCE LINES 85-102 .. code-block:: Python from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression from sklearn.model_selection import ValidationCurveDisplay X, y = make_classification(1000, 10, random_state=0) _ = ValidationCurveDisplay.from_estimator( LogisticRegression(), X, y, param_name="C", param_range=np.geomspace(1e-5, 1e3, num=9), score_type="both", score_name="Accuracy", ) .. image-sg:: /auto_examples/release_highlights/images/sphx_glr_plot_release_highlights_1_3_0_001.png :alt: plot release highlights 1 3 0 :srcset: /auto_examples/release_highlights/images/sphx_glr_plot_release_highlights_1_3_0_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 103-106 梯度提升的伽马损失 ------------------ 类 :class:`ensemble.HistGradientBoostingRegressor` 支持通过 `loss="gamma"` 使用伽马偏差损失函数。该损失函数对于建模具有右偏分布的严格正目标非常有用。 .. GENERATED FROM PYTHON SOURCE LINES 106-119 .. code-block:: Python import numpy as np from sklearn.model_selection import cross_val_score from sklearn.datasets import make_low_rank_matrix from sklearn.ensemble import HistGradientBoostingRegressor n_samples, n_features = 500, 10 rng = np.random.RandomState(0) X = make_low_rank_matrix(n_samples, n_features, random_state=rng) coef = rng.uniform(low=-10, high=20, size=n_features) y = rng.gamma(shape=2, scale=np.exp(X @ coef) / 2) gbdt = HistGradientBoostingRegressor(loss="gamma") cross_val_score(gbdt, X, y).mean() .. rst-class:: sphx-glr-script-out .. code-block:: none np.float64(0.4685851328722167) .. GENERATED FROM PYTHON SOURCE LINES 120-124 将不常见类别分组到 :class:`~preprocessing.OrdinalEncoder` -------------------------------------------------------------- 与 :class:`preprocessing.OneHotEncoder` 类似,:class:`preprocessing.OrdinalEncoder` 类现在支持将不常见类别聚合到每个特征的单一输出中。启用不常见类别聚合的参数是 `min_frequency` 和 `max_categories` 。 有关更多详细信息,请参阅 :ref:`用户指南 ` 。 .. GENERATED FROM PYTHON SOURCE LINES 124-133 .. code-block:: Python from sklearn.preprocessing import OrdinalEncoder import numpy as np X = np.array( [["dog"] * 5 + ["cat"] * 20 + ["rabbit"] * 10 + ["snake"] * 3], dtype=object ).T enc = OrdinalEncoder(min_frequency=6).fit(X) enc.infrequent_categories_ .. rst-class:: sphx-glr-script-out .. code-block:: none [array(['dog', 'snake'], dtype=object)] .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 2.112 seconds) .. _sphx_glr_download_auto_examples_release_highlights_plot_release_highlights_1_3_0.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/main?urlpath=lab/tree/notebooks/auto_examples/release_highlights/plot_release_highlights_1_3_0.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_release_highlights_1_3_0.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_release_highlights_1_3_0.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_release_highlights_1_3_0.zip ` .. include:: plot_release_highlights_1_3_0.recommendations .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_