.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/multiclass/plot_multiclass_overview.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end <sphx_glr_download_auto_examples_multiclass_plot_multiclass_overview.py>` to download the full example code. or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_multiclass_plot_multiclass_overview.py: =============================================== 多类è®ç»ƒå…ƒä¼°è®¡å™¨æ¦‚è¿° =============================================== 在这个例åä¸ï¼Œæˆ‘ä»¬è®¨è®ºäº†å½“ç›®æ ‡å˜é‡ç”±ä¸¤ä¸ªä»¥ä¸Šçš„ç±»åˆ«ç»„æˆæ—¶çš„分类问题。这被称为多类分类。 在scikit-learnä¸ï¼Œæ‰€æœ‰ä¼°è®¡å™¨éƒ½å¼€ç®±å³ç”¨åœ°æ”¯æŒå¤šç±»åˆ†ç±»ï¼šä¸ºç»ˆç«¯ç”¨æˆ·å®žçŽ°äº†æœ€åˆç†çš„ç–略。:mod:`sklearn.multiclass` 模å—实现了å„ç§ç–略,å¯ä»¥ç”¨äºŽå®žéªŒæˆ–å¼€å‘仅支æŒäºŒå…ƒåˆ†ç±»çš„第三方估计器。 :mod:`sklearn.multiclass` 包括用于通过拟åˆä¸€ç»„二元分类器æ¥è®ç»ƒå¤šç±»åˆ†ç±»å™¨çš„OvO/OvRç–略(:class:`~sklearn.multiclass.OneVsOneClassifier` å’Œ:class:`~sklearn.multiclass.OneVsRestClassifier` 元估计器)。这个例å将回顾它们。 .. GENERATED FROM PYTHON SOURCE LINES 14-18 é…µæ¯ UCI æ•°æ®é›† --------------------- 在这个例åä¸ï¼Œæˆ‘们使用了一个UCIæ•°æ®é›† [1]_ï¼Œé€šå¸¸ç§°ä¸ºé…µæ¯æ•°æ®é›†ã€‚我们使用 :func:`sklearn.datasets.fetch_openml` 函数从OpenMLåŠ è½½è¯¥æ•°æ®é›†ã€‚ .. GENERATED FROM PYTHON SOURCE LINES 18-22 .. code-block:: Python from sklearn.datasets import fetch_openml X, y = fetch_openml(data_id=181, as_frame=True, return_X_y=True) .. GENERATED FROM PYTHON SOURCE LINES 23-24 为了了解我们æ£åœ¨å¤„ç†çš„æ•°æ®ç§‘å¦é—®é¢˜çš„类型,我们å¯ä»¥æ£€æŸ¥æˆ‘ä»¬æƒ³è¦æž„å»ºé¢„æµ‹æ¨¡åž‹çš„ç›®æ ‡ã€‚ .. GENERATED FROM PYTHON SOURCE LINES 24-27 .. code-block:: Python y.value_counts().sort_index() .. rst-class:: sphx-glr-script-out .. code-block:: none class_protein_localization CYT 463 ERL 5 EXC 35 ME1 44 ME2 51 ME3 163 MIT 244 NUC 429 POX 20 VAC 30 Name: count, dtype: int64 .. GENERATED FROM PYTHON SOURCE LINES 28-41 æˆ‘ä»¬çœ‹åˆ°ç›®æ ‡æ˜¯ç¦»æ•£çš„ï¼Œç”±10个类别组æˆã€‚å› æ¤ï¼Œæˆ‘们处ç†çš„æ˜¯ä¸€ä¸ªå¤šç±»åˆ†ç±»é—®é¢˜ã€‚ ç–略比较 --------------------- 在以下实验ä¸ï¼Œæˆ‘们使用 :class:`~sklearn.tree.DecisionTreeClassifier` å’Œ :class:`~sklearn.model_selection.RepeatedStratifiedKFold` 交å‰éªŒè¯ï¼Œè¿›è¡Œ 3 次分割和 5 次é‡å¤ã€‚ 我们比较以下ç–略: * :class:~sklearn.tree.DecisionTreeClassifier å¯ä»¥å¤„ç†å¤šç±»åˆ†ç±»è€Œæ— 需任何特殊调整。它通过将è®ç»ƒæ•°æ®åˆ†è§£æˆæ›´å°çš„å集,并关注æ¯ä¸ªå集䏿œ€å¸¸è§çš„类别æ¥å·¥ä½œã€‚通过é‡å¤è¿™ä¸ªè¿‡ç¨‹ï¼Œæ¨¡åž‹å¯ä»¥å‡†ç¡®åœ°å°†è¾“入数æ®åˆ†ç±»ä¸ºå¤šä¸ªä¸åŒçš„类别。 * :class:`~sklearn.multiclass.OneVsOneClassifier` è®ç»ƒä¸€ç»„二元分类器,æ¯ä¸ªåˆ†ç±»å™¨è¢«è®ç»ƒæ¥åŒºåˆ†ä¸¤ä¸ªç±»åˆ«ã€‚ * :class:`~sklearn.multiclass.OneVsRestClassifier` :è®ç»ƒä¸€ç»„二元分类器,æ¯ä¸ªåˆ†ç±»å™¨è¢«è®ç»ƒæ¥åŒºåˆ†ä¸€ä¸ªç±»åˆ«å’Œå…¶ä½™ç±»åˆ«ã€‚ * :class:`~sklearn.multiclass.OutputCodeClassifier` :è®ç»ƒä¸€ç»„二元分类器,æ¯ä¸ªåˆ†ç±»å™¨è¢«è®ç»ƒæ¥åŒºåˆ†ä¸€ç»„ç±»åˆ«å’Œå…¶ä½™ç±»åˆ«ã€‚ç±»åˆ«é›†ç”±ä¸€ä¸ªä»£ç æœ¬å®šä¹‰ï¼Œè¯¥ä»£ç 本在 scikit-learn ä¸éšæœºç”Ÿæˆã€‚æ¤æ–¹æ³•æä¾›äº†ä¸€ä¸ªå‚æ•° `code_size` æ¥æŽ§åˆ¶ä»£ç æœ¬çš„大å°ã€‚æˆ‘ä»¬å°†å…¶è®¾ç½®ä¸ºå¤§äºŽä¸€ï¼Œå› ä¸ºæˆ‘ä»¬å¯¹åŽ‹ç¼©ç±»åˆ«è¡¨ç¤ºä¸æ„Ÿå…´è¶£ã€‚ .. GENERATED FROM PYTHON SOURCE LINES 41-63 .. code-block:: Python import pandas as pd from sklearn.model_selection import RepeatedStratifiedKFold, cross_validate from sklearn.multiclass import ( OneVsOneClassifier, OneVsRestClassifier, OutputCodeClassifier, ) from sklearn.tree import DecisionTreeClassifier cv = RepeatedStratifiedKFold(n_splits=3, n_repeats=5, random_state=0) tree = DecisionTreeClassifier(random_state=0) ovo_tree = OneVsOneClassifier(tree) ovr_tree = OneVsRestClassifier(tree) ecoc = OutputCodeClassifier(tree, code_size=2) cv_results_tree = cross_validate(tree, X, y, cv=cv, n_jobs=2) cv_results_ovo = cross_validate(ovo_tree, X, y, cv=cv, n_jobs=2) cv_results_ovr = cross_validate(ovr_tree, X, y, cv=cv, n_jobs=2) cv_results_ecoc = cross_validate(ecoc, X, y, cv=cv, n_jobs=2) .. GENERATED FROM PYTHON SOURCE LINES 64-66 我们现在å¯ä»¥æ¯”较ä¸åŒç–略的统计性能。 我们绘制ä¸åŒç–略的得分分布图。 .. GENERATED FROM PYTHON SOURCE LINES 66-84 .. code-block:: Python from matplotlib import pyplot as plt scores = pd.DataFrame( { "DecisionTreeClassifier": cv_results_tree["test_score"], "OneVsOneClassifier": cv_results_ovo["test_score"], "OneVsRestClassifier": cv_results_ovr["test_score"], "OutputCodeClassifier": cv_results_ecoc["test_score"], } ) ax = scores.plot.kde(legend=True) ax.set_xlabel("Accuracy score") ax.set_xlim([0, 0.7]) _ = ax.set_title( "Density of the accuracy scores for the different multiclass strategies" ) .. image-sg:: /auto_examples/multiclass/images/sphx_glr_plot_multiclass_overview_001.png :alt: Density of the accuracy scores for the different multiclass strategies :srcset: /auto_examples/multiclass/images/sphx_glr_plot_multiclass_overview_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 85-95 ä¹ä¸€çœ‹ï¼Œæˆ‘们å¯ä»¥çœ‹åˆ°å†³ç–æ ‘åˆ†ç±»å™¨çš„å†…ç½®ç–ç•¥è¿è¡Œå¾—ç›¸å½“å¥½ã€‚ä¸€å¯¹ä¸€å’Œçº é”™è¾“å‡ºç ç–略的效果更好。然而,一对多ç–略的效果ä¸å¦‚å…¶ä»–ç–略。 确实,这些结果é‡çŽ°äº†æ–‡çŒ®ä¸æŠ¥é“的内容,如[2]_所述。然而,事情并ä¸åƒçœ‹èµ·æ¥é‚£ä¹ˆç®€å•。 è¶…å‚æ•°æœç´¢çš„é‡è¦æ€§ ---------------------------------------- åŽæ¥åœ¨ [3]_ 䏿˜¾ç¤ºï¼Œå¦‚æžœé¦–å…ˆä¼˜åŒ–åŸºåˆ†ç±»å™¨çš„è¶…å‚æ•°ï¼Œå¤šåˆ†ç±»ç–略将显示出类似的分数。 在这里,我们å°è¯•é€šè¿‡è‡³å°‘ä¼˜åŒ–åŸºç¡€å†³ç–æ ‘的深度æ¥é‡çŽ°è¿™æ ·çš„ç»“æžœã€‚ .. GENERATED FROM PYTHON SOURCE LINES 95-125 .. code-block:: Python from sklearn.model_selection import GridSearchCV param_grid = {"max_depth": [3, 5, 8]} tree_optimized = GridSearchCV(tree, param_grid=param_grid, cv=3) ovo_tree = OneVsOneClassifier(tree_optimized) ovr_tree = OneVsRestClassifier(tree_optimized) ecoc = OutputCodeClassifier(tree_optimized, code_size=2) cv_results_tree = cross_validate(tree_optimized, X, y, cv=cv, n_jobs=2) cv_results_ovo = cross_validate(ovo_tree, X, y, cv=cv, n_jobs=2) cv_results_ovr = cross_validate(ovr_tree, X, y, cv=cv, n_jobs=2) cv_results_ecoc = cross_validate(ecoc, X, y, cv=cv, n_jobs=2) scores = pd.DataFrame( { "DecisionTreeClassifier": cv_results_tree["test_score"], "OneVsOneClassifier": cv_results_ovo["test_score"], "OneVsRestClassifier": cv_results_ovr["test_score"], "OutputCodeClassifier": cv_results_ecoc["test_score"], } ) ax = scores.plot.kde(legend=True) ax.set_xlabel("Accuracy score") ax.set_xlim([0, 0.7]) _ = ax.set_title( "Density of the accuracy scores for the different multiclass strategies" ) plt.show() .. image-sg:: /auto_examples/multiclass/images/sphx_glr_plot_multiclass_overview_002.png :alt: Density of the accuracy scores for the different multiclass strategies :srcset: /auto_examples/multiclass/images/sphx_glr_plot_multiclass_overview_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 126-150 我们å¯ä»¥çœ‹åˆ°ï¼Œä¸€æ—¦è¶…傿•°è¢«ä¼˜åŒ–,所有多分类ç–略的性能都相似,如[3]_䏿‰€è®¨è®ºçš„。 Conclusion ---------- 我们å¯ä»¥ä»Žè¿™äº›ç»“æžœä¸èŽ·å¾—ä¸€äº›ç›´è§‚çš„ç†è§£ã€‚ é¦–å…ˆï¼Œå½“è¶…å‚æ•°æœªä¼˜åŒ–时,one-vs-one å’Œçº é”™è¾“å‡ºç ä¼˜äºŽæ ‘çš„åŽŸå› åœ¨äºŽå®ƒä»¬é›†æˆäº†æ›´å¤šçš„åˆ†ç±»å™¨ã€‚é›†æˆæ–¹æ³•æé«˜äº†æ³›åŒ–性能。这有点类似于为什么在ä¸ä¼˜åŒ–è¶…å‚æ•°çš„æƒ…况下,bagging 分类器通常比å•ä¸ªå†³ç–æ ‘表现更好。 ç„¶åŽï¼Œæˆ‘ä»¬çœ‹åˆ°äº†ä¼˜åŒ–è¶…å‚æ•°çš„é‡è¦æ€§ã€‚实际上,å³ä½¿åƒé›†æˆè¿™æ ·çš„æ–¹æ³•有助于å‡å°‘è¿™ç§å½±å“,在开å‘预测模型时也应定期进行探索。 最åŽï¼Œé‡è¦çš„æ˜¯è¦è®°ä½ï¼Œscikit-learnä¸çš„估计器是通过特定ç–略开å‘的,å¯ä»¥ç›´æŽ¥å¤„ç†å¤šåˆ†ç±»é—®é¢˜ã€‚å› æ¤ï¼Œå¯¹äºŽè¿™äº›ä¼°è®¡å™¨æ¥è¯´ï¼Œä¸éœ€è¦ä½¿ç”¨ä¸åŒçš„ç–略。这些ç–略主è¦å¯¹ä»…支æŒäºŒåˆ†ç±»çš„ç¬¬ä¸‰æ–¹ä¼°è®¡å™¨æœ‰ç”¨ã€‚åœ¨æ‰€æœ‰æƒ…å†µä¸‹ï¼Œæˆ‘ä»¬è¿˜å±•ç¤ºäº†è¶…å‚æ•°åº”该被优化。 References ---------- .. [1] https://archive.ics.uci.edu/ml/datasets/Yeast .. [2] `"将多类问题简化为二类问题:一ç§ç»Ÿä¸€çš„边缘分类器方法。" Allwein, Erin L., Robert E. Schapire, å’Œ Yoram Singer. 机器å¦ä¹ ç ”ç©¶æœŸåˆŠ 1 2000å¹´12月: 113-141. <https://www.jmlr.org/papers/volume1/allwein00a/allwein00a.pdf>`_ . .. [3] `"为一对多分类辩护。" 机器å¦ä¹ ç ”ç©¶æœŸåˆŠ 5 2004å¹´1月: 101-141. <https://www.jmlr.org/papers/volume5/rifkin04a/rifkin04a.pdf>`_ . .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 14.026 seconds) .. _sphx_glr_download_auto_examples_multiclass_plot_multiclass_overview.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/main?urlpath=lab/tree/notebooks/auto_examples/multiclass/plot_multiclass_overview.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_multiclass_overview.ipynb <plot_multiclass_overview.ipynb>` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_multiclass_overview.py <plot_multiclass_overview.py>` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_multiclass_overview.zip <plot_multiclass_overview.zip>` .. include:: plot_multiclass_overview.recommendations .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_