12. 选择合适的估计器#

通常,解决机器学习问题最难的部分可能是找到适合该工作的正确估计器。不同的估计器更适合不同类型的数据和不同的问题。

下面的流程图旨在为用户提供一些粗略的指导,关于如何根据要在数据上尝试的估计器来处理问题。点击下图中的任何估计器以查看其文档。😭表情符号应理解为“如果此估计器未达到预期结果,则跟随箭头并尝试下一个”。使用滚轮进行缩放,点击并拖动以平移。您还可以下载该图表:ml_map.svg

START
START
>50
samples
>50...
get
more
data
get...
NO
NO
predicting a
category
predicting...
YES
YES
do you have
labeled
data
do you hav...
YES
YES
predicting a
quantity
predicting...
NO
NO
just
looking
just...
NO
NO
predicting
structure
predicting...
NO
NO
tough
luck
tough...
<100K
samples
<100K...
YES
YES
SGD
Classifier
SGD...
NO
NO
Linear
SVC
Linear...
YES
YES
text
data
text...
😭
😭
Kernel
Approximation
Kernel...
😭
😭
KNeighbors
Classifier
KNeighbors...
NO
NO
SVC
SVC
Ensemble
Classifiers
Ensemble...
😭
😭
Naive
Bayes
Naive...
YES
YES
classification
classification
number of
categories
known
number of...
NO
NO
<10K
samples
<10K...
<10K
samples
<10K...
NO
NO
NO
NO
YES
YES
MeanShift
MeanShift
VBGMM
VBGMM
YES
YES
MiniBatch
KMeans
MiniBatch...
NO
NO
clustering
clustering
KMeans
KMeans
YES
YES
Spectral
Clustering
Spectral...
GMM
GMM
😭
😭
<100K
samples
<100K...
YES
YES
few features
should be
important
few features...
YES
YES
SGD
Regressor
SGD...
NO
NO
Lasso
Lasso
ElasticNet
ElasticNet
YES
YES
RidgeRegression
RidgeRegression
SVR(kernel="linear")
SVR(kernel="linea...
NO
NO
SVR(kernel="rbf")
SVR(kernel="rbf...
Ensemble
Regressors
Ensemble...
😭
😭
regression
regression
Ramdomized
PCA
Ramdomized...
YES
YES
<10K
samples
<10K...
😭
😭
Kernel
Approximation
Kernel...
NO
NO
IsoMap
IsoMap
Spectral
Embedding
Spectral...
YES
YES
LLE
LLE
😭
😭
dimensionality
reduction
dimensionality...
scikit-learn
algorithm cheat sheet
scikit-learn...
Text is not SVG - cannot display