`Exact` 解释器

本笔记本演示了如何在某些简单数据集上使用 Exact explainer。Exact explainer 是模型无关的，因此它可以精确计算任何模型的 Shapley 值和 Owen 值（无需近似）。然而，由于它完全枚举了掩码模式的整个空间，因此对于 M 个输入特征，它在 Shapley 值上的复杂度为 \(O(2^M)\)，在平衡聚类树上的 Owen 值上的复杂度为 \(O(M^2)\)。

因为精确的解释器知道它在完全枚举掩码空间，所以它可以使用基于随机采样的方法无法实现的优化，例如使用格雷码排序来最小化连续掩码模式之间变化的输入数量，从而可能减少模型需要被调用的次数。

[1]:

import xgboost

import shap

# get a dataset on income prediction
X, y = shap.datasets.adult()

# train an XGBoost model (but any other model type would also work)
model = xgboost.XGBClassifier()
model.fit(X, y);

带有独立（Shapley 值）掩码的表格数据

[2]:

# build an Exact explainer and explain the model predictions on the given dataset
explainer = shap.explainers.Exact(model.predict_proba, X)
shap_values = explainer(X[:100])

# get just the explanations for the positive class
shap_values = shap_values[..., 1]

Exact explainer: 101it [00:12,  8.13it/s]

绘制全球概览

[3]:

shap.plots.bar(shap_values)

../../../_images/example_notebooks_api_examples_explainers_Exact_5_0.png

绘制单个实例

[4]:

shap.plots.waterfall(shap_values[0])

../../../_images/example_notebooks_api_examples_explainers_Exact_7_0.png

带有分区（Owen 值）掩码的表格数据

虽然Shapley值是通过将每个特征独立于其他特征来计算的，但在模型输入上强制执行结构通常是有用的。强制执行这种结构会产生一个结构化博弈（即一个关于有效输入特征联盟的规则的博弈），当该结构是特征分组的嵌套集时，我们通过将Shapley值递归应用于该组来得到Owen值。在SHAP中，我们将分区推向极限，并构建一个二叉层次聚类树来表示数据的结构。这种结构可以通过多种方式选择，但对于表格数据，通常从输入特征与输出标签之间的信息冗余中构建结构是有帮助的。下面我们就是这样做的：

[5]:

# build a clustering of the features based on shared information about y
clustering = shap.utils.hclust(X, y)

[6]:

# above we implicitly used shap.maskers.Independent by passing a raw dataframe as the masker
# now we explicitly use a Partition masker that uses the clustering we just computed
masker = shap.maskers.Partition(X, clustering=clustering)

# build an Exact explainer and explain the model predictions on the given dataset
explainer = shap.explainers.Exact(model.predict_proba, masker)
shap_values2 = explainer(X[:100])

# get just the explanations for the positive class
shap_values2 = shap_values2[..., 1]

绘制全球概览

注意，只有关系和婚姻状况特征彼此之间共享超过50%的解释力（以R2衡量），因此所有其他聚类树部分都被默认的 clustering_cutoff=0.5 设置所移除：

[7]:

shap.plots.bar(shap_values2)

../../../_images/example_notebooks_api_examples_explainers_Exact_12_0.png

绘制单个实例

请注意，上面来自Independent掩码器的解释与这里的Partition掩码器之间有很强的相似性。总的来说，这些方法在表格数据上的区别并不大，尽管Partition掩码器允许更快的运行时间，并且可能对模型输入进行更现实的操纵（因为特征簇组一起被掩码/取消掩码）。

[8]:

shap.plots.waterfall(shap_values2[0])

../../../_images/example_notebooks_api_examples_explainers_Exact_14_0.png