表格数据解释基准测试:Xgboost 回归

本笔记本演示了如何使用基准测试工具来测试表格数据解释器的性能。在此演示中,我们展示了 TreeExplainer 的解释性能。用于评估的指标是“保持正”和“保持负”。这里使用的掩码器是 IndependentMasker,但也可以推广到其他表格掩码器。

新的 benchmark 工具使用新的 API,以 MaskedModel 作为用户导入模型的包装器,并评估输入的掩码值。

[1]:
import xgboost
from sklearn.model_selection import train_test_split

import shap
import shap.benchmark as benchmark

加载数据和模型

[2]:
# create trained model for prediction function
untrained_model = xgboost.XGBRegressor(n_estimators=100, subsample=0.3)
X, y = shap.datasets.california()
X = X.values

test_size = 0.3
random_state = 0
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=test_size, random_state=random_state
)

model = untrained_model.fit(X_train, y_train)

定义解释器掩码

[3]:
# use Independent masker as default
masker = shap.maskers.Independent(X)

创建解释器对象

[4]:
# tree explainer is used
explainer = shap.Explainer(model, masker)

运行 SHAP 解释

[5]:
shap_values = explainer(X)
 98%|===================| 20313/20640 [00:38<00:00]

定义指标(排序和扰动方法)

[6]:
sort_order = "positive"
perturbation = "keep"

基准解释器

[7]:
sp = benchmark._sequential.SequentialPerturbation(
    explainer.model, explainer.masker, sort_order, perturbation
)
sp_result = sp("SequentialPerturbation", shap_values.values, X)
sp.plot(sp_result.curve_x, sp_result.curve_y, sp_result.value)
../../../_images/example_notebooks_benchmarks_tabular_Tabular_Prediction_Benchmark_Demo_14_1.png
[8]:
sort_order = "negative"
perturbation = "keep"
[9]:
sp = benchmark._sequential.SequentialPerturbation(
    explainer.model, explainer.masker, sort_order, perturbation
)
sp_result = sp("SequentialPerturbation", shap_values.values, X)
sp.plot(sp_result.curve_x, sp_result.curve_y, sp_result.value)
../../../_images/example_notebooks_benchmarks_tabular_Tabular_Prediction_Benchmark_Demo_16_1.png