解释一个使用标准化特征的模型

标准化特征是许多机器学习管道中常见的预处理步骤。在解释使用标准化特征的模型时,通常希望使用原始输入特征(而不是它们的标准化版本)来获得解释。本笔记本展示了如何利用应用于模型输入的任何单变量变换不会影响模型的Shapley值的特性来实现这一点(注意,像PCA分解这样的多变量变换会改变Shapley值,因此这个技巧不适用于那里)。

构建一个使用标准化特征的线性模型

[1]:
import sklearn

import shap

# get standardized data
X, y = shap.datasets.california()
scaler = sklearn.preprocessing.StandardScaler()
scaler.fit(X)
X_std = scaler.transform(X)

# train the linear model
model = sklearn.linear_model.LinearRegression().fit(X_std, y)

# explain the model's predictions using SHAP
explainer = shap.explainers.Linear(model, X_std)
shap_values = explainer(X_std)

# visualize the model's dependence on the first feature
shap.plots.scatter(shap_values[:, 0])
Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)
../../../_images/example_notebooks_tabular_examples_linear_models_Explaining_a_model_that_uses_standardized_features_2_1.png

将解释转换回原始特征空间

[2]:
# we add back the feature names stripped by the StandardScaler
for i, c in enumerate(X.columns):
    shap_values.feature_names[i] = c

# we convert back to the original data
# (note we can do this because X_std is a set of univariate transformations of X)
shap_values.data = X.values

# visualize the model's dependence on the first feature again, now in the new original feature space
shap.plots.scatter(shap_values[:, 0])
../../../_images/example_notebooks_tabular_examples_linear_models_Explaining_a_model_that_uses_standardized_features_4_0.png

有更多有用示例的想法吗?我们鼓励提交增加此文档笔记本的拉取请求!