使用自定义函数和分词器

本笔记本演示了如何在多类文本分类场景中使用 Partition 解释器,其中我们使用自定义的 Python 函数作为我们的模型。

[1]:
import datasets
import numpy as np
import pandas as pd
import scipy as sp
import torch
import transformers

import shap

# load the emotion dataset
dataset = datasets.load_dataset("emotion", split="train")
data = pd.DataFrame({"text": dataset["text"], "emotion": dataset["label"]})
Using custom data configuration default
Reusing dataset emotion (/home/slundberg/.cache/huggingface/datasets/emotion/default/0.0.0/aa34462255cd487d04be8387a2d572588f6ceee23f784f37365aa714afeb8fe6)

定义我们的模型

虽然这里我们使用的是 transformers 包,但任何接受字符串列表并输出分数的 Python 函数都可以工作。

[2]:
# load the model and tokenizer
tokenizer = transformers.AutoTokenizer.from_pretrained(
    "nateraw/bert-base-uncased-emotion", use_fast=True
)
model = transformers.AutoModelForSequenceClassification.from_pretrained(
    "nateraw/bert-base-uncased-emotion"
).cuda()
labels = sorted(model.config.label2id, key=model.config.label2id.get)


# this defines an explicit python function that takes a list of strings and outputs scores for each class
def f(x):
    tv = torch.tensor(
        [
            tokenizer.encode(v, padding="max_length", max_length=128, truncation=True)
            for v in x
        ]
    ).cuda()
    attention_mask = (tv != 0).type(torch.int64).cuda()
    outputs = model(tv, attention_mask=attention_mask)[0].detach().cpu().numpy()
    scores = (np.exp(outputs).T / np.exp(outputs).sum(-1)).T
    val = sp.special.logit(scores)
    return val

创建一个解释器

为了构建一个 Explainer ,我们需要一个模型和一个遮罩器(遮罩器指定了如何隐藏输入的部分内容)。由于我们使用的是自定义函数作为模型,SHAP 无法自动为我们推断出一个遮罩器。因此,我们需要提供一个,可以通过传递一个 transformers 分词器隐式地提供,或者通过构建一个 shap.maskers.Text 对象显式地提供。

[3]:
method = "custom tokenizer"

# build an explainer by passing a transformers tokenizer
if method == "transformers tokenizer":
    explainer = shap.Explainer(f, tokenizer, output_names=labels)

# build an explainer by explicitly creating a masker
elif method == "default masker":
    masker = shap.maskers.Text(r"\W")  # this will create a basic whitespace tokenizer
    explainer = shap.Explainer(f, masker, output_names=labels)

# build a fully custom tokenizer
elif method == "custom tokenizer":
    import re

    def custom_tokenizer(s, return_offsets_mapping=True):
        """Custom tokenizers conform to a subset of the transformers API."""
        pos = 0
        offset_ranges = []
        input_ids = []
        for m in re.finditer(r"\W", s):
            start, end = m.span(0)
            offset_ranges.append((pos, start))
            input_ids.append(s[pos:start])
            pos = end
        if pos != len(s):
            offset_ranges.append((pos, len(s)))
            input_ids.append(s[pos:])
        out = {}
        out["input_ids"] = input_ids
        if return_offsets_mapping:
            out["offset_mapping"] = offset_ranges
        return out

    masker = shap.maskers.Text(custom_tokenizer)
    explainer = shap.Explainer(f, masker, output_names=labels)

计算 SHAP 值

解释器与其解释的模型具有相同的方法签名,因此我们只需传递一个字符串列表,以解释这些分类。

[4]:
shap_values = explainer(data["text"][:3])

可视化所有输出类别的影响

在下面的图中,当你将鼠标悬停在一个输出类上时,你会得到该输出类的解释。当你点击一个输出类名时,该类将成为解释可视化的焦点,直到你点击另一个类。

基值是当整个输入文本被屏蔽时模型输出的值,而 \(f_{output class}(inputs)\) 是模型对完整原始输入的输出。SHAP 值以加法方式解释了每个单词的解屏蔽如何从基值(其中整个输入被屏蔽)到最终预测值改变模型输出的影响。

[5]:
shap.plots.text(shap_values)


[0]
outputs
sadness
joy
love
anger
fear
surprise


-1-4-7-10258-1.84234-1.84234base value5.625445.62544fsadness(inputs)6.54 humiliated 1.217 feel -0.197 i -0.092 didnt
inputs
-0.197
i
-0.092
didnt
1.217
feel
6.54
humiliated


[1]
outputs
sadness
joy
love
anger
fear
surprise


-1-4-7-10258-1.84234-1.84234base value5.353885.35388fsadness(inputs)8.14 hopeless 1.899 feeling 0.289 damned 0.287 to 0.145 from -1.984 hopeful -0.326 so -0.182 awake -0.157 cares -0.152 just -0.147 can -0.144 someone -0.118 i -0.081 so -0.066 around -0.06 go -0.056 being -0.038 is -0.033 from -0.012 who -0.007 and
inputs
-0.118
i
-0.147
can
-0.06
go
0.145
from
1.899
feeling
-0.326
so
8.14
hopeless
0.287
to
-0.081
so
0.289
damned
-1.984
hopeful
-0.152
just
-0.033
from
-0.056
being
-0.066
around
-0.144
someone
-0.012
who
-0.157
cares
-0.007
and
-0.038
is
-0.182
awake


[2]
outputs
sadness
joy
love
anger
fear
surprise


-1-4-7-10258-1.84234-1.84234base value-6.08251-6.08251fsadness(inputs)0.545 i 0.072 minute 0.053 to 0.029 a -3.75 greedy -0.421 wrong -0.367 feel -0.152 post -0.142 im -0.109 grabbing
inputs
-0.142
im
-0.109
grabbing
0.029
a
0.072
minute
0.053
to
-0.152
post
0.545
i
-0.367
feel
-3.75
greedy
-0.421
wrong

有更多有用示例的想法吗?我们鼓励提交增加此文档笔记本的拉取请求!