正面 vs. 负面情感分类
在这里,我们演示如何解释一个用于电影评论的情感分类模型。正面 vs. 负面情感
[47]:
import datasets
import numpy as np
import transformers
import shap
加载IMDB电影评论数据集
[2]:
dataset = datasets.load_dataset("imdb", split="test")
# shorten the strings to fit into the pipeline model
short_data = [v[:500] for v in dataset["text"][:20]]
Reusing dataset imdb (/home/slundberg/.cache/huggingface/datasets/imdb/plain_text/1.0.0/90099cb476936b753383ba2ae6ab2eae419b2e87f71cd5189cb9c8e5814d12a3)
加载并运行情感分析管道
[3]:
classifier = transformers.pipeline("sentiment-analysis", return_all_scores=True)
classifier(short_data[:2])
[3]:
[[{'label': 'NEGATIVE', 'score': 0.0012035118415951729},
{'label': 'POSITIVE', 'score': 0.9987965226173401}],
[{'label': 'NEGATIVE', 'score': 0.002218781039118767},
{'label': 'POSITIVE', 'score': 0.9977812170982361}]]
解释情感分析流程
[4]:
# define the explainer
explainer = shap.Explainer(classifier)
[5]:
# explain the predictions of the pipeline on the first two samples
shap_values = explainer(short_data[:2])
[6]:
shap.plots.text(shap_values[:, :, "POSITIVE"])
0th instance:
i went and saw this movie last night after being coaxed to by a few friends of mine . i ' ll admit that i was reluctant to see it because from what i knew of ashton kutcher he was only able to do comedy .
i was wrong . kutcher played the character of jake fischer very well , and kevin costner played ben randall with such professionalism .
the sign of a good movie is that it can toy with our emotions . this one did exactly that .
the entire theater ( which was sold out ) was overcome by laughter during the
1st instance:
actor turned director bill paxton follows up his promising debut , the gothic - horror " frailty " , with this family friendly sports drama about the 1913 u . s .
open where a young american caddy rises from his humble background to play against his bristish
idol in what was dubbed as " the greatest game ever played .
" i ' m no fan of golf , and these scrappy underdog sports flicks are a dime a dozen ( most recently done to grand effect with " miracle " and " cinderella man " ) ,
but some how this film was enthralli
手动包装流水线
SHAP 需要分类器的张量输出,并且解释在加性空间中效果最佳,因此我们将概率转换为对数值(信息值而非概率)。
创建一个 TransformersPipeline 包装器
[7]:
pmodel = shap.models.TransformersPipeline(classifier, rescale_to_logits=False)
[8]:
pmodel(short_data[:2])
[8]:
array([[0.00120351, 0.99879652],
[0.00221878, 0.99778122]])
[9]:
pmodel = shap.models.TransformersPipeline(classifier, rescale_to_logits=True)
pmodel(short_data[:2])
[9]:
array([[-6.72130722, 6.72133589],
[-6.10857607, 6.10857523]])
[13]:
explainer2 = shap.Explainer(pmodel)
shap_values2 = explainer2(short_data[:2])
shap.plots.text(shap_values2[:, :, 1])
0th instance:
i went and saw this movie last night after being coaxed to by a few friends of mine . i ' ll admit that i was reluctant to see it because from what i knew of ashton kutcher he was only able to do comedy .
i was wrong . kutcher played the character of jake fischer very well , and kevin costner played ben randall with such professionalism .
the sign of a good movie is that it can toy with our emotions . this one did exactly that .
the entire theater ( which was sold out ) was overcome by laughter during the
1st instance:
actor turned director bill paxton follows up his promising debut , the gothic - horror " frailty " , with this family friendly sports drama about the 1913 u . s .
open where a young american caddy rises from his humble background to play against his bristish
idol in what was dubbed as " the greatest game ever played .
" i ' m no fan of golf , and these scrappy underdog sports flicks are a dime a dozen ( most recently done to grand effect with " miracle " and " cinderella man " ) ,
but some how this film was enthralli
将分词器作为掩码对象传递
[15]:
explainer2 = shap.Explainer(pmodel, classifier.tokenizer)
shap_values2 = explainer2(short_data[:2])
shap.plots.text(shap_values2[:, :, 1])
0th instance:
i went and saw this movie last night after being coaxed to by a few friends of mine . i ' ll admit that i was reluctant to see it because from what i knew of ashton kutcher he was only able to do comedy .
i was wrong . kutcher played the character of jake fischer very well , and kevin costner played ben randall with such professionalism .
the sign of a good movie is that it can toy with our emotions . this one did exactly that .
the entire theater ( which was sold out ) was overcome by laughter during the
1st instance:
actor turned director bill paxton follows up his promising debut , the gothic - horror " frailty " , with this family friendly sports drama about the 1913 u . s .
open where a young american caddy rises from his humble background to play against his bristish
idol in what was dubbed as " the greatest game ever played .
" i ' m no fan of golf , and these scrappy underdog sports flicks are a dime a dozen ( most recently done to grand effect with " miracle " and " cinderella man " ) ,
but some how this film was enthralli
显式构建文本掩码器
[35]:
masker = shap.maskers.Text(classifier.tokenizer)
explainer2 = shap.Explainer(pmodel, masker)
shap_values2 = explainer2(short_data[:2])
shap.plots.text(shap_values2[:, :, 1])
0th instance:
i went and saw this movie last night after being coaxed to by a few friends of mine . i ' ll admit that i was reluctant to see it because from what i knew of ashton kutcher he was only able to do comedy .
i was wrong . kutcher played the character of jake fischer very well , and kevin costner played ben randall with such professionalism .
the sign of a good movie is that it can toy with our emotions . this one did exactly that .
the entire theater ( which was sold out ) was overcome by laughter during the
1st instance:
actor turned director bill paxton follows up his promising debut , the gothic - horror " frailty " , with this family friendly sports drama about the 1913 u . s .
open where a young american caddy rises from his humble background to play against his bristish
idol in what was dubbed as " the greatest game ever played .
" i ' m no fan of golf , and these scrappy underdog sports flicks are a dime a dozen ( most recently done to grand effect with " miracle " and " cinderella man " ) ,
but some how this film was enthralli
探索文本遮罩器的工作原理
[42]:
masker.shape("I like this movie.")
[42]:
(1, 7)
[48]:
model_args = masker(
np.array([True, True, True, True, True, True, True]), "I like this movie."
)
model_args
[48]:
(array(['i like this movie .'], dtype='<U19'),)
[49]:
pmodel(*model_args)
[49]:
array([[-8.90780458, 8.90742142]])
[50]:
model_args = masker(
np.array([True, True, False, False, True, True, True]), "I like this movie."
)
model_args
[50]:
(array(['i [MASK] [MASK] movie .'], dtype='<U23'),)
[51]:
pmodel(*model_args)
[51]:
array([[-3.72092204, 3.72092316]])
[52]:
masker2 = shap.maskers.Text(
classifier.tokenizer, mask_token="...", collapse_mask_token=True
)
[53]:
model_args2 = masker2(
np.array([True, True, False, False, True, True, True]), "I like this movie."
)
model_args2
[53]:
(array(['i . . . movie .'], dtype='<U15'),)
[54]:
pmodel(*model_args2)
[54]:
array([[-3.20818664, 3.20818753]])
绘制汇总统计数据和条形图
[55]:
# explain the predictions of the pipeline on the first two samples
shap_values = explainer(short_data[:20])
Partition explainer: 21it [00:11, 1.76it/s]
[56]:
shap.plots.bar(shap_values[0, :, "POSITIVE"])
[57]:
shap.plots.bar(shap_values[:, :, "POSITIVE"].mean(0))
[59]:
shap.plots.bar(shap_values[:, :, "POSITIVE"].mean(0), order=shap.Explanation.argsort)
[ ]: