正面 vs. 负面情感分类

在这里,我们演示如何解释一个用于电影评论的情感分类模型。正面 vs. 负面情感

[47]:
import datasets
import numpy as np
import transformers

import shap

加载IMDB电影评论数据集

[2]:
dataset = datasets.load_dataset("imdb", split="test")

# shorten the strings to fit into the pipeline model
short_data = [v[:500] for v in dataset["text"][:20]]
Reusing dataset imdb (/home/slundberg/.cache/huggingface/datasets/imdb/plain_text/1.0.0/90099cb476936b753383ba2ae6ab2eae419b2e87f71cd5189cb9c8e5814d12a3)

加载并运行情感分析管道

[3]:
classifier = transformers.pipeline("sentiment-analysis", return_all_scores=True)
classifier(short_data[:2])
[3]:
[[{'label': 'NEGATIVE', 'score': 0.0012035118415951729},
  {'label': 'POSITIVE', 'score': 0.9987965226173401}],
 [{'label': 'NEGATIVE', 'score': 0.002218781039118767},
  {'label': 'POSITIVE', 'score': 0.9977812170982361}]]

解释情感分析流程

[4]:
# define the explainer
explainer = shap.Explainer(classifier)
[5]:
# explain the predictions of the pipeline on the first two samples
shap_values = explainer(short_data[:2])
[6]:
shap.plots.text(shap_values[:, :, "POSITIVE"])

0th instance:
-0.187151base value-2.0362201.6619183.5109875.3600567.2091256.7213366.721336f(x)4.179 the sign of a good movie is that it can toy with our emotions . this one did exactly that . 3.398 the entire theater ( which was sold out ) was overcome by laughter during the 1.27 i was wrong . kutcher played the character of jake fischer very well , and kevin costner played ben randall with such professionalism . 0.0 -1.939 i went and saw this movie last night after being coaxed to by a few friends of mine . i ' ll admit that i was reluctant to see it because from what i knew of ashton kutcher he was only able to do comedy . -0.0
0.0
-1.939 / 49
i went and saw this movie last night after being coaxed to by a few friends of mine . i ' ll admit that i was reluctant to see it because from what i knew of ashton kutcher he was only able to do comedy .
1.27 / 28
i was wrong . kutcher played the character of jake fischer very well , and kevin costner played ben randall with such professionalism .
4.179 / 21
the sign of a good movie is that it can toy with our emotions . this one did exactly that .
3.398 / 15
the entire theater ( which was sold out ) was overcome by laughter during the
-0.0

1st instance:
-0.204055base value-2.0531241.6450143.4940835.3431527.1922226.1085756.108575f(x)1.885 actor turned director bill paxton follows up his promising debut , the gothic - horror " frailty " , with this family friendly sports drama about the 1913 u . s . 1.733 but some how this film was enthralli 1.612 idol in what was dubbed as " the greatest game ever played . 1.359 open where a young american caddy rises from his humble background to play against his bristish 0.0 -0.275 " i ' m no fan of golf , and these scrappy underdog sports flicks are a dime a dozen ( most recently done to grand effect with " miracle " and " cinderella man " ) , -0.0
-0.0
1.885 / 33
actor turned director bill paxton follows up his promising debut , the gothic - horror " frailty " , with this family friendly sports drama about the 1913 u . s .
1.359 / 19
open where a young american caddy rises from his humble background to play against his bristish
1.612 / 13
idol in what was dubbed as " the greatest game ever played .
-0.275 / 41
" i ' m no fan of golf , and these scrappy underdog sports flicks are a dime a dozen ( most recently done to grand effect with " miracle " and " cinderella man " ) ,
1.733 / 10
but some how this film was enthralli
0.0

手动包装流水线

SHAP 需要分类器的张量输出,并且解释在加性空间中效果最佳,因此我们将概率转换为对数值(信息值而非概率)。

创建一个 TransformersPipeline 包装器

[7]:
pmodel = shap.models.TransformersPipeline(classifier, rescale_to_logits=False)
[8]:
pmodel(short_data[:2])
[8]:
array([[0.00120351, 0.99879652],
       [0.00221878, 0.99778122]])
[9]:
pmodel = shap.models.TransformersPipeline(classifier, rescale_to_logits=True)
pmodel(short_data[:2])
[9]:
array([[-6.72130722,  6.72133589],
       [-6.10857607,  6.10857523]])
[13]:
explainer2 = shap.Explainer(pmodel)
shap_values2 = explainer2(short_data[:2])
shap.plots.text(shap_values2[:, :, 1])

0th instance:
-0.187151base value-2.0362201.6619183.5109875.3600567.2091256.7213366.721336f(x)4.179 the sign of a good movie is that it can toy with our emotions . this one did exactly that . 3.398 the entire theater ( which was sold out ) was overcome by laughter during the 1.27 i was wrong . kutcher played the character of jake fischer very well , and kevin costner played ben randall with such professionalism . 0.0 -1.939 i went and saw this movie last night after being coaxed to by a few friends of mine . i ' ll admit that i was reluctant to see it because from what i knew of ashton kutcher he was only able to do comedy . -0.0
0.0
-1.939 / 49
i went and saw this movie last night after being coaxed to by a few friends of mine . i ' ll admit that i was reluctant to see it because from what i knew of ashton kutcher he was only able to do comedy .
1.27 / 28
i was wrong . kutcher played the character of jake fischer very well , and kevin costner played ben randall with such professionalism .
4.179 / 21
the sign of a good movie is that it can toy with our emotions . this one did exactly that .
3.398 / 15
the entire theater ( which was sold out ) was overcome by laughter during the
-0.0

1st instance:
-0.204055base value-2.0531241.6450143.4940835.3431527.1922226.1085756.108575f(x)1.885 actor turned director bill paxton follows up his promising debut , the gothic - horror " frailty " , with this family friendly sports drama about the 1913 u . s . 1.733 but some how this film was enthralli 1.612 idol in what was dubbed as " the greatest game ever played . 1.359 open where a young american caddy rises from his humble background to play against his bristish 0.0 -0.275 " i ' m no fan of golf , and these scrappy underdog sports flicks are a dime a dozen ( most recently done to grand effect with " miracle " and " cinderella man " ) , -0.0
-0.0
1.885 / 33
actor turned director bill paxton follows up his promising debut , the gothic - horror " frailty " , with this family friendly sports drama about the 1913 u . s .
1.359 / 19
open where a young american caddy rises from his humble background to play against his bristish
1.612 / 13
idol in what was dubbed as " the greatest game ever played .
-0.275 / 41
" i ' m no fan of golf , and these scrappy underdog sports flicks are a dime a dozen ( most recently done to grand effect with " miracle " and " cinderella man " ) ,
1.733 / 10
but some how this film was enthralli
0.0

将分词器作为掩码对象传递

[15]:
explainer2 = shap.Explainer(pmodel, classifier.tokenizer)
shap_values2 = explainer2(short_data[:2])
shap.plots.text(shap_values2[:, :, 1])

0th instance:
-0.187151base value-2.0362201.6619183.5109875.3600567.2091256.7213366.721336f(x)4.179 the sign of a good movie is that it can toy with our emotions . this one did exactly that . 3.398 the entire theater ( which was sold out ) was overcome by laughter during the 1.27 i was wrong . kutcher played the character of jake fischer very well , and kevin costner played ben randall with such professionalism . 0.0 -1.939 i went and saw this movie last night after being coaxed to by a few friends of mine . i ' ll admit that i was reluctant to see it because from what i knew of ashton kutcher he was only able to do comedy . -0.0
0.0
-1.939 / 49
i went and saw this movie last night after being coaxed to by a few friends of mine . i ' ll admit that i was reluctant to see it because from what i knew of ashton kutcher he was only able to do comedy .
1.27 / 28
i was wrong . kutcher played the character of jake fischer very well , and kevin costner played ben randall with such professionalism .
4.179 / 21
the sign of a good movie is that it can toy with our emotions . this one did exactly that .
3.398 / 15
the entire theater ( which was sold out ) was overcome by laughter during the
-0.0

1st instance:
-0.204055base value-2.0531241.6450143.4940835.3431527.1922226.1085756.108575f(x)1.885 actor turned director bill paxton follows up his promising debut , the gothic - horror " frailty " , with this family friendly sports drama about the 1913 u . s . 1.733 but some how this film was enthralli 1.612 idol in what was dubbed as " the greatest game ever played . 1.359 open where a young american caddy rises from his humble background to play against his bristish 0.0 -0.275 " i ' m no fan of golf , and these scrappy underdog sports flicks are a dime a dozen ( most recently done to grand effect with " miracle " and " cinderella man " ) , -0.0
-0.0
1.885 / 33
actor turned director bill paxton follows up his promising debut , the gothic - horror " frailty " , with this family friendly sports drama about the 1913 u . s .
1.359 / 19
open where a young american caddy rises from his humble background to play against his bristish
1.612 / 13
idol in what was dubbed as " the greatest game ever played .
-0.275 / 41
" i ' m no fan of golf , and these scrappy underdog sports flicks are a dime a dozen ( most recently done to grand effect with " miracle " and " cinderella man " ) ,
1.733 / 10
but some how this film was enthralli
0.0

显式构建文本掩码器

[35]:
masker = shap.maskers.Text(classifier.tokenizer)
explainer2 = shap.Explainer(pmodel, masker)
shap_values2 = explainer2(short_data[:2])
shap.plots.text(shap_values2[:, :, 1])

0th instance:
-0.187151base value-2.0362201.6619183.5109875.3600567.2091256.7213366.721336f(x)4.179 the sign of a good movie is that it can toy with our emotions . this one did exactly that . 3.398 the entire theater ( which was sold out ) was overcome by laughter during the 1.27 i was wrong . kutcher played the character of jake fischer very well , and kevin costner played ben randall with such professionalism . 0.0 -1.939 i went and saw this movie last night after being coaxed to by a few friends of mine . i ' ll admit that i was reluctant to see it because from what i knew of ashton kutcher he was only able to do comedy . -0.0
0.0
-1.939 / 49
i went and saw this movie last night after being coaxed to by a few friends of mine . i ' ll admit that i was reluctant to see it because from what i knew of ashton kutcher he was only able to do comedy .
1.27 / 28
i was wrong . kutcher played the character of jake fischer very well , and kevin costner played ben randall with such professionalism .
4.179 / 21
the sign of a good movie is that it can toy with our emotions . this one did exactly that .
3.398 / 15
the entire theater ( which was sold out ) was overcome by laughter during the
-0.0

1st instance:
-0.204055base value-2.0531241.6450143.4940835.3431527.1922226.1085756.108575f(x)1.885 actor turned director bill paxton follows up his promising debut , the gothic - horror " frailty " , with this family friendly sports drama about the 1913 u . s . 1.733 but some how this film was enthralli 1.612 idol in what was dubbed as " the greatest game ever played . 1.359 open where a young american caddy rises from his humble background to play against his bristish 0.0 -0.275 " i ' m no fan of golf , and these scrappy underdog sports flicks are a dime a dozen ( most recently done to grand effect with " miracle " and " cinderella man " ) , -0.0
-0.0
1.885 / 33
actor turned director bill paxton follows up his promising debut , the gothic - horror " frailty " , with this family friendly sports drama about the 1913 u . s .
1.359 / 19
open where a young american caddy rises from his humble background to play against his bristish
1.612 / 13
idol in what was dubbed as " the greatest game ever played .
-0.275 / 41
" i ' m no fan of golf , and these scrappy underdog sports flicks are a dime a dozen ( most recently done to grand effect with " miracle " and " cinderella man " ) ,
1.733 / 10
but some how this film was enthralli
0.0

探索文本遮罩器的工作原理

[42]:
masker.shape("I like this movie.")
[42]:
(1, 7)
[48]:
model_args = masker(
    np.array([True, True, True, True, True, True, True]), "I like this movie."
)
model_args
[48]:
(array(['i like this movie .'], dtype='<U19'),)
[49]:
pmodel(*model_args)
[49]:
array([[-8.90780458,  8.90742142]])
[50]:
model_args = masker(
    np.array([True, True, False, False, True, True, True]), "I like this movie."
)
model_args
[50]:
(array(['i [MASK] [MASK] movie .'], dtype='<U23'),)
[51]:
pmodel(*model_args)
[51]:
array([[-3.72092204,  3.72092316]])
[52]:
masker2 = shap.maskers.Text(
    classifier.tokenizer, mask_token="...", collapse_mask_token=True
)
[53]:
model_args2 = masker2(
    np.array([True, True, False, False, True, True, True]), "I like this movie."
)
model_args2
[53]:
(array(['i . . . movie .'], dtype='<U15'),)
[54]:
pmodel(*model_args2)
[54]:
array([[-3.20818664,  3.20818753]])

绘制汇总统计数据和条形图

[55]:
# explain the predictions of the pipeline on the first two samples
shap_values = explainer(short_data[:20])
Partition explainer: 21it [00:11,  1.76it/s]
[56]:
shap.plots.bar(shap_values[0, :, "POSITIVE"])
../../../_images/example_notebooks_text_examples_sentiment_analysis_Positive_vs._Negative_Sentiment_Classification_31_0.png
[57]:
shap.plots.bar(shap_values[:, :, "POSITIVE"].mean(0))
../../../_images/example_notebooks_text_examples_sentiment_analysis_Positive_vs._Negative_Sentiment_Classification_32_0.png
[59]:
shap.plots.bar(shap_values[:, :, "POSITIVE"].mean(0), order=shap.Explanation.argsort)
../../../_images/example_notebooks_text_examples_sentiment_analysis_Positive_vs._Negative_Sentiment_Classification_33_0.png
[ ]: