BLEU评估指标

[source]

Bleu class

keras_nlp.metrics.Bleu(
    tokenizer=None, max_order=4, smooth=False, dtype="float32", name="bleu", **kwargs
)

BLEU metric.

This class implements the BLEU metric. BLEU is generally used to evaluate machine translation systems. By default, this implementation replicates SacreBLEU, but user-defined tokenizers can be passed to deal with other languages.

For BLEU score, we count the number of matching n-grams in the candidate translation and the reference text. We find the "clipped count" of matching n-grams so as to not give a high score to a (reference, prediction) pair with redundant, repeated tokens. Secondly, BLEU score tends to reward shorter predictions more, which is why a brevity penalty is applied to penalise short predictions. For more details, see the following article: https://cloud.google.com/translate/automl/docs/evaluate#bleu.

Note on input shapes: For unbatched inputs, y_pred should be a tensor of shape (), and y_true should be a tensor of shape (num_references,). For batched inputs, y_pred should be a tensor of shape (batch_size,), and y_true should be a tensor of shape (batch_size, num_references). In case of batched inputs, y_true can also be a ragged tensor of shape (batch_size, None) if different samples have different number of references.

Arguments

  • tokenizer: callable. A function that takes a string tf.RaggedTensor (of any shape), and tokenizes the strings in the tensor. If the tokenizer is not specified, the default tokenizer is used. The default tokenizer replicates the behaviour of SacreBLEU's "tokenizer_13a" tokenizer (https://github.com/mjpost/sacrebleu/blob/v2.1.0/sacrebleu/tokenizers/tokenizer_13a.py).
  • max_order: int. The maximum n-gram order to use. For example, if max_order is set to 3, unigrams, bigrams, and trigrams will be considered. Defaults to 4.
  • smooth: bool. Whether to apply Lin et al. 2004 smoothing to the BLEU score. Adds 1 to the matched n-gram count (i.e., numerator) and 1 to the total n-gram count (i.e., denominator) for every order while calculating precision. Defaults to False.
  • dtype: string or tf.dtypes.Dtype. Precision of metric computation. If not specified, it defaults to "float32".
  • name: string. Name of the metric instance.
  • **kwargs: Other keyword arguments.

References