编辑距离指标

[source]

EditDistance class

keras_nlp.metrics.EditDistance(
    normalize=True, dtype="float32", name="edit_distance", **kwargs
)

Edit Distance metric.

This class implements the edit distance metric, sometimes called Levenshtein Distance, as a keras.metrics.Metric. Essentially, edit distance is the least number of operations required to convert one string to another, where an operation can be one of substitution, deletion or insertion. By default, this metric will compute the normalized score, where the unnormalized edit distance score is divided by the number of tokens in the reference text.

This class can be used to compute character error rate (CER) and word error rate (WER). You simply have to pass the appropriate tokenized text, and set normalize to True.

Note on input shapes: y_true and y_pred can either be tensors of rank 1 or ragged tensors of rank 2. These tensors contain tokenized text.

Arguments

  • normalize: bool. If True, the computed number of operations (substitutions + deletions + insertions) across all samples is divided by the aggregate number of tokens in all reference texts. If False, number of operations are calculated for every sample, and averaged over all the samples.
  • dtype: string or tf.dtypes.Dtype. Precision of metric computation. If not specified, it defaults to "float32".
  • name: string. Name of the metric instance.
  • **kwargs: Other keyword arguments.

References

Examples

Various Input Types.

Single-level Python list.

>>> edit_distance = keras_nlp.metrics.EditDistance()
>>> y_true = "the tiny little cat was found under the big funny bed".split()
>>> y_pred = "the cat was found under the bed".split()
>>> edit_distance(y_true, y_pred)
<tf.Tensor: shape=(), dtype=float32, numpy=0.36363637>

Nested Python list.

>>> edit_distance = keras_nlp.metrics.EditDistance()
>>> y_true = [
...     "the tiny little cat was found under the big funny bed".split(),
...     "it is sunny today".split(),
... ]
>>> y_pred = [
...     "the cat was found under the bed".split(),
...     "it is sunny but with a hint of cloud cover".split(),
... ]
>>> edit_distance(y_true, y_pred)
<tf.Tensor: shape=(), dtype=float32, numpy=0.73333335>