随机交换层

[source]

RandomSwap class

keras_nlp.layers.RandomSwap(
    rate,
    max_swaps=None,
    skip_list=None,
    skip_fn=None,
    skip_py_fn=None,
    seed=None,
    name=None,
    dtype="int32",
    **kwargs
)

Augments input by randomly swapping words.

This layer comes in handy when you need to generate new data using swap augmentations as described in the paper [EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks] (https://arxiv.org/pdf/1901.11196.pdf). The layer expects the inputs to be pre-split into token level inputs. This allows control over the level of augmentation, you can split by character for character level swaps, or by word for word level swaps.

Input data should be passed as tensors, tf.RaggedTensors, or lists. For batched input, inputs should be a list of lists or a rank two tensor. For unbatched inputs, each element should be a list or a rank one tensor.

Arguments

  • rate: The probability of a given token being chosen to be swapped with another random token.
  • max_swaps: The maximum number of swaps to be performed.
  • skip_list: A list of token values that should not be considered candidates for deletion.
  • skip_fn: A function that takes as input a scalar tensor token and returns as output a scalar tensor True/False value. A value of True indicates that the token should not be considered a candidate for deletion. This function must be tracable–it should consist of tensorflow operations.
  • skip_py_fn: A function that takes as input a python token value and returns as output True or False. A value of True indicates that should not be considered a candidate for deletion. Unlike the skip_fn argument, this argument need not be tracable–it can be any python function.
  • seed: A seed for the random number generator.

Examples

Word level usage.

>>> keras.utils.set_random_seed(1337)
>>> inputs=tf.strings.split(["Hey I like", "Keras and Tensorflow"])
>>> augmenter=keras_nlp.layers.RandomSwap(rate=0.4, seed=42)
>>> augmented=augmenter(inputs)
>>> tf.strings.reduce_join(augmented, separator=" ", axis=-1)
<tf.Tensor: shape=(2,), dtype=string,
numpy=array([b'like I Hey', b'and Keras Tensorflow'], dtype=object)>

Character level usage.

>>> keras.utils.set_random_seed(1337)
>>> inputs=tf.strings.unicode_split(["Hey Dude", "Speed Up"], "UTF-8")
>>> augmenter=keras_nlp.layers.RandomSwap(rate=0.4, seed=42)
>>> augmented=augmenter(inputs)
>>> tf.strings.reduce_join(augmented, axis=-1)
<tf.Tensor: shape=(2,), dtype=string,
numpy=array([b'deD yuHe', b'SUede pp'], dtype=object)>

Usage with skip_list.

>>> keras.utils.set_random_seed(1337)
>>> inputs=tf.strings.split(["Hey I like", "Keras and Tensorflow"])
>>> augmenter=keras_nlp.layers.RandomSwap(rate=0.4,
...     skip_list=["Keras"], seed=42)
>>> augmented=augmenter(inputs)
>>> tf.strings.reduce_join(augmented, separator=" ", axis=-1)
<tf.Tensor: shape=(2,), dtype=string,
numpy=array([b'like I Hey', b'Keras and Tensorflow'], dtype=object)>

Usage with skip_fn.

>>> def skip_fn(word):
...     return tf.strings.regex_full_match(word, r"[I, a].*")
>>> keras.utils.set_random_seed(1337)
>>> inputs=tf.strings.split(["Hey I like", "Keras and Tensorflow"])
>>> augmenter=keras_nlp.layers.RandomSwap(rate=0.9, max_swaps=3,
...     skip_fn=skip_fn, seed=11)
>>> augmented=augmenter(inputs)
>>> tf.strings.reduce_join(augmented, separator=" ", axis=-1)
<tf.Tensor: shape=(2,), dtype=string,
numpy=array([b'like I Hey', b'Keras and Tensorflow'], dtype=object)>

Usage with skip_py_fn.

>>> def skip_py_fn(word):
...     return len(word) < 4
>>> keras.utils.set_random_seed(1337)
>>> inputs=tf.strings.split(["He was drifting along", "With the wind"])
>>> augmenter=keras_nlp.layers.RandomSwap(rate=0.8, max_swaps=2,
...     skip_py_fn=skip_py_fn, seed=15)
>>> augmented=augmenter(inputs)
>>> tf.strings.reduce_join(augmented, separator=" ", axis=-1)
<tf.Tensor: shape=(2,), dtype=string, numpy=array([b'He was along drifting',
b'wind the With'], dtype=object)>