RandomDeletion
classkeras_nlp.layers.RandomDeletion(
rate,
max_deletions=None,
skip_list=None,
skip_fn=None,
skip_py_fn=None,
seed=None,
name=None,
dtype="int32",
**kwargs
)
Augments input by randomly deleting tokens.
This layer comes in handy when you need to generate new data using deletion augmentation as described in the paper [EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks] (https://arxiv.org/pdf/1901.11196.pdf). The layer expects the inputs to be pre-split into token level inputs. This allows control over the level of augmentation, you can split by character for character level swaps, or by word for word level swaps.
Input data should be passed as tensors, tf.RaggedTensor
s, or lists. For
batched input, inputs should be a list of lists or a rank two tensor. For
unbatched inputs, each element should be a list or a rank one tensor.
Arguments
True
or False
. A value of True
indicates that should not be considered a candidate for deletion.
Unlike the skip_fn
argument, this argument need not be
tracable–it can be any python function.Examples
Word level usage.
>>> keras.utils.set_random_seed(1337)
>>> inputs=tf.strings.split(["Hey I like", "Keras and Tensorflow"])
>>> augmenter=keras_nlp.layers.RandomDeletion(rate=0.4, seed=42)
>>> augmented=augmenter(inputs)
>>> tf.strings.reduce_join(augmented, separator=" ", axis=-1)
<tf.Tensor: shape=(2,), dtype=string, numpy=array([b'I like', b'and'],
dtype=object)>
Character level usage.
>>> keras.utils.set_random_seed(1337)
>>> inputs=tf.strings.unicode_split(["Hey Dude", "Speed Up"], "UTF-8")
>>> augmenter=keras_nlp.layers.RandomDeletion(rate=0.4, seed=42)
>>> augmented=augmenter(inputs)
>>> tf.strings.reduce_join(augmented, axis=-1)
<tf.Tensor: shape=(2,), dtype=string, numpy=array([b'H Dude', b'pedUp'],
dtype=object)>
Usage with skip_list.
>>> keras.utils.set_random_seed(1337)
>>> inputs=tf.strings.split(["Hey I like", "Keras and Tensorflow"])
>>> augmenter=keras_nlp.layers.RandomDeletion(rate=0.4,
... skip_list=["Keras", "Tensorflow"], seed=42)
>>> augmented=augmenter(inputs)
>>> tf.strings.reduce_join(augmented, separator=" ", axis=-1)
<tf.Tensor: shape=(2,), dtype=string,
numpy=array([b'I like', b'Keras Tensorflow'], dtype=object)>
Usage with skip_fn.
>>> def skip_fn(word):
... return tf.strings.regex_full_match(word, r"\pP")
>>> keras.utils.set_random_seed(1337)
>>> inputs=tf.strings.split(["Hey I like", "Keras and Tensorflow"])
>>> augmenter=keras_nlp.layers.RandomDeletion(rate=0.4,
... skip_fn=skip_fn, seed=42)
>>> augmented=augmenter(inputs)
>>> tf.strings.reduce_join(augmented, separator=" ", axis=-1)
<tf.Tensor: shape=(2,), dtype=string, numpy=array([b'I like', b'and'],
dtype=object)>
Usage with skip_py_fn.
>>> def skip_py_fn(word):
... return len(word) < 4
>>> keras.utils.set_random_seed(1337)
>>> inputs=tf.strings.split(["Hey I like", "Keras and Tensorflow"])
>>> augmenter=RandomDeletion(rate=0.4,
... skip_py_fn=skip_py_fn, seed=42)
>>> augmented=augmenter(inputs)
>>> tf.strings.reduce_join(augmented, separator=" ", axis=-1)
<tf.Tensor: shape=(2,), dtype=string,
numpy=array([b'Hey I', b'and Tensorflow'], dtype=object)>