Seq2SeqLM
classkeras_nlp.models.Seq2SeqLM()
Base class for sequence to sequence language modeling tasks.
Seq2SeqLM
tasks wrap a keras_nlp.models.Backbone
and
a keras_nlp.models.Preprocessor
to create a model that can be used for
generation and generative fine-tuning, when generation is conditioned on
additional input sequence in a sequence-to-sequence setting.
Seq2SeqLM
tasks provide an additional, high-level generate()
function
which can be used to auto-regressively sample an output sequence token by
token. The compile()
method of Seq2SeqLM
classes contains an additional
sampler
argument, which can be used to pass a keras_nlp.samplers.Sampler
to control how the predicted distribution will be sampled.
When calling fit()
, each input should contain an input and output
sequence. The model will be trained to predict the output sequence
token-by-token using a causal mask, similar to a keras_nlp.models.CausalLM
task. Unlike the CausalLM
task, an input sequence must be passed, and
can be attended to in full by all tokens in the output sequence.
All Seq2SeqLM
tasks include a from_preset()
constructor which can be
used to load a pre-trained config and weights.
Example
# Load a Bart backbone with pre-trained weights.
seq_2_seq_lm = keras_nlp.models.Seq2SeqLM.from_preset(
"bart_base_en",
)
seq_2_seq_lm.compile(sampler="top_k")
# Generate conditioned on the `"The quick brown fox."` as an input sequence.
seq_2_seq_lm.generate("The quick brown fox.", max_length=30)
from_preset
methodSeq2SeqLM.from_preset(preset, load_weights=True, **kwargs)
Instantiate a keras_nlp.models.Task
from a model preset.
A preset is a directory of configs, weights and other file assets used
to save and load a pre-trained model. The preset
can be passed as a
one of:
'bert_base_en'
'kaggle://user/bert/keras/bert_base_en'
'hf://user/bert_base_en'
'./bert_base_en'
For any Task
subclass, you can run cls.presets.keys()
to list all
built-in presets available on the class.
This constructor can be called in one of two ways. Either from a task
specific base class like keras_nlp.models.CausalLM.from_preset()
, or
from a model class like keras_nlp.models.BertClassifier.from_preset()
.
If calling from the a base class, the subclass of the returning object
will be inferred from the config in the preset directory.
Arguments
True
, the weights will be loaded into the
model architecture. If False
, the weights will be randomly
initialized.Examples
# Load a Gemma generative task.
causal_lm = keras_nlp.models.CausalLM.from_preset(
"gemma_2b_en",
)
# Load a Bert classification task.
model = keras_nlp.models.Classifier.from_preset(
"bert_base_en",
num_classes=2,
)
compile
methodSeq2SeqLM.compile(
optimizer="auto", loss="auto", weighted_metrics="auto", sampler="top_k", **kwargs
)
Configures the CausalLM
task for training and generation.
The CausalLM
task extends the default compilation signature of
keras.Model.compile
with defaults for optimizer
, loss
, and
weighted_metrics
. To override these defaults, pass any value
to these arguments during compilation.
The CausalLM
task adds a new sampler
to compile
, which can be used
to control the sampling strategy used with the generate
function.
Note that because training inputs include padded tokens which are
excluded from the loss, it is almost always a good idea to compile with
weighted_metrics
and not metrics
.
Arguments
"auto"
, an optimizer name, or a keras.Optimizer
instance. Defaults to "auto"
, which uses the default optimizer
for the given model and task. See keras.Model.compile
and
keras.optimizers
for more info on possible optimizer
values."auto"
, a loss name, or a keras.losses.Loss
instance.
Defaults to "auto"
, where a
keras.losses.SparseCategoricalCrossentropy
loss will be
applied for the token classification CausalLM
task. See
keras.Model.compile
and keras.losses
for more info on
possible loss
values."auto"
, or a list of metrics to be evaluated by
the model during training and testing. Defaults to "auto"
,
where a keras.metrics.SparseCategoricalAccuracy
will be
applied to track the accuracy of the model at guessing masked
token values. See keras.Model.compile
and keras.metrics
for
more info on possible weighted_metrics
values.keras_nlp.samplers.Sampler
instance.
Configures the sampling method used during generate()
calls.
See keras_nlp.samplers
for a full list of built-in sampling
strategies.keras.Model.compile
for a full list of arguments
supported by the compile method.generate
methodSeq2SeqLM.generate(inputs, max_length=None, stop_token_ids="auto")
Generate text given prompt inputs
.
This method generates text based on given inputs
. The sampling method
used for generation can be set via the compile()
method.
If inputs
are a tf.data.Dataset
, outputs will be generated
"batch-by-batch" and concatenated. Otherwise, all inputs will be handled
as a single batch.
If a preprocessor
is attached to the model, inputs
will be
preprocessed inside the generate()
function and should match the
structure expected by the preprocessor
layer (usually raw strings).
If a preprocessor
is not attached, inputs should match the structure
expected by the backbone
. See the example usage above for a
demonstration of each.
Arguments
tf.data.Dataset
. If a
preprocessor
is attached to the model, inputs
should match
the structure expected by the preprocessor
layer. If a
preprocessor
is not attached, inputs
should match the
structure expected the backbone
model.sequence_length
of the
preprocessor
. If preprocessor
is None
, inputs
should be
should be padded to the desired maximum length and this argument
will be ignored.None
, "auto", or tuple of token ids. Defaults
to "auto" which uses the preprocessor.tokenizer.end_token_id
.
Not specifying a processor will produce an error. None stops
generation after generating max_length
tokens. You may also
specify a list of token id's the model should stop on. Note that
sequences of tokens will each be interpreted as a stop token,
multi-token stop sequences are not supported.save_to_preset
methodSeq2SeqLM.save_to_preset(preset_dir)
Save task to a preset directory.
Arguments
preprocessor
propertykeras_nlp.models.Seq2SeqLM.preprocessor
A keras_nlp.models.Preprocessor
layer used to preprocess input.
backbone
propertykeras_nlp.models.Seq2SeqLM.backbone
A keras_nlp.models.Backbone
model with the core architecture.