TransformerEncoder
classkeras_nlp.layers.TransformerEncoder(
intermediate_dim,
num_heads,
dropout=0,
activation="relu",
layer_norm_epsilon=1e-05,
kernel_initializer="glorot_uniform",
bias_initializer="zeros",
normalize_first=False,
**kwargs
)
Transformer encoder.
This class follows the architecture of the transformer encoder layer in the paper Attention is All You Need. Users can instantiate multiple instances of this class to stack up an encoder.
This layer will correctly compute an attention mask from an implicit
Keras padding mask (for example, by passing mask_zero=True
to a
keras.layers.Embedding
layer). See the Masking and Padding
guide
for more details.
Arguments
keras.layers.MultiHeadAttention
layer.keras.layers.MultiHeadAttention
and feedforward network.
Defaults to 0.
.keras.activations
. the
activation function of feedforward network.
Defaults to "relu"
.1e-5
.keras.initializers
initializer.
The kernel initializer for the dense and multiheaded
attention layers. Defaults to "glorot_uniform"
.keras.initializers
initializer.
The bias initializer for the dense and multiheaded
attention layers. Defaults to "zeros"
.False
.keras.layers.Layer
,
including name
, trainable
, dtype
etc.Example
# Create a single transformer encoder layer.
encoder = keras_nlp.layers.TransformerEncoder(
intermediate_dim=64, num_heads=8)
# Create a simple model containing the encoder.
input = keras.Input(shape=(10, 64))
output = encoder(input)
model = keras.Model(inputs=input, outputs=output)
# Call encoder on the inputs.
input_data = np.random.uniform(size=(2, 10, 64))
output = model(input_data)
References
call
methodTransformerEncoder.call(
inputs, padding_mask=None, attention_mask=None, training=None
)
Forward pass of the TransformerEncoder.
Arguments
padding_mask
should have shape [batch_size, sequence_length].attention_mask
should have shape
[batch_size, sequence_length, sequence_length].Returns
A Tensor of the same shape as the inputs
.