ViTDetBackbone classkeras_cv.models.ViTDetBackbone(
include_rescaling,
input_shape=(1024, 1024, 3),
input_tensor=None,
patch_size=16,
embed_dim=768,
depth=12,
mlp_dim=3072,
num_heads=12,
out_chans=256,
use_bias=True,
use_abs_pos=True,
use_rel_pos=True,
window_size=14,
global_attention_indices=[2, 5, 8, 11],
layer_norm_epsilon=1e-06,
**kwargs
)
A ViT image encoder that uses a windowed transformer encoder and relative positional encodings.
Arguments
(H, W, C) format. Defaults to (1024, 1024, 3).keras.layers.Input()) to use as image input for the model.
Defaults to None.True, inputs will be passed through a
Rescaling(1/255.0) layer. Defaults to False.16.768.12.768*4.MultiHeadAttentionWithRelativePE layer of each transformer
encoder. Defaults to 12.256.True.True.True.14.[2, 5, 8, 11].1e-6.References
from_preset methodViTDetBackbone.from_preset()
Instantiate ViTDetBackbone model from preset config and weights.
Arguments
None, which follows whether the preset has
pretrained weights available.Examples
# Load architecture and weights from preset
model = keras_cv.models.ViTDetBackbone.from_preset(
"vitdet_base_sa1b",
)
# Load randomly initialized model from preset architecture with weights
model = keras_cv.models.ViTDetBackbone.from_preset(
"vitdet_base_sa1b",
load_weights=False,
| Preset name | Parameters | Description |
|---|---|---|
| vitdet_base | 89.67M | Detectron2 ViT basebone with 12 transformer encoders with embed dim 768 and attention layers with 12 heads with global attention on encoders 2, 5, 8, and 11. |
| vitdet_large | 308.28M | Detectron2 ViT basebone with 24 transformer encoders with embed dim 1024 and attention layers with 16 heads with global attention on encoders 5, 11, 17, and 23. |
| vitdet_huge | 637.03M | Detectron2 ViT basebone model with 32 transformer encoders with embed dim 1280 and attention layers with 16 heads with global attention on encoders 7, 15, 23, and 31. |
| vitdet_base_sa1b | 89.67M | A base Detectron2 ViT backbone trained on the SA1B dataset. |
| vitdet_large_sa1b | 308.28M | A large Detectron2 ViT backbone trained on the SA1B dataset. |
| vitdet_huge_sa1b | 637.03M | A huge Detectron2 ViT backbone trained on the SA1B dataset. |
ViTDetBBackbone classkeras_cv.models.ViTDetBBackbone(
include_rescaling,
input_shape=(1024, 1024, 3),
input_tensor=None,
patch_size=16,
embed_dim=768,
depth=12,
mlp_dim=3072,
num_heads=12,
out_chans=256,
use_bias=True,
use_abs_pos=True,
use_rel_pos=True,
window_size=14,
global_attention_indices=[2, 5, 8, 11],
layer_norm_epsilon=1e-06,
**kwargs
)
VitDetBBackbone model.
Reference
For transfer learning use cases, make sure to read the guide to transfer learning & fine-tuning.
Example
input_data = np.ones(shape=(1, 1024, 1024, 3))
# Randomly initialized backbone
model = VitDetBBackbone()
output = model(input_data)
ViTDetLBackbone classkeras_cv.models.ViTDetLBackbone(
include_rescaling,
input_shape=(1024, 1024, 3),
input_tensor=None,
patch_size=16,
embed_dim=768,
depth=12,
mlp_dim=3072,
num_heads=12,
out_chans=256,
use_bias=True,
use_abs_pos=True,
use_rel_pos=True,
window_size=14,
global_attention_indices=[2, 5, 8, 11],
layer_norm_epsilon=1e-06,
**kwargs
)
VitDetLBackbone model.
Reference
For transfer learning use cases, make sure to read the guide to transfer learning & fine-tuning.
Example
input_data = np.ones(shape=(1, 1024, 1024, 3))
# Randomly initialized backbone
model = VitDetLBackbone()
output = model(input_data)
ViTDetHBackbone classkeras_cv.models.ViTDetHBackbone(
include_rescaling,
input_shape=(1024, 1024, 3),
input_tensor=None,
patch_size=16,
embed_dim=768,
depth=12,
mlp_dim=3072,
num_heads=12,
out_chans=256,
use_bias=True,
use_abs_pos=True,
use_rel_pos=True,
window_size=14,
global_attention_indices=[2, 5, 8, 11],
layer_norm_epsilon=1e-06,
**kwargs
)
VitDetHBackbone model.
Reference
For transfer learning use cases, make sure to read the guide to transfer learning & fine-tuning.
Example
input_data = np.ones(shape=(1, 1024, 1024, 3))
# Randomly initialized backbone
model = VitDetHBackbone()
output = model(input_data)