ColossalAIOptimWrapper¶
- class mmengine._strategy.colossalai.ColossalAIOptimWrapper(optimizer, booster=None, accumulative_counts=1)[源代码]¶
OptimWrapper for ColossalAI.
- The available optimizers are:
CPUAdam
FusedAdam
FusedLAMB
FusedSGD
HybridAdam
Lamb
Lars
You can find more details in the colossalai tutorial
- 参数:
optimizer (dict or torch.optim.Optimizer) – The optimizer to be wrapped.
accumulative_counts (int) – The number of iterations to accumulate gradients. The parameters will be updated per
accumulative_counts.booster (None) –
- backward(loss, **kwargs)[源代码]¶
Perform gradient back propagation.
Provide unified
backwardinterface compatible with automatic mixed precision training. Subclass can overload this method to implement the required logic. For example,torch.cuda.amprequire some extra operation on GradScaler during backward process.备注
If subclasses inherit from
OptimWrapperoverridebackward,_inner_count +=1must be implemented.- 参数:
loss (torch.Tensor) – The loss of current iteration.
kwargs – Keyword arguments passed to
torch.Tensor.backward().
- 返回类型:
None
- optim_context(model)[源代码]¶
A Context for gradient accumulation and automatic mix precision training.
If subclasses need to enable the context for mix precision training, e.g.,
:class:`AmpOptimWrapper, the corresponding context should be enabled in optim_context. SinceOptimWrapperuses default fp32 training,optim_contextwill only enable the context for blocking the unnecessary gradient synchronization during gradient accumulationIf model is an instance with
no_syncmethod (which means blocking the gradient synchronization) andself._accumulative_counts != 1. The model will not automatically synchronize gradients ifcur_iteris divisible byself._accumulative_counts. Otherwise, this method will enable an empty context.- 参数:
model (nn.Module) – The training model.