I'm training a speech recognition model using the Nvidia Nemo framework. Just results with the small fastconformer model and two dozen iterations are pretty good; for my data I would say they are quite amazing.
However, I have noticed something strange about validation loss: it is doing zigzags which I would think normal, only each zig and zag consists of several epochs. It looks like this:
Training is conducted with https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_ctc/speech_to_text_ctc_bpe.py, the model is https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/fastconformer/fast-conformer_ctc_bpe.yaml with gradient accumulation steps increased to 16.
What could be the reason? Is this normal? Can this be avoided somehow?
