In particular, what is the complexity of a bi-directional recurrent neural network taking into account the variants of LSTM and GRU as well for training?
I am hoping if I can get links to some additional research papers which talk or have mentioned the computational complexity of these methods in their works. I have been searching, but haven't come across anything meaningful till now