Skip to main content
added 155 characters in body
Source Link
noe
  • 28.4k
  • 1
  • 49
  • 85

The computational complexity of simple single-layer recurrent networks, either vanilla RNNs, LSTMs or GRUs is linear with the length of the input sequence, both at training time and inference time, so $O(n)$, where $n$ is the length of the input sequence. This is because in order to get the last time step output, you need to compute all the previous ones.

This is assuming that there is a single output. If there are multiple output time steps, then it is linear on the sum of both input and output lengths.

Take into account that, inside LSTMs and GRUS there are internal steps that account for a multiplication by a constant in the complexity.

You can complicate the network architecture in many different ways (more layers, skip connections, etc), and this can affect its computational complexity. Here you can find an in-depth study of the computational complexity of different architectural variations.

The computational complexity of simple single-layer recurrent networks, either vanilla RNNs, LSTMs or GRUs is linear with the length of the input sequence, both at training time and inference time, so $O(n)$, where $n$ is the length of the input sequence. This is because in order to get the last time step output, you need to compute all the previous ones.

Take into account that, inside LSTMs and GRUS there are internal steps that account for a multiplication by a constant in the complexity.

You can complicate the network architecture in many different ways (more layers, skip connections, etc), and this can affect its computational complexity. Here you can find an in-depth study of the computational complexity of different architectural variations.

The computational complexity of simple single-layer recurrent networks, either vanilla RNNs, LSTMs or GRUs is linear with the length of the input sequence, both at training time and inference time, so $O(n)$, where $n$ is the length of the input sequence. This is because in order to get the last time step output, you need to compute all the previous ones.

This is assuming that there is a single output. If there are multiple output time steps, then it is linear on the sum of both input and output lengths.

Take into account that, inside LSTMs and GRUS there are internal steps that account for a multiplication by a constant in the complexity.

You can complicate the network architecture in many different ways (more layers, skip connections, etc), and this can affect its computational complexity. Here you can find an in-depth study of the computational complexity of different architectural variations.

Source Link
noe
  • 28.4k
  • 1
  • 49
  • 85

The computational complexity of simple single-layer recurrent networks, either vanilla RNNs, LSTMs or GRUs is linear with the length of the input sequence, both at training time and inference time, so $O(n)$, where $n$ is the length of the input sequence. This is because in order to get the last time step output, you need to compute all the previous ones.

Take into account that, inside LSTMs and GRUS there are internal steps that account for a multiplication by a constant in the complexity.

You can complicate the network architecture in many different ways (more layers, skip connections, etc), and this can affect its computational complexity. Here you can find an in-depth study of the computational complexity of different architectural variations.