Skip to main content
replaced link to another SE site with a more reliable research paper
Source Link
nbro
  • 43.1k
  • 14
  • 121
  • 222
  • $i_t$ (the input gate: the gate that regulates the input into the unit/layer),
  • $o_t$ (the output gate: the gate that regulates the output from the unit)
  • $f_t$ (the forget gate: the forget gate that regulates what the cell should forget)

RNNs are particularly suited for tasks that involve sequences (because ofthanks to the recurrent connections). For example, they are often used for machine translation, where the sequences are sentences or words. In practice, an LSTM is often used, as opposed to a vanilla (or standard) RNN, because it is more computationally effective. In fact, the LSTM was introduced to solve a problem that standard RNNs suffer from, i.e. the vanishing gradient problemvanishing gradient problem. (Now, for these tasks, there are also the transformers, but the question was not about them).

  • $i_t$: the gate that regulates the input into the unit/layer,
  • $o_t$: the gate that regulates the output from the unit
  • $f_t$: the forget gate that regulates what the cell should forget

RNNs are particularly suited for tasks that involve sequences (because of the recurrent connections). For example, they are often used for machine translation, where the sequences are sentences or words. In practice, an LSTM is often used, as opposed to a vanilla (or standard) RNN, because it is more computationally effective. In fact, the LSTM was introduced to solve a problem that standard RNNs suffer from, i.e. the vanishing gradient problem.

  • $i_t$ (the input gate: the gate that regulates the input into the unit/layer),
  • $o_t$ (the output gate: the gate that regulates the output from the unit)
  • $f_t$ (the forget gate: the gate that regulates what the cell should forget)

RNNs are particularly suited for tasks that involve sequences (thanks to the recurrent connections). For example, they are often used for machine translation, where the sequences are sentences or words. In practice, an LSTM is often used, as opposed to a vanilla (or standard) RNN, because it is more computationally effective. In fact, the LSTM was introduced to solve a problem that standard RNNs suffer from, i.e. the vanishing gradient problem. (Now, for these tasks, there are also the transformers, but the question was not about them).

given the length of the answer, I decided to structure it with some sections
Source Link
nbro
  • 43.1k
  • 14
  • 121
  • 222

RNNs have recurrent connections and/or layers

You can describe a recurrent neural network (RNN) or a long short-term memory (LSTM), depending on the context, at different levels of abstraction. For example, you could say that an RNN is any neural network that contains one or more recurrent recurrent (or cycliccyclic) connectionsconnections. Or you could say that layer $l$ of neural network $N$ is a recurrent layerrecurrent layer, given that it contains units (or neurons) with recurrent connections, but $N$ may not contain only recurrent layers (for example, it may also be composed of feedforward layers, i.e. layers with units that contain only feedforward connections).

In any case, a recurrent neural network is almost always described as a neural network (NN) and not as a layer (this should also be obvious from the name).

LSTM can refer to a unit, layer or neural network

On the other hand, an LSTMdepending on the context, the term "LSTM" alone can refer to an LSTM unit (or neuron), an LSTM layer (many

  • LSTM unit (or neuron),
  • an LSTM layer (many LSTM units), or
  • an LSTM neural network (a neural network with LSTM units or layers).

People may also refer to neural networks with LSTM units), or an LSTM neural network as LSTMs (an NN withplural version of LSTM units or layers), depending on the context.

LSTMs are RNNs

An LSTM unit is a recurrent unit, that is, a unit (or neuron) that contains cyclic connections, so an LSTM network is a recurrent networkan LSTM neural network is a recurrent neural network (RNN).

LSTM units/neurons

The main difference between an LSTM unit and a standard RNN unit is that the LSTM unit is more sophisticatedmore sophisticated. More precisely, it is composed of the so-called gatesgates that supposedly regulate better the flow of information through the unit.

Here's a typical representation (or diagram) of an LSTM (more precisely, an LSTM with a so-called peephole connectionpeephole connection).

This can actually represent both an LSTM unit (and, in that case, the variables are scalars) or an LSTM layer (and, in that case, the variables are vectors or matrices). 

You can easily see from this diagram that an LSTM unit (or layer) is composed of gatesgates, denoted by

  • $i_t$: the gate that regulates the input into the unit/layer,
  • $o_t$: the gate that regulates the output from the unit
  • $f_t$: the forget gate that regulates what the cell should forget

and recurrent connections (e.g. the connection from the cell into the forget gate and vice-versa).

It's also composed of a cellcell, which is the only thing that a neuron of a "vanilla" RNN contains. 

To understand the details (i.e. the purpose of all these components, such as the gates), you should e.g.could read the paper that originally proposed the LSTM by S. Hochreiter and J. Schmidhuber. However, there may be other more accessible and understandable papers, articles or video lessons on the topic, which you can find on the web.

LSTMs also have recurrent connections!

Given the presence of cyclic connections, anyany recurrent neural network (either an LSTM or not) may be represented as a graph that contains one or more cyclic connections. For example, the following diagram may represent both a standard/vanilla RNN or an LSTM neural network (or maybe a variant of it, e.g. the GRU).

When should you use RNNs and LSTMs?

You can describe a recurrent neural network (RNN) or a long short-term memory (LSTM), depending on the context, at different levels of abstraction. For example, you could say that an RNN is any neural network that contains one or more recurrent (or cyclic) connections. Or you could say that layer $l$ of neural network $N$ is a recurrent layer, given that it contains units (or neurons) with recurrent connections, but $N$ may not contain only recurrent layers (for example, it may also be composed of feedforward layers, i.e. layers with units that contain only feedforward connections).

In any case, a recurrent neural network is almost always described as a neural network (NN) and not as a layer (this should also be obvious from the name). On the other hand, an LSTM can refer to an LSTM unit (or neuron), an LSTM layer (many LSTM units), or an LSTM neural network (an NN with LSTM units or layers), depending on the context.

An LSTM unit is a recurrent unit, that is, a unit (or neuron) that contains cyclic connections, so an LSTM network is a recurrent network. The main difference between an LSTM unit and a standard RNN unit is that the LSTM unit is more sophisticated. More precisely, it is composed of the so-called gates that supposedly regulate better the flow of information through the unit.

Here's a typical representation (or diagram) of an LSTM (more precisely, an LSTM with a so-called peephole connection).

This can actually represent both an LSTM unit (and, in that case, the variables are scalars) or an LSTM layer (and, in that case, the variables are vectors or matrices). You can easily see from this diagram that an LSTM unit (or layer) is composed of gates and recurrent connections. It's also composed of a cell. To understand the details (i.e. the purpose of all these components, such as the gates), you should e.g. read the paper that originally proposed the LSTM by S. Hochreiter and J. Schmidhuber. However, there may be other more accessible and understandable papers.

Given the presence of cyclic connections, any recurrent neural network (either an LSTM or not) may be represented as a graph that contains one or more cyclic connections. For example, the following diagram may represent both a standard RNN or an LSTM network (or maybe a variant of it, e.g. the GRU).

RNNs have recurrent connections and/or layers

You can describe a recurrent neural network (RNN) or a long short-term memory (LSTM), depending on the context, at different levels of abstraction. For example, you could say that an RNN is any neural network that contains one or more recurrent (or cyclic) connections. Or you could say that layer $l$ of neural network $N$ is a recurrent layer, given that it contains units (or neurons) with recurrent connections, but $N$ may not contain only recurrent layers (for example, it may also be composed of feedforward layers, i.e. layers with units that contain only feedforward connections).

In any case, a recurrent neural network is almost always described as a neural network (NN) and not as a layer (this should also be obvious from the name).

LSTM can refer to a unit, layer or neural network

On the other hand, depending on the context, the term "LSTM" alone can refer to an

  • LSTM unit (or neuron),
  • an LSTM layer (many LSTM units), or
  • an LSTM neural network (a neural network with LSTM units or layers).

People may also refer to neural networks with LSTM units as LSTMs (plural version of LSTM).

LSTMs are RNNs

An LSTM unit is a recurrent unit, that is, a unit (or neuron) that contains cyclic connections, so an LSTM neural network is a recurrent neural network (RNN).

LSTM units/neurons

The main difference between an LSTM unit and a standard RNN unit is that the LSTM unit is more sophisticated. More precisely, it is composed of the so-called gates that supposedly regulate better the flow of information through the unit.

Here's a typical representation (or diagram) of an LSTM (more precisely, an LSTM with a so-called peephole connection).

This can actually represent both an LSTM unit (and, in that case, the variables are scalars) or an LSTM layer (and, in that case, the variables are vectors or matrices). 

You can see from this diagram that an LSTM unit (or layer) is composed of gates, denoted by

  • $i_t$: the gate that regulates the input into the unit/layer,
  • $o_t$: the gate that regulates the output from the unit
  • $f_t$: the forget gate that regulates what the cell should forget

and recurrent connections (e.g. the connection from the cell into the forget gate and vice-versa).

It's also composed of a cell, which is the only thing that a neuron of a "vanilla" RNN contains. 

To understand the details (i.e. the purpose of all these components, such as the gates), you could read the paper that originally proposed the LSTM by S. Hochreiter and J. Schmidhuber. However, there may be other more accessible and understandable papers, articles or video lessons on the topic, which you can find on the web.

LSTMs also have recurrent connections!

Given the presence of cyclic connections, any recurrent neural network (either an LSTM or not) may be represented as a graph that contains one or more cyclic connections. For example, the following diagram may represent both a standard/vanilla RNN or an LSTM neural network (or maybe a variant of it, e.g. the GRU).

When should you use RNNs and LSTMs?

added 14 characters in body
Source Link
nbro
  • 43.1k
  • 14
  • 121
  • 222

You can describe a recurrent neural network (RNN) or a long short-term memory (LSTM), depending on the context, at different levels of abstraction. For example, you could say that an RNN is any neural network that contains one or more recurrent (or cyclic) connections. Or you could say that layer $l$ of neural network $N$ is a recurrent layer, given that it contains units (or neurons) with recurrent connections, but $N$ may not contain only recurrent layers (for example, it may also be composed of feedforward layers, i.e. layers with units that contain only feedforward connections).

In any case, a recurrent neural network is almost always described as a neural network (NN) and not as a layer (this should also be obvious from the name). On the other hand, an LSTM can refer to an LSTM unit (or neuron), an LSTM layer (many LSTM units), or an LSTM neural network (an NN with LSTM units or layers), depending on the context.

An LSTM unit is a recurrent unit, that is, a unit (or neuron) that contains cyclic connections, so an LSTM network is a recurrent network. The main difference between an LSTM unit and a standard RNN unit is that the LSTM unit is more sophisticated. More precisely, it is composed of the so-called gates that supposedly regulate better the flow of information through the unit.

Here's a typical representation (or diagram) of an LSTM (more precisely, an LSTM with a so-called peephole connection).

enter image description here

This can actually represent both an LSTM unit (and, in that case, the variables are scalars) or an LSTM layer (and, in that case, the variables are vectors or matrices). You can easily see from this diagram that an LSTM unit (or layer) is composed of gates and recurrent connections. It's also composed of a cell. To understand the details (i.e. the purpose of all these components, such as the gates), you should e.g. read the paper that originally proposed the LSTM by S. Hochreiter and J. Schmidhuber. However, there may be other more accessible and understandable papers.

OnGiven the other handpresence of cyclic connections, any recurrent neural network (either an LSTM or not) may be represented as a graph that contains one or more cyclic connections. For example, the following diagram may represent both a standard RNN or an LSTM network (or maybe a variant of it, e.g. the GRU).

enter image description here

To conclude, any recurrent network isRNNs are particularly suited for tasks that involve sequences (because of the recurrent connections). For example, they are often used for machine translation, where the sequences are sentences or words. In practice, an LSTM is often used, as opposed to a vanilla (or standard) RNN, because it often works betteris more computationally effective. In fact, the LSTM was introduced to solve a problem that standard RNNs suffer from, i.e. the vanishing gradient problem.

You can describe a recurrent neural network (RNN) or a long short-term memory (LSTM), depending on the context, at different levels of abstraction. For example, you could say that an RNN is any neural network that contains one or more recurrent (or cyclic) connections. Or you could say that layer $l$ of neural network $N$ is a recurrent layer, given that it contains units (or neurons) with recurrent connections, but $N$ may not contain only recurrent layers (for example, it may also be composed of feedforward layers, i.e. layers with units that contain only feedforward connections).

In any case, a recurrent neural network is almost always described as a neural network (NN) and not as a layer (this should also be obvious from the name). On the other hand, an LSTM can refer to an LSTM unit (or neuron), an LSTM layer (many LSTM units) or an LSTM neural network (an NN with LSTM units or layers), depending on the context.

An LSTM unit is a recurrent unit, that is, a unit (or neuron) that contains cyclic connections, so an LSTM network is a recurrent network. The main difference between an LSTM unit and a standard RNN unit is that the LSTM unit is more sophisticated. More precisely, it is composed of the so-called gates that supposedly regulate better the flow of information through the unit.

Here's a typical representation (or diagram) of an LSTM (more precisely, an LSTM with a so-called peephole connection).

enter image description here

This can actually represent both an LSTM unit (and, in that case, the variables are scalars) or an LSTM layer (and, in that case, the variables are vectors or matrices). You can easily see from this diagram that an LSTM unit (or layer) is composed of gates and recurrent connections. It's also composed of a cell. To understand the details (i.e. the purpose of all these components, such as the gates), you should e.g. read the paper that originally proposed the LSTM by S. Hochreiter and J. Schmidhuber. However, there may be more accessible and understandable papers.

On the other hand, any recurrent neural network (either an LSTM or not) may be represented as a graph that contains one or more cyclic connections. For example, the following diagram may represent both a standard RNN or an LSTM network (or maybe a variant of it, e.g. the GRU).

enter image description here

To conclude, any recurrent network is particularly suited for tasks that involve sequences (because of the recurrent connections). For example, they are often used for machine translation, where the sequences are sentences or words. In practice, an LSTM is often used, as opposed to a vanilla (or standard) RNN, because it often works better. In fact, the LSTM was introduced to solve a problem that standard RNNs suffer from, i.e. the vanishing gradient problem.

You can describe a recurrent neural network (RNN) or a long short-term memory (LSTM), depending on the context, at different levels of abstraction. For example, you could say that an RNN is any neural network that contains one or more recurrent (or cyclic) connections. Or you could say that layer $l$ of neural network $N$ is a recurrent layer, given that it contains units (or neurons) with recurrent connections, but $N$ may not contain only recurrent layers (for example, it may also be composed of feedforward layers, i.e. layers with units that contain only feedforward connections).

In any case, a recurrent neural network is almost always described as a neural network (NN) and not as a layer (this should also be obvious from the name). On the other hand, an LSTM can refer to an LSTM unit (or neuron), an LSTM layer (many LSTM units), or an LSTM neural network (an NN with LSTM units or layers), depending on the context.

An LSTM unit is a recurrent unit, that is, a unit (or neuron) that contains cyclic connections, so an LSTM network is a recurrent network. The main difference between an LSTM unit and a standard RNN unit is that the LSTM unit is more sophisticated. More precisely, it is composed of the so-called gates that supposedly regulate better the flow of information through the unit.

Here's a typical representation (or diagram) of an LSTM (more precisely, an LSTM with a so-called peephole connection).

enter image description here

This can actually represent both an LSTM unit (and, in that case, the variables are scalars) or an LSTM layer (and, in that case, the variables are vectors or matrices). You can easily see from this diagram that an LSTM unit (or layer) is composed of gates and recurrent connections. It's also composed of a cell. To understand the details (i.e. the purpose of all these components, such as the gates), you should e.g. read the paper that originally proposed the LSTM by S. Hochreiter and J. Schmidhuber. However, there may be other more accessible and understandable papers.

Given the presence of cyclic connections, any recurrent neural network (either an LSTM or not) may be represented as a graph that contains one or more cyclic connections. For example, the following diagram may represent both a standard RNN or an LSTM network (or maybe a variant of it, e.g. the GRU).

enter image description here

RNNs are particularly suited for tasks that involve sequences (because of the recurrent connections). For example, they are often used for machine translation, where the sequences are sentences or words. In practice, an LSTM is often used, as opposed to a vanilla (or standard) RNN, because it is more computationally effective. In fact, the LSTM was introduced to solve a problem that standard RNNs suffer from, i.e. the vanishing gradient problem.

added 39 characters in body
Source Link
nbro
  • 43.1k
  • 14
  • 121
  • 222
Loading
Source Link
nbro
  • 43.1k
  • 14
  • 121
  • 222
Loading