Questions tagged [seq2seq]
For questions related to sequence-to-sequence (seq2seq) machine learning models/architectures, used e.g. in machine translation.
32 questions
0 votes
0 answers
45 views
Best neural network algorithms/architectures for generating synthetic sequences of tuples of words
I would like to generate sequences of tuples using a neural network algorithm such that the model trains on a dataset of sequences of tuples and generates synthetic sequences of tuples. Each tuple <...
0 votes
1 answer
90 views
Probability interpretation of attention mechanism in Seq2Seq
I have ready many explanations of the seq2seq model. In my opinion, however, it is really like a robot that might say something correctly, but doesn't really understand it, just as is true with an LLM ...
1 vote
0 answers
122 views
How to Interpret Cross Attention
I am a bit confused on what cross attention mechanisms are doing. I understand that the currently decoded output is usually the query and the conditioning/input (from an encoder) is the key and value. ...
7 votes
2 answers
3k views
What are the differences between seq2seq and encoder-decoder architectures?
I've read many tutorials online that use both words interchangeably. When I search and find that they are the same, why not just use one word since they have the same definition?
1 vote
0 answers
79 views
Modifying Cross Entropy Loss to work with multiple correct target sequences?
Let's say I'm training a transformer model to perform a seq to seq task, but there are multiple correct answers. For example, the following outputs would all be considered correct: source: A B C -> ...
0 votes
1 answer
114 views
How to train a seq2seq model to rephrase input text following given rules
I want to train (fine-tune) a seq2seq model to perform the task of rephrasing input following these rules : 1- always follow the pattern "Entity Verb Entity" 2- only use simple sentences : ...
0 votes
1 answer
104 views
How does GPT like Decoder only conversational models distunguish the source of text?
In a conversational setting where two sources of text (user and the model) follow each other like below User: some text bla bla Model: another text bah bah User: bla bla bla Model: bah bah and so on, ...
0 votes
1 answer
58 views
Seq2Seq model- Confusing about the dimension of Seq2Seq model [closed]
I am new to Seq2Seq and hope to find a proper guildances, advices. I am doing a Project from an online course so I can not give the material but I got my Project notebook on Github I want to ask ...
1 vote
0 answers
68 views
The model's accuracy becomes suddenly so unreasonably good at beginning of the training process. I need an explaination
I am practicing machine translation using seq2seq model (more specifically with GRU/LSTM units). The following is my first model: This model first archived about 0.03 accuracy score and gradually ...
0 votes
1 answer
122 views
Seq2seq with RNNs, how is the training loop performed?
How do we train a seq2seq rnn training? We input a sentence that needs to be translated. We encode it sequentially. Then the first decoder outputs the first word with probabilities. We do a gradient ...
1 vote
1 answer
96 views
Why is it called a Seq2Seq model if the output is just a number?
Why is it called a Seq2Seq model if the output is just a number? For example, if you are trying to predict a movie's recommendation, and you are inputting a ...
0 votes
1 answer
513 views
Fine Tuning Transformer Model for Machine Translation
I am working on the Transformer example demonstrated on TensorFlow's website. https://www.tensorflow.org/text/tutorials/transformer In this example, Machine Translation model is trained to translate ...
1 vote
1 answer
308 views
Is the decoder in a transformer Seq2Seq model non parallelizable?
From my understanding, seq2seq models work by first computing a representation of the input sequence, and feeding this to the decoder. The decoder then predicts each token in the output sequence in an ...
3 votes
0 answers
2k views
Any models for text to json
There are many sequence to sequence (seq2seq) models and end to end models, like text to sql. I was wondering are there any text to json deep learning models? For example: Text ...
1 vote
2 answers
166 views
How does Seq2Seq with attention actually use the attention (i.e. the context vector)?
For neural machine translation, there's this model "Seq2Seq with attention", also known as the "Bahdanau architecture" (a good image can be found on this page), where instead of ...