Skip to main content
5 events
when toggle format what by license comment
Feb 28, 2023 at 20:09 comment added noe You save the memory of the one-hot vector with dimension 190000 x seq.length x batch size.
Feb 28, 2023 at 19:45 comment added acxcv Sorry, I don't think I can follow. I don't understand how your suggestion (having a one-hot vector multiplied by a matrix) differs from the linear layer approach in my question? Can you elaborate?
Feb 28, 2023 at 19:33 comment added noe I meant trainable embeddings, not pre-trained ones. Having a one-hot vector multiplied by a matrix is equivalent to having an embedding layer, but without the memory spent on the one-hot vectors.
Feb 28, 2023 at 19:19 comment added acxcv Thanks for your answer. As part of the project, I will be using Word2Vec embeddings, which I expect to produce the best results. However, I would like to investigate the difference in performance compared to a naive encoding like one-hot. The problem is that one-hot encoding with this dimensionality becomes too expensive with my data, even with very small batches.
Feb 28, 2023 at 18:56 history answered noe CC BY-SA 4.0