- Notifications
You must be signed in to change notification settings - Fork 5.4k
Word Embeddings #125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Word Embeddings #125
Conversation
Created a section for the Introduction to word embeddings, and also started writing the tutorial for creating and Embedding layer
…nd training for the second model; plotted training history for both
Thank you for starting this! Here's a round of edits. I think it's almost ready to go. Could you take a look and see if there's anything we can improve in this version? When you're happy with it, please submit a PR to the TF Docs repo, and we can continue refining there. I'd like to add some graphics of the embedding projector before we publish, add more references to educational resources, and improve the intro to embeddings before we publish, but we can work on those changes in the docs PR whenever you're ready. Thanks again!
| So there's good news and bad news. 👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there. 😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request. Note to project maintainer: This is a terminal state, meaning the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the .ipynb_checkpoints/ directory from the PR.
I'd prefer to avoid the use of Flatten because it ties your model to a specific sequence length. The model has to learn the meaning of the embedding at each location.
I would use a GlobalAveragePooling, an RNN, CNN, or an attention module to squeeze the sequence to a fixes size and enforce the idea that the absolute position of a word in a sequence doesn't matter (much).
There is a lot of overlap between this and basic_text_classification.ipynb. The new content seems to be mainly be the setup to use the tensorboard embedding projector. Is there a way we can put more emphasis on that, and reduce the duplication? should this maybe a sub-section of basic_text_classification.ipynb?
Could we add an inline plot for people who are only running in colab?
| Thanks for the feedback! Good comments. Thinking about this more, how does the following sound?
From there, we can include a few diagrams helping to communicate the basic idea of embeddings. WDYT? |
| I will review the comments above and implement in a little bit. Thanks guys. |
| @random-forests do you need to provide consent for the CLA above? |
| @random-forests kindly take a look at the recent updates. Thanks |
| @MarkDaoust kindly review recent changes, thanks. |
| Hi Robert, thanks again for working on this! I will circle back in a week or so (been OOO for a bit, please stay tuned). |
| I'm only a few months late here (wowzers, my bad!) I'm going to close this PR out, and work on a new version for a bit. Will definitely credit you when we get it out! Thanks again for your help, and sorry again for the delay. |
This is the first version of our Keras embeddings tutorial.
https://colab.sandbox.google.com/github/securetorobert/docs/blob/master/site/en/tutorials/keras/intro_word_embeddings.ipynb