Skip to content

solversa/tensorflow-nlp

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

480 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

  • Code has been run on Google Colab which provides free GPU memory

Contents


Text Classification

└── finch/tensorflow2/text_classification/imdb	│	├── data	│ └── glove.840B.300d.txt # pretrained embedding, download and put here	│ └── make_data.ipynb # step 1. make data and vocab: train.txt, test.txt, word.txt	│ └── train.txt # incomplete sample, format <label, text> separated by \t	│ └── test.txt # incomplete sample, format <label, text> separated by \t	│ └── train_bt_part1.txt # (back-translated) incomplete sample, format <label, text> separated by \t	│	├── vocab	│ └── word.txt # incomplete sample, list of words in vocabulary	│	└── main	└── attention_linear.ipynb # step 2: train and evaluate model	└── attention_conv.ipynb # step 2: train and evaluate model	└── fasttext_unigram.ipynb # step 2: train and evaluate model	└── fasttext_bigram.ipynb # step 2: train and evaluate model	└── sliced_rnn.ipynb # step 2: train and evaluate model	└── sliced_rnn_bt.ipynb # step 2: train and evaluate model 

Text Matching

└── finch/tensorflow2/text_matching/snli	│	├── data	│ └── glove.840B.300d.txt # pretrained embedding, download and put here	│ └── download_data.ipynb # step 1. run this to download snli dataset	│ └── make_data.ipynb # step 2. run this to generate train.txt, test.txt, word.txt	│ └── train.txt # incomplete sample, format <label, text1, text2> separated by \t	│ └── test.txt # incomplete sample, format <label, text1, text2> separated by \t	│	├── vocab	│ └── word.txt # incomplete sample, list of words in vocabulary	│	└── main	└── dam.ipynb # step 3. train and evaluate model	└── esim.ipynb # step 3. train and evaluate model	└── ...... 

└── finch/tensorflow2/text_matching/chinese	│	├── data	│ └── make_data.ipynb # step 1. run this to generate char.txt and char.npy	│ └── train.csv # incomplete sample, format <text1, text2, label> separated by comma	│ └── test.csv # incomplete sample, format <text1, text2, label> separated by comma	│	├── vocab	│ └── cc.zh.300.vec # pretrained embedding, download and put here	│ └── char.txt # incomplete sample, list of chinese characters	│ └── char.npy # saved pretrained embedding matrix for this task	│	└── main	└── pyramid.ipynb # step 2. train and evaluate model	└── esim.ipynb # step 2. train and evaluate model	└── ...... 

Topic Modelling


Spoken Language Understanding

└── finch/tensorflow2/spoken_language_understanding/atis	│	├── data	│ └── glove.840B.300d.txt # pretrained embedding, download and put here	│ └── make_data.ipynb # step 1. run this to generate vocab: word.txt, intent.txt, slot.txt	│ └── atis.train.w-intent.iob # incomplete sample, format <text, slot, intent>	│ └── atis.test.w-intent.iob # incomplete sample, format <text, slot, intent>	│	├── vocab	│ └── word.txt # list of words in vocabulary	│ └── intent.txt # list of intents in vocabulary	│ └── slot.txt # list of slots in vocabulary	│	└── main	└── bigru.ipynb # step 2. train and evaluate model	└── bigru_self_attn.ipynb # step 2. train and evaluate model	└── transformer.ipynb # step 2. train and evaluate model	└── transformer_elu.ipynb # step 2. train and evaluate model 

Generative Dialog

└── finch/tensorflow1/free_chat/chinese_qingyun	│	├── data	│ └── raw_data.csv	# raw data downloaded from external	│ └── make_data.ipynb	# step 1. run this to generate vocab {char.txt} and data {train.txt & test.txt}	│ └── train.txt	# processed text file generated by {make_data.ipynb}	│	├── vocab	│ └── char.txt	# list of chars in vocabulary for chinese	│ └── cc.zh.300.vec	# fastText pretrained embedding downloaded from external	│ └── char.npy	# chinese characters and their embedding values (300 dim)	│	└── main	└── lstm_seq2seq_train.ipynb # step 2. train and evaluate model	└── lstm_seq2seq_export.ipynb # step 3. export model	└── lstm_seq2seq_infer.ipynb # step 4. model inference	└── transformer_train.ipynb # step 2. train and evaluate model	└── transformer_export.ipynb # step 3. export model	└── transformer_infer.ipynb # step 4. model inference 
└── FreeChatInference	│	├── data	│ └── transformer_export/	│ └── char.txt	│ └── libtensorflow-1.14.0.jar	│ └── tensorflow_jni.dll	│	└── src └── ModelInference.java 

Semantic Parsing

└── finch/tensorflow2/semantic_parsing/tree_slu	│	├── data	│ └── glove.840B.300d.txt	# pretrained embedding, download and put here	│ └── make_data.ipynb	# step 1. run this to generate vocab: word.txt, intent.txt, slot.txt	│ └── train.tsv	# incomplete sample, format <text, tokenized_text, tree>	│ └── test.tsv	# incomplete sample, format <text, tokenized_text, tree>	│	├── vocab	│ └── source.txt	# list of words in vocabulary for source (of seq2seq)	│ └── target.txt	# list of words in vocabulary for target (of seq2seq)	│	└── main	└── lstm_seq2seq_tf_addons.ipynb # step 2. train and evaluate model	└── ...... 

Knowledge Graph Inference

└── finch/tensorflow2/knowledge_graph_completion/wn18	│	├── data	│ └── download_data.ipynb	# step 1. run this to download wn18 dataset	│ └── make_data.ipynb	# step 2. run this to generate vocabulary: entity.txt, relation.txt	│ └── wn18	# wn18 folder (will be auto created by download_data.ipynb)	│	└── train.txt	# incomplete sample, format <entity1, relation, entity2> separated by \t	│	└── valid.txt	# incomplete sample, format <entity1, relation, entity2> separated by \t	│	└── test.txt	# incomplete sample, format <entity1, relation, entity2> separated by \t	│	├── vocab	│ └── entity.txt	# incomplete sample, list of entities in vocabulary	│ └── relation.txt	# incomplete sample, list of relations in vocabulary	│	└── main	└── distmult_1-N.ipynb	# step 3. train and evaluate model 

Knowledge Graph Tools


Question Answering

└── finch/tensorflow1/question_answering/babi	│	├── data	│ └── make_data.ipynb	# step 1. run this to generate vocabulary: word.txt	│ └── qa5_three-arg-relations_train.txt # one complete example of babi dataset	│ └── qa5_three-arg-relations_test.txt	# one complete example of babi dataset	│	├── vocab	│ └── word.txt	# complete list of words in vocabulary	│	└── main	└── dmn_train.ipynb	└── dmn_serve.ipynb	└── attn_gru_cell.py 

Text Processing Tools


Recommender System

└── finch/tensorflow1/recommender/movielens	│	├── data	│ └── make_data.ipynb	# run this to generate vocabulary	│	├── vocab	│ └── user_job.txt	│ └── user_id.txt	│ └── user_gender.txt	│ └── user_age.txt	│ └── movie_types.txt	│ └── movie_title.txt	│ └── movie_id.txt	│	└── main	└── dnn_softmax.ipynb	└── ...... 

Multi-turn Dialogue Rewriting

└── finch/tensorflow1/multi_turn_rewrite/chinese/	│	├── data	│ └── make_data.ipynb # run this to generate vocab, split train & test data, make pretrained embedding	│ └── corpus.txt	# original data downloaded from external	│ └── train_pos.txt	# processed positive training data after {make_data.ipynb}	│ └── train_neg.txt	# processed negative training data after {make_data.ipynb}	│ └── test_pos.txt	# processed positive testing data after {make_data.ipynb}	│ └── test_neg.txt	# processed negative testing data after {make_data.ipynb}	│	├── vocab	│ └── cc.zh.300.vec	# fastText pretrained embedding downloaded from external	│ └── char.npy	# chinese characters and their embedding values (300 dim)	│ └── char.txt	# list of chinese characters used in this project	│	└── main	└── baseline_lstm_train.ipynb	└── baseline_lstm_export.ipynb	└── baseline_lstm_predict.ipynb 
└── MultiDialogInference	│	├── data	│ └── baseline_lstm_export/	│ └── char.txt	│ └── libtensorflow-1.14.0.jar	│ └── tensorflow_jni.dll	│	└── src └── ModelInference.java 

Knowledge Base Question Answering

About

Building Blocks for NLP and Text Generation in TensorFlow 2.x / 1.x

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 98.2%
  • Python 1.1%
  • Other 0.7%