I have used keras to use pre-trained word embeddings but I am not quite sure how to do it on scikit-learn model.
I need to do this in sklearn as well because I am using vecstack to ensemble both keras sequential model and sklearn model.
This is what I have done for keras model:
glove_dir = '/home/Documents/Glove' embeddings_index = {} f = open(os.path.join(glove_dir, 'glove.6B.200d.txt'), 'r', encoding='utf-8') for line in f: values = line.split() word = values[0] coefs = np.asarray(values[1:], dtype='float32') embeddings_index[word] = coefs f.close() embedding_dim = 200 embedding_matrix = np.zeros((max_words, embedding_dim)) for word, i in word_index.items(): if i < max_words: embedding_vector = embeddings_index.get(word) if embedding_vector is not None: embedding_matrix[i] = embedding_vector model = Sequential() model.add(Embedding(max_words, embedding_dim, input_length=maxlen)) . . model.layers[0].set_weights([embedding_matrix]) model.layers[0].trainable = False model.compile(----) model.fit(-----) I am very new to scikit-learn, from what I have seen to make an model in sklearn you do:
lr = LogisticRegression() lr.fit(X_train, y_train) lr.predict(x_test) So, my question is how do I use pre-trained Glove with this model? where do I pass the pre-trained glove embedding_matrix
Thank you very much and I really appreciate your help.
sklearn, best with formula and/or descriptive diagram.