how to use (read) google pre-trained word2vec model file?

Question

I am trying to apply open() function in keras to use Google news-vectors-negative300.bin which is a pre-trained file via word2vec such as GloVe, but after downloading GloVe it contains 4 files with txt prefix vs the Google news-vectors-negative300.bin folder contains a file with binary prefix namely 'data' which is 3.4 GB. I write the commands on ubuntu 17.10 via keras with tensorflow backend on spyder with python 3.5, and after implementing the command it gave me this error:

File "/home/mary/anaconda3/envs/virenv/lib/python3.5/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x94 in position 19: invalid start byte.

the written code is as follow: f = open('data').

I have already implemented the same code successfully when I applied f = open('glove.6B.100d.txt').

What is the main problem?

$\begingroup$ Have you seen the answers here. $\endgroup$

Green Falcon
– Green Falcon

2018-06-17 04:35:31 +00:00
Commented Jun 17, 2018 at 4:35 — Green Falcon
– Green Falcon, Commented Jun 17, 2018 at 4:35

Joe B · Accepted Answer · 2019-05-02 02:56:17Z

I have searched about it and fixed the error through these steps: you should load the "GoogleNews-vectors-negative300.bin.gz" file at first then extract it by this command in Ubuntu: gunzip -k GoogleNews-vectors-negative300.bin.gz. [ manually extracting is never recommended]. Secondly, you should apply these commands in python 3:

import gensim model = gensim.models.Word2Vec.load_word2vec_format('./model/GoogleNews-vectors-negative300.bin', binary=True)

I hope it will be useful.

Stack Exchange Network

how to use (read) google pre-trained word2vec model file?

1 Answer 1

Hot Network Questions

how to use (read) google pre-trained word2vec model file?

1 Answer 1

Related

Hot Network Questions