2
$\begingroup$

I am trying to apply open() function in keras to use Google news-vectors-negative300.bin which is a pre-trained file via word2vec such as GloVe, but after downloading GloVe it contains 4 files with txt prefix vs the Google news-vectors-negative300.bin folder contains a file with binary prefix namely 'data' which is 3.4 GB. I write the commands on ubuntu 17.10 via keras with tensorflow backend on spyder with python 3.5, and after implementing the command it gave me this error:

File "/home/mary/anaconda3/envs/virenv/lib/python3.5/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x94 in position 19: invalid start byte. 

the written code is as follow: f = open('data').

I have already implemented the same code successfully when I applied f = open('glove.6B.100d.txt').

What is the main problem?

$\endgroup$
1
  • $\begingroup$ Have you seen the answers here. $\endgroup$ Commented Jun 17, 2018 at 4:35

1 Answer 1

2
$\begingroup$

I have searched about it and fixed the error through these steps: you should load the "GoogleNews-vectors-negative300.bin.gz" file at first then extract it by this command in Ubuntu: gunzip -k GoogleNews-vectors-negative300.bin.gz. [ manually extracting is never recommended]. Secondly, you should apply these commands in python 3:

import gensim model = gensim.models.Word2Vec.load_word2vec_format('./model/GoogleNews-vectors-negative300.bin', binary=True) 

I hope it will be useful.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.