1

I am trying to read a file containing data for different dates using numpy.genfromtxt() in python3. The file basically looks like

Date,Open,High,Low,Close,Volume 1-Apr-15,108.33,108.66,108.33,108.66,290 

but may contain missing values marked as -.

The following code works fine in python2

str2date = lambda x: datetime.strptime(x, '%d-%b-%y').strftime('%Y-%m-%d') data = np.genfromtxt('test.dat', dtype="S9,f8,f8,f8,f8,f8", delimiter=',', names=True, missing_values='-', converters={0: str2date}) 

but fails in python3 with

UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128) 

locale.getpreferredencoding(False) returns UTF-8 as the default encoding and the suggested solution by setting the encoding for the input stream suggested for example here is a bit tricky. I also tried setting the encoding of the terminal without success. I also have to admit, that I do not see a solution to my problem in this answer as there are no special characters contained in the file -- or at least I do not see them.

How can I solve this issue without stepping back to python2?

4
  • 1
    It seems that genfromtxt falls in ascii mode for undefined reason.... have you tried genfromtxt(open('test.dat', encoding='utf-8'), ... ? or more efficient, pandas.read_csv ? Commented Dec 3, 2017 at 15:11
  • 2
    genfromtxt(open('test.dat', encoding='utf-8')) complains about bytes provides instead of a string. But pandas works like a charm. Thanks :). It you put that in an answer I'll accept it. Commented Dec 3, 2017 at 16:34
  • genfromtxt opens the file in binary mode, and works with bytestrings (Py3). The converters solution in stackoverflow.com/questions/33001373/… doesn't help? Commented Dec 3, 2017 at 17:17
  • I understood that as a workaround for a problematic file name. Which I do not have. Commented Dec 3, 2017 at 17:49

1 Answer 1

0

When I try to reproduce your code I get problems with the date conversion:

Out[405]: b'1-Apr-15' In [406]: str2date(_) --------------------------------------------------------------------------- ... ----> 1 str2date = lambda x: datetime.strptime(x, '%d-%b-%y').strftime('%Y-%m-%d') TypeError: strptime() argument 1 must be str, not bytes 

If I add a decode:

def foo(x): return str2date(x.decode()) 

the converter handles the byte string that genfromtxt insists on providing.

In [410]: data = np.genfromtxt('stack47619155.txt', dtype="S9,f8,f8,f8,f8,f8", ...: delimiter=',', names=True, missing_values='-', converters={0: foo}) In [411]: data Out[411]: array([(b'2015-04-0', 108.33, 108.66, 108.33, 108.66, 290.), (b'2015-04-0', nan, 108.66, nan, 108.66, 290.), (b'2015-04-0', 108.33, 108.66, 108.33, 108.66, nan)], dtype=[('Date', 'S9'), ('Open', '<f8'), ('High', '<f8'), ('Low', '<f8'), ('Close', '<f8'), ('Volume', '<f8')]) In [412]: data = np.genfromtxt('stack47619155.txt', dtype="U9,f8,f8,f8,f8,f8", ...: delimiter=',', names=True, missing_values='-', converters={0: foo}) In [413]: data Out[413]: array([('2015-04-0', 108.33, 108.66, 108.33, 108.66, 290.), ('2015-04-0', nan, 108.66, nan, 108.66, 290.), ('2015-04-0', 108.33, 108.66, 108.33, 108.66, nan)], dtype=[('Date', '<U9'), ('Open', '<f8'), ('High', '<f8'), ('Low', '<f8'), ('Close', '<f8'), ('Volume', '<f8')]) 

It's a different error, so I may have used a different - as the missing field marker or not.

You found my post from a couple of years ago with a decode in the converters:

Loading UTF-8 file in Python 3 using numpy.genfromtxt

Sign up to request clarification or add additional context in comments.

3 Comments

The decode() is required in python2. For python3 it throws an error (on my system). The conversion itself without decode() runs fine there print(str2date('1-Apr-15')).
How about print(str2date(b'1-Apr-15'))?
Then I need the decode, true. But genfromtxt still fails with the ascii problem.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.