I have a (windows) text file reported by linux as being a:
ISO-8859 text, with very long lines, with CRLF line terminators I want to read this into numpy, except the first line which contains labels (with special characters, usually only the greek mu).
Python 2.7.6, Numpy 1.8.0, this works perfectly:
data = np.loadtxt('input_file.txt', skiprows=1) Python 3.4.0, Numpy 1.8.0, gives an error:
>>> np.loadtxt('input_file.txt', skiprows=1) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python3.4/site-packages/numpy/lib/npyio.py", line 796, in loadtxt next(fh) File "/usr/lib/python3.4/codecs.py", line 313, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 4158: invalid start byte To me this is "buggy" behaviour for the following reasons:
- I want to skip the first line so it should be ignored, regardless of its encoding
- If I delete the first line from the file, loadtxt works fine in both versions of python
- Shouldn't numpy.loadtxt behave the same in python2 and python3?
Questions:
- How to get around this problem (using python3 of course)?
- Should I file a bug report or is this expected behaviour?
genfromtxt()?