2

I need help for using python pandas. I have a A.txt which has these data (this is an example)

0003343 01901310 8193910 91931 9183131 89102010 7373819 83003 3692429 92920202 

and B.txt

424u20u 092u9j 902u39 9293u2 9u193jj 901u39 jdo910 903u98 ue9un88 

So for A.txt I did the following and it does print the dataframe with number of column on top

import pandas as pd fileRead = pd.read_csv("A.txt", delim_whitespace = True, header=None, dtype=object) print fileRead 

the result

 0 1 2 3 0 0003343 01901310 8193910 91931 1 9183131 89102010 7373819 83003 2 3692429 92920202 NaN NaN 

but if I do the same with B.txt, it gave me error

pandas.errors.ParserError: Too many columns specified: expected 4 and found 1 

I don't understand. It should be the same result. What should I do to solve this? Thank you for your help and suggestion.

4
  • Can you post a link to your second txt file, as your code should've worked, there's another problem here Commented Jul 19, 2017 at 8:24
  • what is your pandas version and python version? Commented Jul 19, 2017 at 8:30
  • drive.google.com/file/d/0B5miO1jf9SXecUpDWXVsNkVXczg/… this is the real second text file. My python version is 2.7 and pandas is 0.20.3 Commented Jul 19, 2017 at 8:36
  • I think this is a bug, if you remove the last line then it works fine, if you use read_fwf then it skips the last line Commented Jul 19, 2017 at 8:49

2 Answers 2

1

I think you need read_fwf:

df = pd.read_fwf('test3.txt', header=None, dtype=object) print (df.head()) 0 1 2 3 4 5 6 7 8 0 0000000 00915 00517 00916 00517 00916 00517 00915 00517 1 0000010 00915 00518 00915 00518 00915 00517 00915 00517 2 0000020 00915 00518 00915 00517 00915 00516 00915 00517 3 0000030 00915 00517 00915 00517 00915 00517 00915 00517 4 0000040 00915 00517 00916 00517 00915 00517 00915 00517 print (df.tail()) 0 1 2 3 4 5 6 7 8 262140 03fffc0 00916 00513 00916 00514 00916 00516 00916 00514 262141 03fffd0 00916 00513 00916 00514 00916 00514 00916 00514 262142 03fffe0 00916 00514 00916 00514 00917 00514 00916 00514 262143 03ffff0 00916 00514 00916 00514 00916 00514 00916 00514 262144 0400000 NaN NaN NaN NaN NaN NaN NaN NaN 

EDIT:

As EDchum pointed is possible last row is dropped. It seems in python 3.6. it silently remove.

Sign up to request clarification or add additional context in comments.

12 Comments

No, there is another problem which is not reproducible with the posted data, the posted code should've worked.
Can you post code to show this as this works for me so I'm a little sceptical, using pd.read_clipboard and using read_csv works for me here
One possibility is some duff version of pandas
I think this is a bug, if you remove the last line then it works fine, using read_fwf silently skips the last line
very odd this the difference in behaviour
|
0

I find when I insert new records in my csv file using Pandas I sometimes get extras columns added as well albeit only for some of the records. So I had to manually edit the csv file to the correct number of columns, making sure I do not delete rows further down in the file that still have the correct number of columns. This fixes it, although the real problem is why are the extra columns being added? I do not know the answer to that.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.