Eliminating rows with a specific value in a column using Python

Question

How could I delete the rows which have '0' as a value on 5th column? Or even better, Can we choose the range (ie. remove the rows which have values between -50 and 30 on 5th column)?

data looks like this:

 0 4028.44 4544434.50 -6.76 -117.00 0.0002 0.12 0 4028.50 3455014.50 -5.86 0 0.0003 0.39 0 7028.56 4523434.50 -4.95 -137.00 0.0005 0.25 0 8828.62 4543414.50 -3.05 0 0.0021 0.61 0 4028.44 4544434.50 -6.76 -107.00 0.0002 0.12 0 4028.50 3455014.50 -5.86 -11.00 0.0003 0.39 0 7028.56 4523434.50 -4.95 -127.00 0.0005 0.25 0 8828.62 4543414.50 -3.05 0 0.0021 0.61

operator.itemgetter(4)... then compare it.

JBernardo
– JBernardo

2011-08-09 01:15:37 +00:00
Commented Aug 9, 2011 at 1:15 — JBernardo
– JBernardo, Commented Aug 9, 2011 at 1:15
@Chad: Did you get this working yet?

johnsyweb
– johnsyweb

2011-08-11 22:41:01 +00:00
Commented Aug 11, 2011 at 22:41 — johnsyweb
– johnsyweb, Commented Aug 11, 2011 at 22:41

agf · Accepted Answer · 2011-08-09 15:45:33Z

4

goodrows = [row for row in data if row.split()[4] != '0']

or

goodrows = [row for row in data if not (-50 <= float(row.split()[4]) <= 30)]

Edit:

If your data is actually in a NumPy array, which your comment seems to indicate even if your post didn't:

goodrows = [row for row in data if row[4] != 0]

or

goodrows = [row for row in data if not (-50 <= row[4] <= 30)]

should work. There is definitely a NumPy internal way to do this though.

edited Aug 9, 2011 at 15:45

answered Aug 9, 2011 at 0:30

agf

178k45 gold badges300 silver badges241 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

johnsyweb Over a year ago

I've just tested this to see if they are identical: they're not. int(row.split()[4]) raises when it encounters -117.00. That may explain the -1...

agf Over a year ago

@Johnsyweb absolutely right, good catch. +1 to your answer. Note: I was not one of the downvoters.

Chad Over a year ago

I get 'AttributeError: 'numpy.ndarray' object has no attribute 'split'' error with this one too.

agf Over a year ago

Ok, if it's already in an array, not in a list of strings in a file, just do row[4]. See my edit. Next time, make sure to say in your question if the data is in a numPy array. We all assumed it was in a file in the format you posted.

HYRY · Accepted Answer · 2011-08-09 01:12:16Z

you can use numpy to do this quickly:

data=""" 0 4028.44 4544434.50 -6.76 -117.00 0.0002 0.12 0 4028.50 3455014.50 -5.86 0 0.0003 0.39 0 7028.56 4523434.50 -4.95 -137.00 0.0005 0.25 0 8828.62 4543414.50 -3.05 0 0.0021 0.61 0 4028.44 4544434.50 -6.76 -107.00 0.0002 0.12 0 4028.50 3455014.50 -5.86 -11.00 0.0003 0.39 0 7028.56 4523434.50 -4.95 -127.00 0.0005 0.25 0 8828.62 4543414.50 -3.05 0 0.0021 0.61 """ from StringIO import StringIO import numpy as np d = np.loadtxt(StringIO(data)) # load the text in to a 2d numpy array print d[d[:,4]!=0] # choose column 5 != 0 print d[(d[:,4]>=50)|(d[:,4]<=-30)] # choose column 5 >=50 or <=-30

I don't know if numpy is the right tool as it's not on std library... A list comprehension seems better
I got this error: File "<stdin>", line 1, in <module> File "/Library/Frameworks/Python.framework/Versions/7.1/lib/python2.7/site-packages/numpy/lib/npyio.py", line 796, in loadtxt items = [conv(val) for (conv, val) in zip(converters, vals)] ValueError: could not convert string to float: [[
the program above can only convert numbers split by space. From the error message, it seems that you are trying some other data format.

johnsyweb · Accepted Answer · 2011-08-09 22:16:07Z

Assuming your data is in a plain text file like this:

$ cat data.txt 0 4028.44 4544434.50 -6.76 -117.00 0.0002 0.12 0 4028.50 3455014.50 -5.86 0 0.0003 0.39 0 7028.56 4523434.50 -4.95 -137.00 0.0005 0.25 0 8828.62 4543414.50 -3.05 0 0.0021 0.61 0 4028.44 4544434.50 -6.76 -107.00 0.0002 0.12 0 4028.50 3455014.50 -5.86 -11.00 0.0003 0.39 0 7028.56 4523434.50 -4.95 -127.00 0.0005 0.25 0 8828.62 4543414.50 -3.05 0 0.0021 0.61

And you are not using any external libraries. The following will read the data into a list of strings, omiting the undesirable lines. You can feed these lines into any other function you choose. I call print merely to demonstrate. N.B: The fifth column has index '4', since list indices are zero-based.

$ cat data.py #!/usr/bin/env python print "1. Delete the rows which have '0' as a value on 5th column:" def zero_in_fifth(row): return row.split()[4] == '0' required_rows = [row for row in open('./data.txt') if not zero_in_fifth(row)] print ''.join(required_rows) print '2. Choose the range (i.e. remove the rows which have values between -50 and 30 on 5th column):' def should_ignore(row): return -50 <= float(row.split()[4]) <= 30 required_rows = [row for row in open('./data.txt') if not should_ignore(row)] print ''.join(required_rows)

When you run this you will get:

$ python data.py 1. Delete the rows which have '0' as a value on 5th column: 0 4028.44 4544434.50 -6.76 -117.00 0.0002 0.12 0 7028.56 4523434.50 -4.95 -137.00 0.0005 0.25 0 4028.44 4544434.50 -6.76 -107.00 0.0002 0.12 0 4028.50 3455014.50 -5.86 -11.00 0.0003 0.39 0 7028.56 4523434.50 -4.95 -127.00 0.0005 0.25 2. Choose the range (i.e. remove the rows which have values between -50 and 30 on 5th column): 0 4028.44 4544434.50 -6.76 -117.00 0.0002 0.12 0 7028.56 4523434.50 -4.95 -137.00 0.0005 0.25 0 4028.44 4544434.50 -6.76 -107.00 0.0002 0.12 0 7028.56 4523434.50 -4.95 -127.00 0.0005 0.25

What's the point of naming a lamda function? That's just wrong. Just use the def keyword.
@JBernardo: A named function would probably be better, you're right. I just extracted the lambda from the generator expression to reduce the line-length.
As said above, that's not the place to use lambdas. Wrong in many levels. Try reading that...
@Johnsyweb: I loaded the data from a text file via pylab.loadtxt and try your code but I got the same error with the same line. what am I missing here?

Collectives™ on Stack Overflow

Eliminating rows with a specific value in a column using Python

3 Answers 3

4 Comments

3 Comments

12 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

3 Comments

12 Comments

Linked

Related