1

How could I delete the rows which have '0' as a value on 5th column? Or even better, Can we choose the range (ie. remove the rows which have values between -50 and 30 on 5th column)?

data looks like this:

 0 4028.44 4544434.50 -6.76 -117.00 0.0002 0.12 0 4028.50 3455014.50 -5.86 0 0.0003 0.39 0 7028.56 4523434.50 -4.95 -137.00 0.0005 0.25 0 8828.62 4543414.50 -3.05 0 0.0021 0.61 0 4028.44 4544434.50 -6.76 -107.00 0.0002 0.12 0 4028.50 3455014.50 -5.86 -11.00 0.0003 0.39 0 7028.56 4523434.50 -4.95 -127.00 0.0005 0.25 0 8828.62 4543414.50 -3.05 0 0.0021 0.61 
2
  • operator.itemgetter(4)... then compare it. Commented Aug 9, 2011 at 1:15
  • @Chad: Did you get this working yet? Commented Aug 11, 2011 at 22:41

3 Answers 3

4
goodrows = [row for row in data if row.split()[4] != '0'] 

or

goodrows = [row for row in data if not (-50 <= float(row.split()[4]) <= 30)] 

Edit:

If your data is actually in a NumPy array, which your comment seems to indicate even if your post didn't:

goodrows = [row for row in data if row[4] != 0] 

or

goodrows = [row for row in data if not (-50 <= row[4] <= 30)] 

should work. There is definitely a NumPy internal way to do this though.

Sign up to request clarification or add additional context in comments.

4 Comments

I've just tested this to see if they are identical: they're not. int(row.split()[4]) raises when it encounters -117.00. That may explain the -1...
@Johnsyweb absolutely right, good catch. +1 to your answer. Note: I was not one of the downvoters.
I get 'AttributeError: 'numpy.ndarray' object has no attribute 'split'' error with this one too.
Ok, if it's already in an array, not in a list of strings in a file, just do row[4]. See my edit. Next time, make sure to say in your question if the data is in a numPy array. We all assumed it was in a file in the format you posted.
2

you can use numpy to do this quickly:

data=""" 0 4028.44 4544434.50 -6.76 -117.00 0.0002 0.12 0 4028.50 3455014.50 -5.86 0 0.0003 0.39 0 7028.56 4523434.50 -4.95 -137.00 0.0005 0.25 0 8828.62 4543414.50 -3.05 0 0.0021 0.61 0 4028.44 4544434.50 -6.76 -107.00 0.0002 0.12 0 4028.50 3455014.50 -5.86 -11.00 0.0003 0.39 0 7028.56 4523434.50 -4.95 -127.00 0.0005 0.25 0 8828.62 4543414.50 -3.05 0 0.0021 0.61 """ from StringIO import StringIO import numpy as np d = np.loadtxt(StringIO(data)) # load the text in to a 2d numpy array print d[d[:,4]!=0] # choose column 5 != 0 print d[(d[:,4]>=50)|(d[:,4]<=-30)] # choose column 5 >=50 or <=-30 

3 Comments

I don't know if numpy is the right tool as it's not on std library... A list comprehension seems better
I got this error: File "<stdin>", line 1, in <module> File "/Library/Frameworks/Python.framework/Versions/7.1/lib/python2.7/site-packages/numpy/lib/npyio.py", line 796, in loadtxt items = [conv(val) for (conv, val) in zip(converters, vals)] ValueError: could not convert string to float: [[
the program above can only convert numbers split by space. From the error message, it seems that you are trying some other data format.
1

Assuming your data is in a plain text file like this:

$ cat data.txt 0 4028.44 4544434.50 -6.76 -117.00 0.0002 0.12 0 4028.50 3455014.50 -5.86 0 0.0003 0.39 0 7028.56 4523434.50 -4.95 -137.00 0.0005 0.25 0 8828.62 4543414.50 -3.05 0 0.0021 0.61 0 4028.44 4544434.50 -6.76 -107.00 0.0002 0.12 0 4028.50 3455014.50 -5.86 -11.00 0.0003 0.39 0 7028.56 4523434.50 -4.95 -127.00 0.0005 0.25 0 8828.62 4543414.50 -3.05 0 0.0021 0.61 

And you are not using any external libraries. The following will read the data into a list of strings, omiting the undesirable lines. You can feed these lines into any other function you choose. I call print merely to demonstrate. N.B: The fifth column has index '4', since list indices are zero-based.

$ cat data.py #!/usr/bin/env python print "1. Delete the rows which have '0' as a value on 5th column:" def zero_in_fifth(row): return row.split()[4] == '0' required_rows = [row for row in open('./data.txt') if not zero_in_fifth(row)] print ''.join(required_rows) print '2. Choose the range (i.e. remove the rows which have values between -50 and 30 on 5th column):' def should_ignore(row): return -50 <= float(row.split()[4]) <= 30 required_rows = [row for row in open('./data.txt') if not should_ignore(row)] print ''.join(required_rows) 

When you run this you will get:

$ python data.py 1. Delete the rows which have '0' as a value on 5th column: 0 4028.44 4544434.50 -6.76 -117.00 0.0002 0.12 0 7028.56 4523434.50 -4.95 -137.00 0.0005 0.25 0 4028.44 4544434.50 -6.76 -107.00 0.0002 0.12 0 4028.50 3455014.50 -5.86 -11.00 0.0003 0.39 0 7028.56 4523434.50 -4.95 -127.00 0.0005 0.25 2. Choose the range (i.e. remove the rows which have values between -50 and 30 on 5th column): 0 4028.44 4544434.50 -6.76 -117.00 0.0002 0.12 0 7028.56 4523434.50 -4.95 -137.00 0.0005 0.25 0 4028.44 4544434.50 -6.76 -107.00 0.0002 0.12 0 7028.56 4523434.50 -4.95 -127.00 0.0005 0.25 

12 Comments

Don't you think lambdas are overkill for this?
What's the point of naming a lamda function? That's just wrong. Just use the def keyword.
@JBernardo: A named function would probably be better, you're right. I just extracted the lambda from the generator expression to reduce the line-length.
As said above, that's not the place to use lambdas. Wrong in many levels. Try reading that...
@Johnsyweb: I loaded the data from a text file via pylab.loadtxt and try your code but I got the same error with the same line. what am I missing here?
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.