3

I have a .csv file containing 3 columns of data. I need to create a new output file that includes a specific set of data from the first and third column from the original file. The third column contains decimal values, and I believe in such a case I have use the float() feature of python. I have tried the following code:

in_file = open("filename.csv", "r") out_file = open("output.csv", "w") while True: line = in_file.readline() if (line == ''): break line = line.strip() items = line.split(',') gi_name = items[0] if (gi_name.startswith("_")) continue p_value = float(items[2]) if (p_value > 0.05): continue out_file.write(','.join([gene_name, str(p_value)])) in_file.close() out_file.close() 

when I run the above, I recieve the following error:

Error: invalid literal for float(): 0.000001

the value 0.0000001 is the first value in my data set for the third column, and I guess the code cannot read beyond that set but I'm not sure why. I am new to python, and don't really understand why I am getting this error or how to fix it. I have tried other modifications for how to input the float(), but without success. Does anyone know how I might be able to fix this?

2
  • 3
    Have you considered using the csv module? Commented Mar 28, 2012 at 23:52
  • Adding a few lines of your CSV file to the question would be helpful for reproduction. Commented Mar 28, 2012 at 23:59

2 Answers 2

5

From what you've posted, it's not clear whether there is something subtly wrong with the string you're trying to pass to float() (because it looks perfectly reasonable). Try adding a debug print statement:

print(repr(items[2])) p_value = float(items[2]) 

Then you can determine exactly what is being passed to float(). The call to repr() will make even normally invisible characters visible. Add the result to your question and we will be able to comment further.

Sign up to request clarification or add additional context in comments.

5 Comments

Thank you Greg, when I input the repr(items[2])) it printed the following: '1.10E-06\rGene2' Traceback (most recent call last): File "s6help.py", line 13, in <module> p_value = float(items[2]) so it seems I have a \rGene2 that is hidden in my item[2]. My code has the .strip() function, I thought that would remove the \r and \n. I modified my code to .strip(\r), but it still did not remove it. I don't know what else to do, do have any more ideas?
Well, that's definitely the problem. Note that .strip() only removes whitespace from the ends of the string, while your \r is in the middle of the string. You're now going to have to look at the CSV file format and the code you use to read the file. It's possible that your file might have only \r line endings, which isn't supported by default in Python. Does that seem likely?
Yes this is possible, and I believe this is the problem. My line endings contain \r, and any attempt to remove them or replace them only results in creating one long line, which is not what I want. Any suggestion on how to remove the \r but still maintain seperate rows?
Use \n instead of \r. The \r by itself is not a usual line terminator. Python normally handles both \n and \r\n (but \n is preferred).
Thank you so much! I was able to get the code to work simply by using the 'rU' read argument instead of just 'r', which basically removes the \r issue. Thank you so much, I don't know if I ever would have figured that out on my own!
1

Your file most likely has some unprintable character that is read. Try this:

>>> a = '0.00001\x00' >>> a '0.00001\x00' >>> print(a) 0.00001 >>> float(a) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: invalid literal for float(): 0.00001 

You can see that a has a NUL character which is not printed with either print or the exception of float.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.