1

So I have a file of amino acids that I am trying to read mdvfmkglskakegvvaaaektkqgvaeaagktkegvlyvgsktkegvvhgvatvaektk eqvtnvggavvtgvtavaqktvegagsiaaatgfvkkdqlgkneegapqegiledmpvdp dneayempseegyqdyepea

and I have a list of uppercase letters called aminoacids. The problem is that I cannot read the sequence because the letters are lowercase. I have been trying to make it uppercase. There is no trouble reading the file and I thought I had successfully converted its contents into a string (but maybe I haven't?).

aminoacids = ['A','C','D','E','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y'] content1 = fh.readline() #first line, which is not the sequence #print content1 charline1 = len(content1)-1 #number of characters in the first line #print charline1 contentall = fh.readlines() #each line is converted into a string and put into a list #print contentall numlines = len(contentall) #number of elements in list = number of lines, not the first one #print numlines contentjoined = ''.join(contentall) #list elements are combined, but this includes new lines as characters contentjoined = contentjoined.translate(None, "\n") contentjoined = contentjoined.translate(None,''.join([i for i in contentjoined if i not in aminoacids])) contentjoined = contentjoined.upper() print contentjoined numaa = len(contentjoined) print numaa #this shouldn't be zero but it is 

Why does this not work? What can I do to fix it? I am in a with right now...that hasn't been a problem before, but is it now? Numaa is 0, when it shouldn't be. I realize that I can just add lowercase letters to my list but there should be a more "pythonic" way of fixing this.

3 Answers 3

2

Is it because you are making your string uppercase after you are checking for the strings in aminoacids? Try moving the contentjoined = contentjoined.upper() a line or two up.

When you check against aminoacids, you are supplying str.translate with a fully lowercase string, so it doesn't match the strings. It ends up looking like this:

>>> c = contentjoined.translate(None,''.join([i for i in contentjoined if i not in aminoacids])) >>> c '' 

If you call upper first, you'll be comparing an uppercase string with a list of uppercase strings, so you'll actually have matches. It'll look like this:

>>> contentjoined = contentjoined.upper() >>> c = contentjoined.translate(None,''.join([i for i in contentjoined if i not in aminoacids])) >>> c 'MDVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSKTKEGVVHGVATVAEKTKEQVTNVGGAVVTGVTAVAQKTVEGAGSIAAATGFVKKDQLGKNEEGAPQEGILEDMPVDPDNEAYEMPSEEGYQDYEPEA' 

If you want to keep the string as lowercase letters, you can just do the comparisons with uppercase letters and keep the lowercase letter. That would look like this:

>>> c = contentjoined.translate(None,''.join([i for i in contentjoined.upper() if i not in aminoacids])) >>> c 'mdvfmkglskakegvvaaaektkqgvaeaagktkegvlyvgsktkegvvhgvatvaektkeqvtnvggavvtgvtavaqktvegagsiaaatgfvkkdqlgkneegapqegiledmpvdpdneayempseegyqdyepea' 
Sign up to request clarification or add additional context in comments.

1 Comment

I can't believe I missed that! My whole string was being changed to None because of that I think.
0

Problem is in your translate() commands:

contentjoined = contentjoined.translate(None, "\n") contentjoined = contentjoined.translate(None,''.join([i for i in contentjoined if i not in aminoacids])) 

Here you are replacing everything found (well I am not sure what data you have in the contentjoined or aminoacids ) with None . Like If you try:

>>>temp = "this is a test string" >>>temp.translate(None, "aeiou") >>>'ths s tst strng' #THIS IS OUTPUT 

So I am guessing your whole string is getting changed to None. Check out translate() Docs

1 Comment

Yes, you could think of it that way. The whole string is getting changed to None because it does not match any of the list elements. But the reason it doesn't match the list elements is because it is lowercase, which is why I asked about making it uppercase.
0

When you pull in the file you could convert all the content to uppercase. Maybe something like this?

with open('myfile.txt', 'r') as f: data = f.read().upper() print(data) 'MDVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSKTKEGVVHGVATVAEKTK\nEQVTNVGGAVVTGVTAVAQKTVEGAGSIAAATGFVKKDQLGKNEEGAPQEGILEDMPVDP\nDNEAYEMPSEEGYQDYEPEA\n' 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.