1

In the pandas Dataframe of following structure:

mcve_data = alfa alfa_id beta beta_id a,c 7 c,de 8 c,d 7 d,f 9 l,mnk 8 c,d 9 j,k 8 d,e 9 tk,l 8 n,k 11 
  • I want to run a for-loop in each line reading the values from key (alfa and beta) and key_index (alfa_index, beta_index).
  • If the values in the key is more than 3 in length or if any values are more than 1 character long. I want both the key-values and key-index to convert to period ..

Final expected output:

alfa alfa_id beta beta_id a,c 7 . . c,d 7 d,f 9 . . c,d 9 j,k 8 d,e 9 . . n,k 11 

I wanted to write a function something like (but it hasn't worked properly):

def check_and_convert(mcve_data): labels = (l, l + id) for l in mcve_data.columns.values def convert(lines): for l,id in labels: if len(l) > 3: l = '.' id = '.' return l, id write this back to the file. 

Any suggestions,

2 Answers 2

2

You could also skip the inner for loop by using the str accessor to check the length of every value in a column at once:

keys = [k for k in df.columns if not k.endswith('_id')] for k in keys: df.loc[df[k].str.len()>3,[k,k+'_id']] = '.' 
Sign up to request clarification or add additional context in comments.

Comments

1

You could use for loop and iterrows(). see below.

import pandas as pd from StringIO import StringIO s = """alfa alfa_id beta beta_id a,c 7 c,de 8 c,d 7 d,f 9 l,mnk 8 c,d 9 j,k 8 d,e 9 tk,l 8 n,k 11 """ df = pd.read_table(StringIO(s), delim_whitespace = True, dtype ={'alfa': str, 'alfa_id': str, 'beta': str, 'beta_id': str}) # I create a lsit of keys and key index based on '_id' distinction keys = [i for i in df.columns if 'id' not in i] key_ids = [i+'_id' for i in keys] for index, row in df.iterrows(): for k,kid in zip(keys, key_ids): if (len(row[k].split(','))>3 or any([len(i) > 1 for i in row[k].split(',')])): df.set_value(index, kid, '.') df.set_value(index, k, '.') print df 

results in

 alfa alfa_id beta beta_id 0 a,c 7 . . 1 c,d 7 d,f 9 2 . . c,d 9 3 j,k 8 d,e 9 4 . . n,k 11 

4 Comments

Thanks much for the answer. But, I want to apply for-loop, reason being - there are lots (about 100s) of keys and key_id, not just 2.
I edited to the code with second for loop. for all the keys and key_ids
Thanks for the update. What is the reason for the those int values got converted to strings like '11'. I will try to find the solution for it, but if you can do it without much hassle I would appreciate it.
pandas read_table was reading the data as int so i changed the text data to strings. but now it should be fine.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.