I would like to find a pandas solution for the following problem (the dataframe is very long in reality, therefore performance really is an important topic):
I have an input dataframe df and need to build a new dataframe dfNew, where I need to derive the output in column 'rs' from the values of the other columns.
And the needed logics is the following:
tis always increasing steadily from 0 to its maximum value. Afterwards its starts again with 0.- whenever we are in the range from
t= 0 and the next upcomingpt= 'X' (including), the value of columntdshould be taken for the result columnrs, else the value of columnmdshould be taken for columnrs.
How would a pandas based solution to derive rs from the other columns look like?
td = ['td0','td1','td2','td3','td4','td5','td6','td7','td8','td9','td10','td11','td12'] md = ['md0','md1','md2','md3','md4','md5','md6','md7','md8','md9','md10','md11','md12'] t = [ 0 , 1 , 2 , 3 , 0 , 1 , 2 , 3 , 4 , 5 , 0 , 1 , 2 ] pt = [ 'n', 'n', 'X', 'n', 'n', 'n', 'n', 'X', 'n', 'n', 'n', 'X', 'n'] df = pd.DataFrame({'td': td, 'md': md, 't': t, 'pt': pt}, columns=['td', 'md', 't', 'pt']) df td md t pt 0 td0 md0 0 n 1 td1 md1 1 n 2 td2 md2 2 X 3 td3 md3 3 n 4 td4 md4 0 n 5 td5 md5 1 n 6 td6 md6 2 n 7 td7 md7 3 X 8 td8 md8 4 n 9 td9 md9 5 n 10 td10 md10 0 n 11 td11 md11 1 X 12 td12 md12 2 n dfNew td md t pt rs 0 td0 md0 0 n td0 1 td1 md1 1 n td1 2 td2 md2 2 X td2 3 td3 md3 3 n md3 4 td4 md4 0 n td4 5 td5 md5 1 n td5 6 td6 md6 2 n td6 7 td7 md7 3 X td7 8 td8 md8 4 n md8 9 td9 md9 5 n md9 10 td10 md10 0 n td10 11 td11 md11 1 X td11 12 td12 md12 2 n md12