I have this .csv:
col1,col2,col3,col4,col5 247,19,1.0,2016-01-01 14:11:21,MP 247,3,1.0,2016-01-01 14:23:43,MP 247,12,1.0,2016-01-01 15:32:16,MP 402,3,1.0,2016-01-01 12:11:15,? 583,12,1.0,2016-01-01 02:33:57,? 769,16,1.0,2016-01-01 03:12:24,? 769,4,1.0,2016-01-01 03:22:29,? ..... I need to take col2 values for each col1 unique element and make a new .csv like this:
expected output: 19,3,12 3 12 16,4 ... That is, I want to output numbers until a non-unique value is seen, at which point I will start a new line and continue to output numbers.
I read the .csv in that way and removed duplicate from the list:
import pandas as pd colnames = ['col1', 'col2', 'col3', 'col4', 'col5'] df = pd.read_csv('sorted.csv', names=colnames) list1 = df.col1.tolist() list2 = list(set(list1 )) now things are getting hard for me, I'm newbie in python, my idea was to compare each element in list2 with each row in df writing col2 elements in a new .csv, could you help me please?