I’m trying to optimize my code to reduce the time it takes to run through the data set.
I’m working with a .csv file that has three columns: time_UTC, vmag2D and vdir. The data set as around 1420000 lines (one million, four hundred and 20 thousand).
This for cycle took around 15/20 minutes to run on my Mac with the M1 processor. I’m sure I complicated something for it to take so much time. (I believe the processor is good enough to run this little piece of code faster).
import pandas as pd path_data = '" *insert a path here* "' file = path_data + ' *name of the .csv file* ' data = pd.read_csv(file) time_UTC = [] vmag2D = [] vdir = [] for i in range(len(data)): x = data.iloc[i][0] x1 = x.split(' ') x2 = x1[1].split(';') date = x.split(' ')[0] time_UTC.append(x2[0]) vmag2D.append(x2[1]) vdir.append(x2[2]) The code is parsing each of the lines in the .csv file, and each of them has the same “template”: '1994-01-01 00:05:00;0.52;193'