I have two dataframes of different length. dfSamples (63012375 rows) and dfFixations (200000 rows).
dfSamples = pd.DataFrame({'tSample':[4, 6, 8, 10, 12, 14]}) dfFixations = pd.DataFrame({'tStart':[4,12],'tEnd':[8,14]}) I would like to check each value in dfSamples if it is within any two ranges given in dfFixations and then assign a label to this value. I have found this: Check if value in a dataframe is between two values in another dataframe, but the loop solution is terribly slow and I cannot make any other solution work.
Working (but very slow) example:
labels = np.empty_like(dfSamples['tSample']).astype(np.chararray) for i, fixation in dfFix.iterrows(): log_range = dfSamples['tSample'].between(fixation['tStart'], fixation['tEnd']) labels[log_range] = 'fixation' labels[labels != 'fixation'] = 'no_fixation' dfSamples['labels'] = labels Following this example: Performance of Pandas apply vs np.vectorize to create new column from existing columns I have tried to vectorize this but with no success.
def check_range(samples, tstart, tend): log_range = (samples > tstart) & (samples < tend) return log_range fixations = list(map(check_range, dfSamples['tSample'], dfFix['tStart'], dfFix['tEnd'])) Would appreciate any help!