I have a dataset that consists of ID (participant), run, indexnumber (that is, an index number of a slalom turn) and performance (that could be velocity or time). In addition, I have information for each id and run where in the slalom turn (that is, the index) they actually start to turn.
My goal is to create a new column in the dataframe that contain 0 if the id has not started to turn and 1 if they have started to turn. This column could be called phase.
For example: For ID1 the point where this skier starts to turn i index 4 for the first run and 9 for the second run. Therefore, I want all rows in the new column to contain 0s until index nr 4 and 1s thereafter (for the first run). For the second run I want all rows to contain 0s until index nr 9 and 1 thereafter.
Is there a simple way to do this with pandas or vanilla python?
example = [[1.0, 1.0, 1.0, 0.6912982024915187], [1.0, 1.0, 2.0, 0.16453900411106737], [1.0, 1.0, 3.0, 0.11362801727310845], [1.0, 1.0, 4.0, 0.587778444335624], [1.0, 1.0, 5.0, 0.8455388913351765], [1.0, 1.0, 6.0, 0.5719366584505648], [1.0, 1.0, 7.0, 0.4665520044952449], [1.0, 1.0, 8.0, 0.9105152709573275], [1.0, 1.0, 9.0, 0.4600099001744885], [1.0, 1.0, 10.0, 0.8577060884077763], [1.0, 2.0, 1.0, 0.11550722410813963], [1.0, 2.0, 2.0, 0.5729090378222077], [1.0, 2.0, 3.0, 0.43990164344919824], [1.0, 2.0, 4.0, 0.595242293948498], [1.0, 2.0, 5.0, 0.443684017624451], [1.0, 2.0, 6.0, 0.3608135854303052], [1.0, 2.0, 7.0, 0.28525404982906766], [1.0, 2.0, 8.0, 0.11561422303194391], [1.0, 2.0, 9.0, 0.8579134051748011], [1.0, 2.0, 10.0, 0.540598113345226], [2.0, 1.0, 1.0, 0.4058570295736075], [2.0, 1.0, 2.0, 0.9422426000325298], [2.0, 1.0, 3.0, 0.7918655742964762], [2.0, 1.0, 4.0, 0.4145753321336241], [2.0, 1.0, 5.0, 0.5256388261997529], [2.0, 1.0, 6.0, 0.8140335187050629], [2.0, 1.0, 7.0, 0.12134416740848841], [2.0, 1.0, 8.0, 0.9016748379372173], [2.0, 1.0, 9.0, 0.462241316800442], [2.0, 1.0, 10.0, 0.7839715857746699], [2.0, 2.0, 1.0, 0.5300527244824904], [2.0, 2.0, 2.0, 0.8784844676567194], [2.0, 2.0, 3.0, 0.14395673182343738], [2.0, 2.0, 4.0, 0.7606405990262495], [2.0, 2.0, 5.0, 0.5123048342846208], [2.0, 2.0, 6.0, 0.25608277502943655], [2.0, 2.0, 7.0, 0.4264542956426933], [2.0, 2.0, 8.0, 0.9144976708651866], [2.0, 2.0, 9.0, 0.875888479621729], [2.0, 2.0, 10.0, 0.3428732760552141]] turnPhaseId1 = [4,9] #the index number when ID1 starts to turn in run 1 and run 2, respectively turnPhaseId2 = [2,5] #the index number when ID2 starts to turn in run 1 and run 2, respectively pd.DataFrame(example, columns=['id', 'run', 'index', 'performance'])