0

I have a dataset that consists of ID (participant), run, indexnumber (that is, an index number of a slalom turn) and performance (that could be velocity or time). In addition, I have information for each id and run where in the slalom turn (that is, the index) they actually start to turn.

My goal is to create a new column in the dataframe that contain 0 if the id has not started to turn and 1 if they have started to turn. This column could be called phase.

For example: For ID1 the point where this skier starts to turn i index 4 for the first run and 9 for the second run. Therefore, I want all rows in the new column to contain 0s until index nr 4 and 1s thereafter (for the first run). For the second run I want all rows to contain 0s until index nr 9 and 1 thereafter.

Is there a simple way to do this with pandas or vanilla python?

example = [[1.0, 1.0, 1.0, 0.6912982024915187], [1.0, 1.0, 2.0, 0.16453900411106737], [1.0, 1.0, 3.0, 0.11362801727310845], [1.0, 1.0, 4.0, 0.587778444335624], [1.0, 1.0, 5.0, 0.8455388913351765], [1.0, 1.0, 6.0, 0.5719366584505648], [1.0, 1.0, 7.0, 0.4665520044952449], [1.0, 1.0, 8.0, 0.9105152709573275], [1.0, 1.0, 9.0, 0.4600099001744885], [1.0, 1.0, 10.0, 0.8577060884077763], [1.0, 2.0, 1.0, 0.11550722410813963], [1.0, 2.0, 2.0, 0.5729090378222077], [1.0, 2.0, 3.0, 0.43990164344919824], [1.0, 2.0, 4.0, 0.595242293948498], [1.0, 2.0, 5.0, 0.443684017624451], [1.0, 2.0, 6.0, 0.3608135854303052], [1.0, 2.0, 7.0, 0.28525404982906766], [1.0, 2.0, 8.0, 0.11561422303194391], [1.0, 2.0, 9.0, 0.8579134051748011], [1.0, 2.0, 10.0, 0.540598113345226], [2.0, 1.0, 1.0, 0.4058570295736075], [2.0, 1.0, 2.0, 0.9422426000325298], [2.0, 1.0, 3.0, 0.7918655742964762], [2.0, 1.0, 4.0, 0.4145753321336241], [2.0, 1.0, 5.0, 0.5256388261997529], [2.0, 1.0, 6.0, 0.8140335187050629], [2.0, 1.0, 7.0, 0.12134416740848841], [2.0, 1.0, 8.0, 0.9016748379372173], [2.0, 1.0, 9.0, 0.462241316800442], [2.0, 1.0, 10.0, 0.7839715857746699], [2.0, 2.0, 1.0, 0.5300527244824904], [2.0, 2.0, 2.0, 0.8784844676567194], [2.0, 2.0, 3.0, 0.14395673182343738], [2.0, 2.0, 4.0, 0.7606405990262495], [2.0, 2.0, 5.0, 0.5123048342846208], [2.0, 2.0, 6.0, 0.25608277502943655], [2.0, 2.0, 7.0, 0.4264542956426933], [2.0, 2.0, 8.0, 0.9144976708651866], [2.0, 2.0, 9.0, 0.875888479621729], [2.0, 2.0, 10.0, 0.3428732760552141]] turnPhaseId1 = [4,9] #the index number when ID1 starts to turn in run 1 and run 2, respectively turnPhaseId2 = [2,5] #the index number when ID2 starts to turn in run 1 and run 2, respectively pd.DataFrame(example, columns=['id', 'run', 'index', 'performance']) 

1 Answer 1

1

I believe it is a better idea to turnPhase into a dictionary, and then use apply:

turn_dict = {1: [4, 9], 2: [2, 5]} 

We also need to change the column types as we need to reach dictionary keys, and list indexes, which are int:

df['id'] = df['id'].astype(int) df['index'] = df['index'].astype(int) 

Finally, apply:

df['new_column'] = df.apply(lambda x: 0 if x['index'] < turn_dict[x['id']][int(x['run'] -1)] else 1 , axis=1) 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.