0

Hello guys I'm doing a dataframe filtering based on if condition but the problem that I must repeat the same code 3 times in every if condition and I don't want to do that. It's not optimized. Someone has any idea how to optimize that? here is the code exemple

if sexe == "male": new_df = ( df.where(F.col("sexe") == 1) .where(F.col("column_flag") == False) .withColumn("new_column", F.col("column1") / F.col("column3")) ) elif sexe == "female": new_df = ( df.where(F.col("sexe") == 2) .where(F.col("column_flag") == False) .withColumn("new_column", F.col("column1") / F.col("column3")) ) else: new_df = df.where(F.col("column_flag") == False).withColumn( "new_column", F.col("column1") / F.col("column3") ) 
2
  • are you asking for a more efficient method or simply a way to avoid duplicate code? Commented Nov 23, 2022 at 10:36
  • just a simply a way to avoid duplicate code but if you have any suggestion I'll take it, Thank you Commented Nov 23, 2022 at 10:38

1 Answer 1

2

One way is to build the filtering expression then use it to filter the dataframe:

filter_expr = ~F.col("column_flag") if sexe == "male": filter_expr = filter_expr & F.col("sexe") == 1 elif sexe == "female": filter_expr = filter_expr & F.col("sexe") == 2 new_df = df.filter(filter_expr).withColumn( "new_column", F.col("column1") / F.col("column3") ) 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.