1

I am a newbie to PySpark and was wondering if you can guide me on how can I convert following SAS code to PySpark.

SAS Code:

If ColA > Then Do; If ColB Not In ('B') and ColC <= 0 Then Do; New_Col = Sum(ColA, ColR, ColP); End; Else Do; New_Col = Sum(ColA + ColR) End; End; Else Do; If ColB Not in ('B') and ColC <= 0 then do; New_Col = Sum(ColR, ColP); end; Else Do; New_Col = ColR; End; End; 

Currently, below is the PySpark logic that I am using :

df.withColumn('New_Col', when(ColA > 0 & ColB.isin(['B']) == False & ColC <= 0, col('ColA') + Col('ColR') + Col('ColP')) ... ... 

Is this the most optimal approach or is there a better approach to code?

Thank you for your guidance!

1 Answer 1

1

Your code is as good as needed, however the conditions should be wrapped inside parentheses

from pyspark.sql import functions as F (df .withColumn('New_Col', F .when((F.col('ColA') > 0) & (F.col('ColB').isin(['B']) == False) & (F.col('ColC') <= 0), F.col('ColA') + F.Col('ColR') + F.Col('ColP')) ) ) 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.