I am a newbie to PySpark and was wondering if you can guide me on how can I convert following SAS code to PySpark.
SAS Code:
If ColA > Then Do; If ColB Not In ('B') and ColC <= 0 Then Do; New_Col = Sum(ColA, ColR, ColP); End; Else Do; New_Col = Sum(ColA + ColR) End; End; Else Do; If ColB Not in ('B') and ColC <= 0 then do; New_Col = Sum(ColR, ColP); end; Else Do; New_Col = ColR; End; End; Currently, below is the PySpark logic that I am using :
df.withColumn('New_Col', when(ColA > 0 & ColB.isin(['B']) == False & ColC <= 0, col('ColA') + Col('ColR') + Col('ColP')) ... ... Is this the most optimal approach or is there a better approach to code?
Thank you for your guidance!