To add new column with some custom value or dynamic value calculation which will be populated based on the existing columns.
e.g.
|ColumnA | ColumnB | |--------|---------| | 10 | 15 | | 10 | 20 | | 10 | 30 | and new ColumnC as ColumnA+ColumnB
|ColumnA | ColumnB | ColumnC| |--------|---------|--------| | 10 | 15 | 25 | | 10 | 20 | 30 | | 10 | 30 | 40 | using
#to add new column def customColumnVal(row): rd=row.asDict() rd["ColumnC"]=row["ColumnA"] + row["ColumnB"] new_row=Row(**rd) return new_row ---------------------------- #convert DF to RDD df_rdd= input_dataframe.rdd #apply new fucntion to rdd output_dataframe=df_rdd.map(customColumnVal).toDF() input_dataframe is the dataframe which will get modified and customColumnVal function is having code to add new column.