Skip to main content
AI Assist is now on Stack Overflow. Start a chat to get instant answers from across the network. Sign up to save and share your chats.
formatting improvement
Source Link
samkart
  • 6.7k
  • 3
  • 19
  • 35

To add new column with some custom value or dynamic value calculation which will be populated based on the existing columns.

e.g.

|ColumnA | ColumnB | |--------|---------| | 10 | 15 | | 10 | 20 | | 10 | 30 | 

and new ColumnC as ColumnA+ColumnB

|ColumnA | ColumnB | ColumnC| |--------|---------|--------| | 10 | 15 | 25 | | 10 | 20 | 30 | | 10 | 30 | 40 | 

using

#to add new column def customColumnVal(row):  rd=row.asDict()  rd["ColumnC"]=row["ColumnA"] + row["ColumnB"]    new_row=Row(**rd)  return new_row ---------------------------- #convert DF to RDD df_rdd= input_dataframe.rdd #apply new fucntion to rdd output_dataframe=df_rdd.map(customColumnVal).toDF() 

input_dataframe is the dataframe which will get modified and customColumnVal function is having code to add new column.

To add new column with some custom value or dynamic value calculation which will be populated based on the existing columns.

e.g.

|ColumnA | ColumnB | |--------|---------| | 10 | 15 | | 10 | 20 | | 10 | 30 | 

and new ColumnC as ColumnA+ColumnB

|ColumnA | ColumnB | ColumnC| |--------|---------|--------| | 10 | 15 | 25 | | 10 | 20 | 30 | | 10 | 30 | 40 | 

using

#to add new column def customColumnVal(row): rd=row.asDict() rd["ColumnC"]=row["ColumnA"] + row["ColumnB"] new_row=Row(**rd) return new_row ---------------------------- #convert DF to RDD df_rdd= input_dataframe.rdd #apply new fucntion to rdd output_dataframe=df_rdd.map(customColumnVal).toDF() 

input_dataframe is the dataframe which will get modified and customColumnVal function is having code to add new column.

To add new column with some custom value or dynamic value calculation which will be populated based on the existing columns.

e.g.

|ColumnA | ColumnB | |--------|---------| | 10 | 15 | | 10 | 20 | | 10 | 30 | 

and new ColumnC as ColumnA+ColumnB

|ColumnA | ColumnB | ColumnC| |--------|---------|--------| | 10 | 15 | 25 | | 10 | 20 | 30 | | 10 | 30 | 40 | 

using

#to add new column def customColumnVal(row):  rd=row.asDict()  rd["ColumnC"]=row["ColumnA"] + row["ColumnB"]    new_row=Row(**rd)  return new_row #convert DF to RDD df_rdd= input_dataframe.rdd #apply new fucntion to rdd output_dataframe=df_rdd.map(customColumnVal).toDF() 

input_dataframe is the dataframe which will get modified and customColumnVal function is having code to add new column.

added 5 characters in body
Source Link
Zsolt Meszaros
  • 23.3k
  • 19
  • 60
  • 70

To add new column with some custom value or dynamic value calculation which will be populated based on the existing columns.

e.g.

|ColumnA | ColumnB | |--------|---------| | 10 | 15 | | 10 | 20 | | 10 | 30 | 

and new ColumnC as ColumnA+ColumnB

|ColumnA | ColumnB | ColumnC| |--------|---------|--------| | 10 | 15 | 25 | | 10 | 20 | 30 | | 10 | 30 | 40 | 

using

#to add new column def customColumnVal(row): rd=row.asDict() rd["ColumnC"]=row["ColumnA"] + row["ColumnB"] new_row=Row(**rd) return new_row ---------------------------- #convert DF to RDD df_rdd= input_dataframe.rdd #apply new fucntion to rdd output_dataframe=df_rdd.map(customColumnVal).toDF() 

input_dataframeinput_dataframe is the dataframe which will get modified and customColumnVal funtioncustomColumnVal function is having code to add new column.

To add new column with some custom value or dynamic value calculation which will be populated based on the existing columns.

e.g.

|ColumnA | ColumnB | |--------|---------| | 10 | 15 | | 10 | 20 | | 10 | 30 | 

and new ColumnC as ColumnA+ColumnB

|ColumnA | ColumnB | ColumnC| |--------|---------|--------| | 10 | 15 | 25 | | 10 | 20 | 30 | | 10 | 30 | 40 | 

using

#to add new column def customColumnVal(row): rd=row.asDict() rd["ColumnC"]=row["ColumnA"] + row["ColumnB"] new_row=Row(**rd) return new_row ---------------------------- #convert DF to RDD df_rdd= input_dataframe.rdd #apply new fucntion to rdd output_dataframe=df_rdd.map(customColumnVal).toDF() 

input_dataframe is the dataframe which will get modified and customColumnVal funtion is having code to add new column.

To add new column with some custom value or dynamic value calculation which will be populated based on the existing columns.

e.g.

|ColumnA | ColumnB | |--------|---------| | 10 | 15 | | 10 | 20 | | 10 | 30 | 

and new ColumnC as ColumnA+ColumnB

|ColumnA | ColumnB | ColumnC| |--------|---------|--------| | 10 | 15 | 25 | | 10 | 20 | 30 | | 10 | 30 | 40 | 

using

#to add new column def customColumnVal(row): rd=row.asDict() rd["ColumnC"]=row["ColumnA"] + row["ColumnB"] new_row=Row(**rd) return new_row ---------------------------- #convert DF to RDD df_rdd= input_dataframe.rdd #apply new fucntion to rdd output_dataframe=df_rdd.map(customColumnVal).toDF() 

input_dataframe is the dataframe which will get modified and customColumnVal function is having code to add new column.

To add new column with some custom value or dynamic value calculation which will be populated based on the existing columns.

e.g. |ColumnA | ColumnB | |--------|---------| | 10 | 15 | | 10 | 20 | | 10 | 30 |

|ColumnA | ColumnB | |--------|---------| | 10 | 15 | | 10 | 20 | | 10 | 30 | 

and new ColumnC as ColumnA+ColumnB |ColumnA | ColumnB | ColumnC| |--------|---------|--------| | 10 | 15 | 25 | | 10 | 20 | 30 | | 10 | 30 | 40 |

|ColumnA | ColumnB | ColumnC| |--------|---------|--------| | 10 | 15 | 25 | | 10 | 20 | 30 | | 10 | 30 | 40 | 

using

#to add new column def customColumnVal(row): rd=row.asDict() rd["ColumnC"]=row["ColumnA"] + row["ColumnB"] new_row=Row(**rd) return new_row ---------------------------- #convert DF to RDD df_rdd= input_dataframe.rdd #apply new fucntion to rdd output_dataframe=df_rdd.map(customColumnVal).toDF() 

input_dataframe is the dataframe which will get modified and customColumnVal funtion is having code to add new column.

To add new column with some custom value or dynamic value calculation which will be populated based on the existing columns.

e.g. |ColumnA | ColumnB | |--------|---------| | 10 | 15 | | 10 | 20 | | 10 | 30 |

and new ColumnC as ColumnA+ColumnB |ColumnA | ColumnB | ColumnC| |--------|---------|--------| | 10 | 15 | 25 | | 10 | 20 | 30 | | 10 | 30 | 40 |

#to add new column def customColumnVal(row): rd=row.asDict() rd["ColumnC"]=row["ColumnA"] + row["ColumnB"] new_row=Row(**rd) return new_row ---------------------------- #convert DF to RDD df_rdd= input_dataframe.rdd #apply new fucntion to rdd output_dataframe=df_rdd.map(customColumnVal).toDF() 

input_dataframe is the dataframe which will get modified and customColumnVal funtion is having code to add new column.

To add new column with some custom value or dynamic value calculation which will be populated based on the existing columns.

e.g.

|ColumnA | ColumnB | |--------|---------| | 10 | 15 | | 10 | 20 | | 10 | 30 | 

and new ColumnC as ColumnA+ColumnB

|ColumnA | ColumnB | ColumnC| |--------|---------|--------| | 10 | 15 | 25 | | 10 | 20 | 30 | | 10 | 30 | 40 | 

using

#to add new column def customColumnVal(row): rd=row.asDict() rd["ColumnC"]=row["ColumnA"] + row["ColumnB"] new_row=Row(**rd) return new_row ---------------------------- #convert DF to RDD df_rdd= input_dataframe.rdd #apply new fucntion to rdd output_dataframe=df_rdd.map(customColumnVal).toDF() 

input_dataframe is the dataframe which will get modified and customColumnVal funtion is having code to add new column.

Source Link
Loading