PySpark: add a new field to a data frame Row element

Question

I have the following element:

a = Row(ts=1465326926253, myid=u'1234567', mytype=u'good')

The Row is of Spark data frame Row class. I want to append a new field to a, so that a would look like:

a = Row(ts=1465326926253, myid=u'1234567', mytype=u'good', name = u'john')

Rob · Accepted Answer · 2017-08-01 00:40:33Z

Here is an updated answer that works. First you have to create a dictionary then update the dict and then write it out to a pyspark Row.

Code is as follows:

from pyspark.sql import Row #Creating the pysql row row = Row(field1=12345, field2=0.0123, field3=u'Last Field') #Convert to python dict temp = row.asDict() #Do whatever you want to the dict. Like adding a new field or etc. temp["field4"] = "it worked!" # Save or output the row to a pyspark rdd output = Row(**temp) #How it looks output In [1]: Row(field1=12345, field2=0.0123, field3=u'Last Field', field4='it worked!')

With this solution keeping the same order of columns in the new row is not guaranteed.

Community · Accepted Answer · 2018-07-29 15:12:55Z

You cannot add new field to the Row. Row is a subclass of tuple

from pyspark.sql import Row issubclass(Row, tuple) ## True isinstance(Row(), tuple) ## True

and Python tuples are immutable. All you can do is create a new one:

row = Row(ts=1465326926253, myid=u'1234567', mytype=u'good') # In legacy Python: Row(name=u"john", **row.asDict()) Row(**row.asDict(), name=u"john") ## Row(myid='1234567', mytype='good', name='john', ts=1465326926253)

Please note that Row keeps it fields sorted by name.

Collectives™ on Stack Overflow

PySpark: add a new field to a data frame Row element

2 Answers 2

2 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Linked

Related