13

I have the following element:

a = Row(ts=1465326926253, myid=u'1234567', mytype=u'good') 

The Row is of Spark data frame Row class. I want to append a new field to a, so that a would look like:

a = Row(ts=1465326926253, myid=u'1234567', mytype=u'good', name = u'john') 

2 Answers 2

23

Here is an updated answer that works. First you have to create a dictionary then update the dict and then write it out to a pyspark Row.

Code is as follows:

from pyspark.sql import Row #Creating the pysql row row = Row(field1=12345, field2=0.0123, field3=u'Last Field') #Convert to python dict temp = row.asDict() #Do whatever you want to the dict. Like adding a new field or etc. temp["field4"] = "it worked!" # Save or output the row to a pyspark rdd output = Row(**temp) #How it looks output In [1]: Row(field1=12345, field2=0.0123, field3=u'Last Field', field4='it worked!') 
Sign up to request clarification or add additional context in comments.

2 Comments

YOU DA REAL MVP
With this solution keeping the same order of columns in the new row is not guaranteed.
10

You cannot add new field to the Row. Row is a subclass of tuple

from pyspark.sql import Row issubclass(Row, tuple) ## True isinstance(Row(), tuple) ## True 

and Python tuples are immutable. All you can do is create a new one:

row = Row(ts=1465326926253, myid=u'1234567', mytype=u'good') # In legacy Python: Row(name=u"john", **row.asDict()) Row(**row.asDict(), name=u"john") ## Row(myid='1234567', mytype='good', name='john', ts=1465326926253) 

Please note that Row keeps it fields sorted by name.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.