13

I am trying to add a new row to dataframe but cant.

my code:

newRow = Row(id='ID123') newDF= df.insertInto(newRow) or newDF= df.union(newRow) 

errors:

AttributeError: _jdf AttributeError: 'DataFrame' object has no attribute 'insertInto' 
1
  • This might be something you are looking for. Try from pyspark.sql import Row, create a dictionary and then update the dictionary. stackoverflow.com/questions/39801691/… Commented Nov 29, 2017 at 15:34

3 Answers 3

20

Simple way to add row in dataframe using pyspark

newRow = spark.createDataFrame([(15,'Alk','Dhl')]) df = df.union(newRow) df.show() 
Sign up to request clarification or add additional context in comments.

Comments

-1

Try: (Documentation)

from pyspark.sql import Row newDf = sc.parallelize([Row(id='ID123')]).toDF() newDF.show() 

4 Comments

it creating newDF rather than adding new
dataframes like RDD's are immutable and hence a new once is always created based on any action.
I'm confused. Where is the original df in this response? Not seeing how this answers the original question.
This is not a helpful answer. There is no indication that a dataFrame is being appended to. Alkesh Mahajan's answer is correct.
-6

Operation like is completely useless in practice. Spark DataFrame is a data structure designed for bulk analytical jobs. It is not intended for fine grained updates.

Although you can create single row DataFrame (as shown by i-n-n-m) and union it won't scale and won't truly distribute the data - Spark will have to keep local copy of the data, and execution plan will grow linearly with the number of inserted objects.

Please consider using proper database instead.

1 Comment

I needed it just for testing.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.