pyspark AttributeError: 'DataFrame' object has no attribute 'toDF'

Question

The following code worked for me before, but not anymore. I got the error:

AttributeError: 'DataFrame' object has no attribute 'toDF'

if __name__ == "__main__": sc = SparkContext(appName="test") sqlContext = SQLContext(sc) df = sqlContext.read.format('com.databricks.spark.csv').\ options(header='false',delimiter=',',inferSchema='true').load('test') ### rename columns df = df.toDF('a','b','c') ... sc.stop()

Possible duplicate of How to change dataframe column names in pyspark? — David
– David, Commented Jul 26, 2016 at 17:30
I am aware of that post. I am just thinking 'toDF' is more convenient and it worked for me before. spark.apache.org/docs/1.6.1/api/python/pyspark.sql.html — user3610141
– user3610141, Commented Jul 26, 2016 at 21:02
I figured it out. Looks like it has to do with our spark version. It worked with 1.6. — user3610141
– user3610141, Commented Jul 27, 2016 at 20:32

user3610141 · Accepted Answer · 2016-07-27 20:35:44Z

1

I figured it out. Looks like it has to do with our spark version. It worked with 1.6

answered Jul 27, 2016 at 20:35

user3610141

3051 gold badge4 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Hamid Ali · Accepted Answer · 2018-12-06 20:49:41Z

if you are working with spark version 1.6 then use this code for conversion of rdd into df

from pyspark.sql import SQLContext, Row sqlContext = SQLContext(sc) df = sqlContext.createDataFrame(rdd)

if you want to assign title to rows then use this

df= rdd.map(lambda p: Row(ip=p[0], time=p[1], zone=p[2]))

ip,time,zone are row headers in this example.

Collectives™ on Stack Overflow

pyspark AttributeError: 'DataFrame' object has no attribute 'toDF'

2 Answers 2

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Linked

Related