3

The following code worked for me before, but not anymore. I got the error:

AttributeError: 'DataFrame' object has no attribute 'toDF'

if __name__ == "__main__": sc = SparkContext(appName="test") sqlContext = SQLContext(sc) df = sqlContext.read.format('com.databricks.spark.csv').\ options(header='false',delimiter=',',inferSchema='true').load('test') ### rename columns df = df.toDF('a','b','c') ... sc.stop() 
5
  • What are you trying to accomplish? Commented Jul 26, 2016 at 16:28
  • assign the column names to a data frame Commented Jul 26, 2016 at 16:48
  • Possible duplicate of How to change dataframe column names in pyspark? Commented Jul 26, 2016 at 17:30
  • I am aware of that post. I am just thinking 'toDF' is more convenient and it worked for me before. spark.apache.org/docs/1.6.1/api/python/pyspark.sql.html Commented Jul 26, 2016 at 21:02
  • I figured it out. Looks like it has to do with our spark version. It worked with 1.6. Commented Jul 27, 2016 at 20:32

2 Answers 2

1

I figured it out. Looks like it has to do with our spark version. It worked with 1.6

Sign up to request clarification or add additional context in comments.

Comments

0

if you are working with spark version 1.6 then use this code for conversion of rdd into df

from pyspark.sql import SQLContext, Row sqlContext = SQLContext(sc) df = sqlContext.createDataFrame(rdd) 

if you want to assign title to rows then use this

df= rdd.map(lambda p: Row(ip=p[0], time=p[1], zone=p[2])) 

ip,time,zone are row headers in this example.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.