1

Spark dataframe, df, has the following column names:

scala> df.columns res6: Array[String] = Array(Age, Job, Marital, Education, Default, Balance, Housing, Loan, Contact, Day, Month, Duration, Campaign, pdays, previous, poutcome, Approved) 

And sql query on df by column names works fine:

scala> spark.sql(""" select Age from df limit 2 """).show() +---+ |Age| +---+ | 30| | 33| +---+ 

But when I try to use withColumn on df I run into problems:

scala> val dfTemp = df.withColumn("temp", df.Age.cast(DoubleType)) .drop("Age").withColumnRenamed("temp", "Age") <console>:38: error: value Age is not a member of org.apache.spark.sql.DataFrame 

Above code is taken from here.

Thanks

1 Answer 1

1

df.Age is not a valid way of calling a column from a dataframe. the correct way is

val dfTemp = df.withColumn("temp", df("Age").cast(DoubleType)) 

or you can do

val dfTemp = df.withColumn("temp", df.col("Age").cast(DoubleType)) 

or you can do

import org.apache.spark.sql.functions.col val dfTemp = df.withColumn("temp", col("Age").cast(DoubleType)) 

Note: df.withColumn("temp", df.Age.cast(DoubleType())) is valid in pyspark

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.