0

I read a .csv file to Spark DataFrame. For a DoubleType column is there a way to specify at the time of the file read that this column should be rounded to 2 decimal places? I'm also supplying a custom schema to the DataFrameReader API call. Here's my schema and API calls:

val customSchema = StructType(Array(StructField("id_1", IntegerType, true), StructField("id_2", IntegerType, true), StructField("id_3", DoubleType, true))) #using Spark's CSV reader with custom schema #spark == SparkSession() val parsedSchema = spark.read.format("csv").schema(customSchema).option("header", "true").option("nullvalue", "?").load("C:\\Scala\\SparkAnalytics\\block_1.csv") 

After the file read into DataFrame I can round the decimals like:

parsedSchema.withColumn("cmp_fname_c1", round($"cmp_fname_c1", 3)) 

But this creates a new DataFrame, so I'd also like to know if it can be done in-place instead of creating a new DataFrame.

Thanks

4
  • In-place changes are not allowed in Spark Dataframes. They are immutable. Commented May 1, 2018 at 5:43
  • Is there any specific reason why you think creating a new Dataframe is from existing Dataframe an issue for you? Commented May 1, 2018 at 5:43
  • Spark dataframes are immutable and any operation which transforms the existing dataframe creates a new dataframe. Commented May 1, 2018 at 5:44
  • Spend some time in understanding spark rather than asking questions. Commented May 1, 2018 at 5:45

1 Answer 1

2

You can specify, say, DecimalType(10, 2) for the DoubleType column in your customSchema when loading your CSV file. Let's say you have a CSV file with the following content:

id_1,id_2,Id_3 1,10,5.555 2,20,6.0 3,30,7.444 

Sample code below:

import org.apache.spark.sql.types._ val customSchema = StructType(Array( StructField("id_1", IntegerType, true), StructField("id_2", IntegerType, true), StructField("id_3", DecimalType(10, 2), true) )) spark.read.format("csv").schema(customSchema). option("header", "true").option("nullvalue", "?"). load("/path/to/csvfile"). show // +----+----+----+ // |id_1|id_2|id_3| // +----+----+----+ // | 1| 10|5.56| // | 2| 20|6.00| // | 3| 30|7.44| // +----+----+----+ 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.