Spark dataframe decimal precision

Question

I have one dataframe:

val groupby = df.groupBy($"column1",$"Date") .agg(sum("amount").as("amount")) .orderBy($"column1",desc("cob_date"))

When applyin the window function for adding new column difference:

val windowspec= Window.partitionBy("column1").orderBy(desc("DATE")) groupby.withColumn("diffrence" ,lead($"amount", 1,0).over(windowspec)).show() +--------+------------+-----------+--------------------------+ | Column | Date | Amount | Difference | +--------+------------+-----------+--------------------------+ | A | 3/31/2017 | 12345.45 | 3456.540000000000000000 | +--------+------------+-----------+--------------------------+ | A | 2/28/2017 | 3456.54 | 34289.430000000000000000 | +--------+------------+-----------+--------------------------+ | A | 1/31/2017 | 34289.43 | 45673.987000000000000000 | +--------+------------+-----------+--------------------------+ | A | 12/31/2016 | 45673.987 | 0.00E+00 | +--------+------------+-----------+--------------------------+

I'm getting decimal as with trailing zeros .When I did printSchema() for the above dataframe getting the datatype for difference: decimal(38,18).Can some one tell me how to change the datatype to decimal(38,2) or remove the trailing zeros

please look at spark.apache.org/docs/1.6.2/api/java/org/apache/spark/sql/… — Anahcolus
– Anahcolus, Commented Aug 15, 2017 at 3:34

Ram Ghadiyaram · Accepted Answer · 2019-01-31 05:15:24Z

3

You can cast the data with the specific decimal size like below,

lead($"amount", 1,0).over(windowspec).cast(DataTypes.createDecimalType(32,2))

edited Jan 31, 2019 at 5:15

Ram Ghadiyaram

29.4k16 gold badges102 silver badges133 bronze badges

answered Aug 15, 2017 at 16:13

magic

726 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Thomas Decaux · Accepted Answer · 2018-01-02 10:23:45Z

In pure SQL, you can use the well known technique:

SELECT ceil(100 * column_name_double)/100 AS cost ...

miriam mazzeo · Accepted Answer · 2021-06-29 14:32:01Z

from pyspark.sql.types import DecimalType df=df.withColumn(column_name, df[column_name].cast(DecimalType(10,2)))

Collectives™ on Stack Overflow

Spark dataframe decimal precision

3 Answers 3

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Related