Round all columns in dataframe - two decimal place pyspark

Question

I have this command for all columns in my dataframe to round to 2 decimal places:

data = data.withColumn("columnName1", func.round(data["columnName1"], 2))

I have no idea how to round all Dataframe by the one command (not every column separate). Could somebody help me, please? I don't want to have the same command 50times with different column name.

Daeho Ro · Accepted Answer · 2023-08-23 15:44:12Z

There is not a function or command for applying all functions to the columns but you can iterate.

+-----+-----+ | col1| col2| +-----+-----+ |1.111|2.222| +-----+-----+ df = spark.read.option("header","true").option("inferSchema","true").csv("test.csv") for c in df.columns: df = df.withColumn(c, f.round(c, 2)) df.show() +----+----+ |col1|col2| +----+----+ |1.11|2.22| +----+----+

Updated

from pyspark.sql import functions as f df.select(*[f.round(c, 2).alias(c) for c in df.columns]) \ .show() +----+----+ |col1|col2| +----+----+ |1.11|2.22| +----+----+

If in my df there are two columns that can't round, because is date. How I to exclude the columns in transformation round?
It should be f.round in the 1st part of the answer instead of just round. By default it would pick python round function.

popilla20k · Accepted Answer · 2022-10-11 15:33:28Z

To avoid converting non-FP columns:

import pyspark.sql.functions as F for c_name, c_type in df.dtypes: if c_type in ('double', 'float'): df = df.withColumn(c_name, F.round(c_name, 2))

Collectives™ on Stack Overflow

Round all columns in dataframe - two decimal place pyspark

2 Answers 2

5 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Linked

Related