3

I have this command for all columns in my dataframe to round to 2 decimal places:

data = data.withColumn("columnName1", func.round(data["columnName1"], 2)) 

I have no idea how to round all Dataframe by the one command (not every column separate). Could somebody help me, please? I don't want to have the same command 50times with different column name.

2 Answers 2

11

There is not a function or command for applying all functions to the columns but you can iterate.

+-----+-----+ | col1| col2| +-----+-----+ |1.111|2.222| +-----+-----+ df = spark.read.option("header","true").option("inferSchema","true").csv("test.csv") for c in df.columns: df = df.withColumn(c, f.round(c, 2)) df.show() +----+----+ |col1|col2| +----+----+ |1.11|2.22| +----+----+ 

Updated

from pyspark.sql import functions as f df.select(*[f.round(c, 2).alias(c) for c in df.columns]) \ .show() +----+----+ |col1|col2| +----+----+ |1.11|2.22| +----+----+ 
Sign up to request clarification or add additional context in comments.

5 Comments

df = df.withColumn(c, round(df.col(c), 2))
If in my df there are two columns that can't round, because is date. How I to exclude the columns in transformation round?
TypeError: type str doesn't define round method
worked fine for me !
It should be f.round in the 1st part of the answer instead of just round. By default it would pick python round function.
2

To avoid converting non-FP columns:

import pyspark.sql.functions as F for c_name, c_type in df.dtypes: if c_type in ('double', 'float'): df = df.withColumn(c_name, F.round(c_name, 2)) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.