6

I need to find the percentage of zero across all columns in a pyspark dataframe. How to find the count of zero across each columns in the dataframe?

P.S: I have tried converting the dataframe into a pandas dataframe and used value_counts. But inferring it's observation is not possible for a large dataset.

1

2 Answers 2

6

"How to find the count of zero across each columns in the dataframe?"

First:

import pyspark.sql.functions as F df_zero = df.select([F.count(F.when(df[c] == 0, c)).alias(c) for c in df.columns]) 

Second: you can then see the count (compared to .show(), this gives you better view. And the speed is not much different):

df_zero.limit(2).toPandas().head() 

Enjoy! :)

Sign up to request clarification or add additional context in comments.

Comments

0

Use this code to find number of 0 in a column of a table.

Just replace Tablename and "column name" with the appropriate values:

Tablename.filter(col("column name")==0).count() 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.