6

I am trying to get some counts on a DataFrame using agg and count.

from pyspark.sql import Row ,functions as F row = Row("Cat","Date") df = (sc.parallelize ([ row("A",'2017-03-03'), row('A',None), row('B','2017-03-04'), row('B','Garbage'), row('A','2016-03-04') ]).toDF()) df = df.withColumn("Casted", df['Date'].cast('date')) df.show() 

enter image description here

( df.groupby(df['Cat']) .agg ( #F.count(col('Date').isNull() | col('Date').isNotNull()).alias('Date_Count'), F.count('Date').alias('Date_Count'), F.count('Casted').alias('Valid_Date_Count') ) .show() 

)

Sample

The function F.count() is giving me only the non-null count. Is there a way to get the count including nulls other than using an 'OR' condition.

The invalid count doesn't seem to work. The & condition doesn't look to be working as expected.

( df .groupby(df['Cat']) .agg ( F.count('*').alias('count'), F.count('Date').alias('Date_Count'), F.count('Casted').alias('Valid_Date_Count'), F.count(col('Date').isNotNull() & col('Casted').isNull()).alias('invalid') ) .show() ) 

enter image description here

1 Answer 1

4

Cast the boolean expression as an int and sum it

df\ .groupby(df['Cat'])\ .agg ( F.count('Date').alias('Date_Count'), F.count('Casted').alias('Valid_Date_Count'), F.sum((~F.isnull('Date')&F.isnull("Casted")).cast("int")).alias("Invalid_Date_Cound") ).show() +---+----------+----------------+------------------+ |Cat|Date_Count|Valid_Date_Count|Invalid_Date_Cound| +---+----------+----------------+------------------+ | B| 2| 1| 1| | A| 2| 2| 0| +---+----------+----------------+------------------+ 
Sign up to request clarification or add additional context in comments.

4 Comments

can you look at the final code block I added. The invalid alias is not giving the expected result.
@Tronald Dump What is your intended output
the intended output is added at the end of the post.
Here you go, you have to sum the expression

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.