I have a DataFrame, a snippet here:
[['u1', 1], ['u2', 0]] basically a string field named f and either a 1 or a 0 for second element (is_fav).
What I need to do is grouping on the first field and counting the occurrences of 1s and 0s. I was hoping to do something like
num_fav = count((col("is_fav") == 1)).alias("num_fav") num_nonfav = count((col("is_fav") == 0)).alias("num_nonfav") df.groupBy("f").agg(num_fav, num_nonfav) It does not work properly, I get in both cases the same result which amounts to the count for the items in the group, so the filter (whether it is a 1 or a 0) seems to be ignored. Does this depend on how count works?