I have a Spark Dataframe of the following form:
+------+-------+-----+--------+ | Year | Month | Day | Ticker | +------+-------+-----+--------+ I am trying to group all of the values by "year" and count the number of missing values in each column per year.
I found the following snippet (forgot where from):
df.select(*(sum(col(c).isNull().cast("int")).alias(c) for c in df.columns)).show() This works perfectly when calculating the number of missing values per column. However, I'm not sure how I would modify this to calculate the missing values per year.
Any pointers in the right direction would be much appreciated.