Suppose I have a pyspark data frame as:
col1 col2 col3 1 2 -3 2 null 5 4 4 8 1 0 9 I want to add a column called check where it counts the number of values that are greater than 0.
The final output will be:
col1 col2 col3 check 1 2 -3 2 2 null 5 2 4 4 8 3 1 0 9 2 I was trying this. But, it didn't help and errors out as below:
df= df.withColumn("check", sum((df[col] > 0) for col in df.columns)) Invalid argument, not a string or column: <generator object at 0x7f0a866ae580> of type <class 'generator'>. For column literals, use 'lit', 'array', 'struct' or 'create_map' function.