Looking for a way to Calculate Frequency distribution of a dataframe in spark/scala

Question

I want to calculate the frequency distribution(return most common element in each column and the number of times it appeared) of a dataframe using spark and scala. I've tried using DataFrameStatFunctions library but after I filter my dataframe for only numeric type columns, I cant apply any functions from the library. Is the best way to do this to create a UDF?

Abhi · Accepted Answer · 2016-07-01 18:05:35Z

you can use val newDF = df.groupBy("columnName").count() newDF.show()

it will show you the frequency count for unique entries.

Collectives™ on Stack Overflow

Looking for a way to Calculate Frequency distribution of a dataframe in spark/scala

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related