1

is there a way to calculate KDE of every column of a DataFrame?

I have a DataFrame where each column represents the values of one feature. The KDE function of Spark MLLib needs an RDD[Double] of the sample values. The problem is I need to find a way without collecting the values for each column, because that would slow down the program to much.

Does anyone have an idea how I could solve that? Sadly all my tries failed till now.

1 Answer 1

0

Probably you can create a new RDD using sample function (refer here) and then perform your operation to get the optimal performance.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.