3

I'm using Java Spark and I have 1 Dataframe like this

+---+-----+------+ |id |color|datas | +----------------+ |1 |blue |data1| |1 |red |data2| |1 |orange|data3| |2 |black |data4| |2 | |data5| |2 |yellow| | |3 |white |data7| |3 | |data8| +----------------+ 

I need to modify this dataframe to look like this :

+---+--------------------+---------------------+ |id |color |datas | +----------------------------------------------+ |1 |[blue, red, orange] |[data1, data2, data3]| |2 |[black, yellow] |[data4, data5] | |3 |[white] |[data7, data8] | +----------------------------------------------+ 

I want to merge the data to create an 'array' of the same column but from differents rows based on the 'id' column.

I'm able to do it throught UserDefinedAggregateFunction but I can only do it one column at a time and it takes too much time to process.

Thank you

Edit : I'm using Spark 1.6

2 Answers 2

2

you can group by "id" and then use collect_list function to get the aggregated values.

dataframe.groupBy("id").agg(collect_list(struct("color")).as("color"), collect_list(struct("dates")).as("dates") ) 

Hope this helps

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for the fast answer but unfortunately, I got a "org.apache.spark.sql.AnalysisException: undefined function collect_list" when I try with collect_list I'm using Spark 1.6
did you import the function?? import static org.apache.spark.sql.functions.*;
yes I did, I found this answer for the undefined function stackoverflow.com/questions/35324049/… I'm trying to solve it
if you import function from spark sql it should work.
0

The actual function that works for me is:

dataframe.groupBy("id").agg(collect_list("color").as("color"), collect_list("date").as("date") ) dataframe.createOrReplaceTempView("dataframe")

Then create a new query where you can use the struct()

dffinal = spark.sql(s"""SELECT struct(a.color) AS colors, struct(a.date) AS dates FROM dataframe a """)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.