6

I'm using pyspark. So I have a spark dataframe that looks like:

a | b | c 5 | 2 | 1 5 | 4 | 3 2 | 4 | 2 2 | 3 | 7 

Need Output:

a | b_list 5 | 2,1,4,3 2 | 4,2,3,7 

It's important to keep the sequence as given in output.

2
  • On what is the data frame currently ordered? Commented Apr 28, 2018 at 19:30
  • @ErnestKiwele Didn't understand your question, but I want to groupby on column a, and get b,c into a list as given in the output. In pandas, it's a one line answer, I can't figure out in pyspark. Commented Apr 28, 2018 at 19:51

2 Answers 2

2

Instead of udf, for joining the list, we can also use concat_ws function as suggested in comments above, like this:

import pyspark.sql.functions as F df = (df .withColumn('lst', F.concat(df['b'], F.lit(','), df['c']).alias('lst')) .groupBy('a') .agg( F.concat_ws(',', F.collect_list('lst').alias('b_list')).alias('lst'))) df.show() +---+-------+ | a| lst| +---+-------+ | 5|2,1,4,3| | 2|4,2,3,7| +---+-------+ 
Sign up to request clarification or add additional context in comments.

Comments

1

The following results in the last 2 columns aggregated into an array column:

df1 = df.withColumn('lst', f.concat(df['b'], f.lit(','), df['c']).alias('lst'))\ .groupBy('a')\ .agg( f.collect_list('lst').alias('b_list')) 

Now join array elements:

#Simplistic udf to joing array: def join_array(col): return ','.join(col) join = f.udf(join_array) df1.select('a', join(df1['b_list']).alias('b_list'))\ .show() 

Printing:

+---+-------+ | a| b_list| +---+-------+ | 5|2,1,4,3| | 2|4,2,3,7| +---+-------+ 

3 Comments

You could use pyspark.sql.functions.concat_ws to do the join which will be faster than using a udf.
@pault thanks. Not sure I misread, but when I first looked at it, it seemed to want string columns as input, but I had arrays to pass in. Will take another look when I get some time...
You can pass an array (like the output of collect_list) to concat_ws - for example, take a look at this answer.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.