2

I can neither use pyspark or scala. I can only write SQL code. I have a table with 2 columns item id, name.

item_id, name 1 name1 1 name2 1 name3 2 name4 2 name5 

I want to generate results with the names of an item_id concatenated.

item_id, names 1 name1-name2-name3 2 name4-name5 

How do I create such a table with Spark sql?

1

3 Answers 3

2

The beauty of Spark SQL is that once you have a solution in any of the supported languages (Scala, Java, Python, R or SQL) you can somewhat figure out other variants.

The following SQL statement seems doing what you ask for:

SELECT item_id, array_join(collect_list(name), '-') as names FROM tableName GROUP BY item_id 

In spark-shell it gives the following result:

scala> sql("select item_id, array_join(collect_list(name), '-') as names from so group by item_id").show +-------+-----------------+ |item_id| names| +-------+-----------------+ | 1|name1-name2-name3| | 2| name4-name5| +-------+-----------------+ 
Sign up to request clarification or add additional context in comments.

Comments

1

You can try the below -

df.orderBy('names', ascending=False) .groupBy('item_id') .agg( array_join( collect_list('names'), delimiter='-', ).alias('names') ) 

1 Comment

I can only use SQL
0

You can use Spark data frame's groupBy and agg methods and concat_ws function:

df.groupBy($"item_id").agg(concat_ws("-", collect_list($"name")).alias("names")).show() 

Group fields by item_id and aggregating each name field by concatenating them together.

1 Comment

I can only use SQL

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.