Pivot a column in Dataframe which is having multiple values for the pivoted columns

Question

I am having a dataframe as shown below.

+------+-------------+------+-----+ |NUM_ID| TIME|SIGNAL|VALUE| +------+-------------+------+-----+ |XXXX01|1571634079547| SIG1|78860| |XXXX01|1571634090000| SIG1|25.73| |XXXX01|1571634042000| SIG1|25.73| |XXXX01|1571634050000| SIG1|25.73| |XXXX01|1571634050000| SIG2|25.73| |XXXX01|1571634066000| SIG2|25.73| |XXXX01|1571634074000| SIG2|25.73| |XXXX01|1571634090000| SIG3|25.73| |XXXX02|1571634088000| SIG1|25.73| |XXXX02|1571634040000| SIG1|25.73| |XXXX02|1571634048000| SIG1|25.73| |XXXX02|1571634056000| SIG1|25.73| |XXXX02|1571634088000| SIG2|25.73| |XXXX02|1571634072000| SIG2|25.73| |XXXX02|1571634080000| SIG2|25.73| |XXXX02|1571634088000| SIG3|25.73| |XXXX02|1571634094000| SIG3|25.73| |XXXX02|1571634038000| SIG3|25.73| |XXXX03|1571634046000| SIG1|25.73| |XXXX03|1571634054000| SIG1|25.73| |XXXX03|1571634062000| SIG1|25.73| |XXXX03|1571634070000| SIG1|25.73| |XXXX03|1571634078000| SIG2|25.73| |XXXX03|1571634092000| SIG2|25.73| |XXXX03|1571634036000| SIG2|25.73| |XXXX03|1571634044000| SIG3|25.73| |XXXX03|1571634052000| SIG3|25.73| |XXXX03|1571634060000| SIG3|25.73| +------+-------------+------+-----+

I want to take each SIGx as a new column and corresponding VALUE as rows for each SIGx from existing column SIGNAL.

The output should be as shown below.

+------+-------------+-----+-----+-----+ |NUM_ID| TIME| SIG1| SIG2| SIG3| +------+-------------+-----+-----+-----+ |XXXX01|1571634079547|78860| null| null| |XXXX01|1571634090000|25.73| null|25.73| |XXXX01|1571634042000|25.73| null| null| |XXXX01|1571634050000|25.73|25.73| null| |XXXX01|1571634066000| null|25.73| null| |XXXX01|1571634074000| null|25.73| null| |XXXX02|1571634088000|25.73|25.73|25.73| |XXXX02|1571634040000|25.73| null| null| |XXXX02|1571634048000|25.73| null| null| |XXXX02|1571634056000|25.73| null| null| |XXXX02|1571634072000| null|25.73| null| |XXXX02|1571634080000| null|25.73| null| |XXXX02|1571634094000| null| null|25.73| |XXXX02|1571634038000| null| null|25.73| | | | +------+-------------+-----+-----+-----+

The VALUE for SIGx with same TIME should be in same row.

Is there any way to achieve this? I tried with pivot function but not working as expected for pivoted columns having multiple values.

Any leads appreciated. Thanks in advance!

koiralo · Accepted Answer · 2019-10-22 11:46:01Z

1

You can groupBy "NUM_ID" and "TIME" and pivot with "SIGNAL" and get the first value from "VALUE" as below.

df.groupBy("NUM_ID", "TIME") .pivot("SIGNAL") .agg(first("VALUE"))

Hope this helps!

answered Oct 22, 2019 at 11:46

koiralo

23.2k6 gold badges57 silver badges77 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Antony Over a year ago

I tried this but getting an error as

org.apache.spark.sql.AnalysisException: "VALUE" is not a numeric column. Aggregation function can only be applied on a numeric column.; at org.apache.spark.sql.RelationalGroupedDataset$$anonfun$3.apply(RelationalGroupedDataset.scala:103

The column VALUE is of string type. I have values of DOUBLE and BIGINT in the VALUE column, so that casting to a particular type is also not possible.-@Shankar Koirala

koiralo Over a year ago

Can you provide the schema of dataframe?

Antony Over a year ago

-

scala> DF.printSchema root |-- NUM_ID: string (nullable = true) |-- TIME: string (nullable = true) |-- SIGNAL: string (nullable = true) |-- VALUE: string (nullable = true)

Antony Over a year ago

I tried without agg as df.groupBy("NUM_ID", "TIME") .pivot("SIGNAL") But how can we see the data after execution of pivot function. show function will not work as it is not a member of RelationalGroupedDataset.- @Shankar Koirala

koiralo Over a year ago

it should always follow group by with some aggregation function as .agg()

Collectives™ on Stack Overflow

Pivot a column in Dataframe which is having multiple values for the pivoted columns

1 Answer 1

5 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Related