1

My dataset has columns named "key(string), value(long)"

The value of column key like prefix.20171012.111.2222, and the value of column value like 9999.

I want to transform the dataset to a new one which split the colmun key to others like ths "day, rt, item_id, value".

how to do it, thanks a lot

2
  • Maybe this question can help you: stackoverflow.com/questions/39255973 Commented Oct 12, 2017 at 3:46
  • Hi, Shaido. thanks for your quick reply, I am trying it. Commented Oct 12, 2017 at 4:09

1 Answer 1

0
// input ds looks like this +--------+-----+ | key|value| +--------+-----+ |20171011| 9999| +--------+-----+ //import the functions you need import org.apache.spark.sql.functions.{to_date, month, year, dayofmonth} // ds2 val ds2 = ds.withColumn("date", to_date($"key", "yyyyMMdd")) // ds2.show() +--------+-----+----------+ | key|value| date| +--------+-----+----------+ |20171011| 9999|2017-10-11| +--------+-----+----------+ // ds3 val ds3 = ds2.withColumn("Month", month($"date")) .withColumn("Year", year($"date")) .withColumn("Date", dayofmonth($"date")) // ds3.show() +--------+-----+----+-----+----+ | key|value|Date|Month|Year| +--------+-----+----+-----+----+ |20171011| 9999| 11| 10|2017| +--------+-----+----+-----+----+ 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.