Scala - read JSON file as a single String with Spark

Question

I have JSON files describing a table structure. I want to read each file from S3 as a single String in order to then apply a fromJson() method of apache.spark.sql.types.DataType

DataType.fromJson(jsonString).asInstanceOf[StructType]

But for now I only managed to read the files into a DataFrame:

 val testJsonData = sqlContext.read.option("multiline", "true").json("/s3Bucket/metrics/metric1.json")

But I don't need a df.schema, instead I need to parse the contents of a JSON string to a StructType.

The contents of a JSON file:

{ "type" : "struct", "fields" : [ { "name" : "metric_name", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "metric_time", "type" : "long", "nullable" : true, "metadata" : { } }, { "name" : "metric_value", "type" : "string", "nullable" : true, "metadata" : { } }] }

this might help docs.databricks.com/_static/notebooks/…

maogautam
– maogautam

2019-08-21 18:06:28 +00:00
Commented Aug 21, 2019 at 18:06 — maogautam
– maogautam, Commented Aug 21, 2019 at 18:06

Travis Hegner · Accepted Answer · 2019-08-21 19:32:44Z

It looks like what you want to use is sc.wholeTextFiles (sc is a SparkContext in this case).

This results in an RDD[(String, String)] where ._1 is the file name, and ._2 is the entire file content. Maybe you can try:

val files = sc.wholeTextFiles("/s3Bucket/metrics/", 16).toDS() files.map(DataType.fromJson(_._2).asInstanceOf[StructType])

Which, in theory, would give you an Dataset[StructType]. Unfortunately, I'm not finding a similar function in the pure spark sql API, but this may work.

Collectives™ on Stack Overflow

Scala - read JSON file as a single String with Spark

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related