3

I'm trying to read a Json file which is like :

[ {"IFAM":"EQR","KTM":1430006400000,"COL":21,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} ,{"MLrate":"31","Nrout":"0","up":null,"Crate":"2"} ,{"MLrate":"30","Nrout":"5","up":null,"Crate":"2"} ,{"MLrate":"34","Nrout":"0","up":null,"Crate":"4"} ,{"MLrate":"33","Nrout":"0","up":null,"Crate":"2"} ,{"MLrate":"30","Nrout":"8","up":null,"Crate":"2"} ]} ,{"IFAM":"EQR","KTM":1430006400000,"COL":22,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"0"} ,{"MLrate":"35","Nrout":"1","up":null,"Crate":"5"} ,{"MLrate":"30","Nrout":"6","up":null,"Crate":"2"} ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} ,{"MLrate":"38","Nrout":"8","up":null,"Crate":"1"} ]} ,... ] 

I've tried the command:

 val df = sqlContext.read.json("namefile") df.show() 

But this does not work : my columns are not recognized...

2
  • is your single json data is spread into multiple lines or single line ? Commented Jul 29, 2015 at 13:55
  • single line but when I open it with notePad this is easy to read and on several lines. Commented Jul 29, 2015 at 14:01

2 Answers 2

6

If you want to use read.json you need a single JSON document per line. If your file contains a valid JSON array with documents it simply won't work as expected. For example if we take your example data input file should be formatted like this:

{"IFAM":"EQR","KTM":1430006400000,"COL":21,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}, {"MLrate":"31","Nrout":"0","up":null,"Crate":"2"}, {"MLrate":"30","Nrout":"5","up":null,"Crate":"2"} ,{"MLrate":"34","Nrout":"0","up":null,"Crate":"4"} ,{"MLrate":"33","Nrout":"0","up":null,"Crate":"2"} ,{"MLrate":"30","Nrout":"8","up":null,"Crate":"2"} ]} {"IFAM":"EQR","KTM":1430006400000,"COL":22,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"0"} ,{"MLrate":"35","Nrout":"1","up":null,"Crate":"5"} ,{"MLrate":"30","Nrout":"6","up":null,"Crate":"2"} ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} ,{"MLrate":"38","Nrout":"8","up":null,"Crate":"1"} ]} 

If you use read.json on above structure you'll see it is parsed as expected:

scala> sqlContext.read.json("namefile").printSchema root |-- COL: long (nullable = true) |-- DATA: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- Crate: string (nullable = true) | | |-- MLrate: string (nullable = true) | | |-- Nrout: string (nullable = true) | | |-- up: string (nullable = true) |-- IFAM: string (nullable = true) |-- KTM: long (nullable = true) 
Sign up to request clarification or add additional context in comments.

1 Comment

@zero323 I want o do something similar, but I just want to load the data from the json file, to an rdd and I dont want to go through the spark-sql just a simple transformation. How can I achieve this? Thanks!
0

If you don't want to format your JSON file (line by line) you could create a schema using StructType and MapType using SparkSQL functions

import org.apache.spark.sql.DataFrame import org.apache.spark.sql.functions._ import org.apache.spark.sql.types._ // Convenience function for turning JSON strings into DataFrames def jsonToDataFrame(json: String, schema: StructType = null): DataFrame = { val reader = spark.read Option(schema).foreach(reader.schema) reader.json(sc.parallelize(Array(json))) } // Using a struct val schema = new StructType().add("a", new StructType().add("b", IntegerType)) // call the function passing the sample JSON data and the schema as parameter val json_df = jsonToDataFrame(""" { "a": { "b": 1 } } """, schema) // now you can access your json fields val b_value = json_df.select("a.b") b_value.show() 

See this reference documentation for more examples and details https://docs.databricks.com/spark/latest/spark-sql/complex-types.html#transform-complex-data-types-scala

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.