A simple UDF to split JSON arrays into Hive arrays.
Check out the code and run
mvn package to build an uberjar with everything you need.
The split UDF accepts a single JSON string containing only an array. In the Hive CLI:
add jar target/JsonSplit-1.0-SNAPSHOT.jar; create temporary function json_split as 'com.pythian.hive.udf.JsonSplitUDF'; create table json_example (json string); load data local inpath 'split_example.json' into table json_example; SELECT ex.* FROM json_example LATERAL VIEW explode(json_split(json_example.json)) ex; json_split converts the string to the following array of structs, which are exploded into individual records:
[ { row_id:1, json_string:'1' }, { row_id:2, json_string:'2' }, { row_id:3, json_string:'3' } ] You can access the JSON string for the element with the json_string attribute. The json_string can be any arbitrary JSON string, including another array or a nested object. row_id is the position in the array.
The map UDF accepts a flat JSON object (only integer and string values, no arrays or maps) and converts it into a Hive map. The elements of the map don't have to be defined until query-time, and can be accessed with the square bracket syntax ['key'].
add jar target/JsonSplit-1.0-SNAPSHOT.jar; create temporary function json_map as 'com.pythian.hive.udf.JsonMapUDF'; create table json_map_example (json string); load data local inpath 'map_example.json' into table json_map_example; SELECT json_map(json)['x'] FROM json_map_example LATERAL VIEW explode(json_split(json_example.json)) ex; The above converts the JSON string to a map, then pulls out the value for each record's key 'x'.