3

I'm using the jq tools (jq-json-processor) in shell script to parse json.

I've got 2 json files and want to merge them into one unique file

Here the content of files:

file1:

{"tag_id" : ["t1"], "inst_id" : "s1"} {"tag_id" : ["t1"], "inst_id" : "s2"} 

file2:

{"tag_id" : ["t2"], "inst_id" : "s1"} {"tag_id" : ["t2"], "inst_id" : "s2"} {"tag_id" : ["t2"], "inst_id" : "s3"} 

expected result:

{"tag_id" : ["t1","t2"], "inst_id" : "s1"} {"tag_id" : ["t1","t2"], "inst_id" : "s2"} {"tag_id" : ["t2"], "inst_id" : "s3"} 

3 Answers 3

1

One way is to use group_by:

jq -n --slurpfile file1 file1.json --slurpfile file2 file2.json -f merge.jq 

where merge.jq contains:

def sigma(f): reduce f as $x (null; . + $x); $file1 + $file2 | group_by(.inst_id)[] | {tag_id: sigma(.[].tag_id), inst_id: .[0].inst_id } 
Sign up to request clarification or add additional context in comments.

Comments

0

Here's a join-like approach. It assumes your jq has INDEX/2 and supports the --slurpfile command-line option. If your jq does not have these, now would be a good time to upgrade, though there are easy workarounds.

Invocation

jq -n --slurpfile file1 file1.json -f join.jq file2.json 

join.jq

def join(s2; joinField; field): INDEX(.[]; joinField) | reduce s2 as $x (.; ($x|joinField) as $key | if .[$key] then (.[$key]|field) += ($x|field) else .[$key] = $x end ) | .[] ; $file1 | join(inputs; .inst_id; .tag_id) 

Comments

0

The following approach is very efficient in that:

(a) it takes advantage of the fact that file1.json and file2.json are streams of objects, thus avoiding the memory required to store these objects as arrays;

(b) it avoids sorting (as entailed, for example, by group_by)

The key concept is the keywise-addition of objects. For performing keywise-addition of objects in a stream, we define the following generic function:

# s is assumed to be a stream of mutually # compatible objects in the sense that, given # any key of any object, the values at that key # must be compatible w.r.t. `add` def keywise_add(s): reduce s as $x ({}; reduce ($x|keys_unsorted)[] as $k (.; .[$k] += $x[$k])); 

The task can now be accomplished as follows:

keywise_add(inputs | {(.inst_id): .tag_id} ) | keys_unsorted[] as $k | {tag_id: .[$k], inst_id: $k} 

Invocation

With the above program in add.jq, the invocation:

jq -c -n -f add.jq file1.json file2.json 

yields:

{"tag_id":["t1","t2"],"inst_id":"s1"} {"tag_id":["t1","t2"],"inst_id":"s2"} {"tag_id":["t2"],"inst_id":"s3"} 

Caveat

The above assumes that inst_id is string-valued. If that is not the case, then the above approach can still be used so long as there are no collisions amongst inst_id|tostring, which would be the case, for example, if inst_id were always numeric.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.