The following approach is very efficient in that:
(a) it takes advantage of the fact that file1.json and file2.json are streams of objects, thus avoiding the memory required to store these objects as arrays;
(b) it avoids sorting (as entailed, for example, by group_by)
The key concept is the keywise-addition of objects. For performing keywise-addition of objects in a stream, we define the following generic function:
# s is assumed to be a stream of mutually # compatible objects in the sense that, given # any key of any object, the values at that key # must be compatible w.r.t. `add` def keywise_add(s): reduce s as $x ({}; reduce ($x|keys_unsorted)[] as $k (.; .[$k] += $x[$k]));
The task can now be accomplished as follows:
keywise_add(inputs | {(.inst_id): .tag_id} ) | keys_unsorted[] as $k | {tag_id: .[$k], inst_id: $k}
Invocation
With the above program in add.jq, the invocation:
jq -c -n -f add.jq file1.json file2.json
yields:
{"tag_id":["t1","t2"],"inst_id":"s1"} {"tag_id":["t1","t2"],"inst_id":"s2"} {"tag_id":["t2"],"inst_id":"s3"}
Caveat
The above assumes that inst_id is string-valued. If that is not the case, then the above approach can still be used so long as there are no collisions amongst inst_id|tostring, which would be the case, for example, if inst_id were always numeric.