12

I would like to merge two files containing JSON. They each contain an array of JSON objects.

registration.json

[ { "name": "User1", "registration": "2009-04-18T21:55:40Z" }, { "name": "User2", "registration": "2010-11-17T15:09:43Z" } ] 

useredits.json

[ { "name": "User1", "editcount": 164 }, { "name": "User2", "editcount": 150 }, { "name": "User3", "editcount": 10 } ] 

In the ideal scenario, I would like to have the following as a result of the merge operation:

[ { "name": "User1", "editcount": 164, "registration": "2009-04-18T21:55:40Z" }, { "name": "User2", "editcount": 150, "registration": "2010-11-17T15:09:43Z" } ] 

I have found https://github.com/stedolan/jq/issues/1247#issuecomment-348817802 but I get

jq: error: module not found: jq 

3 Answers 3

20

jq solution:

jq -s '[ .[0] + .[1] | group_by(.name)[] | select(length > 1) | add ]' registration.json useredits.json 

The output:

[ { "name": "User1", "registration": "2009-04-18T21:55:40Z", "editcount": 164 }, { "name": "User2", "registration": "2010-11-17T15:09:43Z", "editcount": 150 } ] 
Sign up to request clarification or add additional context in comments.

Comments

6

Although not strictly answering the question, the command below

jq -s 'flatten | group_by(.name) | map(reduce .[] as $x ({}; . * $x))' registration.json useredits.json 

generates this output:

[ { "name": "User1", "editcount": 164, "registration": "2009-04-18T21:55:40Z" }, { "name": "User2", "editcount": 150, "registration": "2010-11-17T15:09:43Z" }, { "name": "User3", "editcount": 10 } ] 

Source: jq - error when merging two JSON files "cannot be multiplied"

Comments

0

The following assumes you have jq 1.5 or later, and that:

  • joins.jq as shown below is in the directory ~/.jq/ or the directory ~/.jq/joins/
  • there is no file named joins.jq in the pwd
  • registration.json has been fixed to make it valid JSON (btw, this can be done by jq itself).

The invocation to use would then be:

jq -s 'include "joins"; joins(.name)' registration.json useredits.json 

joins.jq

# joins.jq Version 1 (12-12-2017) def distinct(s): reduce s as $x ({}; .[$x | (type[0:1] + tostring)] = $x) |.[]; # Relational Join # joins/6 provides similar functionality to the SQL INNER JOIN statement: # SELECT (Table1|p1), (Table2|p2) # FROM Table1 # INNER JOIN Table2 ON (Table1|filter1) = (Table2|filter2) # where filter1, filter2, p1 and p2 are filters. # joins(s1; s2; filter1; filter2; p1; p2) # s1 and s2 are streams of objects corresponding to rows in Table1 and Table2; # filter1 and filter2 determine the join criteria; # p1 and p2 are filters determining the final results. # Input: ignored # Output: a stream of distinct pairs [p1, p2] # Note: items in s1 for which filter1 == null are ignored, otherwise all rows are considered. # def joins(s1; s2; filter1; filter2; p1; p2): def it: type[0:1] + tostring; def ix(s;f): reduce s as $x ({}; ($x|f) as $y | if $y == null then . else .[$y|it] += [$x] end); # combine two dictionaries using the cartesian product of distinct elements def merge: .[0] as $d1 | .[1] as $d2 | ($d1|keys_unsorted[]) as $k | if $d2[$k] then distinct($d1[$k][]|p1) as $a | distinct($d2[$k][]|p2) as $b | [$a,$b] else empty end; [ix(s1; filter1), ix(s2; filter2)] | merge; def joins(s1; s2; filter1; filter2): joins(s1; s2; filter1; filter2; .; .) | add ; # Input: an array of two arrays of objects # Output: a stream of the joined objects def joins(filter1; filter2): joins(.[0][]; .[1][]; filter1; filter2); # Input: an array of arrays of objects. # Output: a stream of the joined objects where f defines the join criterion. def joins(f): # j/0 is defined so TCO is applicable def j: if length < 2 then .[][] else [[ joins(.[0][]; .[1][]; f; f)]] + .[2:] | j end; j ; 

1 Comment

In order to be more portable and more readable, I've opted to use the answer by RomanPerekhrest. Thanks for answering my question though!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.