jq - duplicate object in list when merging arbitrary number of json arrays from files

Question

I'm configuring cloudwatch agent logs, using saltstack (which is why there some odd syntax). I am trying to merge an arbitrary number of files, containing identical schema's, but different data into a single file.

File 1

{ "logs": { "logs_collected": { "files":{ "collect_list": [ { "file_name": "/var/log/suricata/eve-ips.json", "log_group_name": "{{grains.environment_full}}SuricataIPS", "log_stream_name": "{{grains.id}}", "timezone": "UTC", "timestamp_format": "%Y-%m-%dT%H:%M:%S.%f+0000" } ] } } } }

File 2

{ "logs": { "logs_collected": { "files": { "collect_list": [ { "file_name": "/var/log/company/company-json.log", "log_group_name": "{{grains.environment_full}}Play", "log_stream_name": "{{grains.id}}", "timezone": "UTC", "timestamp_format": "%Y-%m-%dT%H:%M:%S.%fZ" }, { "file_name": "/var/log/company/company-notifications.log", "log_group_name": "{{grains.environment_full}}Notifications", "log_stream_name": "{{grains.id}}", "timezone": "UTC", "timestamp_format": "%Y-%m-%dT%H:%M:%S.%fZ" } ] } } } }

File 3

{ "logs": { "logs_collected": { "files": { "collect_list": [ { "file_name": "/var/ossec/logs/alerts/alerts.json", "log_group_name": "{{grains.environment_full}}OSSEC", "log_stream_name": "{{grains.id}}", "timezone": "UTC", "timestamp_format": "%Y-%m-%d %H:%M:%S" } ] } } } }

jq query (based on some SO help)

jq -s '.[0].logs.logs_collected.files.collect_list += [.[].logs.logs_collected.files.collect_list | add] | unique| .[0]' web.json suricata.json wazuh-agent.json

Output

{ "logs": { "logs_collected": { "files": { "collect_list": [ { "file_name": "/var/log/company/company-json.log", "log_group_name": "{{grains.environment_full}}Play", "log_stream_name": "{{grains.id}}", "timezone": "UTC", "timestamp_format": "%Y-%m-%dT%H:%M:%S.%fZ" }, { "file_name": "/var/log/company/company-notifications.log", "log_group_name": "{{grains.environment_full}}Notifications", "log_stream_name": "{{grains.id}}", "timezone": "UTC", "timestamp_format": "%Y-%m-%dT%H:%M:%S.%fZ" }, { "file_name": "/var/log/company/company-notifications.log", "log_group_name": "{{grains.environment_full}}Notifications", "log_stream_name": "{{grains.id}}", "timezone": "UTC", "timestamp_format": "%Y-%m-%dT%H:%M:%S.%fZ" }, { "file_name": "/var/log/suricata/eve-ips.json", "log_group_name": "{{grains.environment_full}}SuricataIPS", "log_stream_name": "{{grains.id}}", "timezone": "UTC", "timestamp_format": "%Y-%m-%dT%H:%M:%S.%f+0000" }, { "file_name": "/var/ossec/logs/alerts/alerts.json", "log_group_name": "{{grains['environment_full']}}OSSEC", "log_stream_name": "{{grains.id}}", "timezone": "UTC", "timestamp_format": "%Y-%m-%d %H:%M:%S" } ] } } } }

if you've gotten this far, thank you. One additional point to note, is that if i change the order of the files the first index of collect_list is always duplicated and if web.json is last (the only one with a length of 2) the second log file is not in the group.

To the person that -1'd me. if i added all of the research effort that went into this it would have been enormous, and full of wayward paths that lead nowhere. It was a succinct problem, examples and description of desired output. — ekydfejj
– ekydfejj, Commented Jan 23, 2020 at 19:36

Inian · Accepted Answer · 2020-01-27 06:03:53Z

You have a couple of incorrect steps as part of the attempt you have made. Firstly, to convert the array of all "list" arrays

[.[].logs.logs_collected.files.collect_list]

to one single array, you need to use add but the way it was invoked was incorrect. The add takes input as an array of objects and produces as output the elements of the array added together. So it should have been

[.[].logs.logs_collected.files.collect_list] | add

followed by the unique function which again takes as input an array and produces an array of the same elements, in sorted order, with duplicates removed.

[.[].logs.logs_collected.files.collect_list] | add | unique

As for the duplication of first elements, it is because you were using the append operation += instead of the assignment =. Because the code on the right side of the operator, groups the entries from all the objects, using an append merely appends the value in the first object with the rest of the objects combined together.

Also group the functions together with (..) before accessing the .[0] from the resultant object. So putting it together will work as expected irrespective of the order of files.

jq -s '.[0].logs.logs_collected.files.collect_list = ([.[].logs.logs_collected.files.collect_list]|add|unique)|.[0]'

Another variant using reduce without having to use slurp mode -s, but rather working on the inputs i.e. content of the all the files available over standard input

jq -n 'reduce inputs.logs.logs_collected.files.collect_list as $d (.; .logs.logs_collected.files.collect_list += $d)'

Inian · Accepted Answer · 2020-01-24 04:52:58Z

The following outputs the correct information (no duplicates) regardless of file order

jq -s 'reduce .[] as $dot ({}; .logs.logs_collected.files.collect_list += $dot.logs.logs_collected.files.collect_list)' web.json wazuh-agent.json suricata.json

If anyone knows why the duplicate occurs in my other command would love to know that. Though i think reduce is a better solution.

Collectives™ on Stack Overflow

jq - duplicate object in list when merging arbitrary number of json arrays from files

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related