Since you are selecting bits of data from different levels of the input objects, you will need to specify the selection more precisely.
As your input consists of a stream of JSON objects, let's start with a function for reading one of those objects:
# Input and output: a JSON object def get: {company_number} as $number | .data | (.address | {address_line_1,country,locality,postal_code,premises}) as $address | {ceased_on,country_of_residence} as $details | (.date_of_birth | {month, year}) as $dob | $number + $address + $details + $dob + {etag,kind} ; There are several ways to read JSON streams, but it's quite convenient to use use input and inputs with the -n command-line option.
To make things easy to read, let's firstnext define another helper function for producing an array of the relevant data:
def getRow: get | [.[]]; Putting it all together:
(input|get) | keys_unsorted, [.[]], (inputs | getRow) | @csv Don't forget the -r and -n command-line options!
Footnote:
In general, using [.[]] to "flatten" a JSON object to a flat array of values is ill-advised, but in the present case, we have ensured a consistent ordering of keys in get, and it is reasonable to assume that none of the values in the selected fields are compound, as suggested by the snippet and the 500,000 records in one of the snapshot files. A "robustification" would, however, be trivial to achieve (e.g. using tostring), and might therefore be advisable.
If you were using gojq (the Go implementation of jq), you would have to do things slightly differently as gojq does not respect user-specified ordering of keys. You'd have to generate the header row differently and make minor changes to get.