Revisions to JSON to CSV conversion using jq

edited Dec 5, 2021 at 18:08

119.1k
21
185
218

Since you are selecting bits of data from different levels of the input objects, you will need to specify the selection more precisely.

As your input consists of a stream of JSON objects, let's start with a function for reading one of those objects:

# Input and output: a JSON object def get: {company_number} as $number | .data | (.address | {address_line_1,country,locality,postal_code,premises}) as $address | {ceased_on,country_of_residence} as $details | (.date_of_birth | {month, year}) as $dob | $number + $address + $details + $dob + {etag,kind} ;

There are several ways to read JSON streams, but it's quite convenient to use use input and inputs with the -n command-line option.

To make things easy to read, let's firstnext define another helper function for producing an array of the relevant data:

def getRow: get | [.[]];

Putting it all together:

(input|get) | keys_unsorted, [.[]], (inputs | getRow) | @csv

Don't forget the -r and -n command-line options!

Footnote:

^{In general, using [.[]] to "flatten" a JSON object to a flat array of values is ill-advised, but in the present case, we have ensured a consistent ordering of keys in get, and it is reasonable to assume that none of the values in the selected fields are compound, as suggested by the snippet and the 500,000 records in one of the snapshot files. A "robustification" would, however, be trivial to achieve (e.g. using tostring), and might therefore be advisable.}

^{If you were using gojq (the Go implementation of jq), you would have to do things slightly differently as gojq does not respect user-specified ordering of keys. You'd have to generate the header row differently and make minor changes to get.}

Since you are selecting bits of data from different levels of the input objects, you will need to specify the selection more precisely.

As your input consists of a stream of JSON objects, let's start with a function for reading one of those objects:

# Input and output: a JSON object def get: {company_number} as $number | .data | (.address | {address_line_1,country,locality,postal_code,premises}) as $address | {ceased_on,country_of_residence} as $details | (.date_of_birth | {month, year}) as $dob | $number + $address + $details + $dob + {etag,kind} ;

There are several ways to read JSON streams, but it's quite convenient to use use input and inputs with the -n command-line option.

To make things easy to read, let's first define another helper function for producing an array of the relevant data:

def getRow: get | [.[]];

Putting it all together:

(input|get) | keys_unsorted, [.[]], (inputs | getRow) | @csv

Don't forget the -r and -n command-line options!

Footnote:

^{In general, using [.[]] to "flatten" a JSON object to a flat array of values is ill-advised, but in the present case, we have ensured a consistent ordering of keys in get, and it is reasonable to assume that none of the values in the selected fields are compound, as suggested by the snippet and the 500,000 records in one of the snapshot files. A "robustification" would, however, be trivial to achieve (e.g. using tostring), and might therefore be advisable.}

^{If you were using gojq (the Go implementation of jq), you would have to do things slightly differently as gojq does not respect user-specified ordering of keys. You'd have to generate the header row differently and make minor changes to get.}

Since you are selecting bits of data from different levels of the input objects, you will need to specify the selection more precisely.

As your input consists of a stream of JSON objects, let's start with a function for reading one of those objects:

# Input and output: a JSON object def get: {company_number} as $number | .data | (.address | {address_line_1,country,locality,postal_code,premises}) as $address | {ceased_on,country_of_residence} as $details | (.date_of_birth | {month, year}) as $dob | $number + $address + $details + $dob + {etag,kind} ;

There are several ways to read JSON streams, but it's quite convenient to use use input and inputs with the -n command-line option.

To make things easy to read, let's next define another helper function for producing an array of the relevant data:

def getRow: get | [.[]];

Putting it all together:

(input|get) | keys_unsorted, [.[]], (inputs | getRow) | @csv

Don't forget the -r and -n command-line options!

Footnote:

^{In general, using [.[]] to "flatten" a JSON object to a flat array of values is ill-advised, but in the present case, we have ensured a consistent ordering of keys in get, and it is reasonable to assume that none of the values in the selected fields are compound, as suggested by the snippet and the 500,000 records in one of the snapshot files. A "robustification" would, however, be trivial to achieve (e.g. using tostring), and might therefore be advisable.}

^{If you were using gojq (the Go implementation of jq), you would have to do things slightly differently as gojq does not respect user-specified ordering of keys. You'd have to generate the header row differently and make minor changes to get.}

gojq

Source Link

edited Dec 5, 2021 at 18:03

peak

119.1k
21
185
218

Since you are selecting bits of data from different levels of the input objects, you will need to specify the selection more precisely.

As your input consists of a stream of JSON objects, let's start with a function for reading one of those objects:

# Input and output: a JSON object def get: {company_number} as $number | .data | (.address | {address_line_1,country,locality,postal_code,premises}) as $address | {ceased_on,country_of_residence} as $details | (.date_of_birth | {month, year}) as $dob | $number + $address + $details + $dob + {etag,kind} ;

There are several ways to read JSON streams, but it's quite convenient to use use input and inputs with the -n command-line option.

To make things easy to read, let's first define another helper function for producing an array of the relevant data:

def getRow: get | [.[]];

Putting it all together:

(input|get) | keys_unsorted as $keys, |  [.[]] as $rows | $keys, $rows, (inputs | getRow) | @csv

Don't forget the -r and -n command-line options!

Footnote:

^{In general, using [.[]] to "flatten" a JSON object to a flat array of values is ill-advised, but in the present case, we have ensured a consistent ordering of keys in get, and it is reasonable to assume that none of the values in the selected fields are compound, as suggested by the snippet and the 500,000 records in one of the snapshot files. A "robustification" would, however, be trivial to achieve (e.g. using tostring), and might therefore be advisable.}

^{If you were using gojq (the Go implementation of jq), you would have to do things slightly differently as gojq does not respect user-specified ordering of keys. You'd only have to generate the header row differently and make minor tweakschanges to get.}

Since you are selecting bits of data from different levels of the input objects, you will need to specify the selection more precisely.

As your input consists of a stream of JSON objects, let's start with a function for reading one of those objects:

# Input and output: a JSON object def get: {company_number} as $number | .data | (.address | {address_line_1,country,locality,postal_code,premises}) as $address | {ceased_on,country_of_residence} as $details | (.date_of_birth | {month, year}) as $dob | $number + $address + $details + $dob + {etag,kind} ;

There are several ways to read JSON streams, but it's quite convenient to use use input and inputs with the -n command-line option.

To make things easy to read, let's first define another helper function for producing an array of the relevant data:

def getRow: get | [.[]];

Putting it all together:

(input|get) | keys_unsorted as $keys | [.[]] as $rows | $keys, $rows, (inputs | getRow) | @csv

Don't forget the -r and -n command-line options!

Footnote:

^{In general, using [.[]] to "flatten" a JSON object to a flat array of values is ill-advised, but in the present case, we have ensured a consistent ordering of keys in get, and it is reasonable to assume that none of the values in the selected fields are compound, as suggested by the snippet and the 500,000 records in one of the snapshot files. A "robustification" would, however, be trivial to achieve (e.g. using tostring), and might therefore be advisable.}

^{If you were using gojq (the Go implementation of jq), you would have to do things slightly differently as gojq does not respect user-specified ordering of keys. You'd only have to make minor tweaks to get.}

Since you are selecting bits of data from different levels of the input objects, you will need to specify the selection more precisely.

As your input consists of a stream of JSON objects, let's start with a function for reading one of those objects:

# Input and output: a JSON object def get: {company_number} as $number | .data | (.address | {address_line_1,country,locality,postal_code,premises}) as $address | {ceased_on,country_of_residence} as $details | (.date_of_birth | {month, year}) as $dob | $number + $address + $details + $dob + {etag,kind} ;

There are several ways to read JSON streams, but it's quite convenient to use use input and inputs with the -n command-line option.

To make things easy to read, let's first define another helper function for producing an array of the relevant data:

def getRow: get | [.[]];

Putting it all together:

(input|get) | keys_unsorted,   [.[]], (inputs | getRow) | @csv

Don't forget the -r and -n command-line options!

Footnote:

^{In general, using [.[]] to "flatten" a JSON object to a flat array of values is ill-advised, but in the present case, we have ensured a consistent ordering of keys in get, and it is reasonable to assume that none of the values in the selected fields are compound, as suggested by the snippet and the 500,000 records in one of the snapshot files. A "robustification" would, however, be trivial to achieve (e.g. using tostring), and might therefore be advisable.}

^{If you were using gojq (the Go implementation of jq), you would have to do things slightly differently as gojq does not respect user-specified ordering of keys. You'd have to generate the header row differently and make minor changes to get.}

added 14 characters in body

Source Link

edited Dec 5, 2021 at 11:36

peak

119.1k
21
185
218

Since you are selecting bits of data from different levels of the input objects, you will need to specify the selection more precisely.

As your input consists of a stream of JSON objects, let's start with a function for reading one of those objects:

# Input and output: a JSON object def get: {company_number} as $number | .data | (.address | {address_line_1,country,locality,postal_code,premises}) as $address | {ceased_on,country_of_residence} as $details | (.date_of_birth | {month, year}) as $dob | $number + $address + $details + $dob + {etag,kind} ;

There are several ways to read JSON streams, but it's quite convenient to use use input and inputs with the -n command-line option.

To make things easy to read, let's first define another helper function for producing an array of the relevant data:

def getRow: get | [.[]];

Putting it all together:

(input|get) | keys_unsorted as $keys | [.[]] as $rows | $keys, $rows, (inputs | getRow) | @csv

Don't forget the -r and -n command-line options!

Footnote:

^{In general, using [.[]] to "flatten" a JSON object to ana flat array of values is ill-advised, but in the present case, we have ensured a consistent ordering of keys in get, and it is reasonable to assume that none of the values in the selected fields are compound, as suggested by the snippet and the 500,000 records in one of the snapshot files. A "robustification" would, however, be trivial to achieve (e.g. using tostring), and might therefore be advisable.}

^{If you were using gojq (the Go implementation of jq), you would have to do things slightly differently as gojq does not respect user-specified ordering of keys. You'd only have to make minor tweaks to get.}

Since you are selecting bits of data from different levels of the input objects, you will need to specify the selection more precisely.

As your input consists of a stream of JSON objects, let's start with a function for reading one of those objects:

# Input and output: a JSON object def get: {company_number} as $number | .data | (.address | {address_line_1,country,locality,postal_code,premises}) as $address | {ceased_on,country_of_residence} as $details | (.date_of_birth | {month, year}) as $dob | $number + $address + $details + $dob + {etag,kind} ;

There are several ways to read JSON streams, but it's quite convenient to use use input and inputs with the -n command-line option.

To make things easy to read, let's first define another helper function for producing an array of the relevant data:

def getRow: get | [.[]];

Putting it all together:

(input|get) | keys_unsorted as $keys | [.[]] as $rows | $keys, $rows, (inputs | getRow) | @csv

Don't forget the -r and -n command-line options!

Footnote:

^{In general, using [.[]] to "flatten" a JSON object to an array is ill-advised, but in the present case, we have ensured a consistent ordering of keys in get, and it is reasonable to assume that none of the values in the selected fields are compound, as suggested by the snippet and the 500,000 records in one of the snapshot files. A "robustification" would, however, be trivial to achieve (e.g. using tostring), and might therefore be advisable.}

^{If you were using gojq (the Go implementation of jq), you would have to do things slightly differently as gojq does not respect user-specified ordering of keys. You'd only have to make minor tweaks to get.}

Since you are selecting bits of data from different levels of the input objects, you will need to specify the selection more precisely.

As your input consists of a stream of JSON objects, let's start with a function for reading one of those objects:

# Input and output: a JSON object def get: {company_number} as $number | .data | (.address | {address_line_1,country,locality,postal_code,premises}) as $address | {ceased_on,country_of_residence} as $details | (.date_of_birth | {month, year}) as $dob | $number + $address + $details + $dob + {etag,kind} ;

There are several ways to read JSON streams, but it's quite convenient to use use input and inputs with the -n command-line option.

To make things easy to read, let's first define another helper function for producing an array of the relevant data:

def getRow: get | [.[]];

Putting it all together:

(input|get) | keys_unsorted as $keys | [.[]] as $rows | $keys, $rows, (inputs | getRow) | @csv

Don't forget the -r and -n command-line options!

Footnote:

^{In general, using [.[]] to "flatten" a JSON object to a flat array of values is ill-advised, but in the present case, we have ensured a consistent ordering of keys in get, and it is reasonable to assume that none of the values in the selected fields are compound, as suggested by the snippet and the 500,000 records in one of the snapshot files. A "robustification" would, however, be trivial to achieve (e.g. using tostring), and might therefore be advisable.}

^{If you were using gojq (the Go implementation of jq), you would have to do things slightly differently as gojq does not respect user-specified ordering of keys. You'd only have to make minor tweaks to get.}