2

I have a 1GB JSON file I like to convert to CSV format. The file contains information about UK company people with significant control (PSC). file source: http://download.companieshouse.gov.uk/en_pscdata.html

here is a data snippet of PSC Data product:

{"company_number":"04502074","data":{"address":{"address_line_1":"Grove Hall","address_line_2":"Ashbourne Green","locality":"Ashbourne","postal_code":"DE6 1JD","region":"Derbyshire"},"country_of_residence":"England","date_of_birth":{"month":11,"year":1964},"etag":"f9a632332f63b61f004569f99d6b15e3e6d28192","kind":"individual-person-with-significant-control","links":{"self":"/company/04502074/persons-with-significant-control/individual/34zTsx2BFGMyn0lJe2REL656U8w"},"name":"Mr Philip Anthony Donlan","name_elements":{"forename":"Philip","middle_name":"Anthony","surname":"Donlan","title":"Mr"},"nationality":"British","natures_of_control":["ownership-of-shares-50-to-75-percent"],"notified_on":"2016-04-06"}} {"company_number":"10260075","data":{"address":{"country":"England","locality":"Widnes","postal_code":"WA8 9DH","premises":"1 Stockswell Farm Court"},"country_of_residence":"England","date_of_birth":{"month":12,"year":1978},"etag":"dbf13fc08cb9136450089681b6e9364eb8458129","kind":"individual-person-with-significant-control","links":{"self":"/company/10260075/persons-with-significant-control/individual/Br24rkYIl3ZKam3C9fT4o_9uF7k"},"name":"Mr Daniel Thomas Ross","name_elements":{"forename":"Daniel","middle_name":"Thomas","surname":"Ross","title":"Mr"},"nationality":"English","natures_of_control":["significant-influence-or-control"],"notified_on":"2016-07-01"}} {"company_number":"SC539354","data":{"address":{"address_line_1":"5 West Victoria Dock Road","locality":"Dundee","postal_code":"DD1 3JT","premises":"Begbies Traynor (Central) Llp, River Court"},"country_of_residence":"Scotland","date_of_birth":{"month":4,"year":1980},"etag":"37599a22ede050072457db60af6e75ba8e237246","kind":"individual-person-with-significant-control","links":{"self":"/company/SC539354/persons-with-significant-control/individual/T7LPjXkKRuaunfMRjfFrnWiHEnI"},"name":"Mr Stuart Hemple","name_elements":{"forename":"Stuart","surname":"Hemple","title":"Mr"},"nationality":"British","natures_of_control":["ownership-of-shares-75-to-100-percent","voting-rights-75-to-100-percent","right-to-appoint-and-remove-directors","significant-influence-or-control"],"notified_on":"2016-07-01"}} {"company_number":"02722495","data":{"address":{"address_line_1":"Beechdene","address_line_2":"108 Coventry Road","locality":"Warwick","postal_code":"CV34 5HH"},"country_of_residence":"England","date_of_birth":{"month":12,"year":1953},"etag":"610138d3809ab3237b609f3cb93bfe4bf89d7581","kind":"individual-person-with-significant-control","links":{"self":"/company/02722495/persons-with-significant-control/individual/8g4ED3usT4wLPEqra7dE97eqmHE"},"name":"Mr Marshall Fenn Stephenson","name_elements":{"forename":"Marshall","middle_name":"Fenn","surname":"Stephenson","title":"Mr"},"nationality":"British","natures_of_control":["ownership-of-shares-25-to-50-percent"],"notified_on":"2016-07-01"}} {"company_number":"05495733","data":{"address":{"address_line_1":"Brompton Road","country":"England","locality":"London","postal_code":"SW3 2AS","premises":"253"},"country_of_residence":"Italy","date_of_birth":{"month":4,"year":1953},"etag":"d45e5d2aa905e769e6fd3aa364e301f73b047985","kind":"individual-person-with-significant-control","links":{"self":"/company/05495733/persons-with-significant-control/individual/Oqp-z-D5JTX0mjXTtmOqmct1vR4"},"name":"Mr Roberto Gavazzi","name_elements":{"forename":"Roberto","surname":"Gavazzi","title":"Mr"},"nationality":"Italian","natures_of_control":["ownership-of-shares-50-to-75-percent","voting-rights-50-to-75-percent","right-to-appoint-and-remove-directors-as-firm","significant-influence-or-control-as-firm"],"notified_on":"2016-06-30"}} {"company_number":"SC539355","data":{"address":{"address_line_1":"6 Dryden Road","country":"Scotland","locality":"Loanhead","postal_code":"EH20 9LZ","premises":"Bilston Glen Business Centre"},"country_of_residence":"Scotland","date_of_birth":{"month":10,"year":1990},"etag":"b03abb8bb1b6f95039dd896210d7c231d8784c31","kind":"individual-person-with-significant-control","links":{"self":"/company/SC539355/persons-with-significant-control/individual/tYyjuJrp6Ifp327VxThVGeRswMM"},"name":"Mr David John Kelly","name_elements":{"forename":"David","middle_name":"John","surname":"Kelly","title":"Mr"},"nationality":"Scottish","natures_of_control":["ownership-of-shares-75-to-100-percent"],"notified_on":"2016-07-01"}} {"company_number":"SC539356","data":{"address":{"address_line_1":"Scholes","country":"England","locality":"Wigan","postal_code":"WN1 1YF","premises":"106 Douglas House"},"country_of_residence":"England","date_of_birth":{"month":3,"year":1961},"etag":"2f15d0fbacc68763b00e203ab0820b0911ac5906","kind":"individual-person-with-significant-control","links":{"self":"/company/SC539356/persons-with-significant-control/individual/0eATs-Ecoj9ie0_pBCq29L6UtlM"},"name":"Mr Mark Edward Sowery","name_elements":{"forename":"Mark","middle_name":"Edward","surname":"Sowery","title":"Mr"},"nationality":"British","natures_of_control":["ownership-of-shares-75-to-100-percent"],"notified_on":"2016-07-01"}} {"company_number":"07674942","data":{"address":{"address_line_1":"Old Gloucester Street","country":"England","locality":"London","postal_code":"WC1N 3AX","premises":"27"},"country_of_residence":"Sierra Leone","date_of_birth":{"month":2,"year":1979},"etag":"f2e9cc0033cd5ef24fcda06baeff06bb8ea72654","kind":"individual-person-with-significant-control","links":{"self":"/company/07674942/persons-with-significant-control/individual/D31C5Na0B1I4rqM1RYwpy3J8oKA"},"name":"Mr Muhammad Umar Babar","name_elements":{"forename":"Muhammad","middle_name":"Umar","surname":"Babar","title":"Mr"},"nationality":"Pakistani","natures_of_control":["ownership-of-shares-75-to-100-percent"],"notified_on":"2016-07-01"}} {"company_number":"09639364","data":{"address":{"address_line_1":"Galmington Road","country":"United Kingdom","locality":"Taunton","postal_code":"TA1 5NP","premises":"58b","region":"Somerset"},"country_of_residence":"United Kingdom","date_of_birth":{"month":12,"year":1977},"etag":"25ff7d41f9b8f257f0d41ae82e88202017beff34","kind":"individual-person-with-significant-control","links":{"self":"/company/09639364/persons-with-significant-control/individual/qlPpucOQopiIgq1xzZIb6xjO5JQ"},"name":"Mr Li Ying Cao","name_elements":{"forename":"Li","middle_name":"Ying","surname":"Cao","title":"Mr"},"nationality":"British","natures_of_control":["ownership-of-shares-75-to-100-percent"],"notified_on":"2016-07-01"}} {"company_number":"08541397","data":{"address":{"address_line_1":"Hedley Avenue","locality":"Blyth","postal_code":"NE24 3JP","premises":"27","region":"Northumberland"},"country_of_residence":"England","date_of_birth":{"month":11,"year":1949},"etag":"b843664ca67a4274ee6f6cc9816ab35cd8367190","kind":"individual-person-with-significant-control","links":{"self":"/company/08541397/persons-with-significant-control/individual/KuddC6fZH17ifaXSAVWEcC2ba74"},"name":"Mr David Harwood","name_elements":{"forename":"David","surname":"Harwood","title":"Mr"},"nationality":"British","natures_of_control":["ownership-of-shares-75-to-100-percent"],"notified_on":"2016-05-23"}} {"company_number":"02832188","data":{"address":{"address_line_1":"Lodge Road","country":"England","locality":"London","postal_code":"NW4 4DD","premises":"1"},"country_of_residence":"England","date_of_birth":{"month":5,"year":1952},"etag":"0c21b2b560ee43ca0c2ffd9f07d5ca564536b6e2","kind":"individual-person-with-significant-control","links":{"self":"/company/02832188/persons-with-significant-control/individual/Rh8pb-L7fEZzkyhttuCwVjjL_eA"},"name":"Mrs Rachel Weissman","name_elements":{"forename":"Rachel","surname":"Weissman","title":"Mrs"},"nationality":"British","natures_of_control":["ownership-of-shares-25-to-50-percent","voting-rights-25-to-50-percent","right-to-appoint-and-remove-directors","significant-influence-or-control"],"notified_on":"2016-07-01"}} 

I have created a input file in.json, contain my json data as providet by companies house: file source: http://download.companieshouse.gov.uk/en_pscdata.html

I have created a output file out.csv

I am trying to run the follwing code:

jq -r 'map({company_number,address_line_1,country,locality,postal_code,premises,ceased_on,country_of_residence,month,year,etag,kind}) | (first | keys_unsorted) as $keys | map([to_entries[] | .value]) as $rows | $keys,$rows[] | @csv' in.json > out.csv 

im getting the following error:

jq: error (at in.json:0): Cannot index string with string "company_number"

please advise on what am I doing wrong and how to get this done.

1 Answer 1

1

Since you are selecting bits of data from different levels of the input objects, you will need to specify the selection more precisely.

As your input consists of a stream of JSON objects, let's start with a function for reading one of those objects:

# Input and output: a JSON object def get: {company_number} as $number | .data | (.address | {address_line_1,country,locality,postal_code,premises}) as $address | {ceased_on,country_of_residence} as $details | (.date_of_birth | {month, year}) as $dob | $number + $address + $details + $dob + {etag,kind} ; 

There are several ways to read JSON streams, but it's quite convenient to use use input and inputs with the -n command-line option.

To make things easy to read, let's next define another helper function for producing an array of the relevant data:

def getRow: get | [.[]]; 

Putting it all together:

(input|get) | keys_unsorted, [.[]], (inputs | getRow) | @csv 

Don't forget the -r and -n command-line options!


Footnote:

In general, using [.[]] to "flatten" a JSON object to a flat array of values is ill-advised, but in the present case, we have ensured a consistent ordering of keys in get, and it is reasonable to assume that none of the values in the selected fields are compound, as suggested by the snippet and the 500,000 records in one of the snapshot files. A "robustification" would, however, be trivial to achieve (e.g. using tostring), and might therefore be advisable.

If you were using gojq (the Go implementation of jq), you would have to do things slightly differently as gojq does not respect user-specified ordering of keys. You'd have to generate the header row differently and make minor changes to get.

Sign up to request clarification or add additional context in comments.

6 Comments

Jq -r 'map {company_number} as $number | .data | (.address | {address_line_1,country,locality,postal_code,premises}) as $address | {ceased_on,country_of_residence} as $details | (.date_of_birth | {month, year}) as $dob | $number + $address + $details + $dob + {etag,kind} | keys_unsorted as $keys | [.[]] as $rows | $keys, $rows | @csv' in.json > out.json jq: error: syntax error, unexpected '{', expecting $end (Unix shell quoting issues?) at <top-level>, line 1: map {company_number} as $number jq: 1 compile error please elaborate
Your modification of the jq program is incorrect. It would probably be a good idea if you crafted a more representative but tiny sample of your data, as per the minimal reproducible example guidelines.
I have edited/reworded my question and edited the data source : download.companieshouse.gov.uk/en_pscdata.html
Aha! It's a stream. A sample of two would probably have sufficed. I tested the revised script against psc-snapshot-2021-12-05_1of20.zip It took about 20 secs.
Jq -r -n input in.json prints a data sample, could you elaborate on how to put the different pieces of your answer/snippets together and actually run the code? i have the data in my input file in.json and my script is called script.jq (or script.sh also seem so be accepted) however, im getting errors. /Users/doss/Desktop/ds/test.jq: line 2: input: command not found /Users/doss/Desktop/ds/test.jq: line 2: get: command not found /Users/doss/Desktop/ds/test.jq: line 3: syntax error near unexpected token |' /Users/doss/Desktop/ds/test.jq: line 3: | keys_unsorted as $keys'
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.