4

I have a file with tab-delimited values in this format:

your-email your-order-id PayPal-transaction-id your-first-name your-second-name [email protected] 12345 54321 sooky spooky [email protected] 23456 23456 kiki dee [email protected] 34567 76543 cheeky chappy 

and I'd like to use awk to convert this to YAML:

--- your-email: [email protected] your-order-id: 12345 PayPal-transaction-id: 54321 your-first-name: sooky your-second-name: spooky your-email: [email protected] your-order-id: 23456 PayPal-transaction-id: 23456 your-first-name: kiki your-second-name: dee your-email: [email protected] your-order-id: 34567 PayPal-transaction-id: 76543 your-first-name: cheeky your-second-name: chappy 

So far, my awk script looks like this:

#!/usr/bin/awk FS=="\t" BEGIN {print "---"} NR==1 {for (i=1;i<=NF;i++) print $i ": "} 

But I can't figure out how to get each field from line 1 onwards to print after its header and recreate the YAML key values from the first line of the input file. In the real file, there are 38 fields and 34 records (so not huge).

1
  • 1
    Note that the YAML document you depict is probably not what you actually want: you repeatedly overwrite the value of the 5 keys, so you would get only the info for the last order when you loaded the document. You probably want either a series of subdocuments -- in which case you should change the blank lines to --- -- or you want a list of dictionaries (that would be my preferred choice.), in which case you should prefix the your-email lines with - and indent the other non-blank ones two spaces. See the YAML reference-card. Commented May 28, 2013 at 7:35

4 Answers 4

3

Here's one way:

$ cat inf your-email your-order-id PayPal-transaction-id your-first-name your-second-name [email protected] 12345 54321 sooky spooky [email protected] 23456 23456 kiki dee [email protected] 34567 76543 cheeky chappy $ cat mkf.sh awk ' BEGIN { print "---\n" } NR == 1 { nc = NF for(c = 1; c <= NF; c++) { h[c] = $c } } NR > 1 { for(c = 1; c <= nc; c++) { printf h[c] ": " $c "\n" } print "" }' inf $ ./mkf.sh inf --- your-email: [email protected] your-order-id: 12345 PayPal-transaction-id: 54321 your-first-name: sooky your-second-name: spooky your-email: [email protected] your-order-id: 23456 PayPal-transaction-id: 23456 your-first-name: kiki your-second-name: dee your-email: [email protected] your-order-id: 34567 PayPal-transaction-id: 76543 your-first-name: cheeky your-second-name: chappy 
3
  • That's great. Thanks. I would upvote but my poor reputation precludes that :-( Commented May 28, 2013 at 8:47
  • @duff No worries, glad it worked as expected for you! Commented May 29, 2013 at 1:23
  • @duff Good! If this solves your issue, please consider accepting the answer. Accepting an answer marks the issue as resolved. Commented Aug 28, 2022 at 6:59
0

Have you tried to define an iterable integer variable set to zero in begin and run an if/else statement that if "iter==0" saves the field names to elements in an array then autoincrements the integer or else it does the record print you've written (except printing out the fields by using your i iterable? (more information on awk arrays).

I haven't tested this code at all (and I suck something awful at awk in general), but it should serve as a concrete illustration of the general programming/scripting concept:

#!/usr/bin/awk FS=="\t" BEGIN { print "---" iter=0 } NR==1 { if (iter == 0) for (i=1;i<=NF;i++) newArr[i]=$i iter++ else for (i=1;i<=NF;i++) print newArr[i] ": " $i } 
4
  • I get a syntax error when trying to run this:gawk: ./a.awk:15: else the arrow (^) indicates the e of else. could you explain the NR==1? Won't that make your code execute only on the first line? Commented May 27, 2013 at 23:46
  • Yeah, like I said, it was just an illustration, not final code. I looks like @icyrock.com has essentially re-created the same basic script except using NR instead of an if/else statement. I would try to use their code. Commented May 27, 2013 at 23:50
  • OK, fair enough, I thought it was some dark trickery I was not aware of :). Sorry, next time I'll read the text of your answer as well as the code. Commented May 27, 2013 at 23:51
  • I didn't mean to catch the NR, that's from your code that you provided, I just worked off that as a template. Near as I can tell from the other person's code, it matches line numbers so I'm assuming NR is "number of record" and the statement before the curly brace is a conditional statement. Commented May 27, 2013 at 23:52
0

I am sure this can be done in awk but if a Perl answer is acceptable, this should do what you need:

#!/usr/bin/env perl print "---\n"; while (<>) { chomp; ## This splits the line at one or more whitespace characters ## into the array @fields. @fields=split(/\t+/); ## Get the column names if this is the 1st line if ($.==1){@cols=@fields} ## Print the data if it is not the first line else { print "\n"; for ($i=0;$i<=$#fields;$i++){ print "$cols[$i] : $fields[$i]\n"; } } } 

For example:

$./foo.pl input_text.txt --- your-email: [email protected] your-order-id: 12345 PayPal-transaction-id: 54321 your-first-name: sooky your-second-name: spooky your-email: [email protected] your-order-id: 23456 PayPal-transaction-id: 23456 your-first-name: kiki your-second-name: dee your-email: [email protected] your-order-id: 34567 PayPal-transaction-id: 76543 your-first-name: cheeky your-second-name: chappy 

This can be condensed into a one-liner using Perl's -a option which splits each line into the array @F:

echo "---";perl -aF"\t" -ne 'if ($.==1){@c=@F; chomp($c[$#c]);}else { print "\n";for ($i=0;$i<=$#F;$i++){print "$c[$i]: $F[$i]\n";}}' input_text.txt 
0
csvjson -t file | yq -y . 

Assuming the fields of the original data are delimited by tabs, this uses csvjson (from the csvkit toolkit) to convert the data to JSON format. The yq parser (from https://kislyuk.github.io/yq/) is then used to transcode the JSON into YAML.

Given the data in the question, the final output will be the YAML document

- your-email: [email protected] your-order-id: 12345 PayPal-transaction-id: 54321 your-first-name: sooky your-second-name: spooky - your-email: [email protected] your-order-id: 23456 PayPal-transaction-id: 23456 your-first-name: kiki your-second-name: dee - your-email: [email protected] your-order-id: 34567 PayPal-transaction-id: 76543 your-first-name: cheeky your-second-name: chappy 

I'm noting that the expected output in the question makes little sense as it's a single section with multiple duplicated keys (a key's value is overwritten by a later instance of that same key). I've therefore chosen to ignore that in favour of a document without duplicated keys (the above document contains a list of three objects).

In place of csvjson -j file you may instead use

mlr --itsv --ojson --jlistwrap cat file 

... which uses Miller (mlr) from https://miller.readthedocs.io/en/latest/ to convert the tab-delimited input into JSON.

In place of yq -y . you may use

yj -jy 

... which uses yj from https://github.com/sclevine/yj to translate JSON to YAML.

Any combination of the four tools mentioned for TSV-->JSON and JSON-->YAML transcoding will give you the same (or equivalent) result in the end.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.