sed/Awk/cut... How to decide which to use to parse Docker output?

Question

My output:

docker images REPOSITORY TAG IMAGE ID CREATED SIZE jenkins/jenkins lts 806f56c84444 8 days ago 703MB mongo latest 0da05d84b1fe 2 weeks ago 394MB

I would like to just cut the image ID alone from the output.

I tried using cut:

docker images | cut -d " " -f1 REPOSITORY jenkins/jenkins

The -f1 just gives me the repository names, if I use -f3 it tends to be empty. Since the delimiter is not a single space I don't see how to get the desired output.

Can we cut based on field names?

I read the documentation and did not see anything relevant. I also saw that there is a way to achieve this using sed/AWK which i'm still figuring out.

In the meanwhile is there a easier way to achieve this using the cut command?

I'm new to Unix/Linux, how can I determine which of Sed/AWK/Cut to prefer?

The docker command comes with a --format option to let you extract exactly the fields you need in a format suitable for your purposes. (Admittedly, the use of Go format strings isn't immediately obvious to us non-Go people.) — tripleee
– tripleee, Commented Feb 23, 2019 at 9:50
And no, cut doesn't support extracting fields by name, though this isn't hard to do in Awk, either. See e.g. stackoverflow.com/questions/45608145/… — tripleee
– tripleee, Commented Feb 23, 2019 at 12:21
I thinking about casting a close vote: "which is the best" questions are typically based on opinion, and are not a good fit for this Q&A. — glenn jackman
– glenn jackman, Commented Feb 23, 2019 at 15:55

oguz ismail · Accepted Answer · 2019-02-23 09:10:10Z

Your input seems to have a fixed width of 20 chars for each field, so you can make use of gawk's FIELDWIDTHS feature.

$ awk -v FIELDWIDTHS="20 20 20 20 20" '{ print $3 }' file IMAGE ID 806f56c84444 0da05d84b1fe $ $ awk -v FIELDWIDTHS="20 20 20 20 20" '{ printf "%20s%20s\n", $1, $3 }' file REPOSITORY IMAGE ID jenkins/jenkins 806f56c84444 mongo 0da05d84b1fe

From man gawk:

If the FIELDWIDTHS variable is set to a space-separated list of numbers, each field is expected to have fixed width, and gawk splits up the record using the specified widths. Each field width may optionally be preceded by a colon-separated value specifying the number of characters to skip before the field starts. The value of FS is ignored. Assigning a new value to FS or FPAT overrides the use of FIELDWIDTHS.

TenG · Accepted Answer · 2019-02-23 10:14:26Z

You have to "squeeze" the space padding in the default output to single space.

1 2 == 1-space-space-2 == Field 1 before 1st space, Field between 1st and 2nd space, Field 3 after 2nd space.

cut -d' ' -f1 ==> '1'

cut -d' ' -f2 ==> '' empty field between 1st and 2nd delimiter

cut -d' ' -f3 ==> '2'

So, in your case use sed to replace consecutive spaces with 1:

docker images | sed 's/ */ /g' | cut -d " " -f1,3

If the output is fixed columns widths, then you can use this variant of cut:

docker images | cut -c1-20,41-60

This will cut out columns 41 to 60, where we find the Image ID.

If ever the output uses TAB for padding, you should use expand -t n to make the output consistently space padded then apply the appropriate cut -cx,y, e.g. (numbers may need adjusting):

docker images | expand -t 4 | cut -c1-20,41-60

Darby_Crash · Accepted Answer · 2019-02-23 10:45:40Z

1

Try this:

docker images | tr -s ' ' | cut -f3 -d' '

The command tr -s ' ' convert multiple spaces into a single one and after with cut you can grab your field. This work fine if values in your field haven't spaces.

edited Feb 23, 2019 at 10:45

answered Feb 23, 2019 at 8:27

Darby_Crash

4463 silver badges7 bronze badges

2 Comments

anish anil Over a year ago

Nope. Does'nt work. docker images | cut -f1,2,4 -d$'\t' REPOSITORY TAG IMAGE ID CREATED SIZE jenkins/jenkins lts 806f56c84444 9 days ago 703MB mongo latest 0da05d84b1fe 2 weeks ago 394MB

anish anil Over a year ago

docker images | tr -s ' ' | cut -f3 -d " " This worked Perfectly

I3ck · Accepted Answer · 2019-02-23 11:46:46Z

With Procedural Text Edit it's :

forEach line { if (contains ci "REPOSITORY") { remove } keepRange word 2 1 } removeEmptyLines // <- optional

tripleee · Accepted Answer · 2019-02-23 12:32:22Z

In the general case, avoid parsing output meant for human consumption. Many modern utilities offer an option to produce output in some standard format like JSON or XML, or even CSV (though that is less strictly specified, and exists in multiple "dialects").

docker in particular has a generalized --format option which allows you to specify your own output format:

docker images --format "{{.ID}}"

If you cannot avoid writing your own parser (are you really sure!? Look again!), cut is suitable for output with a specific single-character delimiter, or otherwise fairly regular output. For everything else, I would go with Awk. Out of the box, it parses columns from sequences of whitespace, so it does precisely what you specifically ask for:

docker images | awk 'NR>1 { print $3 }'

(NR>1 skips the first line, which contains the column headers.)

In the case of fixed-width columns, it allows you to pull out a string by index:

docker images | awk 'NR>1 { print substr($0, 41, 12) }'

... though you could do that with cut, too:

docker images | cut -c41-53

... but notice that Docker might adjust column widths depending on your screen size!

Awk lets you write regular expression extractions, too:

awk 'NR>1 { sub(/^([^[:space:]]*[[:space:]]+){2}/, ""); sub(/[[:space]].*/, ""); print }'

This is where it overlaps with sed:

sed -n '2,$s/^[^ ]\+[ ]\+[^ ]\+[ ]\+\([^ ]\+\)[ ].*/\1/p'

though sed is significantly less human-readable, especially for nontrivial scripts. (This is still pretty trivial.)

If you haven't used regex before, the above will seem cryptic, but it really isn't very hard to pick apart. We are looking for sequences of non-spaces (a field in a column) followed by sequences of spaces (a column separator) - two before the ID field and whatever comes after it, starting from the first space after the ID column.

If you want to learn shell scripting, you should probably also learn at least the basics of Awk (and a passing familiarity with sed). If you just want to get the job done, and perhaps aren't specifically interested in learning U*x tools (though you probably should be anyway!), perhaps instead learn a modern scripting language like Python or Ruby.

... Here's a Python docker library:

import docker client = docker.from_env() for image in client.images.list(): print(image.id)

But really, just say no to spending your life writing ad-hoc parsers for poorly-specified formats with never-ending corner cases. I am living proof: that way lies madness.

James Brown · Accepted Answer · 2019-02-23 19:02:52Z

Can we cut based on field names? No.

How can I determine which of Sed/AWK/Cut to prefer? YMMV. For this particular input where fields are separated by two or more spaces, using awk you could set field separator to " +" (two or more spaces), look for desired field name (IMAGE ID below) and print only that particular field:

$ awk -F" +" ' # set field separator { if(f=="") # while we have not determined the desired field for(i=1;i<=NF;i++) # ... keep looking if($i=="IMAGE ID") f=i if(f!="") # once found print $f # start printing it }' file

Output:

IMAGE ID 806f56c84444 0da05d84b1fe

As one-liner:

$ awk -F" +" '{if(f=="")for(i=1;i<=NF;i++)if($i=="IMAGE ID")f=i;if(f!="")print $f}' file

Collectives™ on Stack Overflow

sed/Awk/cut... How to decide which to use to parse Docker output?

6 Answers 6

Comments

Comments

2 Comments

Comments

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

Comments

2 Comments

Comments

1 Comment

Comments

Linked

Related