awk to print all columns from the nth to the last with spaces

Question

I have the following input file:

a 1 o p b 2 o p p c 3 o p p p

in the last line there is a double space between the last p's, and columns have different spacing

I have used the solution from: Using awk to print all columns from the nth to the last.

awk '{for(i=2;i<=NF;i++){printf "%s ", $i}; printf "\n"}'

and it works fine, untill it reaches double-space in the last column and removes one space.

How can I avoid that while still using awk?

You want to preserve the space? If that's the case, are the files single characters as you have shown (or at lease constant-width)? — eduffy
– eduffy, Commented Apr 8, 2015 at 12:34
How important is it that you keep using awk? (does passing awk -F '[ ]' solve the problem?) — Wintermute
– Wintermute, Commented Apr 8, 2015 at 12:40
@EtanReisner cant use cut, columns might have different spacing — meso_2600
– meso_2600, Commented Apr 8, 2015 at 12:40
I don't understand? The only spacing that matters to cut in this case is the very first column spacing. Does that vary in how many spaces are there? — Etan Reisner
– Etan Reisner, Commented Apr 8, 2015 at 12:42

Community · Accepted Answer · 2017-05-23 11:47:33Z

Since you want to preserve spaces, let's just use cut:

$ cut -d' ' -f2- file 1 o p 2 o p p 3 o p p p

Or for example to start by column 4:

$ cut -d' ' -f4- file p p p p p p

This will work as long as the columns you are removing are one-space separated.

If the columns you are removing also contain different amount of spaces, you can use the beautiful solution by Ed Morton in Print all but the first three columns:

awk '{sub(/[[:space:]]*([^[:space:]]+[[:space:]]+){1}/,"")}1' ^ number of cols to remove

Test

$ cat a a 1 o p b 2 o p p c 3 o p p p $ awk '{sub(/[[:space:]]*([^[:space:]]+[[:space:]]+){2}/,"")}1' a o p o p p o p p p

@meso_2600 are the columns you want to remove just one-space separated?

score 3 · Accepted Answer · 2015-04-08 13:30:52Z

3

GNU sed

remove first n fields

sed -r 's/([^ ]+ +){2}//' file

GNU awk 4.0+

awk '{sub("([^"FS"]"FS"){2}","")}1' file

GNU awk <4.0

awk --re-interval '{sub("([^"FS"]"FS"){2}","")}1' file

Incase FS one doesn't work(Eds suggestion)

awk '{sub(/([^ ] ){2}/,"")}1' file

Replace 2 with number of fields you wish to remove

EDIT

Another way(doesn't require re-interval)

awk '{for(i=0;i<2;i++)sub($1"[[:space:]]*","")}1' file

Further edit

As advised by EdMorton it is bad to use fields in sub as they may contain metacharacters so here is an alternative(again!)

awk '{for(i=0;i<2;i++)sub(/[^[:space:]]+[[:space:]]*/,"")}1' file

Output

o p o p p o p p p

edited Apr 8, 2015 at 13:30

answered Apr 8, 2015 at 12:40

user4453924

24 Comments

Ed Morton Over a year ago

then do it in awk. sed has s, awk has sub().

meso_2600 Over a year ago

@EdMorton cant use just sub, columns have different spacing

Ed Morton Over a year ago

You are wrong - sub IS the command to use. I'll post the answer.

Ed Morton Over a year ago

@meso_2600 it's extremely important when posting questions to show an example of your problem that covers the various cases you need to deal with. Edit your question to show a truly representative example of your problem, including the cases that you think might be difficult/unusual to deal with, and provide a better explanation of what might be in your input file. Otherwise we're all just guessing and churning trying to figure out what you want.

Ed Morton Over a year ago

@JID maybe. You could try something like awk -v var='.*' '{gsub(/./,"[&]",var); sub(var,...)}' but that doesn't work for all cases, e.g. if var='.*\\' you'd get a syntax error. gsub(/./,"\\\\&",var) might work better, idk.... I just avoid doing it - if you want to replace an RE use something that operates on REs like sub(), if you want to replace a string then use string functions like index()+substr(). If you find yourself trying to escape/disable all RE metachars then clearly you do NOT want an RE!

|

choroba · Accepted Answer · 2015-04-08 12:38:34Z

In Perl, you can use split with capturing to keep the delimiters:

perl -ne '@f = split /( +)/; print @f[ 1 * 2 .. $#f ]' # ^ # | # column number goes # here (starting from 0)

ReluctantBIOSGuy · Accepted Answer · 2015-04-08 13:42:22Z

If you want to preserve all spaces after the start of the second column, this will do the trick:

{ match($0, ($1 "[ \\t*]+")) print substr($0, RSTART+RLENGTH) }

The call to match locates the start of the first 'token' on the line and the length of the first token and the whitespace that follows it. Then you just print everything on the line after that.

You could generalize it somewhat to ignore the first N tokens this way:

BEGIN { N = 2 } { r = "" for (i=1; i<=N; i++) { r = (r $i "[ \\t*]+") } match($0, r) print substr($0, RSTART+RLENGTH) }

Applying the above script to your example input yields:

o p o p p o p p p

lol what is this site.This is just awk '{for(i=0;i<2;i++)sub($1"[[:space:]]*","")}1' with about ten more lines of useless junk
JID: It works much like fedorqui's script, but is somewhat easier for non-regex-ninjas to read. Not sure what part of my solution is useless; I think it is all needed to get the proper answer.
Ed: Never mind. I read further down and now I see the potential problem. If $1 contains regex metacharacters, then match won't perform as expected. I tried to vote down my answer, but I can't. :-(

Collectives™ on Stack Overflow

awk to print all columns from the nth to the last with spaces

4 Answers 4

Test

7 Comments

EDIT

Output

24 Comments

1 Comment

8 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Test

7 Comments

EDIT

Output

24 Comments

1 Comment

8 Comments

Linked

Related