1

I have the following input file:

a 1 o p b 2 o p p c 3 o p p p 

in the last line there is a double space between the last p's, and columns have different spacing

I have used the solution from: Using awk to print all columns from the nth to the last.

awk '{for(i=2;i<=NF;i++){printf "%s ", $i}; printf "\n"}' 

and it works fine, untill it reaches double-space in the last column and removes one space.

How can I avoid that while still using awk?

7
  • You want to preserve the space? If that's the case, are the files single characters as you have shown (or at lease constant-width)? Commented Apr 8, 2015 at 12:34
  • 1
    Use cut instead of awk: cut -d ' ' -f 2-. Commented Apr 8, 2015 at 12:38
  • How important is it that you keep using awk? (does passing awk -F '[ ]' solve the problem?) Commented Apr 8, 2015 at 12:40
  • @EtanReisner cant use cut, columns might have different spacing Commented Apr 8, 2015 at 12:40
  • 1
    I don't understand? The only spacing that matters to cut in this case is the very first column spacing. Does that vary in how many spaces are there? Commented Apr 8, 2015 at 12:42

4 Answers 4

4

Since you want to preserve spaces, let's just use cut:

$ cut -d' ' -f2- file 1 o p 2 o p p 3 o p p p 

Or for example to start by column 4:

$ cut -d' ' -f4- file p p p p p p 

This will work as long as the columns you are removing are one-space separated.


If the columns you are removing also contain different amount of spaces, you can use the beautiful solution by Ed Morton in Print all but the first three columns:

awk '{sub(/[[:space:]]*([^[:space:]]+[[:space:]]+){1}/,"")}1' ^ number of cols to remove 

Test

$ cat a a 1 o p b 2 o p p c 3 o p p p $ awk '{sub(/[[:space:]]*([^[:space:]]+[[:space:]]+){2}/,"")}1' a o p o p p o p p p 
Sign up to request clarification or add additional context in comments.

7 Comments

Does cut support multiple field delims ?
cant use cut, must use awk
@JID you need to pipe to tr -s ' ' beforehand.
@meso_2600 are the columns you want to remove just one-space separated?
@fedorqui must use awk, columns have different width
|
3

GNU sed

remove first n fields

sed -r 's/([^ ]+ +){2}//' file 

GNU awk 4.0+

awk '{sub("([^"FS"]"FS"){2}","")}1' file 

GNU awk <4.0

awk --re-interval '{sub("([^"FS"]"FS"){2}","")}1' file 

Incase FS one doesn't work(Eds suggestion)

awk '{sub(/([^ ] ){2}/,"")}1' file 

Replace 2 with number of fields you wish to remove

EDIT

Another way(doesn't require re-interval)

awk '{for(i=0;i<2;i++)sub($1"[[:space:]]*","")}1' file 

Further edit

As advised by EdMorton it is bad to use fields in sub as they may contain metacharacters so here is an alternative(again!)

awk '{for(i=0;i<2;i++)sub(/[^[:space:]]+[[:space:]]*/,"")}1' file 

Output

o p o p p o p p p 

24 Comments

then do it in awk. sed has s, awk has sub().
@EdMorton cant use just sub, columns have different spacing
You are wrong - sub IS the command to use. I'll post the answer.
@meso_2600 it's extremely important when posting questions to show an example of your problem that covers the various cases you need to deal with. Edit your question to show a truly representative example of your problem, including the cases that you think might be difficult/unusual to deal with, and provide a better explanation of what might be in your input file. Otherwise we're all just guessing and churning trying to figure out what you want.
@JID maybe. You could try something like awk -v var='.*' '{gsub(/./,"[&]",var); sub(var,...)}' but that doesn't work for all cases, e.g. if var='.*\\' you'd get a syntax error. gsub(/./,"\\\\&",var) might work better, idk.... I just avoid doing it - if you want to replace an RE use something that operates on REs like sub(), if you want to replace a string then use string functions like index()+substr(). If you find yourself trying to escape/disable all RE metachars then clearly you do NOT want an RE!
|
2

In Perl, you can use split with capturing to keep the delimiters:

perl -ne '@f = split /( +)/; print @f[ 1 * 2 .. $#f ]' # ^ # | # column number goes # here (starting from 0) 

1 Comment

must use awk, as sated in the original post
1

If you want to preserve all spaces after the start of the second column, this will do the trick:

{ match($0, ($1 "[ \\t*]+")) print substr($0, RSTART+RLENGTH) } 

The call to match locates the start of the first 'token' on the line and the length of the first token and the whitespace that follows it. Then you just print everything on the line after that.

You could generalize it somewhat to ignore the first N tokens this way:

BEGIN { N = 2 } { r = "" for (i=1; i<=N; i++) { r = (r $i "[ \\t*]+") } match($0, r) print substr($0, RSTART+RLENGTH) } 

Applying the above script to your example input yields:

o p o p p o p p p 

8 Comments

lol what is this site.This is just awk '{for(i=0;i<2;i++)sub($1"[[:space:]]*","")}1' with about ten more lines of useless junk
Do NOT do this. It's a disaster waiting to happen.
JID: It works much like fedorqui's script, but is somewhat easier for non-regex-ninjas to read. Not sure what part of my solution is useless; I think it is all needed to get the proper answer.
Ed: What issue do you see with my solution?
Ed: Never mind. I read further down and now I see the potential problem. If $1 contains regex metacharacters, then match won't perform as expected. I tried to vote down my answer, but I can't. :-(
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.