59

Suppose we have this data file.

john 32 maketing executive jack 41 chief technical officer jim 27 developer dela 33 assistant risk management officer 

I want to print using awk

john maketing executive jack chief technical officer jim developer dela assistant risk management officer 

I know it can be done using for.

awk '{printf $1; for(i=3;i<=NF;i++){printf " %s", $i} printf "\n"}' < file 

Problem is its long and looks complex.

Is there any other short way to print rest of the fields.

3
  • A simple hack is to set $2 to "", then print $0 (all fields) -- though that would give you an extra delimiter for the empty field. Commented Aug 27, 2013 at 5:24
  • 2
    3 years after, you helped me. But you should change "<NF" to "<=NF", if not you'll skip the very last field ;) Commented Sep 22, 2017 at 9:19
  • 2
    3 years after that, I edited the question to change <NF to <=NF, to fix the bug @Koreth pointed out. Commented Jun 5, 2020 at 15:55

7 Answers 7

80

Set the field(s) you want to skip to blank:

awk '{$2 = ""; print $0;}' < file_name 

Source: Using awk to print all columns from the nth to the last

Sign up to request clarification or add additional context in comments.

3 Comments

Does not clean up extra space, and using unneeded print $0 that could be replaced by a simple 1
@Jotne When I use 1 in-place of print $0, I don't get any output from awk. You sure they're equivalent?
@Alex Remove print $0 and put 1 after the closing }.
9

Reliably with GNU awk for gensub() when using the default FS:

$ gawk -v delNr=2 '{$0=gensub("^([[:space:]]*([^[:space:]]+[[:space:]]+){"delNr-1"})[^[:space:]]+[[:space:]]*","\\1","")}1' file john maketing executive jack chief technical officer jim developer dela assistant risk management officer 

With other awks, you need to use match() and substr() instead of gensub(). Note that the variable delNr above tells awk which field you want to delete:

$ gawk -v delNr=3 '{$0=gensub("^([[:space:]]*([^[:space:]]+[[:space:]]+){"delNr-1"})[^[:space:]]+[[:space:]]*","\\1","")}1' file john 32 executive jack 41 technical officer jim 27 dela 33 risk management officer 

Do not do this:

awk '{sub($2 OFS, "")}1' 

as the same text that's in $2 might be at the end of $1, and/or $2 might contain RE metacharacters so there's a very good chance that you'll remove the wrong string that way.

Do not do this:

awk '{$2=""}1' file 

as it adds an FS and will compress all other contiguous white space between fields into a single blank char each.

Do not do this:

awk '{$2="";sub(" "," ")}1' file 

as it hasthe space-compression issue mentioned above and relies on a hard-coded FS of a single blank (the default, though, so maybe not so bad) but more importantly if there were spaces before $1 it would remove one of those instead of the space it's adding between $1 and $2.

One last thing worth mentioning is that in recent versions of gawk there is a new function named patsplit() which works like split() BUT in addition to creating an array of the fields, it also creates an array of the spaces between the fields. What that means is that you can manipulate fields and the spaces between then within the arrays so you don't have to worry about awk recompiling the record using OFS if you manipulate a field. Then you just have to print the fields you want from the arrays. See patsplit() in http://www.gnu.org/software/gawk/manual/gawk.html#String-Functions for more info.

4 Comments

Looking at these complications one wonders whether awk is indeed the best tool for this job. e.g. if fields are delimited by pipe or comma then whole awk code needs to be rewritten.
Depends on your input. If you have single chars between fields then cut is better. If you have anything else then gawk+gensub() or sed (very similar syntactically) might be the best options. Both of those can run into problems when trying to describe the negation of multi-char REs so then you need to look at gawk+patsplit() or gawk+FPAT. No silver bullet unfortunately.
Great answer I wish I could +2 you. One problem is the code is much longer than for loop solution. f
@shiplu.mokadd.im - correct but it preserves the original white space whereas the for loop you posted will not produce the output you specified. By the way, wrt that for loop you posted - never use printf with input data, e.g. printf $1 as that will fail spectacularly if your input data contains printf formatting characters such as %. Always use printf "%s",$1 for printing input data instead. Also to print a newline is just print "", no need for printf "\n".
8

You can use simple awk like this:

awk '{$2=""}1' file 

However this will have an extra OFS in your output that can be avoided by this awk

awk '{sub($2 OFS, "")}1' file 

OR else by using this tr and cut combo:

On Linux:

tr -s ' ' < file | cut -d ' ' -f1,f3- 

On OSX:

tr -s ' ' < file | cut -d ' ' -f1 -f3- 

12 Comments

This should be cut -d' ' -f1,3-.
@AdrianFrühwirth: Thanks but cut -f1,3- is not portable and isn't supported on my OSX.
You shouldn't use awk '{sub($2 OFS, "")}1' since the same text that's in $2 might be at the end of $1, and/or $2 might contain RE metacharacters so there's a very good chance that you'll remove the wrong string that way.
@anubhava - no, the only awk function that looks for strings rather than REs in another string is index().
@anubhava - correct there's no simple way but see my answer for a robust way.
|
4

This removes filed #2 and cleans up the extra space.

awk '{$2="";sub(" "," ")}1' file 

3 Comments

what does that extra 1 do here?
@shiplu.mokadd.im The 1 evaluates to true which kicks in the default block ({ print $0 }).
Does not clean anything, but instead like all rewrites of existing fields do - it replaces IFS (one or more in a row) into a single OFS. E.g. that is one way to implement a 'normalize spaces' filter: awk '{$1=$1}1'
3

Another way is to just use sed to replace the first digits and space match:

sed 's|[0-9]\+\s\+||' file

Comments

0

Approach using awk that would not require gawk or any state mutations:

awk '{print $1 " " substr($0, index($0, $3));}' datafile 

UPD

solution that is a bit longer, but will stand up the case when $1 or $2 contains $3:

awk '{print $1 " " substr($0, length($1 $2) + 1);}' data 

Or even more robust if you have custom field separator:

awk '{print $1 " " substr($0, length($1 FS $2 FS) + 1);}' data 

Comments

-1

Do not use altering $n. If you have more spaces in some part you want to keep, it will reduce to one.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.