How to pass a parameter to a command inside AWK for each line processed

Question

I want to pass parameter $8, which is a file name, to the function "testfunc". The function should grep a key_word in that file and return a year. The problem is that the Linux command "grep" do not see anything in fileN. If I pass $8 directly, it still does not see anything.

awk ' function testfunc(fileN, my_year) { "grep 'key_word' fileN" | getline my_year return(my_year) close("grep 'key_word' fileN") } BEGIN {OFS="\t"} {printf "%s\t%s\t%s\t", $8, testfunc($8), $9}'

This is absolutely the wrong way to do it. You're trying to use awk as a shell - don't do that, awk is not good at it even if you can force it to execute and produce the output you want. Surprise, surprise, shell is very good at it. If you tell us what you're really trying to do we could help. — Ed Morton
– Ed Morton, Commented Mar 5, 2013 at 20:43

Ed Morton · Accepted Answer · 2013-03-06 14:31:34Z

This is the syntax you're looking for:

awk ' function testfunc(fileN, my_year, cmd) { cmd = "grep \"key_word\" " fileN cmd | getline my_year close(cmd) return(my_year) } BEGIN {OFS="\t"} {printf "%s\t%s\t%s\t", $8, testfunc($8), $9}'

BUT as I mentioned in my comment - don't do this, it's the wrong approach for whatever it is you're trying to do.

Note that you cannot use singe quotes within a single-quote-delimited script.

EDIT: let me try to clarify my point about using a different approach. You seem to have a file, let's call it "file1" that has another file name in it's 8th field, and some other value you care about in it's 9th field. All of the files named in that 8th field each contain a line containing the text "key_word" and what you want printed out is that the 8th field from file1 then a tab then the key-word line from the named file, then the 9th field from file1.

That can be written as (just one possible solution):

gawk -v OFS='\t' ' ARGIND < ARGC { if (/key_word/) my_year[FILENAME] = $0; nextfile } { print $8, my_year[$8], $9 } ' $(awk '{print $8}' file1 | sort -u) file1

i.e. call awk once on "file1" to get the list of files that contain the date info you want then pass that list of files to awk again ahead of "file1" so all of the info you need when finally processing file1 gets stored in an array.

The above uses GNU awk's "nextfile" for efficiency but that's not required and GNU awks ARGIND for clarity but you can replace ARGIND < ARGC with FILENAME!=ARGV[ARGC] in a non-gawk solution.

There are many alternative solutions, it all depends what you're really trying to do....

Though You can use single quotes if your script is in a file and sourced, e.g. awk -f yourscript.
Thanks a lot. This helped me to understand the problem of having that single quote within a single quote script. Note: I know I can get a similar output using a straight shell script but I would need to write a larger number of code lines with if or for statements. In this case all I need is to change the order of the input columns and add an extra one in between.
I'm definitely not suggesting writing your whole script in shell, I'm just suggesting that there is probably a better way to do whatever you're trying to do because having shell call awk to call shell is almost always the wrong approach and using getline in awk is fraught with dangers. I would bet that whatever you're doing is trivial in awk just by employing the right approach. If you post another question with some sample input and expected output I'm sure we could help you with that.

Zsolt Botykai · Accepted Answer · 2013-03-05 21:22:49Z

1

Try this:

function testfunc(fileN) { cmd="grep 'key_word' " fileN cmd | getline my_year return(sprintf("%s",my_year)) }

edited Mar 5, 2013 at 21:22

answered Mar 5, 2013 at 20:20

Zsolt Botykai

52k14 gold badges90 silver badges111 bronze badges

4 Comments

Guasqueño Over a year ago

Your sugestion partially works for me. It does do the grep but the variable my_year (or the execution of the system command) includes carry-return that I do not want as it add an extra CR to the file. I have added $9 so you can see that after $8 the return is added braking the output record in two lines.

Guasqueño Over a year ago

it also add a sh: 0: command not found in the first output line.

Guasqueño Over a year ago

Actually, what is happening is that the system command is executed displaying the result before the "getline" gets anything. The "my_year" variable is not getting any value. In other words the output I see is the output of the system command, not what the function is returning.

Guasqueño Over a year ago

I was traying to add " | tr -d '\n' " but I don't know how as the parameter should be somewhere.

Guasqueño · Accepted Answer · 2013-03-06 16:11:33Z

Thanks Ed and Zsolt for your help. At the end I decided to use a shell script instead because in addition to the grep command I needed a sed command that give all sorts of problem because of the special charactes required in it. So my final solution is as follows:

fileList=`ls -1 *.xml` for f in ${fileList} ; do my_year=`grep -e "key_word" ${f} | sed -n '{s/^.*>\([0-9][0-9]*\)<.*$/\1/p}'` line=`ls -ltr ${f}` line="${line} ${my-year} sthElseHere" echo ${line} done | \ awk ' BEGIN {print "File Name \tcol02 \tcol03 " print "=================== \t====== \t============"} {printf "%s\t%s\t%s\n", $8, $4, $9 }'

Collectives™ on Stack Overflow

How to pass a parameter to a command inside AWK for each line processed

3 Answers 3

3 Comments

4 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

4 Comments

Comments

Related