0

I want to pass parameter $8, which is a file name, to the function "testfunc". The function should grep a key_word in that file and return a year. The problem is that the Linux command "grep" do not see anything in fileN. If I pass $8 directly, it still does not see anything.

awk ' function testfunc(fileN, my_year) { "grep 'key_word' fileN" | getline my_year return(my_year) close("grep 'key_word' fileN") } BEGIN {OFS="\t"} {printf "%s\t%s\t%s\t", $8, testfunc($8), $9}' 
1
  • 1
    This is absolutely the wrong way to do it. You're trying to use awk as a shell - don't do that, awk is not good at it even if you can force it to execute and produce the output you want. Surprise, surprise, shell is very good at it. If you tell us what you're really trying to do we could help. Commented Mar 5, 2013 at 20:43

3 Answers 3

1

This is the syntax you're looking for:

awk ' function testfunc(fileN, my_year, cmd) { cmd = "grep \"key_word\" " fileN cmd | getline my_year close(cmd) return(my_year) } BEGIN {OFS="\t"} {printf "%s\t%s\t%s\t", $8, testfunc($8), $9}' 

BUT as I mentioned in my comment - don't do this, it's the wrong approach for whatever it is you're trying to do.

Note that you cannot use singe quotes within a single-quote-delimited script.

EDIT: let me try to clarify my point about using a different approach. You seem to have a file, let's call it "file1" that has another file name in it's 8th field, and some other value you care about in it's 9th field. All of the files named in that 8th field each contain a line containing the text "key_word" and what you want printed out is that the 8th field from file1 then a tab then the key-word line from the named file, then the 9th field from file1.

That can be written as (just one possible solution):

gawk -v OFS='\t' ' ARGIND < ARGC { if (/key_word/) my_year[FILENAME] = $0; nextfile } { print $8, my_year[$8], $9 } ' $(awk '{print $8}' file1 | sort -u) file1 

i.e. call awk once on "file1" to get the list of files that contain the date info you want then pass that list of files to awk again ahead of "file1" so all of the info you need when finally processing file1 gets stored in an array.

The above uses GNU awk's "nextfile" for efficiency but that's not required and GNU awks ARGIND for clarity but you can replace ARGIND < ARGC with FILENAME!=ARGV[ARGC] in a non-gawk solution.

There are many alternative solutions, it all depends what you're really trying to do....

Sign up to request clarification or add additional context in comments.

3 Comments

Though You can use single quotes if your script is in a file and sourced, e.g. awk -f yourscript.
Thanks a lot. This helped me to understand the problem of having that single quote within a single quote script. Note: I know I can get a similar output using a straight shell script but I would need to write a larger number of code lines with if or for statements. In this case all I need is to change the order of the input columns and add an extra one in between.
I'm definitely not suggesting writing your whole script in shell, I'm just suggesting that there is probably a better way to do whatever you're trying to do because having shell call awk to call shell is almost always the wrong approach and using getline in awk is fraught with dangers. I would bet that whatever you're doing is trivial in awk just by employing the right approach. If you post another question with some sample input and expected output I'm sure we could help you with that.
1

Try this:

function testfunc(fileN) { cmd="grep 'key_word' " fileN cmd | getline my_year return(sprintf("%s",my_year)) } 

4 Comments

Your sugestion partially works for me. It does do the grep but the variable my_year (or the execution of the system command) includes carry-return that I do not want as it add an extra CR to the file. I have added $9 so you can see that after $8 the return is added braking the output record in two lines.
it also add a sh: 0: command not found in the first output line.
Actually, what is happening is that the system command is executed displaying the result before the "getline" gets anything. The "my_year" variable is not getting any value. In other words the output I see is the output of the system command, not what the function is returning.
I was traying to add " | tr -d '\n' " but I don't know how as the parameter should be somewhere.
0

Thanks Ed and Zsolt for your help. At the end I decided to use a shell script instead because in addition to the grep command I needed a sed command that give all sorts of problem because of the special charactes required in it. So my final solution is as follows:

fileList=`ls -1 *.xml` for f in ${fileList} ; do my_year=`grep -e "key_word" ${f} | sed -n '{s/^.*>\([0-9][0-9]*\)<.*$/\1/p}'` line=`ls -ltr ${f}` line="${line} ${my-year} sthElseHere" echo ${line} done | \ awk ' BEGIN {print "File Name \tcol02 \tcol03 " print "=================== \t====== \t============"} {printf "%s\t%s\t%s\n", $8, $4, $9 }' 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.