1

I am trying to understand how busybox's awk works so I'm looking into the standard and hit weird thing which I do not fully understand why is legal. Standard ( https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html , in section User-Defined Functions ) states that

When invoking a function, no white space can be placed between the function name and the opening parenthesis.

The grammar shown later on is prefixed with:

This formal syntax shall take precedence over the preceding text syntax description.

non_unary_expr : '(' expr ')' | '!' expr ... | FUNC_NAME '(' expr_list_opt ')' /* no white space allowed before '(' */ | BUILTIN_FUNC_NAME '(' expr_list_opt ')' | BUILTIN_FUNC_NAME 

The grammar is completely same for both BUILTIN_FUNC_NAME and FUNC_NAME. Yet despite that, it behaves differently for user and builtin functions:

+$echo | awk -P '{ print length() 1 }' 01 +$echo | awk -P '{ print length () 1 }' 01 +$echo | awk -P 'function foo() { return 0 } ; { print foo() 1 }' 01 +$echo | awk -P 'function foo() { return 0 } ; { print foo () 1 }' awk: cmd. line:1: error: function `foo' called with space between name and `(', or used as a variable or an array awk: cmd. line:1: function foo() { return 0 } ; { print foo () 1 } awk: cmd. line:1: ^ syntax error awk: cmd. line:1: function foo() { return 0 } ; { print foo () 1 } awk: cmd. line:1: ^ syntax error 

Which part of the grammar does specify this behaviour?

0

1 Answer 1

2

Check the definition of the FUNC_NAME in the same spec you're quoting from:

12. The token NAME shall consist of a word that is not a keyword or a name of a built-in function and is not followed immediately (without any delimiters) by the ( character.

13. The token FUNC_NAME shall consist of a word that is not a keyword or a name of a built-in function, followed immediately (without any delimiters) by the ( character. The ( character shall not be included as part of the token.

So the difference is already made in the lexer, and a word like foo will turn into a NAME, not a FUNC_NAME token when not immediatedly followed by a (.

1
  • The '(' character shall not be included as part of the token. - This is the part I was missing, thank you. Commented Jan 17, 2020 at 23:14

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.