1

I'm concerned regarding what awk shows as the record length. I'm checking some files for a specific record length - awk shows the result I wanted, but the file size shows that each record in the file is actually larger than what awk says by 1 byte.

$ ls -l some_file.txt -rw-r--r-- 1 foo bar 250614 Oct 20 08:49 some_file.txt $ awk '{ print length }' some_file.txt | sort -u 458 $ echo "(250614%458)" | bc 88 $ echo "(250614%459)" | bc 0 

Notice that the bc result is wrong with a record length of 458, but seems fine with a record length of 459. Also, awk + sort shows that all records have a record length of 458. My educated guess is that awk is not accounting for the End Of Line character, hence making a real record length of 459. What do you think?

ps: awk on AIX 5.3

6
  • In awk, is the output record separator ORS not set to the newline character therefore it is classed as a seperator instead of a character? Commented Oct 20, 2014 at 14:29
  • How can I check what is currently set for ORS in my awk? Commented Oct 20, 2014 at 14:55
  • 1
    Sorry - I meant the defaul Record seperator (RS) NOT ORS. Commented Oct 20, 2014 at 15:03
  • 1
    You can actually print the RS from within awk, it will print a newline (echo | awk '{print RS}'). Commented Oct 20, 2014 at 15:13
  • 1
    Yep , a way to test this would be to change the value of RS to something that does not exist in the file so it ignores the newline character and counts all the characters. e.g. awk 'BEGIN {RS=":"} {print length}' some_file.txt Commented Oct 20, 2014 at 15:20

2 Answers 2

3

What you're seeing is perfectly normal. By default, awk does not include the newline character in a record.

From the POSIX standard for awk:

Input shall be interpreted as a sequence of records. By default, a record is a line, less its terminating <newline>
...
String Functions
   length[([s])] - Return the length, in characters, of its argument taken as a string, or of the whole record, $0, if there is no argument.

2

This is because the default Record Separator RS is set to newline.

Therefore awk will interpret this as a separator instead of a character in the length.

To check what RS is set to:

echo | awk '{print "\""RS"\""}' " " 

The quotes are seperated by a newline showing the RS value.

To confirm that the RS character is not included in the length output:

$ echo test > some_file.txt $ ls -l -rw-r--r--. 1 user user 5 Oct 20 16:33 some_file.txt 

Show the length with RS set to newline.

$ awk '{print length}' some_file.txt 4 

Set RS to be a character that does not exist in the file and count again:

$ awk 'BEGIN {RS=":"} {print length}' some_file.txt 5 

The additional character is now included.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.