Revisions to How to print the number of occurrences of consonants for each file separately with awk?

added 75 characters in body

edited Mar 27, 2021 at 15:34

35.9k
6
25
60

With GNU awk for ENDFILE and IGNORECASE:

$ awk -v IGNORECASE=1 ' { cnt += ( gsub(/[[:alpha:]]/,"&") - gsub(/[aeiou]/,"&") )} ENDFILE { print FILENAME, cnt+0; cnt=0 } ' file1 file2 file1 12 file2 7

or with any POSIX awk:

$ awk ' { lc=tolower($0); cnt[FILENAME] += (gsub(/[[:alpha:]]/,"&",lc) - gsub(/[aeiou]/,"&",lc)) } END { for (i=1; i<ARGC; i++) print ARGV[i], cnt[ARGV[i]]+0 } ' file1 file2 file1 12 file2 7

If you only want to count the specific characters b, c, d, etc. instead of all alphabetic characters that aren't aeiou, then just change ( gsub(/[[:alpha:]]/,"&") - gsub(/[aeiou]/,"&") ) above to gsub(/[bcdfghjklmnpqrtsvwxyz]/,"&"))

Note that, unlike any approach that prints results in an FNR==1 clause, both of the above scripts will handle empty files correctly by printing the file name and 0 as the count.

Also note the cnt+0 in the first script - the +0 ensures that the value printed will be a numeric 0 rather than a null string if the first file is empty.

If the same file name can appear multiple times in the input then add FNR==1{cnt[FILENAME]=0} to the start of the script if you want it output multiple times or add if (!seen[ARGV[i]]++) { ... } around the print in the END section if you only want it output once.

See https://unix.stackexchange.com/a/642372/133219 for an answer to the followup question of also counting vowels.

With GNU awk for ENDFILE and IGNORECASE:

$ awk -v IGNORECASE=1 ' { cnt += ( gsub(/[[:alpha:]]/,"&") - gsub(/[aeiou]/,"&") )} ENDFILE { print FILENAME, cnt+0; cnt=0 } ' file1 file2 file1 12 file2 7

or with any POSIX awk:

$ awk ' { lc=tolower($0); cnt[FILENAME] += (gsub(/[[:alpha:]]/,"&",lc) - gsub(/[aeiou]/,"&",lc)) } END { for (i=1; i<ARGC; i++) print ARGV[i], cnt[ARGV[i]]+0 } ' file1 file2 file1 12 file2 7

If you only want to count the specific characters b, c, d, etc. instead of all alphabetic characters that aren't aeiou, then just change ( gsub(/[[:alpha:]]/,"&") - gsub(/[aeiou]/,"&") ) above to gsub(/[bcdfghjklmnpqrtsvwxyz]/,"&"))

Note that, unlike any approach that prints results in an FNR==1 clause, both of the above scripts will handle empty files correctly by printing the file name and 0 as the count.

Also note the cnt+0 in the first script - the +0 ensures that the value printed will be a numeric 0 rather than a null string if the first file is empty.

If the same file name can appear multiple times in the input then add FNR==1{cnt[FILENAME]=0} to the start of the script if you want it output multiple times or add if (!seen[ARGV[i]]++) { ... } around the print in the END section if you only want it output once.

With GNU awk for ENDFILE and IGNORECASE:

$ awk -v IGNORECASE=1 ' { cnt += ( gsub(/[[:alpha:]]/,"&") - gsub(/[aeiou]/,"&") )} ENDFILE { print FILENAME, cnt+0; cnt=0 } ' file1 file2 file1 12 file2 7

or with any POSIX awk:

$ awk ' { lc=tolower($0); cnt[FILENAME] += (gsub(/[[:alpha:]]/,"&",lc) - gsub(/[aeiou]/,"&",lc)) } END { for (i=1; i<ARGC; i++) print ARGV[i], cnt[ARGV[i]]+0 } ' file1 file2 file1 12 file2 7

If you only want to count the specific characters b, c, d, etc. instead of all alphabetic characters that aren't aeiou, then just change ( gsub(/[[:alpha:]]/,"&") - gsub(/[aeiou]/,"&") ) above to gsub(/[bcdfghjklmnpqrtsvwxyz]/,"&"))

Note that, unlike any approach that prints results in an FNR==1 clause, both of the above scripts will handle empty files correctly by printing the file name and 0 as the count.

Also note the cnt+0 in the first script - the +0 ensures that the value printed will be a numeric 0 rather than a null string if the first file is empty.

If the same file name can appear multiple times in the input then add FNR==1{cnt[FILENAME]=0} to the start of the script if you want it output multiple times or add if (!seen[ARGV[i]]++) { ... } around the print in the END section if you only want it output once.

See https://unix.stackexchange.com/a/642372/133219 for an answer to the followup question of also counting vowels.

edited body

Source Link

edited Mar 27, 2021 at 15:19

Ed Morton

35.9k
6
25
60

With GNU awk for ENDFILE and IGNORECASE:

$ awk -iv IGNORECASE=1 ' { cnt += ( gsub(/[[:alpha:]]/,"&") - gsub(/[aeiou]/,"&") )} ENDFILE { print FILENAME, cnt+0; cnt=0 } ' file1 file2 file1 12 file2 7

or with any POSIX awk:

$ awk ' { lc=tolower($0); cnt[FILENAME] += (gsub(/[[:alpha:]]/,"&",lc) - gsub(/[aeiou]/,"&",lc)) } END { for (i=1; i<ARGC; i++) print ARGV[i], cnt[ARGV[i]]+0 } ' file1 file2 file1 12 file2 7

If you only want to count the specific characters b, c, d, etc. instead of all alphabetic characters that aren't aeiou, then just change ( gsub(/[[:alpha:]]/,"&") - gsub(/[aeiou]/,"&") ) above to gsub(/[bcdfghjklmnpqrtsvwxyz]/,"&"))

Note that, unlike any approach that prints results in an FNR==1 clause, both of the above scripts will handle empty files correctly by printing the file name and 0 as the count.

Also note the cnt+0 in the first script - the +0 ensures that the value printed will be a numeric 0 rather than a null string if the first file is empty.

If the same file name can appear multiple times in the input then add FNR==1{cnt[FILENAME]=0} to the start of the script if you want it output multiple times or add if (!seen[ARGV[i]]++) { ... } around the print in the END section if you only want it output once.

With GNU awk for ENDFILE and IGNORECASE:

$ awk -i IGNORECASE=1 ' { cnt += ( gsub(/[[:alpha:]]/,"&") - gsub(/[aeiou]/,"&") )} ENDFILE { print FILENAME, cnt+0; cnt=0 } ' file1 file2 file1 12 file2 7

or with any POSIX awk:

$ awk ' { lc=tolower($0); cnt[FILENAME] += (gsub(/[[:alpha:]]/,"&",lc) - gsub(/[aeiou]/,"&",lc)) } END { for (i=1; i<ARGC; i++) print ARGV[i], cnt[ARGV[i]]+0 } ' file1 file2 file1 12 file2 7

If you only want to count the specific characters b, c, d, etc. instead of all alphabetic characters that aren't aeiou, then just change ( gsub(/[[:alpha:]]/,"&") - gsub(/[aeiou]/,"&") ) above to gsub(/[bcdfghjklmnpqrtsvwxyz]/,"&"))

Note that, unlike any approach that prints results in an FNR==1 clause, both of the above scripts will handle empty files correctly by printing the file name and 0 as the count.

Also note the cnt+0 in the first script - the +0 ensures that the value printed will be a numeric 0 rather than a null string if the first file is empty.

If the same file name can appear multiple times in the input then add FNR==1{cnt[FILENAME]=0} to the start of the script if you want it output multiple times or add if (!seen[ARGV[i]]++) { ... } around the print in the END section if you only want it output once.

With GNU awk for ENDFILE and IGNORECASE:

$ awk -v IGNORECASE=1 ' { cnt += ( gsub(/[[:alpha:]]/,"&") - gsub(/[aeiou]/,"&") )} ENDFILE { print FILENAME, cnt+0; cnt=0 } ' file1 file2 file1 12 file2 7

or with any POSIX awk:

$ awk ' { lc=tolower($0); cnt[FILENAME] += (gsub(/[[:alpha:]]/,"&",lc) - gsub(/[aeiou]/,"&",lc)) } END { for (i=1; i<ARGC; i++) print ARGV[i], cnt[ARGV[i]]+0 } ' file1 file2 file1 12 file2 7

If you only want to count the specific characters b, c, d, etc. instead of all alphabetic characters that aren't aeiou, then just change ( gsub(/[[:alpha:]]/,"&") - gsub(/[aeiou]/,"&") ) above to gsub(/[bcdfghjklmnpqrtsvwxyz]/,"&"))

Note that, unlike any approach that prints results in an FNR==1 clause, both of the above scripts will handle empty files correctly by printing the file name and 0 as the count.

Also note the cnt+0 in the first script - the +0 ensures that the value printed will be a numeric 0 rather than a null string if the first file is empty.

If the same file name can appear multiple times in the input then add FNR==1{cnt[FILENAME]=0} to the start of the script if you want it output multiple times or add if (!seen[ARGV[i]]++) { ... } around the print in the END section if you only want it output once.

added 9 characters in body

Source Link

edited Mar 27, 2021 at 14:48

Ed Morton

35.9k
6
25
60

With GNU awk for ENDFILE and IGNORECASE:

$ awk -i IGNORECASE=1 ' { cnt += ( gsub(/[[:alpha:]]/,"&") - gsub(/[aeiou]/,"&") )} ENDFILE { print FILENAME, cnt+0; cnt=0 } ' file1 file2 file1 12 file2 7

or with any POSIX awk:

$ awk ' { lc=tolower($0); cnt[FILENAME] += (gsub(/[[:alpha:]]/,"&",lc) - gsub(/[aeiou]/,"&",lc)) } END { for (i=1; i<ARGC; i++) print ARGV[i], cnt[ARGV[i]]+0 } ' file1 file2 file1 12 file2 7

If you only want to count the specific characters b, c, d, etc. instead of all alphabetic characters that aren't aeiou, then just change ( gsub(/[[:alpha:]]/,"&") - gsub(/[aeiou]/,"&") ) above to gsub(/[bcdfghjklmnpqrtsvwxyz]/,"&"))

Note that, unlike any approach that prints results in an FNR==1 clause, both of the above scripts will handle empty files correctly by printing the file name and 0 as the count.

Also note the cnt+0 in the first script - the +0 ensures that the value printed will be a numeric 0 rather than a null string if the first file is empty.

If the same file name can appear multiple times in the input then add FNR==1{cnt[FILENAME]=0} to the start of the script if you want it output multiple times or add if (!seen[ARGV[i]]++) { ... } around the print in the END section if you only want it output once.

With GNU awk for ENDFILE and IGNORECASE:

$ awk -i IGNORECASE=1 ' { cnt += ( gsub(/[[:alpha:]]/,"&") - gsub(/[aeiou]/,"&") )} ENDFILE { print FILENAME, cnt+0; cnt=0 } ' file1 file2 file1 12 file2 7

or with any POSIX awk:

$ awk ' { lc=tolower($0); cnt[FILENAME] += (gsub(/[[:alpha:]]/,"&",lc) - gsub(/[aeiou]/,"&",lc)) } END { for (i=1; i<ARGC; i++) print ARGV[i], cnt[ARGV[i]]+0 } ' file1 file2 file1 12 file2 7

If you only want to count the characters b, c, d, etc. instead of all alphabetic characters that aren't aeiou, then just change ( gsub(/[[:alpha:]]/,"&") - gsub(/[aeiou]/,"&") ) above to gsub(/[bcdfghjklmnpqrtsvwxyz]/,"&"))

Note that, unlike any approach that prints results in an FNR==1 clause, both of the above scripts will handle empty files correctly by printing the file name and 0 as the count.

Also note the cnt+0 in the first script - the +0 ensures that the value printed will be a numeric 0 rather than a null string if the first file is empty.

If the same file name can appear multiple times in the input then add FNR==1{cnt[FILENAME]=0} to the start of the script if you want it output multiple times or add if (!seen[ARGV[i]]++) { ... } around the print in the END section if you only want it output once.

With GNU awk for ENDFILE and IGNORECASE:

$ awk -i IGNORECASE=1 ' { cnt += ( gsub(/[[:alpha:]]/,"&") - gsub(/[aeiou]/,"&") )} ENDFILE { print FILENAME, cnt+0; cnt=0 } ' file1 file2 file1 12 file2 7

or with any POSIX awk:

$ awk ' { lc=tolower($0); cnt[FILENAME] += (gsub(/[[:alpha:]]/,"&",lc) - gsub(/[aeiou]/,"&",lc)) } END { for (i=1; i<ARGC; i++) print ARGV[i], cnt[ARGV[i]]+0 } ' file1 file2 file1 12 file2 7

If you only want to count the specific characters b, c, d, etc. instead of all alphabetic characters that aren't aeiou, then just change ( gsub(/[[:alpha:]]/,"&") - gsub(/[aeiou]/,"&") ) above to gsub(/[bcdfghjklmnpqrtsvwxyz]/,"&"))

Note that, unlike any approach that prints results in an FNR==1 clause, both of the above scripts will handle empty files correctly by printing the file name and 0 as the count.

Also note the cnt+0 in the first script - the +0 ensures that the value printed will be a numeric 0 rather than a null string if the first file is empty.

If the same file name can appear multiple times in the input then add FNR==1{cnt[FILENAME]=0} to the start of the script if you want it output multiple times or add if (!seen[ARGV[i]]++) { ... } around the print in the END section if you only want it output once.