0

I am trying to count the occurrences of consonants and vowels in multiple files on Linux, but I want the number of occurrences to be separately calculated for each file. I use

awk -v FS=""'{for ( i=1;i<=NF;i++){if($i ~/[bcdfghjklmnpqrtsvwxyzBCDEFGHJKLMNPQRTSVWXYZ]/)cout_c++ ;else if ($i ~/[aeiouAEIOU]/) count_v++}}END {print FILENAME,count_v,count_c}' 

file1 looks like this:

bac Dfeg k87 eH tRe rt up 

file2 looks like this:

hi rt2w PrOt 

but it prints the occurrences of both files:

file2 7 19 

How could I change this so the output would be like :

file1 5 12 file2 2 7 
5
  • If the input contains a U+FB03 character (), should it be counted as 2 consonants and 1 vowel? Commented Mar 27, 2021 at 14:16
  • no, just these characters bcdfghjklmnpqrtsvwxyzBCDEFGHJKLMNPQRTSVWXYZ should be counted for consonants and for vowels aeiouAEIOU anything else should be ignored Commented Mar 27, 2021 at 14:20
  • Again, please don't add requirements as comments. Make sure all of your requirements are clearly stated in your question and included in your sample input/output. Commented Mar 27, 2021 at 14:29
  • 1
    Crossposting? How to print with awk the number of consonants and vowels from files? Commented Mar 27, 2021 at 15:16
  • If any of the answers solves your problem then see unix.stackexchange.com/help/someone-answers for what to do next, otherwise provide feedback and/or questions about any issues. Commented Mar 28, 2021 at 16:38

2 Answers 2

4

To answer this followup question, here is my followup answer with GNU awk (and modified now to only count b, c, d, etc. as non-vowels instead of every char that's not aeiou, e.g. À and é as mentioned by @StéphaneChazelas in a comment):

$ awk -v IGNORECASE=1 ' { v_cnt += gsub(/[aeiou]/,"") c_cnt += gsub(/[bcdfghjklmnpqrtsvwxyz]/,"") } ENDFILE { print FILENAME, v_cnt+0, c_cnt+0 v_cnt = c_cnt = 0 } ' file1 file2 file1 5 12 file2 2 7 

I'll leave it as a simple exercise for how to modify the POSIX awk equivalent from my previous answer.

If you also wanted some kind of indication if alphabetic characters not listed in either of the bracket expressions above are present then it's just a tweak to:

awk -v IGNORECASE=1 ' { v_cnt += gsub(/[aeiou]/,"") c_cnt += gsub(/[bcdfghjklmnpqrtsvwxyz]/,"") } /[[:alpha:]]/ { gsub(/[^[:alpha:]]+/,"") printf "Warning %s[%d]: Unexpected chars found: %s\n", FILENAME, FNR, $0 > "/dev/stderr" } ENDFILE { print FILENAME, v_cnt+0, c_cnt+0 v_cnt = c_cnt = 0 } ' file1 file2 

How that's handled can of course be treated in various different ways and with various amounts+details of output.

9
  • Note that [[:alpha:]] would match on À, é (if expressed in their precomposed form which is the most common), α, etc that haven't been removed by the previous gsub(). Commented Mar 27, 2021 at 14:06
  • 1
    That's true. If such characters can exist in the input then the OP should add them to the sample input/output in the question so we can see how they should be handled. Commented Mar 27, 2021 at 14:10
  • if i write instead of [[:alpha:]]-> [bcdfghjklmnpqrtsvwxyzBCDEFGHJKLMNPQRTSVWXYZ] than that would solve this problem right? Commented Mar 27, 2021 at 14:20
  • 1
    Just [bcdfghjklmnpqrtsvwxyz] - you don't need to list the upper case letters since we're doing a case-insensitive comparison. But then how SHOULD letters like À and é be counted if they can exist in your input - as vowels or as consonants or as punctuation (i.e. not alphabetic characters) or something else? If they can exist in your real data then please edit your question to include them in the sample input and the counts shown in the expected output. Commented Mar 27, 2021 at 14:25
  • anything else than aeiou for vowels and bcdfghjklmnpqrtsvwxyz for consonants should be ignored.Could you also tell me for what v_cnt+0 is needed, I don't understand the adding with 0? Commented Mar 27, 2021 at 14:28
2

One way using Perl is as follows:

perl -lne '$,=" "; $A[0] += +lc =~ tr/aeiou//; $A[1] += s/(?![aeiou])[[:alpha:]]//gi; print $ARGV, splice @A if eof; ' file1 file2 

Output:

file1 5 12 file2 2 7 

Comments;

  • First element of an array @A accumulates the running total of vowels.
  • The second element accumulates the running total of consonants, which are alphabet set minus vowels.
  • At the end of present file, the data is dumped.Note splice as a side effect nulls the array.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.