Count occurrences of character per line/field on Unix

Question

Given a file with data like this (ie stores.dat file)

sid|storeNo|latitude|longitude 2tt|1|-28.0372000t0|153.42921670 9|2t|-33tt.85t09t0000|15t1.03274200

What is the command that would return the number of occurrences of the 't' character per line?

eg. would return:

count lineNum 4 1 3 2 6 3

Also, to do it by count of occurrences by field what is the command to return the following results?

eg. input of column 2 and character 't'

count lineNum 1 1 0 2 1 3

eg. input of column 3 and character 't'

count lineNum 2 1 1 2 4 3

take a look at gnu.org/software/gawk/manual/gawk.html its a very powerful unix tool — Chris
– Chris, Commented Dec 25, 2011 at 11:39

jaypal singh · Accepted Answer · 2014-05-16 17:46:44Z

To count occurrence of a character per line you can do:

awk -F'|' 'BEGIN{print "count", "lineNum"}{print gsub(/t/,"") "\t" NR}' file count lineNum 4 1 3 2 6 3

To count occurrence of a character per field/column you can do:

column 2:

awk -F'|' -v fld=2 'BEGIN{print "count", "lineNum"}{print gsub(/t/,"",$fld) "\t" NR}' file count lineNum 1 1 0 2 1 3

column 3:

awk -F'|' -v fld=3 'BEGIN{print "count", "lineNum"}{print gsub(/t/,"",$fld) "\t" NR}' file count lineNum 2 1 1 2 4 3

gsub() function's return value is number of substitution made. So we use that to print the number.
NR holds the line number so we use it to print the line number.
For printing occurrences of particular field, we create a variable fld and put the field number we wish to extract counts from.

It prints "0"(occurences) as well which might not be desired in the output
@TarunSapra It's actually shown as expected result in the question.
Note that gsub() will change the record's content. If you need the original value you should call gsub() after other action blocks (or save the original content in a variable).

Gabriel Burt · Accepted Answer · 2013-03-12 16:00:32Z

52

grep -n -o "t" stores.dat | sort -n | uniq -c | cut -d : -f 1

gives almost exactly the output you want:

 4 1 3 2 6 3

Thanks to @raghav-bhushan for the grep -o hint, what a useful flag. The -n flag includes the line number as well.

answered Mar 12, 2013 at 16:00

Gabriel Burt

8378 silver badges4 bronze badges

6 Comments

Capstone Over a year ago

This is a much more elegant and general solution.

Tom Zych Over a year ago

I think the sort -n could be dispensed with -- isn't the output in line number order anyway?

Michael Over a year ago

That's weird, that exact command returns "10 t", "1 1", "1 2", "1 3" on my Mac.

T. Brian Jones Over a year ago

@Gabrial Burt, Can you explain each step of this in your answer .. what are the commands you're piping to and how do their modifiers affect what's happening?

combinatorist Over a year ago

If your mac grep is weird consider brew install to get and use pcregrep instead.

|

artm · Accepted Answer · 2014-12-18 14:32:35Z

To count occurences of a character per line:

$ awk -F 't' '{print NF-1, NR}' input.txt 4 1 3 2 6 3

this sets field separator to the character that needs to be counted, then uses the fact that number of fields is one greater than number of separators.

To count occurences in a particular column cut out that column first:

$ cut -d '|' -f 2 input.txt | awk -F 't' '{print NF-1, NR}' 1 1 0 2 1 3 $ cut -d '|' -f 3 input.txt | awk -F 't' '{print NF-1, NR}' 2 1 1 2 4 3

Birei · Accepted Answer · 2011-12-25 12:38:52Z

One possible solution using perl:

Content of script.pl:

use warnings; use strict; ## Check arguments: ## 1.- Input file ## 2.- Char to search. ## 3.- (Optional) field to search. If blank, zero or bigger than number ## of columns, default to search char in all the line. (@ARGV == 2 || @ARGV == 3) or die qq(Usage: perl $0 input-file char [column]\n); my ($char,$column); ## Get values or arguments. if ( @ARGV == 3 ) { ($char, $column) = splice @ARGV, -2; } else { $char = pop @ARGV; $column = 0; } ## Check that $char must be a non-white space character and $column ## only accept numbers. die qq[Bad input\n] if $char !~ m/^\S$/ or $column !~ m/^\d+$/; print qq[count\tlineNum\n]; while ( <> ) { ## Remove last '\n' chomp; ## Get fields. my @f = split /\|/; ## If column is a valid one, select it to the search. if ( $column > 0 and $column <= scalar @f ) { $_ = $f[ $column - 1]; } ## Count. my $count = eval qq[tr/$char/$char/]; ## Print result. printf qq[%d\t%d\n], $count, $.; }

The script accepts three parameters:

Input file
Char to search
Column to search: If column is a bad digit, it searchs all the line.

Running the script without arguments:

perl script.pl Usage: perl script.pl input-file char [column]

With arguments and its output:

Here 0 is a bad column, it searches all the line.

perl script.pl stores.dat 't' 0 count lineNum 4 1 3 2 6 3

Here it searches in column 1.

perl script.pl stores.dat 't' 1 count lineNum 0 1 2 2 0 3

Here it searches in column 3.

perl script.pl stores.dat 't' 3 count lineNum 2 1 1 2 4 3

th is not a char.

perl script.pl stores.dat 'th' 3 Bad input

Like this a lot, but accepting the other answer for easier integration with bash

jfg956 · Accepted Answer · 2012-02-22 13:38:35Z

No need for awk or perl, only with bash and standard Unix utilities:

cat file | tr -c -d "t\n" | cat -n | { echo "count lineNum" while read num data; do test ${#data} -gt 0 && printf "%4d %5d\n" ${#data} $num done; }

And for a particular column:

cut -d "|" -f 2 file | tr -c -d "t\n" | cat -n | { echo -e "count lineNum" while read num data; do test ${#data} -gt 0 && printf "%4d %5d\n" ${#data} $num done; }

And we can even avoid tr and the cats:

echo "count lineNum" num=1 while read data; do new_data=${data//t/} count=$((${#data}-${#new_data})) test $count -gt 0 && printf "%4d %5d\n" $count $num num=$(($num+1)) done < file

and event the cut:

echo "count lineNum" num=1; OLF_IFS=$IFS; IFS="|" while read -a array_data; do data=${array_data[1]} new_data=${data//t/} count=$((${#data}-${#new_data})) test $count -gt 0 && printf "%4d %5d\n" $count $num num=$(($num+1)) done < file IFS=$OLF_IFS

Cole Tierney · Accepted Answer · 2013-11-13 14:44:22Z

You could also split the line or field with "t" and check the length of the resulting array - 1. Set the col variable to 0 for the line or 1 through 3 for columns:

awk -F'|' -v col=0 -v OFS=$'\t' 'BEGIN { print "count", "lineNum" }{ split($col, a, "t"); print length(a) - 1, NR } ' stores.dat

vulcan · Accepted Answer · 2014-02-16 13:54:03Z

awk '{gsub("[^t]",""); print length($0),NR;}' stores.dat

The call to gsub() deletes everything in the line that is not a t, then just print the length of what remains, and the current line number.

Want to do it just for column 2?

awk 'BEGIN{FS="|"} {gsub("[^t]","",$2); print NR,length($2);}' stores.dat

Haven Holmes · Accepted Answer · 2015-11-25 04:05:03Z

 $ cat -n test.txt 1 test 1 2 you want 3 void 4 you don't want 5 ttttttttttt 6 t t t t t t $ awk '{n=split($0,c,"t")-1;if (n!=0) print n,NR}' test.txt 2 1 1 2 2 4 11 5 6 6

Udo Held · Accepted Answer · 2012-02-21 21:11:12Z

cat stores.dat | awk 'BEGIN {FS = "|"}; {print $1}' | awk 'BEGIN {FS = "\t"}; {print NF}'

Where $1 would be a column number you want to count.

Steve Thorn · Accepted Answer · 2016-05-17 15:04:45Z

perl -e 'while(<>) { $count = tr/t//; print "$count ".++$x."\n"; }' stores.dat

Another perl answer yay! The tr/t// function returns the count of the number of times the translation occurred on that line, in other words the number of times tr found the character 't'. ++$x maintains the line number count.

Collectives™ on Stack Overflow

Count occurrences of character per line/field on Unix

10 Answers 10

3 Comments

6 Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

10 Answers 10

3 Comments

6 Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Linked

Related