Revisions to The wc -w command outputs incorrect answer [duplicate]

clarify

edited Feb 7, 2016 at 19:04

79.3k
9
189
290

The wc command is counting the words in the output from grep, which includes "for":

> grep shell test.txt for shell_A shell_B shell_C

So there really are 4 words.

If you only want to count the number of lines that contain a particular word in a file, you can use the -c option of grep, e.g.,

grep -c shell test.txt

Neither of those actually count words, but could match other things which include that string. Some Most implementations of grep (GNU grep, modern BSDs as well as AIX, HPUX, Solaris) provide a -w option for words, however that is not in POSIX. You They also recognize a regular expression, e.g.,

grep -e '\<shell\>' test.txt

which corresponds to the -w option. Again, that is not in POSIX. Solaris does document this, while AIX and HPUX describe -w without mentioning the regular expression. These all appear to be consistent, treating a "word" as a sequence of alphanumerics plus underscore.

You could use a POSIX regular expression with grep to match words (separated by blanks, etc), but your example has none which are just "shell": they all have some other character touching the matches. Alternatively, if you care only about alphanumerics (and no underscore) and do not mind matching substrings, you could do

tr -c '[[:alnum:]]' '\n' test.txt |grep -c shell

The -o option suggested is non-POSIX, and since OP did not limit the question to Linux or BSDs, is not what I would recommend. In either case, it does not match words, but strings (which was OP's expectation).

For reference:

grep
wc

The wc command is counting the words in the output from grep, which includes "for":

> grep shell test.txt for shell_A shell_B shell_C

So there really are 4 words.

If you only want to count the number of lines that contain a particular word in a file, you can use the -c option of grep, e.g.,

grep -c shell test.txt

Neither of those actually count words, but could match other things which include that string. Some implementations of grep provide a -w option for words, however that is not in POSIX. You could use a regular expression with grep to match words (separated by blanks, etc), but your example has none which are just "shell": they all have some other character touching the matches.

For reference:

grep
wc

The wc command is counting the words in the output from grep, which includes "for":

> grep shell test.txt for shell_A shell_B shell_C

So there really are 4 words.

If you only want to count the number of lines that contain a particular word in a file, you can use the -c option of grep, e.g.,

grep -c shell test.txt

Neither of those actually count words, but could match other things which include that string. Most implementations of grep (GNU grep, modern BSDs as well as AIX, HPUX, Solaris) provide a -w option for words, however that is not in POSIX. They also recognize a regular expression, e.g.,

grep -e '\<shell\>' test.txt

which corresponds to the -w option. Again, that is not in POSIX. Solaris does document this, while AIX and HPUX describe -w without mentioning the regular expression. These all appear to be consistent, treating a "word" as a sequence of alphanumerics plus underscore.

You could use a POSIX regular expression with grep to match words (separated by blanks, etc), but your example has none which are just "shell": they all have some other character touching the matches. Alternatively, if you care only about alphanumerics (and no underscore) and do not mind matching substrings, you could do

tr -c '[[:alnum:]]' '\n' test.txt |grep -c shell

The -o option suggested is non-POSIX, and since OP did not limit the question to Linux or BSDs, is not what I would recommend. In either case, it does not match words, but strings (which was OP's expectation).

For reference:

grep
wc

correct the description of what grep -c does

Source Link

edit approved Feb 6, 2016 at 19:53

Miles

103
2

The wc command is counting the words in the output from grep, which includes "for":

> grep shell test.txt for shell_A shell_B shell_C

So there really are 4 words.

If you only want to count the occurrencesnumber of lines that contain a particular word in a file, you can use the -c option of grep, e.g.,

grep -c shell test.txt

Neither of those actually count words, but could match other things which include that string. Some implementations of grep provide a -w option for words, however that is not in POSIX. You could use a regular expression with grep to match words (separated by blanks, etc), but your example has none which are just "shell": they all have some other character touching the matches.

For reference:

grep
wc

The wc command is counting the words in the output from grep, which includes "for":

> grep shell test.txt for shell_A shell_B shell_C

So there really are 4 words.

If you only want to count the occurrences of a particular word in a file, you can use the -c option of grep, e.g.,

grep -c shell test.txt

Neither of those actually count words, but could match other things which include that string. Some implementations of grep provide a -w option for words, however that is not in POSIX. You could use a regular expression with grep to match words (separated by blanks, etc), but your example has none which are just "shell": they all have some other character touching the matches.

For reference:

grep
wc

The wc command is counting the words in the output from grep, which includes "for":

> grep shell test.txt for shell_A shell_B shell_C

So there really are 4 words.

If you only want to count the number of lines that contain a particular word in a file, you can use the -c option of grep, e.g.,

grep -c shell test.txt

Neither of those actually count words, but could match other things which include that string. Some implementations of grep provide a -w option for words, however that is not in POSIX. You could use a regular expression with grep to match words (separated by blanks, etc), but your example has none which are just "shell": they all have some other character touching the matches.

For reference:

grep
wc

clarify

Source Link

edited Feb 6, 2016 at 16:37

Thomas Dickey

79.3k
9
189
290

The wc command is counting the words in the output from grep, which includes "for":

> grep shell test.txt for shell_A shell_B shell_C

So there really are 4 words.

If you only want to count the occurrences of a particular word in a file, you can use the -c option of grep, e.g.,

grep -c shell test.txt

Neither of those actually count words, but could match other things which include that string. Some implementations of grep provide a -w option for words, however that is not in POSIX. You could use a regular expression with grep to match words (separated by blanks, etc), but your example has none which are just "shell": they all have some other character touching the matches.

For reference:

grep
wc

The wc command is counting the words in the output from grep, which includes "for":

> grep shell test.txt for shell_A shell_B shell_C

So there really are 4 words.

If you only want to count the occurrences of a particular word in a file, you can use the -c option of grep, e.g.,

grep -c shell test.txt

For reference:

grep
wc

The wc command is counting the words in the output from grep, which includes "for":

> grep shell test.txt for shell_A shell_B shell_C

So there really are 4 words.

If you only want to count the occurrences of a particular word in a file, you can use the -c option of grep, e.g.,

grep -c shell test.txt

Neither of those actually count words, but could match other things which include that string. Some implementations of grep provide a -w option for words, however that is not in POSIX. You could use a regular expression with grep to match words (separated by blanks, etc), but your example has none which are just "shell": they all have some other character touching the matches.

For reference: