Revisions to How do I count the number of occurrences of a word in a text file with the command line?

correct spelling

edited May 12, 2023 at 5:47

3.9k
10
21

sed -e 's/[^[:alpha:]]/ /g' text_to_analizetext_to_analyze.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl

Substitute all non alphanumeric characters with a blank space.
All line breaks are converted to spaces also.
Reduces all multiple blank spaces to one blank space
All spaces are now converted to line breaks. Each word in a line.
Translates all words to lower case to avoid 'Hello' and 'hello' to be different words
Sorts dethe text
Counts and remove the equal lines
Sorts reverse in order to count the most frequent words
Add a line number to each word in order to know the word posotionposition in the whole

For example if I want to analizeanalyze the first Linus Torvald message:

sed -e 's/[^[:alpha:]]/ /g' text_to_analizetext_to_analyze.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | head -n 20

sed -e 's/[^[:alpha:]]/ /g' text_to_analizetext_to_analyze.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep "\sword_to_search_for$"

#!/bin/bash sed -e 's/[^[:alpha:]]/ /g' text_to_analizetext_to_analyze.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep "\s$1$"

sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl

Substitute all non alphanumeric characters with a blank space.
All line breaks are converted to spaces also.
Reduces all multiple blank spaces to one blank space
All spaces are now converted to line breaks. Each word in a line.
Translates all words to lower case to avoid 'Hello' and 'hello' to be different words
Sorts de text
Counts and remove the equal lines
Sorts reverse in order to count the most frequent words
Add a line number to each word in order to know the word posotion in the whole

For example if I want to analize the first Linus Torvald message:

sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | head -n 20

sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep "\sword_to_search_for$"

#!/bin/bash sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep "\s$1$"

sed -e 's/[^[:alpha:]]/ /g' text_to_analyze.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl

Substitute all non alphanumeric characters with a blank space.
All line breaks are converted to spaces also.
Reduces all multiple blank spaces to one blank space
All spaces are now converted to line breaks. Each word in a line.
Translates all words to lower case to avoid 'Hello' and 'hello' to be different words
Sorts the text
Counts and remove the equal lines
Sorts reverse in order to count the most frequent words
Add a line number to each word in order to know the word position in the whole

For example if I want to analyze the first Linus Torvald message:

sed -e 's/[^[:alpha:]]/ /g' text_to_analyze.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | head -n 20

sed -e 's/[^[:alpha:]]/ /g' text_to_analyze.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep "\sword_to_search_for$"

#!/bin/bash sed -e 's/[^[:alpha:]]/ /g' text_to_analyze.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep "\s$1$"

edited body

Source Link

edited Dec 31, 2016 at 16:30

Roger Borrell

121
1
3

sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep '\sword_to_search_for$'"\sword_to_search_for$"

#!/bin/bash sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep '\s$1$'"\s$1$"

sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep '\sword_to_search_for$'

#!/bin/bash sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep '\s$1$'

sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep "\sword_to_search_for$"

#!/bin/bash sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep "\s$1$"

added 3 characters in body

Source Link

edited Dec 27, 2016 at 11:49

Roger Borrell

121
1
3

sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep 'word_to_search_for''\sword_to_search_for$'

In a script called search_freq:

#!/bin/bash sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep '\s$1$'

The script must be called:

 search_freq word_to_search_for

sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep 'word_to_search_for'

sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep '\sword_to_search_for$'

In a script called search_freq:

#!/bin/bash sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep '\s$1$'

The script must be called: