sed -e 's/[^[:alpha:]]/ /g' text_to_analizetext_to_analyze.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl - Substitute all non alphanumeric characters with a blank space.
- All line breaks are converted to spaces also.
- Reduces all multiple blank spaces to one blank space
- All spaces are now converted to line breaks. Each word in a line.
- Translates all words to lower case to avoid 'Hello' and 'hello' to be different words
- Sorts dethe text
- Counts and remove the equal lines
- Sorts reverse in order to count the most frequent words
- Add a line number to each word in order to know the word posotionposition in the whole
For example if I want to analizeanalyze the first Linus Torvald message:
sed -e 's/[^[:alpha:]]/ /g' text_to_analizetext_to_analyze.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | head -n 20 sed -e 's/[^[:alpha:]]/ /g' text_to_analizetext_to_analyze.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep "\sword_to_search_for$" #!/bin/bash sed -e 's/[^[:alpha:]]/ /g' text_to_analizetext_to_analyze.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep "\s$1$"