Skip to main content
correct spelling
Source Link
jubilatious1
  • 3.9k
  • 10
  • 21
sed -e 's/[^[:alpha:]]/ /g' text_to_analizetext_to_analyze.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl 
  1. Substitute all non alphanumeric characters with a blank space.
  2. All line breaks are converted to spaces also.
  3. Reduces all multiple blank spaces to one blank space
  4. All spaces are now converted to line breaks. Each word in a line.
  5. Translates all words to lower case to avoid 'Hello' and 'hello' to be different words
  6. Sorts dethe text
  7. Counts and remove the equal lines
  8. Sorts reverse in order to count the most frequent words
  9. Add a line number to each word in order to know the word posotionposition in the whole

For example if I want to analizeanalyze the first Linus Torvald message:

sed -e 's/[^[:alpha:]]/ /g' text_to_analizetext_to_analyze.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | head -n 20 
sed -e 's/[^[:alpha:]]/ /g' text_to_analizetext_to_analyze.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep "\sword_to_search_for$" 
#!/bin/bash sed -e 's/[^[:alpha:]]/ /g' text_to_analizetext_to_analyze.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep "\s$1$" 
sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl 
  1. Substitute all non alphanumeric characters with a blank space.
  2. All line breaks are converted to spaces also.
  3. Reduces all multiple blank spaces to one blank space
  4. All spaces are now converted to line breaks. Each word in a line.
  5. Translates all words to lower case to avoid 'Hello' and 'hello' to be different words
  6. Sorts de text
  7. Counts and remove the equal lines
  8. Sorts reverse in order to count the most frequent words
  9. Add a line number to each word in order to know the word posotion in the whole

For example if I want to analize the first Linus Torvald message:

sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | head -n 20 
sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep "\sword_to_search_for$" 
#!/bin/bash sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep "\s$1$" 
sed -e 's/[^[:alpha:]]/ /g' text_to_analyze.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl 
  1. Substitute all non alphanumeric characters with a blank space.
  2. All line breaks are converted to spaces also.
  3. Reduces all multiple blank spaces to one blank space
  4. All spaces are now converted to line breaks. Each word in a line.
  5. Translates all words to lower case to avoid 'Hello' and 'hello' to be different words
  6. Sorts the text
  7. Counts and remove the equal lines
  8. Sorts reverse in order to count the most frequent words
  9. Add a line number to each word in order to know the word position in the whole

For example if I want to analyze the first Linus Torvald message:

sed -e 's/[^[:alpha:]]/ /g' text_to_analyze.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | head -n 20 
sed -e 's/[^[:alpha:]]/ /g' text_to_analyze.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep "\sword_to_search_for$" 
#!/bin/bash sed -e 's/[^[:alpha:]]/ /g' text_to_analyze.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep "\s$1$" 
edited body
Source Link
sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep '\sword_to_search_for$'"\sword_to_search_for$" 
#!/bin/bash sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep '\s$1$'"\s$1$" 
sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep '\sword_to_search_for$' 
#!/bin/bash sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep '\s$1$' 
sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep "\sword_to_search_for$" 
#!/bin/bash sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep "\s$1$" 
added 3 characters in body
Source Link
sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep 'word_to_search_for''\sword_to_search_for$' 

In a script called search_freq:

#!/bin/bash sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep '\s$1$' 

The script must be called:

 search_freq word_to_search_for 
sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep 'word_to_search_for' 
sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep '\sword_to_search_for$' 

In a script called search_freq:

#!/bin/bash sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr '\n' " " | tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep '\s$1$' 

The script must be called:

 search_freq word_to_search_for 
added 228 characters in body
Source Link
Loading
added 28 characters in body
Source Link
Loading
added 1 character in body
Source Link
Loading
added 1 character in body
Source Link
Loading
Source Link
Loading