Skip to main content
added 3 characters in body
Source Link
jubilatious1
  • 3.9k
  • 10
  • 21

Using Raku (formerly known as Perl_6)

~$ curl https://www.gutenberg.org/cache/epub/5/pg5.txt > US_Constitution.txt 

THEN:

Below grep followed by elems gives the count per "examined unit" of text, wherein for slurp the unit is the entire file, lines is obviously lines, and words is obviously words:

~$ raku -e 'slurp.grep(/ :i the /).elems.put;' US_Constitution.txt 1 ~$ raku -e 'lines.grep(/ :i the /).elems.put;' US_Constitution.txt 443 ~$ raku -e 'words.grep(/ :i the /).elems.put;' US_Constitution.txt 681 

Below match followed by elems gives the count of matches. The "examined unit" doesn't matter so slurp, lines, and words all return the same count:

~$ raku -e 'slurp.match(:global, / :i the /).elems.put;' US_Constitution.txt 681 ~$ raku -e 'lines.match(:global, / :i the /).elems.put;' US_Constitution.txt 681 ~$ raku -e 'words.match(:global, / :i the /).elems.put;' US_Constitution.txt 681 

The regex can be improved to only match the free-standing word "the", as opposed to the thosethat three characters-character sequence being found within other words, such as "these" and "bathe". General word-boundaries are denoted with either <|w> or <?wb>. Alternatively, you can be even more specific and denote a << left-word-boundary and/or >> a right-word-boundary:

~$ raku -e 'slurp.match(:global, / :i <|w> the <|w> /).elems.put;' US_Constitution.txt 519 ~$ raku -e 'slurp.match(:global, / :i <?wb> the <?wb> /).elems.put;' US_Constitution.txt 519 ~$ raku -e 'slurp.match(:global, / :i << the >> /).elems.put;' US_Constitution.txt 519 #below, remove `:i` (:ignorecase flag, i.e. adverb): ~$ raku -e 'slurp.match(:global, / << the >> /).elems.put;' US_Constitution.txt 458 

Edit: the foregoing is just a general overview on word-counting with Raku. If you need to analyze JSON files specifically you can use Raku's JSON::Tiny or JSON::Fast modules.

https://docs.raku.org/routine/grep
https://docs.raku.org/type/Str#method_match
https://raku.org

Using Raku (formerly known as Perl_6)

~$ curl https://www.gutenberg.org/cache/epub/5/pg5.txt > US_Constitution.txt 

THEN:

Below grep followed by elems gives the count per "examined unit" of text, wherein for slurp the unit is the entire file, lines is obviously lines, and words is obviously words:

~$ raku -e 'slurp.grep(/ :i the /).elems.put;' US_Constitution.txt 1 ~$ raku -e 'lines.grep(/ :i the /).elems.put;' US_Constitution.txt 443 ~$ raku -e 'words.grep(/ :i the /).elems.put;' US_Constitution.txt 681 

Below match followed by elems gives the count of matches. The "examined unit" doesn't matter so slurp, lines, and words all return the same count:

~$ raku -e 'slurp.match(:global, / :i the /).elems.put;' US_Constitution.txt 681 ~$ raku -e 'lines.match(:global, / :i the /).elems.put;' US_Constitution.txt 681 ~$ raku -e 'words.match(:global, / :i the /).elems.put;' US_Constitution.txt 681 

The regex can be improved to only match the free-standing word "the", as opposed to the those three characters being found within other words, such as "these" and "bathe". General word-boundaries are denoted with either <|w> or <?wb>. Alternatively, you can be even more specific and denote a << left-word-boundary and/or >> a right-word-boundary:

~$ raku -e 'slurp.match(:global, / :i <|w> the <|w> /).elems.put;' US_Constitution.txt 519 ~$ raku -e 'slurp.match(:global, / :i <?wb> the <?wb> /).elems.put;' US_Constitution.txt 519 ~$ raku -e 'slurp.match(:global, / :i << the >> /).elems.put;' US_Constitution.txt 519 #below, remove `:i` (:ignorecase flag, i.e. adverb): ~$ raku -e 'slurp.match(:global, / << the >> /).elems.put;' US_Constitution.txt 458 

Edit: the foregoing is just a general overview on word-counting with Raku. If you need to analyze JSON files specifically you can use Raku's JSON::Tiny or JSON::Fast modules.

https://docs.raku.org/routine/grep
https://docs.raku.org/type/Str#method_match
https://raku.org

Using Raku (formerly known as Perl_6)

~$ curl https://www.gutenberg.org/cache/epub/5/pg5.txt > US_Constitution.txt 

THEN:

Below grep followed by elems gives the count per "examined unit" of text, wherein for slurp the unit is the entire file, lines is obviously lines, and words is obviously words:

~$ raku -e 'slurp.grep(/ :i the /).elems.put;' US_Constitution.txt 1 ~$ raku -e 'lines.grep(/ :i the /).elems.put;' US_Constitution.txt 443 ~$ raku -e 'words.grep(/ :i the /).elems.put;' US_Constitution.txt 681 

Below match followed by elems gives the count of matches. The "examined unit" doesn't matter so slurp, lines, and words all return the same count:

~$ raku -e 'slurp.match(:global, / :i the /).elems.put;' US_Constitution.txt 681 ~$ raku -e 'lines.match(:global, / :i the /).elems.put;' US_Constitution.txt 681 ~$ raku -e 'words.match(:global, / :i the /).elems.put;' US_Constitution.txt 681 

The regex can be improved to only match the free-standing word "the", as opposed to that three-character sequence being found within other words, such as "these" and "bathe". General word-boundaries are denoted with either <|w> or <?wb>. Alternatively, you can be even more specific and denote a << left-word-boundary and/or >> a right-word-boundary:

~$ raku -e 'slurp.match(:global, / :i <|w> the <|w> /).elems.put;' US_Constitution.txt 519 ~$ raku -e 'slurp.match(:global, / :i <?wb> the <?wb> /).elems.put;' US_Constitution.txt 519 ~$ raku -e 'slurp.match(:global, / :i << the >> /).elems.put;' US_Constitution.txt 519 #below, remove `:i` (:ignorecase flag, i.e. adverb): ~$ raku -e 'slurp.match(:global, / << the >> /).elems.put;' US_Constitution.txt 458 

Edit: the foregoing is just a general overview on word-counting with Raku. If you need to analyze JSON files specifically you can use Raku's JSON::Tiny or JSON::Fast modules.

https://docs.raku.org/routine/grep
https://docs.raku.org/type/Str#method_match
https://raku.org

edited body
Source Link
jubilatious1
  • 3.9k
  • 10
  • 21

Using Raku (formerly known as Perl_6)

~$ curl https://www.gutenberg.org/cache/epub/5/pg5.txt > US_Constitution.txt 

THEN:

Below grep followed by elems gives the count per "examined unit" of text, wherein for slurp the unit is the entire file, lines is obviously lines, and words is obviously words:

~$ raku -e 'slurp.grep(/ :i the /).elems.put;' US_Constitution.txt 1 ~$ raku -e 'lines.grep(/ :i the /).elems.put;' US_Constitution.txt 443 ~$ raku -e 'words.grep(/ :i the /).elems.put;' US_Constitution.txt 681 

Below match followed by elems gives the count of matches. The "examined unit" doesn't matter so slurp, lines, and words all return the same count:

~$ raku -e 'slurp.match(:global, / :i the /).elems.put;' US_Constitution.txt 681 ~$ raku -e 'lines.match(:global, / :i the /).elems.put;' US_Constitution.txt 681 ~$ raku -e 'words.match(:global, / :i the /).elems.put;' US_Constitution.txt 681 

The regex can be improved to only match the free-standing word "the", as opposed to the those three characters being found within other words, such as "these" and "bathe". General word-boundaries are denoted with either <|w> or <?wb>. Alternatively, you can be even more specific and denote a << left-word-boundary and/or >> a right-word-boundary:

~$ raku -e 'words'slurp.match(:global, / :i <|w> the <|w> /).elems.put;' US_Constitution.txt 519 ~$ raku -e 'words'slurp.match(:global, / :i <?wb> the <?wb> /).elems.put;' US_Constitution.txt 519 ~$ raku -e 'words'slurp.match(:global, / :i << the >> /).elems.put;' US_Constitution.txt 519 #below, remove `:i` (:ignorecase flag, i.e. adverb): ~$ raku -e 'words'slurp.match(:global, / << the >> /).elems.put;' US_Constitution.txt 458 

Edit: the foregoing is just a general overview on word-counting with Raku. If you need to analyze JSON files specifically you can use Raku's JSON::Tiny or JSON::Fast modules.

https://docs.raku.org/routine/grep
https://docs.raku.org/type/Str#method_match
https://raku.org

Using Raku (formerly known as Perl_6)

~$ curl https://www.gutenberg.org/cache/epub/5/pg5.txt > US_Constitution.txt 

THEN:

Below grep followed by elems gives the count per "examined unit" of text, wherein for slurp the unit is the entire file, lines is obviously lines, and words is obviously words:

~$ raku -e 'slurp.grep(/ :i the /).elems.put;' US_Constitution.txt 1 ~$ raku -e 'lines.grep(/ :i the /).elems.put;' US_Constitution.txt 443 ~$ raku -e 'words.grep(/ :i the /).elems.put;' US_Constitution.txt 681 

Below match followed by elems gives the count of matches. The "examined unit" doesn't matter so slurp, lines, and words all return the same count:

~$ raku -e 'slurp.match(:global, / :i the /).elems.put;' US_Constitution.txt 681 ~$ raku -e 'lines.match(:global, / :i the /).elems.put;' US_Constitution.txt 681 ~$ raku -e 'words.match(:global, / :i the /).elems.put;' US_Constitution.txt 681 

The regex can be improved to only match the free-standing word "the", as opposed to the those three characters being found within other words, such as "these" and "bathe". General word-boundaries are denoted with either <|w> or <?wb>. Alternatively, you can be even more specific and denote a << left-word-boundary and/or >> a right-word-boundary:

~$ raku -e 'words.match(:global, / :i <|w> the <|w> /).elems.put;' US_Constitution.txt 519 ~$ raku -e 'words.match(:global, / :i <?wb> the <?wb> /).elems.put;' US_Constitution.txt 519 ~$ raku -e 'words.match(:global, / :i << the >> /).elems.put;' US_Constitution.txt 519 #below, remove `:i` (:ignorecase flag, i.e. adverb): ~$ raku -e 'words.match(:global, / << the >> /).elems.put;' US_Constitution.txt 458 

Edit: the foregoing is just a general overview on word-counting with Raku. If you need to analyze JSON files specifically you can use Raku's JSON::Tiny or JSON::Fast modules.

https://docs.raku.org/routine/grep
https://docs.raku.org/type/Str#method_match
https://raku.org

Using Raku (formerly known as Perl_6)

~$ curl https://www.gutenberg.org/cache/epub/5/pg5.txt > US_Constitution.txt 

THEN:

Below grep followed by elems gives the count per "examined unit" of text, wherein for slurp the unit is the entire file, lines is obviously lines, and words is obviously words:

~$ raku -e 'slurp.grep(/ :i the /).elems.put;' US_Constitution.txt 1 ~$ raku -e 'lines.grep(/ :i the /).elems.put;' US_Constitution.txt 443 ~$ raku -e 'words.grep(/ :i the /).elems.put;' US_Constitution.txt 681 

Below match followed by elems gives the count of matches. The "examined unit" doesn't matter so slurp, lines, and words all return the same count:

~$ raku -e 'slurp.match(:global, / :i the /).elems.put;' US_Constitution.txt 681 ~$ raku -e 'lines.match(:global, / :i the /).elems.put;' US_Constitution.txt 681 ~$ raku -e 'words.match(:global, / :i the /).elems.put;' US_Constitution.txt 681 

The regex can be improved to only match the free-standing word "the", as opposed to the those three characters being found within other words, such as "these" and "bathe". General word-boundaries are denoted with either <|w> or <?wb>. Alternatively, you can be even more specific and denote a << left-word-boundary and/or >> a right-word-boundary:

~$ raku -e 'slurp.match(:global, / :i <|w> the <|w> /).elems.put;' US_Constitution.txt 519 ~$ raku -e 'slurp.match(:global, / :i <?wb> the <?wb> /).elems.put;' US_Constitution.txt 519 ~$ raku -e 'slurp.match(:global, / :i << the >> /).elems.put;' US_Constitution.txt 519 #below, remove `:i` (:ignorecase flag, i.e. adverb): ~$ raku -e 'slurp.match(:global, / << the >> /).elems.put;' US_Constitution.txt 458 

Edit: the foregoing is just a general overview on word-counting with Raku. If you need to analyze JSON files specifically you can use Raku's JSON::Tiny or JSON::Fast modules.

https://docs.raku.org/routine/grep
https://docs.raku.org/type/Str#method_match
https://raku.org

added 11 characters in body
Source Link
jubilatious1
  • 3.9k
  • 10
  • 21

Using Raku (formerly known as Perl_6)

~$ curl https://www.gutenberg.org/cache/epub/5/pg5.txt > US_Constitution.txt 

THEN:

Below grep followed by elems gives the count per "examined unit" of text, wherein for slurp the unit is the entire file, lines is obviously lines, and words is obviously words:

~$ raku -e 'slurp.grep(/ :i the /).elems.put;' US_Constitution.txt 1 ~$ raku -e 'lines.grep(/ :i the /).elems.put;' US_Constitution.txt 443 ~$ raku -e 'words.grep(/ :i the /).elems.put;' US_Constitution.txt 681 

Below match followed by elems gives the count of matches. The "examined unit" doesn't matter so slurp, lines, and words all return the same count:

~$ raku -e 'slurp.match(:global, / :i the /).elems.put;' US_Constitution.txt 681 ~$ raku -e 'lines.match(:global, / :i the /).elems.put;' US_Constitution.txt 681 ~$ raku -e 'words.match(:global, / :i the /).elems.put;' US_Constitution.txt 681 

The regex can be improved to only match the free-standing word "the", as opposed to the those three characters being found within other words, such as "these" and "bathe". WordGeneral word-boundaries (general) are denoted with either <|w> or or <?wb>. Alternatively, you can be even more specific and denote a << left-word-boundary and/or >> a right-word-boundary:

~$ raku -e 'words.match(:global, / :i <|w> the <|w> /).elems.put;' US_Constitution.txt 519 ~$ raku -e 'words.match(:global, / :i <?wb> the <?wb> /).elems.put;' US_Constitution.txt 519 ~$ raku -e 'words.match(:global, / :i << the >> /).elems.put;' US_Constitution.txt 519 #below, remove `:i` (:ignorecase flag, i.e. adverb): ~$ raku -e 'words.match(:global, / << the >> /).elems.put;' US_Constitution.txt 458 

Edit: the foregoing is just a general overview on word-counting with Raku. If you need to analyze JSON files specifically you can use Raku's JSON::Tiny or JSON::Fast modules.

https://docs.raku.org/routine/grep
https://docs.raku.org/type/Str#method_match
https://raku.org

Using Raku (formerly known as Perl_6)

~$ curl https://www.gutenberg.org/cache/epub/5/pg5.txt > US_Constitution.txt 

THEN:

Below grep followed by elems gives the count per "examined unit" of text, wherein for slurp the unit is the entire file, lines is obviously lines, and words is obviously words:

~$ raku -e 'slurp.grep(/ :i the /).elems.put;' US_Constitution.txt 1 ~$ raku -e 'lines.grep(/ :i the /).elems.put;' US_Constitution.txt 443 ~$ raku -e 'words.grep(/ :i the /).elems.put;' US_Constitution.txt 681 

Below match followed by elems gives the count of matches. The "examined unit" doesn't matter so slurp, lines, and words all return the same count:

~$ raku -e 'slurp.match(:global, / :i the /).elems.put;' US_Constitution.txt 681 ~$ raku -e 'lines.match(:global, / :i the /).elems.put;' US_Constitution.txt 681 ~$ raku -e 'words.match(:global, / :i the /).elems.put;' US_Constitution.txt 681 

The regex can be improved to only match the free-standing word "the". Word-boundaries (general) are denoted with <|w> or <?wb>. Alternatively, you can be even more specific and denote << left-word-boundary and/or >> right-word-boundary:

~$ raku -e 'words.match(:global, / :i <|w> the <|w> /).elems.put;' US_Constitution.txt 519 ~$ raku -e 'words.match(:global, / :i <?wb> the <?wb> /).elems.put;' US_Constitution.txt 519 ~$ raku -e 'words.match(:global, / :i << the >> /).elems.put;' US_Constitution.txt 519 #below, remove `:i` (:ignorecase flag, i.e. adverb): ~$ raku -e 'words.match(:global, / << the >> /).elems.put;' US_Constitution.txt 458 

Edit: the foregoing is just a general overview on word-counting with Raku. If you need to analyze JSON files specifically you can use Raku's JSON::Tiny or JSON::Fast modules.

https://docs.raku.org/routine/grep
https://docs.raku.org/type/Str#method_match
https://raku.org

Using Raku (formerly known as Perl_6)

~$ curl https://www.gutenberg.org/cache/epub/5/pg5.txt > US_Constitution.txt 

THEN:

Below grep followed by elems gives the count per "examined unit" of text, wherein for slurp the unit is the entire file, lines is obviously lines, and words is obviously words:

~$ raku -e 'slurp.grep(/ :i the /).elems.put;' US_Constitution.txt 1 ~$ raku -e 'lines.grep(/ :i the /).elems.put;' US_Constitution.txt 443 ~$ raku -e 'words.grep(/ :i the /).elems.put;' US_Constitution.txt 681 

Below match followed by elems gives the count of matches. The "examined unit" doesn't matter so slurp, lines, and words all return the same count:

~$ raku -e 'slurp.match(:global, / :i the /).elems.put;' US_Constitution.txt 681 ~$ raku -e 'lines.match(:global, / :i the /).elems.put;' US_Constitution.txt 681 ~$ raku -e 'words.match(:global, / :i the /).elems.put;' US_Constitution.txt 681 

The regex can be improved to only match the free-standing word "the", as opposed to the those three characters being found within other words, such as "these" and "bathe". General word-boundaries are denoted with either <|w> or <?wb>. Alternatively, you can be even more specific and denote a << left-word-boundary and/or >> a right-word-boundary:

~$ raku -e 'words.match(:global, / :i <|w> the <|w> /).elems.put;' US_Constitution.txt 519 ~$ raku -e 'words.match(:global, / :i <?wb> the <?wb> /).elems.put;' US_Constitution.txt 519 ~$ raku -e 'words.match(:global, / :i << the >> /).elems.put;' US_Constitution.txt 519 #below, remove `:i` (:ignorecase flag, i.e. adverb): ~$ raku -e 'words.match(:global, / << the >> /).elems.put;' US_Constitution.txt 458 

Edit: the foregoing is just a general overview on word-counting with Raku. If you need to analyze JSON files specifically you can use Raku's JSON::Tiny or JSON::Fast modules.

https://docs.raku.org/routine/grep
https://docs.raku.org/type/Str#method_match
https://raku.org

added 2 characters in body
Source Link
jubilatious1
  • 3.9k
  • 10
  • 21
Loading
deleted 1 character in body
Source Link
jubilatious1
  • 3.9k
  • 10
  • 21
Loading
added 1 character in body
Source Link
jubilatious1
  • 3.9k
  • 10
  • 21
Loading
added 87 characters in body
Source Link
jubilatious1
  • 3.9k
  • 10
  • 21
Loading
added 185 characters in body
Source Link
jubilatious1
  • 3.9k
  • 10
  • 21
Loading
Source Link
jubilatious1
  • 3.9k
  • 10
  • 21
Loading