I want to find where a word appears in a text file — as in the number of words into the text that a word occurs — for all instances of that word, but I'm not sure even where to start. I imagine I'll need a loop, and some combination of grep and wc.
As an example, here is a an article about iPhone 11:
On Tuesday, in a sign that Apple is paying attention to consumers who aren’t racing to buy more expensive phones, the company said the iPhone 11, its entry-level phone, would start at $700, compared with $750 for the comparable model last year.
Apple kept the starting prices of its more advanced models, the iPhone 11 Pro and iPhone 11 Pro Max, at $1,000 and $1,100. The company unveiled the new phones at a 90-minute press event at its Silicon Valley campus.
There are 81 words in the text.
jaireaux@macbook:~$ wc -w temp.txt 81 temp.txt The word 'iPhone' appears three times.
jaireaux@macbook:~$ grep -o -i iphone temp.txt | wc -w 3 The output I want would be like this:
jaireaux@macbook:~$ whereword iPhone temp.txt 24 54 57 What would I do to get that output?
aren'ta word? Is$1,000a word? If so then they don't fit the usual criteria that a word is a series of word-constituent characters and the word-constituent characters are letters, digits, and underscore (e.g. see-win the GNUgrepman page, linuxcommand.org/lc3_man_pages/grep1.html, and the meaning of\win regexps for tools that accept such). Ifaren'tisn't a word then does that meanarenandtare both words?'part of a word or not? If you wanted to search foraren'tthen you'd want it to be part of a word but if you also wanted to findiPhonewhenmy iPhone's brokenappears in your text then you wouldn't want it to be part of a word. Lots of different conflicting possibilities to consider when trying to parse natural language!