-1

I want to find all the files that has the words “Who”, “What”, “Why”, “How”, “When”. All of the words, in any order. Case Insensitive

I tried:

grep -rl --include='*.adoc' "Who" . | xargs grep -l "What" | xargs grep -l "Why" | xargs grep -l "How" | xargs grep -l "When" 

It is giving error like:

grep: Walkthrough/datatable/extras/Scroller/media/data/2500.adoc: No such file or directory 
3
  • 1
    Related: How to run grep with multiple AND patterns? Commented Apr 30, 2024 at 0:46
  • @steeldriver:  I’m considering VTCing that question for ambiguity — but, based on reading it a couple of times, I believe that it is looking for lines that contain all the words/patterns.  And I’m fairly sure that this question is asking for files that contain all the words, not necessarily on the same line  (e.g., find a file that contains “brillig”, “mimsy” and “vorpal”, rather than a line that contains “curious”, “forgotten” and “quaint”). Commented Apr 30, 2024 at 1:10
  • @G-ManSays'ReinstateMonica' I don't disagree with your analysis - I linked it because IMHO some of the methods are still relevant (for example, awk or perl solutions that could be applied with a suitable change to the record separator) Commented Apr 30, 2024 at 1:57

2 Answers 2

5

The problem you are having is that some of your filenames contain spaces. xargs will split that into multiple "filenames".

Add the -0 option to the xargs to make them split on NULs instead of whitespace, and the --null or -Z option to the grep command line to make it use NULs instead of newlines. (but omit the --null on the final grep if you want to read the output...). So putting it all together:

grep -r -iwlZ --include='*.adoc' 'who' . | xargs -r0 grep -iwlZ 'what' | xargs -r0 grep -iwlZ 'why' | xargs -r0 grep -iwlZ 'how' | xargs -r0 grep -iwl 'when' 

Alternatively, eliminate the whitespace and other shell special characters from your filenames.

Otherwise, your solution is basically correct, though the answer by @James is correct that you need the -i option for case insensitive.

4
  • 1
    ... and perhaps the -w option to match whole words not compounds like whatever and Show Commented Apr 30, 2024 at 1:53
  • So how would the final command look like? In my case the whole output is in one line, can not separate line from line. Commented Apr 30, 2024 at 10:43
  • I am now using grep -rl --null -w --include='*.adoc' "Who" . | xargs -0 grep -w --null -l "What" | xargs -0 grep -w --null -l "Why" | xargs -0 grep -w --null -l "How" | xargs -0 grep -w --null -l "When" | tr '\0' '\n'. Is this correct? Commented Apr 30, 2024 at 10:52
  • 1
    @AhmadIsmail if you omit the --null on the final grep, then you won't need the tr. Based on your original post, all the greps would need -i as well to make the matches case insensitive. Commented Apr 30, 2024 at 11:13
2

Using find and GNU awk in slurp mode to process whole files as single records and word boundaries \<, \> equivalent to grep's -w / --word-regexp option:

find . -name '*.adoc' -exec gawk -v RS='^$' -v IGNORECASE=1 ' /\<who\>/ && /\<what\>/ && /\<why\>/ && /\<how\>/ && /\<when\>/ {print FILENAME} ' {} + 

although this appears to be an order of magnitude slower than piping through multiple greps - I guess caching means there's very little overhead in grepping the same file multiple times.

4
  • 3
    (1) Yes, obviously, -i is needed to do case-insensitive search.  But this command will find any file that contains any of the words, not all of them (as the question asks).   (2) For directory-based searches (e.g., recursive ones), it’s better to specify an argument of . rather than *. Commented Apr 30, 2024 at 0:44
  • Yes, he want All of the words... in ONE file. Just realized this. Thanks Commented Apr 30, 2024 at 1:46
  • @StéphaneChazelas Thanks. I wrote this once on 4/30. But steeldriver modified it soon. Maybe some old version g/awk doesn't work with this way? I am not sure about this. Commented May 6, 2024 at 9:56
  • not AFAIK, we'd have to ask @steeldriver why they moved the IGNORECASE definition from -v to a BEGIN statement (without also moving the RS definition there). Commented May 6, 2024 at 10:10

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.