Bash script with grep -w

Question

I am using the following command: x.txt | grep -w 'in' and I am getting answers like: in into ... etc.

I only want the answer: in

How should i modify the command?

You should rephrase your original question to specify that the problem has to do with UTF-8 specific characters. Also, look at "EDIT4" in my answer below where I use sed to work around the problem. — Daniel Andersson
– Daniel Andersson, Commented Apr 8, 2012 at 14:11

Community · Accepted Answer · 2017-05-23 12:41:50Z

First, the command should be

grep -w in x.txt

Your current pipe doesn't work, and it is unnecessary to cat the file just to pipe it. grep can read files directly.

Second, the -w does exactly what you want. From the man page:

-w, --word-regexp

Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character. Similarly, it must be either at the end of the line or followed by a non-word constituent character. Word-constituent characters are letters, digits, and the underscore.

grep will return the complete lines where the word occurs though; that is the function of grep; I mention it to check that you don't get confused because of that.

If you just want to return the word, as you say, you can do

grep -ow in x.txt

since -o returns only the matching part, but that seems quite unfruitful. What are you really trying to do?

EDIT: An explicit example:

$ cat test word in word within word word word $ grep -w in test word in word

"within" is not matched.

EDIT2: Another example:

$ grep '\<in\>' test word in word

EDIT3: It was given that the problem was with Swedish characters. I can reproduce this, even with the environment variable LANG set to sv_SE.UTF-8. https://stackoverflow.com/questions/9260293/egrep-accented-characters-not-recognised-as-part-of-a-word suggests using Perl for UTF-8 specific tasks as the easiest solution.

EDIT4: It seems I can use sed to get this working with Swedish characters:

$ cat test word den word avträden word word word $ sed -n '/\bden\b/p' test word den word $ sed -n '/\<den\>/p' test word den word

It is a pragmatic solution, but hopefully it works for this task.

What I meant was that I only want the line where the complete word is present, i.e. a line where "in" is present but not a line where "within" is present. — NewBo
– NewBo, Commented Apr 8, 2012 at 12:41
Yes, but that is exactly what -w does. Does it not work? Give an explicit example where it does not work. I did a test case just now, and it works just as you want, from all I can tell. — Daniel Andersson
– Daniel Andersson, Commented Apr 8, 2012 at 12:44
grep -w "den" ./sv_enb.txt gives the result den;it avträden;privies (I am using OSX) — NewBo
– NewBo, Commented Apr 8, 2012 at 13:30
OSX could very well be relevant. Have you looked at the manual for grep on your system? Is -w described? Otherwise, you can exchange -w in for '\<in\>', as was trying to be described in the now deleted answer. It could also be some strange unicode error on OSX, but try the above first. — Daniel Andersson
– Daniel Andersson, Commented Apr 8, 2012 at 13:34
grep "\<den\>" ./sv_enb.txt gives den;it avträden;privies. When I read the man pages they are the same as other man pages for grep, no special information about the command in OSX (or OpenBSD). — NewBo
– NewBo, Commented Apr 8, 2012 at 13:36

Stack Exchange Network

Bash script with grep -w

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Bash script with grep -w

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions