2

I am using the following command: x.txt | grep -w 'in' and I am getting answers like: in into ... etc.

I only want the answer: in

How should i modify the command?

1
  • You should rephrase your original question to specify that the problem has to do with UTF-8 specific characters. Also, look at "EDIT4" in my answer below where I use sed to work around the problem. Commented Apr 8, 2012 at 14:11

1 Answer 1

2

First, the command should be

grep -w in x.txt 

Your current pipe doesn't work, and it is unnecessary to cat the file just to pipe it. grep can read files directly.

Second, the -w does exactly what you want. From the man page:

-w, --word-regexp

Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character. Similarly, it must be either at the end of the line or followed by a non-word constituent character. Word-constituent characters are letters, digits, and the underscore.

grep will return the complete lines where the word occurs though; that is the function of grep; I mention it to check that you don't get confused because of that.

If you just want to return the word, as you say, you can do

grep -ow in x.txt 

since -o returns only the matching part, but that seems quite unfruitful. What are you really trying to do?


EDIT: An explicit example:

$ cat test word in word within word word word $ grep -w in test word in word 

"within" is not matched.

EDIT2: Another example:

$ grep '\<in\>' test word in word 

EDIT3: It was given that the problem was with Swedish characters. I can reproduce this, even with the environment variable LANG set to sv_SE.UTF-8. https://stackoverflow.com/questions/9260293/egrep-accented-characters-not-recognised-as-part-of-a-word suggests using Perl for UTF-8 specific tasks as the easiest solution.


EDIT4: It seems I can use sed to get this working with Swedish characters:

$ cat test word den word avträden word word word $ sed -n '/\bden\b/p' test word den word $ sed -n '/\<den\>/p' test word den word 

It is a pragmatic solution, but hopefully it works for this task.

8
  • What I meant was that I only want the line where the complete word is present, i.e. a line where "in" is present but not a line where "within" is present. Commented Apr 8, 2012 at 12:41
  • 2
    Yes, but that is exactly what -w does. Does it not work? Give an explicit example where it does not work. I did a test case just now, and it works just as you want, from all I can tell. Commented Apr 8, 2012 at 12:44
  • grep -w "den" ./sv_enb.txt gives the result den;it avträden;privies (I am using OSX) Commented Apr 8, 2012 at 13:30
  • OSX could very well be relevant. Have you looked at the manual for grep on your system? Is -w described? Otherwise, you can exchange -w in for '\<in\>', as was trying to be described in the now deleted answer. It could also be some strange unicode error on OSX, but try the above first. Commented Apr 8, 2012 at 13:34
  • grep "\<den\>" ./sv_enb.txt gives den;it avträden;privies. When I read the man pages they are the same as other man pages for grep, no special information about the command in OSX (or OpenBSD). Commented Apr 8, 2012 at 13:36

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.