0

I was trying to search for lines that start with create and end in ;. The match may span multiple lines. I was trying to use grep for that and after searching internet I found out how to do it.

The following query does it

grep -zioE 'create (\w|\W|\n)*?;' Day1.sql | less # Output create schema sigmoid_db; create table instructor( ID char(5), name varchar(20), dept_name varchar(20), salary numeric(8,2)); 

What I want to ask why wouldn't the same query without \n work? Like the following query should produce the same output

grep -zioE 'create (\w|\W)*?;' Day1.sql | less # Output create schema sigmoid_db; 

My reasoning is \w|\W should match any character. But the second command doesn't print the patterns that span multiple lines.

Can anyone tell why so?

9
  • In any case *? for non-greedy * is a perl operator. Very few grep implementations support it with -E. Commented Feb 10, 2023 at 11:02
  • Try pcregrep -Mio '(?s)^\h*create .*?;'. Commented Feb 10, 2023 at 11:04
  • @StéphaneChazelas I am not asking for query. I want to understand the behaviour of the queries. Thanks for your suggestion though Commented Feb 10, 2023 at 11:10
  • 1
    I can't reproduce with GNU grep 3.7 Commented Feb 10, 2023 at 11:13
  • This is my grep version grep (BSD grep, GNU compatible) 2.6.0-FreeBSD Commented Feb 10, 2023 at 11:14

1 Answer 1

-1

The \n symbol is a Carriage Return. That is a special symbol which separates one line from another.

Any text file is actually a one long string like:

first\nsecond\nthird\n 

which is printed on the screen as

first second third 

The grep splits the input file into lines and process them one by one. If you want to have a multi-line pattern to be found you must use \n in the appropriate place of regular expression.

That is why the pattern create (\w|\W)*?; found only a single-line match.

And no, control symbols (and \n is one of them) are not considered to be a member of groups "letter" (\w) or "non-letter" (\W). They are in a group of their own and have to be used by itself.

6
  • I tried my expression on online regex, it works fine. \W means any character other than [a-zA-Z0-9_] which includes \n as well. Commented Feb 10, 2023 at 15:41
  • @Dhruv Yes, that is possible. There are several regexp variations. With grep you can choose one of the four major regexp dialects (keys -E, -F, -G, -P). And even then, by using different versions of grep you can encounter differences in regexp processing. Your grep in "extended" mode (you have -E key in the command) does not include \n in \W. The online regexp checker most likely using javascript or Perl versions - they do include it. Run your command with -P instead of -E and see the difference. Commented Feb 11, 2023 at 2:39
  • I think it's a bug related to \n as pointed out by @Stéphane Chazelas. Because I tried a similar thing for \t and it worked. eg - echo $'a\tb\nc\td' | grep -zEo 'c[^;]*d' outputs c[8 spaces]d but echo $'a\tb\nc\td' | grep -zEo 'a[^;]*d' outputs nothing. Commented Feb 11, 2023 at 12:06
  • @Dhruv No bugs. Your sample string is treated as two separate lines by grep : "a\tb" and "c\td". So it can find pattern which includes "c" and "d", but letters "a" and "d" are on different lines - so nothing is found. Commented Feb 11, 2023 at 13:02
  • No I used -z flag so it doesn't treat the input as two separate lines. eg - echo $'a\nb' | grep -z '\n' | less the output is a b ^@ (each in a new line). ^@ represents null character. The input is treated as null terminated string. So \n matches a part of line and grep prints the matching line (the whole line itself). Commented Feb 11, 2023 at 13:13

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.