4

Attempting to use pcregrep to print the first line after a blank line. For example, contents of file

first line second line 

I need second line to be printed. Here are a few tests using the same regular expression throughout

With Python 2.7

python -c "import re; print re.search(r'(?<=\n\n).*?$',\ open('file').read(), re.MULTILINE).group()" second line 

With GNU grep 2.16

grep -oPz '(?<=\n\n).*?$' file second line 

With pcregrep version 8.12

pcregrep -Mo '(?<=\n\n).*?$' file (no output) 

Based on a few tests, pcregrep supports lookbehind assertions in general but does not seem to be able to deal with \n within lookbehind assertions in particular. \n within lookahead assertions presents no problem.

Tested on RHEL as well as Ubuntu. Any ideas?

1
  • 1
    Fedora 19's version, pcregrep version 8.32 2012-11-30 does the same thing. Commented May 2, 2014 at 0:54

1 Answer 1

4

Apparently you can specify to pcregrep what type of newline you want it to look for. The -N switch does this when usin PCRE mode.

-N newline-type, --newline=newline-type The PCRE library supports five different conventions for indicating the ends of lines. They are the single-character sequences CR (carriage return) and LF (linefeed), the two-character sequence CRLF, an "anycrlf" convention, which recognizes any of the preceding three types, and an "any" convention, in which any Unicode line ending sequence is assumed to end a line. The Unicode sequences are the three just mentioned, plus VT (vertical tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS (paragraph separator, U+2029).

When the PCRE library is built, a default line-ending sequence is specified. This is normally the standard sequence for the operating system. Unless otherwise specified by this option, pcregrep uses the library's default. The possible values for this option are CR, LF, CRLF, ANYCRLF, or ANY. This makes it possible to use pcregrep to scan files that have come from other environments without having to modify their line endings. If the data that is being scanned does not agree with the convention set by this option, pcregrep may behave in strange ways. Note that this option does not apply to files specified by the -f, --exclude-from, or --include-from options, which are expected to use the operating system's standard newline sequence.

Example

$ pcregrep -Mo -N CRLF '(?<=\n\n).*?$' sample.txt second line $ 

Other odd behavior

Interestingly changing from a lookbehind to a lookahead yields results:

$ pcregrep -Mo '(?>\n\n).*?$' sample.txt second line $ 
6
  • slm, great observation, this seems to work. But funnily enough, pcregrep can handle \n in look-ahead assertions without the need for -N CRLF! Additionally, the newlines in my file are LF not CRLF which makes the apparent success of this technique all the more puzzling! Commented May 2, 2014 at 1:21
  • @slm also doesn't appear to explain why pcregrep -Mo '\n\n\K.*$' file does appear to work (at least on my Ubuntu 12.04 box - pcregrep version 8.12) Commented May 2, 2014 at 1:28
  • @steeldriver - that works for me as well on 8.32. Commented May 2, 2014 at 1:35
  • @1_CR - it's interesting that it doesn't appear to work with ANYCRLF or ANY either. Commented May 2, 2014 at 1:39
  • Source for pcregrep.c is here: ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/doxyhtml/… Commented May 2, 2014 at 1:42

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.