Where is less search pattern reference?

Question

Where can I find reference for less regex search patterns?

I want to search file with less using \d to find digits, but it does not seem to understand this wildcard. I tried to find a reference for less regex patterns, but could not find anything, not on man pages and not on the Internet.

ilkkachu · Accepted Answer · 2018-07-03 14:31:35Z

40

less's man page says:

 /pattern Search forward in the file for the N-th line containing the pattern. N defaults to 1. The pattern is a regular expression, as recognized by the regular expression library supplied by your system.

so the accepted syntax may depend on your system. Off-hand, it seems to accept extended regular expressions on my Debian system, see regex(7), and Why does my regular expression work in X but not in Y?

\d is from Perl, and isn't supported by all regex engines. Use [0-9] or [[:digit:]] to match digits. (Their exact behaviour may depend on the locale.)

answered Jul 3, 2018 at 14:31

ilkkachu

148k16 gold badges268 silver badges441 bronze badges

2

> as recognized by the regular expression library supplied by your system. < so… any way to direct less to libpcre?

JamesTheAwesomeDude
– JamesTheAwesomeDude

2021-01-28 20:54:02 +00:00
Commented Jan 28, 2021 at 20:54
I have Debian 10 according to lsb_release. If I run less on a file that contains a's and e's, and use the command /a|e/, less only highlights the a's. If I enter the command /a\|e/ I get pattern not found. This tends to support that less is accepting the basic syntax, according to man re_format. If I made a mistake in trying to invoke extended syntax, please let me know.

cardiff space man
– cardiff space man

2022-06-09 22:51:08 +00:00
Commented Jun 9, 2022 at 22:51
2

@cardiffspaceman, did you put the trailing / there? I don't think less expects that, so it'll look for the string e/ as the other alternative. It has to be extended regexes, since BRE doesn't have the alternation, and in BRE, a|e would look for that literal string

ilkkachu
– ilkkachu

2022-06-09 23:02:36 +00:00
Commented Jun 9, 2022 at 23:02
2

@likkachu leaving off the trailing / does make the command work as expected, extended RE.

cardiff space man
– cardiff space man

2022-06-15 01:09:01 +00:00
Commented Jun 15, 2022 at 1:09

Add a comment |

Kusalananda · Accepted Answer · 2025-03-03 07:07:14Z

The expressions supported by less are documented in the re_format(7) manual (man 7 re_format). That manual describes both the extended regular expressions and the basic regular expressions available on your system. The less utility understands extended regular expressions.

To match a digit, you would use [0-9] or [[:digit:]] (there's a slight difference as the former depends on the current locale). The \d pattern is a Perl-like regular expression (PCRE), not supported by less.

@DeeNewcum, with that edit, that answer becomes true in even narrower cases (less built with --with-regex=posix on OpenBSD, with the caveat that the [[:<:]], [[:>:]] won't work properly). See my answer for details. — Stéphane Chazelas
– Stéphane Chazelas, Commented Mar 3 at 7:06
@StéphaneChazelas I rolled it back, thanks. It passed beneath my radar. — Kusalananda
– Kusalananda ♦, Commented Mar 3 at 7:07

Stéphane Chazelas · Accepted Answer · 2025-03-04 06:27:36Z

If you run ./configure --help from within the source distribution of current versions of less, you'll see:

[...] Optional Packages: [...] --with-regex=LIB select regular expression library (LIB is one of auto,none,gnu, pcre,pcre2,posix,regcmp,re_comp,regcomp,regcomp-local) [auto] [...]

With the default being auto, meaning it will use the first available on the system in this order:

posix, meaning standard regcomp()/regexec() API, called with REG_EXTENDED where available.
pcre2 the replacement for the now deprecated
pcre (perl compatible regular expression)
gnu, using re_compile_pattern()/re_search() API (with re_set_syntax(RE_SYNTAX_POSIX_EXTENDED))
regcmp (an old deprecated API from PWB Unix but still found on some SysV-based commercial or OpenSource Unices.
regcomp, not the POSIX regcomp but an older API from research Unix V8 from the early 80s.
regcomp-local, same as above, but as supplied in the regexp.c shipped with less.
re_comp, re_comp() and re_exec() functions from 4.0BSD still found on modern BSDs.

What regexp flavour comes with that will depend on the API selected above and on the system and implementation of the API.

To find out what API less uses, less --version will tell you.

Or when less is dynamically linked against the library that supplies the regex functions, that is when the code is loaded from shared object files at run time, rather than the code of those library functions being embedded in the less executable, you can use nm -D /path/to/less to check what external Dynamic symbol less needs and look for the regexp functions in those. For instance:

$ nm -D /bin/less | grep -E '\<re[g_]|pcre' U re_compile_pattern@GLIBC_2.2.5 U regfree@GLIBC_2.2.5 U re_search@GLIBC_2.2.5 U re_set_syntax@GLIBC_2.2.5

Would indicate that that less executable was built with --with-regex=gnu (and indeed Debian builds less with --with-regex=gnu, see why).

That's confirmed by:

 $ less --version less 643 (GNU regular expressions) Copyright (C) 1984-2023 Mark Nudelman less comes with NO WARRANTY, to the extent permitted by law. For information about the terms of redistribution, see the file named README in the less distribution. Home page: https://greenwoodsoftware.com/less

Compare with a less built with the default options:

 $ nm -D less | grep -E '\<re[g_]|pcre' U regcomp@GLIBC_2.2.5 U regexec@GLIBC_2.3.4 U regfree@GLIBC_2.2.5 $ ./less --version less 643 (POSIX regular expressions) Copyright (C) 1984-2023 Mark Nudelman less comes with NO WARRANTY, to the extent permitted by law. For information about the terms of redistribution, see the file named README in the less distribution. Home page: https://greenwoodsoftware.com/less

Or with --with-regex=pcre2:

 $ nm -D less | grep -E '\<re[g_]|pcre' U pcre2_code_free_8 U pcre2_compile_8 U pcre2_get_error_message_8 U pcre2_get_ovector_pointer_8 U pcre2_match_8 U pcre2_match_data_create_8 U pcre2_match_data_free_8 $ ./less --version less 643 (PCRE2 regular expressions) Copyright (C) 1984-2023 Mark Nudelman less comes with NO WARRANTY, to the extent permitted by law. For information about the terms of redistribution, see the file named README in the less distribution. Home page: https://greenwoodsoftware.com/less

On most systems, unless specified otherwise with that --with-regex at build time, the API will most likely be the POSIX one with its regcomp()/regexec() functions as that's the first on that list above, and that should be available on all modern systems.

regcomp() is called with the REG_EXTENDED flag if supported by the regexp library, which would mean you get POSIX Extended regular expressions as implemented on the system.

That means you should have at least the extended regexp operators specified by some version or other of the POSIX standard (at least ^$.\()|{}*+? and bracket expressions which have been there since the first POSIX.2 edition in 1992), and that they should work as specified, though note that some of the advanced locale-dependant things in bracket expressions such as [[:class:]], [[=character-equivalence=]] and [[.collating-element.]] are not always implemented especially on embedded systems.

Some implementations will support extensions over what POSIX specifies including some borrowed from perl such as \s, \w, *?, or from ex such as \<, \>, looking at the documentation of your system's regcomp() function should give you pointers as to where to find the documentation for the system's extended regexp syntax.

Beware some like \<, \> (or its BSD equivalent [[:<:]]/[[:>:]] or perl equivalent \b for word boundary) don't work well with the POSIX API. For instance echo ababcab | less +/'\<ab' on systems where \< is supported will highlight the first two abs¹.

Depending on the regexp API, the input may or may not be decoded as text as per the locale's charmap or may only support single byte charsets or only UTF-8 as a multi-byte charmap so a . for instance may not match a multibyte character, but each byte of its encoding.

less's / and ? search operator can also handle the first character(s) you type afterward specially, as indicated in its man page, including:

^R Don't interpret regular expression metacharacters; that is, do a simple textual comparison.

But also printable characters such as @ or ! which you'd need to escape with a \ for them to be passed literally to the regexp engine.

To match an ASCII decimal digit

[0123456789] should work with all APIs and regexp flavours that less may use,
\d will work with pcre/pcre2 ((*UTF)(*UCP)\d or (*UTF)\p{Nd}² would also match on all UTF-8 encoded characters classified as decimal digit by Unicode) including inside bracket expressions, and might also work in a few others such as ast-open's as used by ksh93 which does provide a POSIX API (though generally not within bracket expressions). POSIX extended regexp specification leaves \d unspecified as long as its outside bracket expressions allowing implementations to give it the special meaning they like.
[[:digit:]] would work in a few but depending on locale and/or system may match on some other decimal digits besides the English ASCII ones
[0-9] should match on all of 0123456789 but may also match on other characters that happen to sort between 0 and 9 in the locale, including some not normally classified as decimal digits such as 🆛.

^{¹ Using the GNU or PCRE(2) APIs would normally allow this kind of problem to be avoided, but unfortunately, the way less invokes them when they're used exhibits the problem there as well as when repeating a search, it passes them only the text after the last match like it does (and have to do) using the POSIX API instead of passing the same input but telling them where to start looking. That problem also affects PCRE look-behind operators (edit. Now fixed).}

^{² That (*UTF) to force the input to be interpreted as UTF-8 encoded is only needed for PCRE2 when in UTF-8 locales, not PCRE. There was explicit code to cover for that for PCRE. Its omission for PCRE2 looks like a bug to me (edit. Now fixed).}

Is there a way of figuring out what regular expression library a given less executable was compiled with? — Kusalananda
– Kusalananda ♦, Commented Mar 3 at 7:17
@Kusalananda, nm will help. I'll add a paragraph about that. — Stéphane Chazelas
– Stéphane Chazelas, Commented Mar 3 at 7:38

Stack Exchange Network

Where is less search pattern reference?

3 Answers 3

You must log in to answer this question.

Linked

Hot Network Questions

Where is less search pattern reference?

3 Answers 3

You must log in to answer this question.

Linked

Related

Hot Network Questions