search for a line that contains unmatched delimiters

Question

When running pdflatex and it crashes, my .aux file contains lines like

\@writefile{toc}{\contentsline {section}{\numberline {B

The only way I can think of to identify such lines is to count if the number of {'s exceeds the number of }'s in any line. I want to be able to check the .aux file generated by pdflatex and determine if it contains such lines. Is there a way to do this using grep, awk or some other utility? Of course, if there's an alternative, more efficient way of identifying lines like these, I'd be delighted.

thanks for any advise

What sort of output would you like? You could print filename, line number, line (like grep -Hn does), possibly stopping at the first match. Or do you want an exit status indicating success/failure for a single input file? — rowboat
– rowboat, Commented Aug 1, 2021 at 6:49

rowboat · Accepted Answer · 2021-08-01 07:53:43Z

Here's another brief one:

awk '{while(gsub(/{[^{}]*}/, "")){ }} /[{}]/ {exit 1}'

or maybe

awk '{x=$0;while(gsub(/{[^{}]*}/, "")){ }} /[{}]/ {print FILENAME,FNR,x;nextfile}'

This removes all balanced {...}, and takes some action if there is still a { or } character.

This was just what I needed. just for completeness, I've put this line in my bash script cat myAux.aux | awk '{while(gsub(/{[^{}]*}/, "")){ }} /[{}]/ {exit 1}' and now can condition on whether $? is 0 or 1. — Leo Simon
– Leo Simon, Commented Aug 3, 2021 at 2:24

score 4 · Accepted Answer · 2021-08-22 18:30:18Z

Yes, it is possible (and very precise) in grep (with PCRE), but not simple to understand.

grep -Px '((?>[^{}]+|\{(?1)\})*)'

Or, defining the input ($str) and an appropriate regex ($re) we can do:

$ printf '%s\n' "$str" | grep -vP "${re//[ $'\n']/}"

How that works?

Present day regexes could match balanced constructs (not most old regex engines).

In PCRE, recursion is the key to do that.

To match a balanced set this structure is needed:

b(m|(?R))*e

Where b is the beginning pattern ({ in your case),
e is the end pattern (} in your case),
and m is the middle pattern (something like [^{}]+ in your case).

{([^{}]*+|(?R))*}

As could be seen in action here.

But that is a non-anchored match that recurse the whole regex (?R).

An anchored version (to match the whole line) could be obtained by using the -x option of grep.

A complete solution that allows for other text outside the braces becomes a bit complex, so, using an option of Perl regexes to ignore whitespace we can write. And changing the regex structure to (somewhat slower):

((m+|b(?1)e)*)

The original structure b(m|(?R))*e.

(?(DEFINE)(?'nonbrace' [^{}\n] )) # Define a non-brace (?(DEFINE)(?'begin' { )) # Define the start text (?(DEFINE)(?'end' } )) # define the end text (?(DEFINE)(?'middle' (?&nonbrace) )) # define the allowed text # inside the braces (?(DEFINE)(?'nested' # define a nested ((?&begin)((?&middle)|(?&nested))*(?&end)) # pattern )) # here ^((?&nonbrace)*+(?&nested))*+(?&nonbrace)*$ # finally, use this regex.

As tested here.

Or the alternative structure ((m+|b(?1)e)*)

(?(DEFINE)(?'nonbrace' [^{}\n] )) # Define a non-brace (?(DEFINE)(?'begin' \{ )) # Define the start text (?(DEFINE)(?'end' \} )) # define the end text (?(DEFINE)(?'middle' (?&nonbrace) )) # define the allowed text # inside the braces (?(DEFINE)(?'nested' # define a nested ( (?&middle)++ | (?&begin)(?&nested)(?&end) )* )) ^(?&nested)$ # finally, use this regex.

as tested here

Note that once the very long regex with many DEFINE gets compiled by the regex engine it works at the same speed as a shorter one.

The added feature is that the description is clearer for humans (or, at least, I hope so).

That shows a clearer description of the regex, generally easier to understand to humans, but uses quite deep regex features from PCRE.

Script

To use all those ideas with grep (GNU and PCRE), use this shell (bash) example:

#!/bin/bash str=$' a abc {} {a} {{aa}} {a{b}} {a{bb}a} {a{b{c}b}a} n{a{}}nn{b{bb}} \@writefile{toc}}}}{\\contentsline {section}{\\numberline {B \@writefile{toc}{\contentsline {section}{\\numberline {B Previous lines contain mismatched braces. This and the next line don\'t. \@writefile{toc}{\\contentsline {section}{\\numberline {B}}} ' re=$' (?(DEFINE)(?\'nonbrace\' [^{}\\n] )) (?(DEFINE)(?\'begin\' { )) (?(DEFINE)(?\'end\' } )) (?(DEFINE)(?\'middle\' (?&nonbrace) )) (?(DEFINE)(?\'nested\' ((?&begin)((?&middle)|(?&nested))*(?&end)) )) ^((?&nonbrace)*(?&nested))*(?&nonbrace)*$ ' printf '%s\n' "$str" | grep -P "${re//[ $'\n']/}" a abc {} {a} {{aa}} {a{b}} {a{bb}a} {a{b{c}b}a} n{a{}}nn{b{bb}} Previous lines contain mismatched braces. This and the next line don't. \@writefile{toc}{\contentsline {section}{\numberline {B}}}

Test results

And finally, to get all non-matching lines reverse the output with -v (source the script above if you need to execute what follows inside a running shell):

$ printf '%s\n' "$str" | grep -vP "${re//[ $'\n']/}" \@writefile{toc}}}}{\contentsline {section}{\numberline {B \@writefile{toc}{ntentsline {section}{\numberline {B

Thanks @StéphaneChazelas Alternative nested structure explained. — user232326
– user232326, Commented Aug 22, 2021 at 18:25

Stéphane Chazelas · Accepted Answer · 2021-08-05 06:07:35Z

A sed translation of @rowboat's awk approach:

sed 'h; s/[^{}]//g; :1 s/{}//g; t1 /./!d; g'

That is:

sed ' h; # save a copy of the line on the hold space s/[^{}]//g; # remove all characters but { and } :1 s/{}//g; # remove the {}s (so starting with inner ones) # and loop until there's no more {} to remove t1 /./!d; # if the pattern space does not contain any single # character, that means all {} were matched. Delete g; # otherwise retrieve the saved copy which will be printed # at the end of the cycle'

That's POSIX, but like the awk one would be much slower than solutions using perl-like recursive regexps such as:

grep -Pvx '((?:[^{}]++|\{(?1)\})*+)'

guest_7 · Accepted Answer · 2021-08-05 04:56:51Z

Using awk:

for every record, initialize sum to zero.
start inspecting a line character by character.
increment sum when just saw an opening brace , decrement sum when seen a closing brace.
as soon as the sum dips below zero, STOP.
upon reaching the end of for loop , either midway due to negative sum or normally, exit with a nonzero status if the sum was nonzero.
NOTE: this approach is not the same as counting the braces . Here we stop processing as soon as the sum goes negative.

awk 'BEGIN { a["{"]=1;a["}"]=-1 } { for (s=i=0; i++<length();) if (0>(s += a[substr($0,i,1)])) break } s {exit 1}' file

The same thing in perl

perl -lne ' local(%h,$^R) = qw/{ 1 } -1/; /(?:(?:([{}])(?{$^R+=$h{$1}})|[^{}]+)(?(?{$^R<0})(?!)))+/g; exit 1 if $^R; ' file

Perl has powerful regex features, it's almost as if it's a mini programming language of its own. Inside the regex, we are doing the looping, updating the sum, and monitoring when the sum dips below zero.

Stack Exchange Network

search for a line that contains unmatched delimiters

4 Answers 4

How that works?

Script

Test results

You must log in to answer this question.

Hot Network Questions

search for a line that contains unmatched delimiters

4 Answers 4

How that works?

Script

Test results

You must log in to answer this question.

Related

Hot Network Questions