How to find unmatched brackets in a text file?

Question

Today I learned that I can use perl -c filename to find unmatched curly brackets {} in arbitrary files, not necessarily Perl scripts. The problem is, it doesn't work with other types of brackets () [] and maybe <>. I also had experiments with several Vim plugins that claims to help finding unmatched brackets but so far not so good.

I have a text file with quite a few brackets and one of them is missing! Is there any program / script / vim plugin / whatever that can help me identify the unmatched bracket?

Michael Mrozek · Accepted Answer · 2011-03-30 04:44:18Z

27

In Vim you can use [ and ] to quickly travel to nearest unmatched bracket of the type entered in the next keystroke.

So [{ will take you back up to the nearest unmatched "{"; ]) would take you ahead to the nearest unmatched ")", and so on.

edited Mar 30, 2011 at 4:44

Michael Mrozek

95.8k40 gold badges245 silver badges236 bronze badges

answered Mar 29, 2011 at 10:59

Shadur-don't-feed-the-AI

32.3k11 gold badges65 silver badges73 bronze badges

8

I will also add that in vim you can use % (Shift 5, in the USA) to immediately find the matching bracket for the one you're on.

atroon
– atroon

2011-03-29 14:31:59 +00:00
Commented Mar 29, 2011 at 14:31
5

Onfortunately, this does not work for brackets. [[ and ]] actually go to the next open/closed brace in the first column respectively.

madmax1
– madmax1

2019-08-29 12:09:42 +00:00
Commented Aug 29, 2019 at 12:09

Add a comment |

Peter.O · Accepted Answer · 2011-03-30 12:59:25Z

Update 2:
The following script now prints out the line number and column of a mismached bracket. It processes one bracket type per scan (ie. '[]' '<>' '{}' '()' ...)
The script identifies the first ,unmatched right bracket, or the first of any un-paired left bracket... On detecting an erroe, it exits with the line and column numbers

Here is some sample output...

File = /tmp/fred/test/test.in Pair = () *INFO: Group 1 contains 1 matching pairs ERROR: *END-OF-FILE* encountered after Bracket 7. A Left "(" is un-paired in Group 2. Group 2 has 1 un-paired Left "(". Group 2 begins at Bracket 3. see: Line, Column (8, 10) ----+----1----+----2----+----3----+----4----+----5----+----6----+----7 000008 ( ) ( ( ( ) )

Here is the script...

#!/bin/bash # Itentify the script bname="$(basename "$0")" # Make a work dir wdir="/tmp/$USER/$bname" [[ ! -d "$wdir" ]] && mkdir -p "$wdir" # Arg1: The bracket pair 'string' pair="$1" # pair='[]' # test # pair='<>' # test # pair='{}' # test # pair='()' # test # Arg2: The input file to test ifile="$2" # Build a test source file ifile="$wdir/$bname.in" cp /dev/null "$ifile" while IFS= read -r line ;do echo "$line" >> "$ifile" done <<EOF AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA [ ] [ [ [ < > < < > < > > > ----+----1----+----2----+----3----+----4----+----5----+----6 { } { } } } } ( ) ( ( ( ) ) ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ EOF echo "File = $ifile" # Count how many: Left, Right, and Both left=${pair:0:1} rght=${pair:1:1} echo "Pair = $left$rght" # Make a stripped-down 'skeleton' of the source file - brackets only skel="/tmp/$USER/$bname.skel" cp /dev/null "$skel" # Make a String Of Brackets file ... (It is tricky manipulating bash strings with [].. sed 's/[^'${rght}${left}']//g' "$ifile" > "$skel" < "$skel" tr -d '\n' > "$skel.str" Left=($(<"$skel.str" tr -d "$left" |wc -m -l)); LeftCt=$((${Left[1]}-${Left[0]})) Rght=($(<"$skel.str" tr -d "$rght" |wc -m -l)); RghtCt=$((${Rght[1]}-${Rght[0]})) yBkts=($(sed -e "s/\(.\)/ \1 /g" "$skel.str")) BothCt=$((LeftCt+RghtCt)) eleCtB=${#yBkts[@]} echo if (( eleCtB != BothCt )) ; then echo "ERROR: array Item Count ($eleCtB)" echo " should equal BothCt ($BothCt)" exit 1 else grpIx=0 # Keep track of Groups of nested pairs eleIxFir[$grpIx]=0 # Ix of First Bracket in a specific Group eleCtL=0 # Count of Left brackets in current Group eleCtR=0 # Count of Right brackets in current Group errIx=-1 # Ix of an element in error. for (( eleIx=0; eleIx < eleCtB; eleIx++ )) ; do if [[ "${yBkts[eleIx]}" == "$left" ]] ; then # Left brackets are 'okay' until proven otherwise ((eleCtL++)) # increment Left bracket count else ((eleCtR++)) # increment Right bracket count # Right brackets are 'okay' until their count exceeds that of Left brackets if (( eleCtR > eleCtL )) ; then echo echo "ERROR: MIS-matching Right \"$rght\" in Group $((grpIx+1)) (at Bracket $((eleIx+1)) overall)" errType=$rght errIx=$eleIx break elif (( eleCtL == eleCtR )) ; then echo "*INFO: Group $((grpIx+1)) contains $eleCtL matching pairs" # Reset the element counts, and note the first element Ix for the next group eleCtL=0 eleCtR=0 ((grpIx++)) eleIxFir[$grpIx]=$((eleIx+1)) fi fi done # if (( eleCtL > eleCtR )) ; then # Left brackets are always potentially valid (until EOF)... # so, this 'error' is the last element in array echo echo "ERROR: *END-OF-FILE* encountered after Bracket $eleCtB." echo " A Left \"$left\" is un-paired in Group $((grpIx+1))." errType=$left unpairedCt=$((eleCtL-eleCtR)) errIx=$((${eleIxFir[grpIx]}+unpairedCt-1)) echo " Group $((grpIx+1)) has $unpairedCt un-paired Left \"$left\"." echo " Group $((grpIx+1)) begins at Bracket $((eleIxFir[grpIx]+1))." fi # On error, get Line and Column numbers if (( errIx >= 0 )) ; then errLNum=0 # Source Line number (current). eleCtSoFar=0 # Count of bracket-elements in lines processed so far. errItemNum=$((errIx+1)) # error Ix + 1 (ie. "1 based") # Read the skeketon file to find the error line-number while IFS= read -r skline ; do ((errLNum++)) brackets="${skline//[^"${rght}${left}"]/}" # remove whitespace ((eleCtSoFar+=${#brackets})) if (( eleCtSoFar >= errItemNum )) ; then # We now have the error line-number # ..now get the relevant Source Line excerpt=$(< "$ifile" tail -n +$errLNum |head -n 1) # Homogenize the brackets (to be all "Left"), for easy counting mogX="${excerpt//$rght/$left}"; mogXCt=${#mogX} # How many 'Both' brackets on the error line? if [[ "$errType" == "$left" ]] ; then # R-Trunc from the error element [inclusive] ((eleTruncCt=eleCtSoFar-errItemNum+1)) for (( ele=0; ele<eleTruncCt; ele++ )) ; do mogX="${mogX%"$left"*}" # R-Trunc (Lazy) done errCNum=$((${#mogX}+1)) else # errType=$rght mogX="${mogX%"$left"*}" # R-Trunc (Lazy) errCNum=$((${#mogX}+1)) fi echo " see: Line, Column ($errLNum, $errCNum)" echo " ----+----1----+----2----+----3----+----4----+----5----+----6----+----7" printf "%06d $excerpt\n\n" $errLNum break fi done < "$skel" else echo "*INFO: OK. All brackets are paired." fi fi exit

This is awesome, but it seems to always print Line, Column (8, 10) no matter which file I try it on. Also mogXCt=${#mogX} is set but not used anywhere. — Clayton Dukes
– Clayton Dukes, Commented Nov 8, 2017 at 3:39

Community · Accepted Answer · 2017-05-23 11:33:33Z

The best option is vim/gvim as identified by Shadur, but if you want a script, you can check my answer to a similar question on Stack Overflow. I repeat my whole answer here:

If what you are trying to do applies to a general purpose language, then this is a non-trivial problem.

To start with you will have to worry about comments and strings. If you want to check this on a programming language that uses regular expressions, this will make your quest harder again.

So before I can come in and give you any advice on your question I need to know the limits of your problem area. If you can guarantee that there are no strings, no comments and no regular expressions to worry about - or more generically nowhere in the code that brackets can possibly be used other than for the uses for which you are checking that they are balanced - this will make life a lot simpler.

Knowing the language that you want to check would be helpful.

If I take the hypothesis that there is no noise, i.e. that all brackets are useful brackets, my strategy would be iterative:

I would simply look for and remove all inner bracket pairs: those that contain no brackets inside. This is best done by collapsing all lines to a single long line (and find a mechanism to to add line references, should you need to get that information out). In this case the search and replace is pretty simple:

It requires an array:

B["("]=")"; B["["]="]"; B["{"]="}"

And a loop through those elements:

for (b in B) {gsub("[" b "][^][(){}]*[" B[b] "]", "", $0)}

My test file is as follows:

#!/bin/awk ($1 == "PID") { fo (i=1; i<NF; i++) { F[$i] = i } } ($1 + 0) > 0 { count("VIRT") count("RES") count("SHR") count("%MEM") } END { pintf "VIRT=\t%12d\nRES=\t%12d\nSHR=\t%12d\n%%MEM=\t%5.1f%%\n", C["VIRT"], C["RES"], C["SHR"], C["%MEM"] } function count(c[) { f=F[c]; if ($f ~ /m$/) { $f = ($f+0) * 1024 } C[c]+=($f+0) }

My full script (without line referencing) is as follows:

cat test-file-for-brackets.txt | \ tr -d '\r\n' | \ awk \ ' BEGIN { B["("]=")"; B["["]="]"; B["{"]="}" } { m=1; while(m>0) { m=0; for (b in B) { m+=gsub("[" b "][^][(){}]*[" B[b] "]", "", $0) } }; print } '

The output of that script stops on the innermost illegal uses of brackets. But beware: 1/ this script will not work with brackets in comments, regular expressions or strings, 2/ it does not report where in the original file the problem is located, 3/ although it will remove all balanced pairs it stops at the innermost error conditions and keeps all englobbing brackets.

Point 3/ is probably an exploitable result, though I'm not sure of the reporting mechanism you had in mind.

Point 2/ is relatively easy to implement but takes more than a few minutes work to produce, so I'll leave it up to you to figure out.

Point 1/ is the tricky one because you enter a whole new realm of competing sometimes nested beginnings and endings, or special quoting rules for special characters...

Stack Exchange Network

How to find unmatched brackets in a text file?

3 Answers 3

You must log in to answer this question.

Linked

Hot Network Questions

How to find unmatched brackets in a text file?

3 Answers 3

You must log in to answer this question.

Linked

Related

Hot Network Questions