I think I found a solution for your problem. The issue is that there are many different flavours or dialects of Regex: Basic (BRE), Extended (ERE), and Simple (SRE). grep also understands PCRE, which is Perl-compatible Regular Expression. Regex is a rabbithole.
Solution
My solution, working on Ubuntu 24.04 LTS (Noble Numbat), uses precisely the Perl-flavoured stuff. Due to the collation issues cleverly mentioned in the comment by @dave_thompson_085, any solution that losely addresses character ranges will not be portable.
Here's the solution:
$ echo eá | grep -v -P "[\x80-\xff]" $ echo á | grep -v -P "[\x80-\xff]" $ echo e | grep -v -P "[\x80-\xff]" e
How it works
-P: match Perl-style "[\x7f-\xff]": match letters above 127 -v: invert match
With these options, it is rejecting all lines where characters outside the 32-126 range are present. Note that this is operating on a line granularity basis. For character granularity basis, a sed option could be devised.
Other uses
Your use case was to filter on echo outputs. Here's an option for files. I'm using https://raw.githubusercontent.com/stopwords-iso/stopwords-fr/master/stopwords-fr.txt, a file with 691 lines, one word per line. Being French, you'll find plenty of accents and cedillas. Please add a newline at the end of this file in order to make the following math consistent:
$ cat -n stopwords-fr.txt | wc -l 691 $ cat -n stopwords-fr.txt | grep -P "^[\x00-\x7e]*$" | wc -l 591 $ cat -n stopwords-fr.txt | grep -v -P "^[\x00-\x7e]*$" | wc -l 100 $ cat stopwords-fr.txt | grep -v -P "[\x7f-\xff]" | wc -l 591
Notice in this last run that commands #2 and #4 yield the same results, as they are opposite sides of the same operation. On #2, you're asking for all lines that are entirely ("^...$") composed of only ASCII (a positive match), whereas on #4 you're asking for all lines that have no character above or including \x7f (negative match).
This second case requires that you leave out both anchors (^ and $) and the repetition (*). Please re fer to De Morgan's laws for why, for reviewing, or just for fun.
Final notes
Control characters
You won't find many cases of characters under 32 (\x20), which is the SPACE. With this in mind, for the positive matches, you will find that [\x00-\x7e] and [\x20-\x7e] are quite interchangeable.
Really only letters
If you're really interested in letters only, you could further constrain the regular expression by replacing [\x20-\x7e] with '[\x41-\x5a\x61-\x7a]`. This means:
[\x41-\x5a]: UPPERCASE [\x61-\x7a]: lowercase
You could as well separate the two segments with a | (a logical OR), if only to make things more legible. I use this below.
Here are some runs:
$ echo ASD | grep -P "^[\x41-\x5a|\x61-\x7a]*$" ASD $ echo fgh | grep -P "^[\x41-\x5a|\x61-\x7a]*$" fgh $ echo 123 | grep -P "^[\x41-\x5a|\x61-\x7a]*$" $ echo ASDfgh | grep -P "^[\x41-\x5a|\x61-\x7a]*$" ASDfgh $ echo "ASDfgh " | grep -P "^[\x41-\x5a|\x61-\x7a]*$" $
Even SPACE will kill the line. If you want all letters plus SPACE, all you need to do is include the space inside the square brackets:
$ echo "ASDfgh " | grep -P "^ [\x41-\x5a\x61-\x7a]*$" $ echo "ASDfgh " | grep -P "^[ \x41-\x5a\x61-\x7a]*$" ASDfgh ^ ADDED SPACE HERE $
Notice that here I did without the |, for the sake of completeness.
Happy grepping!
grepuse the current locale. If your locale's collation order puts accented characters with the plain letters then they are included.Cis a standard locale that does NOT collate accented letters together e.g. assuming your data is in UTF-8 (likely nowadays especially if it contains nonASCII)(echo e; echo é) | LANG=C.UTF-8 grep '[a-z]'