For example:
sed 's/\u0091//g' file1 Right now, I have to do hexdump to get hex number and put into sed as follows:
$ echo -ne '\u9991' | hexdump -C 00000000 e9 a6 91 |...| 00000003 And then:
$ sed 's/\xe9\xa6\x91//g' file1 Just use that syntax:
sed 's/馑//g' file1 Or in the escaped form:
sed "s/$(echo -ne '\u9991')//g" file1 (Note that older versions of Bash and some shells do not understand echo -e '\u9991', so check first.)
echo 馑 | sed s/...// print anything? sed has the g modifier it replaces all occurence also when they follow each other. Also sed should count it as one character, see: echo -ne "馑" | wc -m gives 1. If you count the bytes (wc -c) it would return 3. Did I understand your question correctly? . mean "one character" or "one byte"? echo 馑 | sed s/...// gives me 馑 (nothing is replaced) en_US.UTF-8, but doesn't under C. Perl can do that:
echo 汉典“馑”字的基本解释 | perl -CS -pe 's/\N{U+9991}/Jin/g' -CS turns on UTF-8 for standard input, output and error.
echo 汉典“馑”字的基本解释 | raku -pe 's:g/\x9991/Jin/' #OUTPUT 汉典“Jin”字的基本解释. A number of versions of sed support Unicode:
I couldn't find information on BSD sed, which I thought was strange, but I think the odds are good that it supports Unicode too. Unfortunately, there is no standard way to tell sed which encoding to use, so each one does this in its own ways.
With recent versions of BASH, just omit the quotes around the sed expression and you can use BASH's escaped strings. Spaces within the sed expression or parts of the sed expression that might be interpreted by BASH as wildcards can be individually quoted.
$ echo "饥馑荐臻" | sed s/$'\u9991'//g 饥荐臻 $'...' type of quotes comes from ksh93 in 1993 while the \uxxxx within them comes from zsh in 2003 (inspired from GNU printf). Added in bash in 4.2 in 2010. So unless you're on macos which still comes with 3.2, that answer would have also been valid in 2015 when that question was asked. This works for me:
$ vim -nEs +'%s/\%u9991//g' +wq file1 It’s a drop more verbose than I’d like; here’s a full explanation:
-n disable vim swap file-E Ex improved mode-s silent mode+'%s/\%u9991//g' execute the substitution command+wq save and exitfile1 in-place, is that correct? Works for me with GNU sed (version 4.2.1):
$ echo -ne $'\u9991' | sed 's/\xe9\xa6\x91//g' | hexdump -C $ echo -ne $'\u9991' | hexdump -C 00000000 e9 a6 91 (As another replacement for sed you could also use GNU awk; but it don't seem necessary.)
Using Raku (formerly known as Perl_6)
~$ echo 汉典“馑”字的基本解释 | raku -pe 's:g/\x9991/Jin/;' 汉典“Jin”字的基本解释 ~$ echo "饥馑荐臻" | raku -pe s:g/'\x9991'//; 饥荐臻 ~$ raku -e 'print "e", "e\x301", "\x000e9";' eéé ~$ raku -e 'say "e\x301" eq "\x000e9";' True ~$ echo "Stephane" | raku -pe 's/e/e\x301/;' Stéphane ~$ echo "Stephane" | raku -pe 's/e/\x000e9/;' Stéphane [Rakudo 2020.10; code tested on GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin14]