I want to remove UTF-8 BOM from a file using this command:
sed '1 s/\xEF\xBB\xBF//' old.java > tmp.java But it did not work. I am running with ksh on AIX 7.1
In POSIX, the behaviour is unspecified for \x in a basic regexp. Some implementations use it to introduce hex byte representations, some (like yours) treat it like x.
POSIXly, you could do:
(export LC_ALL=C sed "s/$(printf '\357\273\277')//") < file.in > file.out Here, you may also have some luck with:
< file.in iconv -t UTF-16LE | iconv -f UTF-16 > file.out I can't say if that would work in AIX, but with GNU iconv, UTF-16 means UTF-16 with BOM, while UTF-16LE means UTF-16 little endian, so the second iconv would strip the UTF-16LE BOM produced by the first (would also work with UTF-16BE).
AIX sed does not understand escape sequence characters, as the AIX sed document said, it only know ASCII characters. So you should use another tools.
tail
tail -c +4 old.java > tmp.java awk
awk '{NR==1{sub(/^\xef\xbb\xbf/,"")}1' old.java > tmp.java Same issue here. Needed to remove BOM (UTF-16LE) from a file. Ended up using "tr" with octal codes for the 0xFF 0xFE:
$ cat old.csv ÿþ"SET01"|"0000001"|"2016-11-15"|""|"0"|""|""|"Data01" $ echo "ibase=16\nobae=8\nFF" | bc 377 $ echo "ibase=16\nobae=8\nFE" | bc 376 $ cat old.csv | tr -d "\377\376" "SET01"|"0000001"|"2016-11-15"|""|"0"|""|""|"Data01" tr -d '\377\376' would remove all the occurrences of all the \377 and \376 bytes. Fine only as long as the file otherwise only contains ASCII characters. (you'll probably want to remove the NUL bytes as well if that's UTF-16 indeed). Here, if that's UTF-16, I would do iconv -f UTF-16 instead. still cannot comment - so if you would like to try using GNU iconv - as a more robust solution, without killing the programs that depend on AIX iconv I may have a package that will work for you.
michael@x071:[/home/michael]ar -X64 tv /usr/lib/libiconv.a rwxr-xr-x 0/0 1032868 Aug 21 16:19 2016 libiconv.so.2 r--r--r-- 0/0 159410 Aug 21 20:09 2016 shr4_64.o michael@x071:[/home/michael]ar -X32 tv /usr/lib/libiconv.a rwxr-xr-x 0/0 1010856 Aug 21 16:21 2016 libiconv.so.2 r--r--r-- 0/0 117276 Aug 21 20:09 2016 shr4.o r--r--r-- 0/0 117526 Aug 21 20:09 2016 shr.o (64-bit) Programs depending on AIX iconv will be looking for /usr/lib/libiconv.a(shr4_64.o) (32-bit look for /usr/lib/libiconv.a(shr4.o)) while those, such as GNU iconv look to /usr/lib/libiconv.a(libiconv.so.2)
michael@x071:[/home/michael]ldd /usr/bin/iconv /usr/bin/iconv needs: /usr/lib/libc.a(shr.o) /usr/lib/libiconv.a(shr4.o) /unix /usr/lib/libcrypt.a(shr.o) michael@x071:[/home/michael]ldd /opt/bin/iconv /opt/bin/iconv needs: /usr/lib/libc.a(shr_64.o) /usr/lib/libiconv.a(libiconv.so.2) /unix /usr/lib/libcrypt.a(shr_64.o) You can get GNU iconv for AIX via http://www.aixtools.net/index.php/libiconv and can be side-by-side with AIX iconv.
sedimplementations don't interpret\xsequences, so your command is probably replacing litteral "backslash x E F ..." sequences. You'll have to include litteral binary characters in your sed command, I guess.