Skip to main content
Now that I'm home and can look it up, use actual tag name
Source Link

Most probably, that file is encoded in UTF-16, that is with 2 or 4 bytes per characters, probably even with a Byte-Order-Mark at the beginning.

The characters that are shown in your sample (all ASCII characters) are typically encoded on 2 bytes, the first or second of which (depending on whether it's a big-enfian or little-endian UTF-16 encoding) being 0 and the other one being the ASCII/Unicode code. The 0 byte is typically invisible on a terminal, so that text appears OK when dumped there as the rest is just ASCII, but in effect the text contains:

<[NUL]B[NUL]o[NUL]o[NUL]l[NUL]e[NUL]a[NUL]n[NUL]T[NUL]a[NUL]g<[NUL]S[NUL]t[NUL]a[NUL]r[NUL]t[NUL]W[NUL]h[NUL]e[NUL]n[NUL]... 

You'd need to convert that text to your locale's charset for sed to be able to deal with it. Note that UTF-16 cannot be used as a character encoding in a locale on Unix. You won't find a locale that uses UTF-16 as its character encoding.

iconv -f utf-16 < MyFileTask_01.xml | sed 's~<BooleanTag>true<'s~<StartWhenAvailable>true</BooleanTag>~<BooleanTag>false<StartWhenAvailable>~<StartWhenAvailable>false</BooleanTag>~g'StartWhenAvailable>~g' | iconv -t utf-16 > MyFileTask_01.xml.out 

That assumes the input has a BOM. If not, you need to determine if it's big endian or little endian (probably little endian) and change that utf-16 to utf-16le or utf-16be.

If the locale's charset is UTF-8, there shouldn't be anything lost in translation even if the text contains non-ASCII characters.

As Cygwin's sed is typically GNU sed, it will also be able to deal with that type of binary (since it contains NUL bytes) input by itself, so you can also do something like:

LC_ALL=C sed -i 's/t\x00r\x00u\x00e/f\x00a\x00l\x00s\x00e/g' fileTask_01.xml 

The file command should be able to tell you if the input is indeed UTF-16. You can use sed -n l or od -tc to see those hidden NUL characters. Example of little-endian UTF-16 text with BOM:

$ echo true | iconv -t utf-16 | od -tc 0000000 377 376 t \0 r \0 u \0 e \0 \n \0 0000014 $ echo true | iconv -t utf-16 | sed -n l \377\376t\000r\000u\000e\000$ \000$ $ echo true | iconv -t utf-16 | file - /dev/stdin: Little-endian UTF-16 Unicode text, with no line terminators 

To process several files with zsh/bash/ksh93:

set -o pipefail for file in ./*.xml; do cp -ai "$file" "$file.back"bak" && iconv -f utf-16 < "$file.back"bak" | sed 's~<BooleanTag>true<'s~<StartWhenAvailable>true</BooleanTag>~<BooleanTag>false<StartWhenAvailable>~<StartWhenAvailable>false</BooleanTag>~g'StartWhenAvailable>~g' | iconv -t utf-16 > "$file" && rm -f "$file.back"bak" done 

Most probably, that file is encoded in UTF-16, that is with 2 or 4 bytes per characters, probably even with a Byte-Order-Mark at the beginning.

The characters that are shown in your sample (all ASCII characters) are typically encoded on 2 bytes, the first or second of which (depending on whether it's a big-enfian or little-endian UTF-16 encoding) being 0 and the other one being the ASCII/Unicode code. The 0 byte is typically invisible on a terminal, so that text appears OK when dumped there as the rest is just ASCII, but in effect the text contains:

<[NUL]B[NUL]o[NUL]o[NUL]l[NUL]e[NUL]a[NUL]n[NUL]T[NUL]a[NUL]g... 

You'd need to convert that text to your locale's charset for sed to be able to deal with it. Note that UTF-16 cannot be used as a character encoding in a locale on Unix. You won't find a locale that uses UTF-16 as its character encoding.

iconv -f utf-16 < MyFile.xml | sed 's~<BooleanTag>true</BooleanTag>~<BooleanTag>false</BooleanTag>~g' | iconv -t utf-16 > MyFile.xml.out 

That assumes the input has a BOM. If not, you need to determine if it's big endian or little endian (probably little endian) and change that utf-16 to utf-16le or utf-16be.

If the locale's charset is UTF-8, there shouldn't be anything lost in translation even if the text contains non-ASCII characters.

As Cygwin's sed is typically GNU sed, it will also be able to deal with that type of binary (since it contains NUL bytes) input by itself, so you can also do something like:

LC_ALL=C sed -i 's/t\x00r\x00u\x00e/f\x00a\x00l\x00s\x00e/g' file.xml 

The file command should be able to tell you if the input is indeed UTF-16. You can use sed -n l or od -tc to see those hidden NUL characters. Example of little-endian UTF-16 text with BOM:

$ echo true | iconv -t utf-16 | od -tc 0000000 377 376 t \0 r \0 u \0 e \0 \n \0 0000014 $ echo true | iconv -t utf-16 | sed -n l \377\376t\000r\000u\000e\000$ \000$ $ echo true | iconv -t utf-16 | file - /dev/stdin: Little-endian UTF-16 Unicode text, with no line terminators 

To process several files with zsh/bash/ksh93:

set -o pipefail for file in ./*.xml; do cp -ai "$file" "$file.back" && iconv -f utf-16 < "$file.back" | sed 's~<BooleanTag>true</BooleanTag>~<BooleanTag>false</BooleanTag>~g' | iconv -t utf-16 > "$file" && rm -f "$file.back" done 

Most probably, that file is encoded in UTF-16, that is with 2 or 4 bytes per characters, probably even with a Byte-Order-Mark at the beginning.

The characters that are shown in your sample (all ASCII characters) are typically encoded on 2 bytes, the first or second of which (depending on whether it's a big-enfian or little-endian UTF-16 encoding) being 0 and the other one being the ASCII/Unicode code. The 0 byte is typically invisible on a terminal, so that text appears OK when dumped there as the rest is just ASCII, but in effect the text contains:

<[NUL]S[NUL]t[NUL]a[NUL]r[NUL]t[NUL]W[NUL]h[NUL]e[NUL]n[NUL]... 

You'd need to convert that text to your locale's charset for sed to be able to deal with it. Note that UTF-16 cannot be used as a character encoding in a locale on Unix. You won't find a locale that uses UTF-16 as its character encoding.

iconv -f utf-16 < Task_01.xml | sed 's~<StartWhenAvailable>true</StartWhenAvailable>~<StartWhenAvailable>false</StartWhenAvailable>~g' | iconv -t utf-16 > Task_01.xml.out 

That assumes the input has a BOM. If not, you need to determine if it's big endian or little endian (probably little endian) and change that utf-16 to utf-16le or utf-16be.

If the locale's charset is UTF-8, there shouldn't be anything lost in translation even if the text contains non-ASCII characters.

As Cygwin's sed is typically GNU sed, it will also be able to deal with that type of binary (since it contains NUL bytes) input by itself, so you can also do something like:

LC_ALL=C sed -i 's/t\x00r\x00u\x00e/f\x00a\x00l\x00s\x00e/g' Task_01.xml 

The file command should be able to tell you if the input is indeed UTF-16. You can use sed -n l or od -tc to see those hidden NUL characters. Example of little-endian UTF-16 text with BOM:

$ echo true | iconv -t utf-16 | od -tc 0000000 377 376 t \0 r \0 u \0 e \0 \n \0 0000014 $ echo true | iconv -t utf-16 | sed -n l \377\376t\000r\000u\000e\000$ \000$ $ echo true | iconv -t utf-16 | file - /dev/stdin: Little-endian UTF-16 Unicode text, with no line terminators 

To process several files with zsh/bash/ksh93:

set -o pipefail for file in ./*.xml; do cp -ai "$file" "$file.bak" && iconv -f utf-16 < "$file.bak" | sed 's~<StartWhenAvailable>true</StartWhenAvailable>~<StartWhenAvailable>false</StartWhenAvailable>~g' | iconv -t utf-16 > "$file" && rm -f "$file.bak" done 
added 343 characters in body
Source Link
Stéphane Chazelas
  • 586.2k
  • 96
  • 1.1k
  • 1.7k

Most probably, that file is encoded in UTF-16, that is with 2 or 4 bytes per characters, probably even with a Byte-Order-Mark at the beginning.

The characters that are shown in your sample (all ASCII characters) are typically encoded on 2 bytes, the first or second of which (depending on whether it's a big-enfian or little-endian UTF-16 encoding) being 0 and the other one being the ASCII/Unicode code. The 0 byte is typically invisible on a terminal, so that text appears OK when dumped there as the rest is just ASCII, but in effect the text contains:

<[NUL]B[NUL]o[NUL]o[NUL]l[NUL]e[NUL]a[NUL]n[NUL]T[NUL]a[NUL]g... 

You'd need to convert that text to your locale's charset for sed to be able to deal with it. Note that UTF-16 cannot be used as a character encoding in a locale on Unix. You won't find a locale that uses UTF-16 as its character encoding.

iconv -f utf-16 < MyFile.xml | sed 's~<BooleanTag>true</BooleanTag>~<BooleanTag>false</BooleanTag>~g' | iconv -t utf-16 > MyFile.xml.out 

That assumes the input has a BOM. If not, you need to determine if it's big endian or little endian (probably little endian) and change that utf-16 to utf-16le or utf-16be.

If the locale's charset is UTF-8, there shouldn't be anything lost in translation even if the text contains non-ASCII characters.

As Cygwin's sed is typically GNU sed, it will also be able to deal with that type of binary (since it contains NUL bytes) input by itself, so you can also do something like:

LC_ALL=C sed -i 's/t\x00r\x00u\x00e/f\x00a\x00l\x00s\x00e/g' file.xml 

The file command should be able to tell you if the input is indeed UTF-16. You can use sed -n l or od -tc to see those hidden NUL characters. Example of little-endian UTF-16 text with BOM:

$ echo true | iconv -t utf-16 | od -tc 0000000 377 376 t \0 r \0 u \0 e \0 \n \0 0000014 $ echo true | iconv -t utf-16 | sed -n l \377\376t\000r\000u\000e\000$ \000$ $ echo true | iconv -t utf-16 | file - /dev/stdin: Little-endian UTF-16 Unicode text, with no line terminators 

To process several files with zsh/bash/ksh93:

set -o pipefail for file in ./*.xml; do cp -ai "$file" "$file.back" && iconv -f utf-16 < "$file.back" | sed 's~<BooleanTag>true</BooleanTag>~<BooleanTag>false</BooleanTag>~g' | iconv -t utf-16 > "$file" && rm -f "$file.back" done 

Most probably, that file is encoded in UTF-16, that is with 2 or 4 bytes per characters, probably even with a Byte-Order-Mark at the beginning.

The characters that are shown in your sample (all ASCII characters) are typically encoded on 2 bytes, the first or second of which (depending on whether it's a big-enfian or little-endian UTF-16 encoding) being 0 and the other one being the ASCII/Unicode code. The 0 byte is typically invisible on a terminal, so that text appears OK when dumped there as the rest is just ASCII, but in effect the text contains:

<[NUL]B[NUL]o[NUL]o[NUL]l[NUL]e[NUL]a[NUL]n[NUL]T[NUL]a[NUL]g... 

You'd need to convert that text to your locale's charset for sed to be able to deal with it. Note that UTF-16 cannot be used as a character encoding in a locale on Unix. You won't find a locale that uses UTF-16 as its character encoding.

iconv -f utf-16 < MyFile.xml | sed 's~<BooleanTag>true</BooleanTag>~<BooleanTag>false</BooleanTag>~g' | iconv -t utf-16 > MyFile.xml.out 

That assumes the input has a BOM. If not, you need to determine if it's big endian or little endian (probably little endian) and change that utf-16 to utf-16le or utf-16be.

If the locale's charset is UTF-8, there shouldn't be anything lost in translation even if the text contains non-ASCII characters.

As Cygwin's sed is typically GNU sed, it will also be able to deal with that type of binary (since it contains NUL bytes) input by itself, so you can also do something like:

LC_ALL=C sed -i 's/t\x00r\x00u\x00e/f\x00a\x00l\x00s\x00e/g' file.xml 

The file command should be able to tell you if the input is indeed UTF-16. You can use sed -n l or od -tc to see those hidden NUL characters. Example of little-endian UTF-16 text with BOM:

$ echo true | iconv -t utf-16 | od -tc 0000000 377 376 t \0 r \0 u \0 e \0 \n \0 0000014 $ echo true | iconv -t utf-16 | sed -n l \377\376t\000r\000u\000e\000$ \000$ $ echo true | iconv -t utf-16 | file - /dev/stdin: Little-endian UTF-16 Unicode text, with no line terminators 

Most probably, that file is encoded in UTF-16, that is with 2 or 4 bytes per characters, probably even with a Byte-Order-Mark at the beginning.

The characters that are shown in your sample (all ASCII characters) are typically encoded on 2 bytes, the first or second of which (depending on whether it's a big-enfian or little-endian UTF-16 encoding) being 0 and the other one being the ASCII/Unicode code. The 0 byte is typically invisible on a terminal, so that text appears OK when dumped there as the rest is just ASCII, but in effect the text contains:

<[NUL]B[NUL]o[NUL]o[NUL]l[NUL]e[NUL]a[NUL]n[NUL]T[NUL]a[NUL]g... 

You'd need to convert that text to your locale's charset for sed to be able to deal with it. Note that UTF-16 cannot be used as a character encoding in a locale on Unix. You won't find a locale that uses UTF-16 as its character encoding.

iconv -f utf-16 < MyFile.xml | sed 's~<BooleanTag>true</BooleanTag>~<BooleanTag>false</BooleanTag>~g' | iconv -t utf-16 > MyFile.xml.out 

That assumes the input has a BOM. If not, you need to determine if it's big endian or little endian (probably little endian) and change that utf-16 to utf-16le or utf-16be.

If the locale's charset is UTF-8, there shouldn't be anything lost in translation even if the text contains non-ASCII characters.

As Cygwin's sed is typically GNU sed, it will also be able to deal with that type of binary (since it contains NUL bytes) input by itself, so you can also do something like:

LC_ALL=C sed -i 's/t\x00r\x00u\x00e/f\x00a\x00l\x00s\x00e/g' file.xml 

The file command should be able to tell you if the input is indeed UTF-16. You can use sed -n l or od -tc to see those hidden NUL characters. Example of little-endian UTF-16 text with BOM:

$ echo true | iconv -t utf-16 | od -tc 0000000 377 376 t \0 r \0 u \0 e \0 \n \0 0000014 $ echo true | iconv -t utf-16 | sed -n l \377\376t\000r\000u\000e\000$ \000$ $ echo true | iconv -t utf-16 | file - /dev/stdin: Little-endian UTF-16 Unicode text, with no line terminators 

To process several files with zsh/bash/ksh93:

set -o pipefail for file in ./*.xml; do cp -ai "$file" "$file.back" && iconv -f utf-16 < "$file.back" | sed 's~<BooleanTag>true</BooleanTag>~<BooleanTag>false</BooleanTag>~g' | iconv -t utf-16 > "$file" && rm -f "$file.back" done 
added 589 characters in body
Source Link
Stéphane Chazelas
  • 586.2k
  • 96
  • 1.1k
  • 1.7k

Most probably, that file is encoded in UTF-16, that is with 2 or 4 bytes per characters, probably even with a Byte-Order-Mark at the beginning.

The characters that are shown in your sample and are(all ASCII characters) are typically encoded withon 2 bytes, the first or second byteof which (depending on whether it's a big-enfian or little-endian UTF-16 encoding) being 0 and the other onesone being the ASCII/Unicode code. The 0 byte is typically invisible on a terminal, so that text appears OK when dumpdumped there as the rest is just ASCII, but in effect the text contains:

<[NUL]B[NUL]o[NUL]o[NUL]l[NUL]e[NUL]a[NUL]n[NUL]T[NUL]a[NUL]g... 

You'd need to convert that text to your locale's charset for sed to be able to deal with it. Note that UTF-16 cannot be used as a character encoding in a locale on Unix. You won't find a locale that uses UTF-16 as its character encoding.

iconv -f utf-16 < MyFile.xml | sed 's~<BooleanTag>true</BooleanTag>~<BooleanTag>false</BooleanTag>~g' | iconv -t utf-16 > MyFile.xml.out 

That assumes the input has a BOM. If not, you need to determine if it's big endian or little endian (probably little endian) and change that utf-16 to utf-16le or utf-16be.

If the locale's charset is UTF-8, there shouldn't be anything lost in translation even if the text contains non-ASCII characters.

As Cygwin's sed is typically GNU sed, it will also be able to deal with that type of binary (since it contains NUL bytes) input by itself, so you can also do something like:

LC_ALL=C sed -i 's/t\x00r\x00u\x00e/f\x00a\x00l\x00s\x00e/g' file.xml 

The file command should be able to tell you if the input is indeed UTF-16. You can use sed -n l or od -tc to see those hidden NUL characters. Example of little-endian UTF-16 text with BOM:

$ echo true | iconv -t utf-16 | od -tc 0000000 377 376 t \0 r \0 u \0 e \0 \n \0 0000014 $ echo true | iconv -t utf-16 | sed -n l \377\376t\000r\000u\000e\000$ \000$ $ echo true | iconv -t utf-16 | file - /dev/stdin: Little-endian UTF-16 Unicode text, with no line terminators 

Most probably, that file is encoded in UTF-16, that is with 2 or 4 bytes per characters, probably even with a Byte-Order-Mark at the beginning.

The characters that are shown in your sample and are ASCII characters are typically encoded with the first or second byte (depending on whether it's a big-enfian or little-endian UTF-16 encoding) being 0 and the other ones being the ASCII/Unicode code. The 0 byte is typically invisible on a terminal, so that text appears OK when dump there, but in effect the text contains:

<[NUL]B[NUL]o[NUL]o[NUL]l[NUL]e[NUL]a[NUL]n[NUL]T[NUL]a[NUL]g 

You'd need to convert that text to your locale's charset for sed to be able to deal with it. Note that UTF-16 cannot be used as a character encoding on Unix.

iconv -f utf-16 < MyFile.xml | sed 's~<BooleanTag>true</BooleanTag>~<BooleanTag>false</BooleanTag>~g' | iconv -t utf-16 > MyFile.xml.out 

That assumes the input has a BOM. If not, you need to determine if it's big endian or little endian (probably little endian) and change that utf-16 to utf-16le or utf-16be.

If the locale's charset is UTF-8, there shouldn't be anything lost in translation even if the text contains non-ASCII characters.

As Cygwin's sed is typically GNU sed, it will also be able to deal with that type of binary (since it contains NUL bytes) input, so you can also do something like:

LC_ALL=C sed -i 's/t\x00r\x00u\x00e/f\x00a\x00l\x00s\x00e/g' file.xml 

Most probably, that file is encoded in UTF-16, that is with 2 or 4 bytes per characters, probably even with a Byte-Order-Mark at the beginning.

The characters that are shown in your sample (all ASCII characters) are typically encoded on 2 bytes, the first or second of which (depending on whether it's a big-enfian or little-endian UTF-16 encoding) being 0 and the other one being the ASCII/Unicode code. The 0 byte is typically invisible on a terminal, so that text appears OK when dumped there as the rest is just ASCII, but in effect the text contains:

<[NUL]B[NUL]o[NUL]o[NUL]l[NUL]e[NUL]a[NUL]n[NUL]T[NUL]a[NUL]g... 

You'd need to convert that text to your locale's charset for sed to be able to deal with it. Note that UTF-16 cannot be used as a character encoding in a locale on Unix. You won't find a locale that uses UTF-16 as its character encoding.

iconv -f utf-16 < MyFile.xml | sed 's~<BooleanTag>true</BooleanTag>~<BooleanTag>false</BooleanTag>~g' | iconv -t utf-16 > MyFile.xml.out 

That assumes the input has a BOM. If not, you need to determine if it's big endian or little endian (probably little endian) and change that utf-16 to utf-16le or utf-16be.

If the locale's charset is UTF-8, there shouldn't be anything lost in translation even if the text contains non-ASCII characters.

As Cygwin's sed is typically GNU sed, it will also be able to deal with that type of binary (since it contains NUL bytes) input by itself, so you can also do something like:

LC_ALL=C sed -i 's/t\x00r\x00u\x00e/f\x00a\x00l\x00s\x00e/g' file.xml 

The file command should be able to tell you if the input is indeed UTF-16. You can use sed -n l or od -tc to see those hidden NUL characters. Example of little-endian UTF-16 text with BOM:

$ echo true | iconv -t utf-16 | od -tc 0000000 377 376 t \0 r \0 u \0 e \0 \n \0 0000014 $ echo true | iconv -t utf-16 | sed -n l \377\376t\000r\000u\000e\000$ \000$ $ echo true | iconv -t utf-16 | file - /dev/stdin: Little-endian UTF-16 Unicode text, with no line terminators 
Source Link
Stéphane Chazelas
  • 586.2k
  • 96
  • 1.1k
  • 1.7k
Loading