9

(question re-written to be more useful)

I have a batch script which will interact with command line programs, take their output, and then perform decisions based on that output.

One of the programs I need to interact with is a fairly old one, so I am stuck with it's quirks. When I pipe it's output to a text file, that text file is in the UTF-16 LE encoding.

Here's how I do that:

program -parameter > resultat.txt 

Under Windows 7, this encoding seems to be troublesome for cmd/batch work, because you cannot read the contents of such a text file into a variable.

Here is an example, (this only uses the first line of the text file):

set /p Var=<resultat.txt echo %Var% cmd /k 

It just echoes nothing, saying "ECHO is on".

Also, if you use "type" to print the contents of the text file, there is weird spacing, suggesting it's not properly being processed.

Attempted solution [1] - Powershell

After research, I found that powershell can convert txt encodings, using the following method:

Get-Content -Path "path\file.txt" | Out-File -FilePath "path\new_file.txt" -Encoding <encoding> 

Using Notepad++, I did some research, what encoding do I need to attain?

UTF-8 (no BOM), which is equivalent to "ANSI" in Notepad, is the encoding I need, loading text files to variables, and the "type" command, both work flawlessly when this encoding is used. How do I know? If I open the piped text file in Notepad, and resave as "ANSI" encoding, everything works flawlessly.

-Encoding ascii 

...Is the option which should have worked, as this produces a result in UTF-8 (no BOM), but it seems to be unable to handle UTF-16 LE source encoding format, and does not produce useable output. When I opened the resultant file in Notepad++ it identified it as UTF-16 LE "Unix", which was odd.

Funny enough: if I resave piped txt file as "unicode" in Notepad, this produces a UTF-16 LE BOM file, which works with the above conversion parameter to produce a perfect UTF-8 file. At this point, I extended my research to also ask the question "How can I add BOM to UTF-16 LE encoding?" As I could combine such knowledge with the powershell knowledge. However, spoiler alert: I was unsuccessful in finding a decent answer.

-Encoding utf8 

...Is another similar option, but it produces a UTF-8 BOM file (the equivalent of saving as "UTF-8" in Notepad), this produces an output with corruption.

So to sum up:

I am looking for a command line tool/method (open or proprietary, 1st or 3rd party), to be able to achieve a convesion as follows:

  1. UTF-16 LE - Windows(CR LF) straight to UTF-8 - Windows(CR LF)

  2. UTF-16 LE - Windows(CR LF) to UTF-16 LE BOM - Windows(CR LF)

13
  • Does Converting text file to UTF-8 on Windows command prompt - Super User answer your question? Commented May 29, 2023 at 17:49
  • Are you using the chcp command? Try chcp 437 (United States) to see if with it the program generates an ANSI file. Commented May 29, 2023 at 17:49
  • @DavidPostill that result produces a UTF-8 BOM result which is not displayed properly and gives garbled cmd result. But thanks for the reply. "Set-Content" certainly looked different to "Out-File" which I demonstrated here, but it seems it does the same thing Commented May 29, 2023 at 18:01
  • @harrymc I did briefly come across some solutions which used "chcp" however I didn't have luck using them, but from the sounds of it, maybe it deserves revisiting. Could you potentially provide a working example, or link one? I will do some research later today when I have time. if I run "chcp" it tells me I am using code page 866. Commented May 29, 2023 at 18:06
  • Code page 866 is "DOS Cyrillic Russian", so there is no reason that it will generate UTF16. However, try putting the line chcp 437 before the command. Commented May 29, 2023 at 18:09

4 Answers 4

4

Path of least resistance: use libiconv for Windows

After about a day of searching (back when the question was asked), I noticed that Stackoverflow had a tag called [utf16-le] and I decided it would be worth my time to go through all of the threads using this tag.

I found a solution which shows off a program called "iconv", and even the full command needed to carry out the conversion. Unlike the powershell method, you need to accurately specify input encoding as well as the output encoding, but also unlike the powershell method, it produces a good result.

Here is the helpful thread:

https://stackoverflow.com/questions/17287713/using-iconv-to-convert-from-utf-16le-to-utf-8

iconv is not a Windows utility, but it was apparently ported to Windows, and whilst the question linked above was asked with the [Linux] tag, one of the answers contained an example which is somehow entirely compatible with Windows:

iconv -f UTF-16LE -t UTF-8 infile > outfile 

I downloaded the files from here:

https://sourceforge.net/projects/gnuwin32/files/libiconv/1.9.2-1/

I only needed the "bin" (binary) and "dep" (dependencies), extract the contents of both into the same folder, and you are good to go.

3
find /v "" sourcefile > destinationFile 

this will read the contents of a sourcefile, and print any line that DOES NOT match "" (nothing) - thereby printing the contents of the entire file.

the find command seems to parse UTF-16 fine for me, and also happens to output plain ascii, so, your destination file will contain the same text as source, but will be ascii.

Edit: addressing @ellen22's comment about getting rid of the undesirable output of the find command - just execute from a for loop and skip those lines: ex:

 for /f "skip=2 usebackq" %%A in (`find /v "" sourcefile`) do @(echo %%A >> destinationFile) 

caveat: batch will now open the file, write to it, and close it for each line. To speed this up, put it all in its own block:

( for /f "skip=2 usebackq" %%A in (`find /v "" sourcefile`) do @(echo %%A) ) > destinationFile now batch will "unfold" the whole for-loop before writing to the file. faster! 
3
  • This is awesome thank you!! Unfortunately it's starting the file with a blank line, file name and a line of dashes - but it's much simpler than any other solution. Commented Nov 18, 2024 at 16:26
  • Yeah, I usually use a for block to skip the undesirable output of the "find" command at the top that you've mentioned. for brevity I left this out of my answer, but when I use this trick I typically put it in a for loop to skip those lines: ex. ` for /f "skip=2 usebackq" %%A in (find /v "" sourcefile) do @(echo %%A >> destinationFile) ` Commented Nov 20, 2024 at 4:30
  • in addition: put that in its own block to cause batch to "unfold" the for loop all at once and prevent the file from opening/closing for each line, and instead write all at once. modified my answer to reflect these additions Commented Nov 20, 2024 at 4:32
2

The type command will work if the UTF16 file does not contain a BOM:

type utf16.txt >ascii.txt 

But as in your case the generated file does have a BOM, a sure-fire method for converting the file uses PowerShell:

powershell "Get-Content 'utf16.txt' | Out-File 'ascii.txt' -Encoding ascii" 

Notice the use of two types of quotes to avoid the need to escape the inner quotes.

8
  • Hello! Do you happen to know if this solution differs from the powershell method I mentioned in my original post? It's just, on the surface it looks very similar. Just to be clear (I tried to make sure my post was detailed, but I considered less the prospect of making that information easily understandable.. so I apologise), the file created by piping recycle.exe output to a txt file, is UTF-16 LE, without BOM. That is the format I need to convert from. When I ran powershell's get-content out-file, it was in a seperate PS1 file, your suggestion is to do it within batch without a "middle-file"? Commented May 30, 2023 at 9:21
  • I'm only making suggestions, the decision is yours what to do. Commented May 30, 2023 at 9:29
  • It's fair enough, no problem, I am just looking at this and it looks like something I've tried already, that's all. I am not at home currently so will only be able to try this later today. Just to clarify, you suggest running "powershell -command "command in quotes"" from cmd/batch - right? On an unrelated note - do you think it's worth me tidying up the question, and maybe opening it in stackoverflow? Is this "superuser" material or more advanced? Commented May 30, 2023 at 9:40
  • Hey again - I found a solution: someone on stackoverflow had the exact same query many years ago, and they were recommended to use gnuwin32 ("iconv") program. I was able to successfully use this program via CMD to get the correct conversion. Would you be offended if I were to delete my thread, as I see it as absolutely redundant now... Commented Jun 2, 2023 at 12:18
  • I won't be offended, but a thread on stackoverflow doesn't mean that your post is redundant. You should rather post here your own answer and mark it as accepted. Commented Jun 2, 2023 at 12:29
0

For the "add missing BOM" option: I don't have 7, but in 8.1 (or 10):

  • open notepad, don't enter anything, and save as Unicode (UTF16LE in 10); this creates a file containing only littleendian BOM

  • copy bomfile+bomless_utf16le newfile

The result works for me with type and powershell get-content.

But it's not as devious as Charles' find /v ""!

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.