4

I have a simple text file processing tool written in C#, the skeleton looks like this:

 using (StreamReader reader = new StreamReader(absFileName, true)) // auto detect encoding using (StreamWriter writer = new StreamWriter(tmpFileName, false, reader.CurrentEncoding)) // open writer with the same encoding as reader { string line; while ((line = reader.ReadLine()) != null) { // do something with line writer.WriteLine(line); } } 

Most of the files it operates on are ASCII files, with the occasional UTF-16 here and there. I want to preserve the file encoding, the newly created file should have the same encoding as the file being read - that's why I open StreamWriter with the CurrentEncoding of reader.

My problem is some of the UTF-16 files lack preamble and after the StreamReader is opened it has CurrentEncoding set to UTF-8, which causes the writer to be opened in UTF-8 mode. When debugging I can see the reader changes its CurrentEncoding property to UTF-16 after the first call to ReadLine, but by that time the writer is already opened.

I can think of a few workarounds (opening the writer later or going over the source file twice - the first one just to detect encoding), but thought I'd ask experts for opinion first. Note that I'm not concerned with code pages of the ASCII files, I'm only concerned with ASCII/UTF-8/UTF-16 encodings.

1 Answer 1

4

I'd try doing a reader.Peek() before opening the writer - that ought to be sufficient in your case, I think.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.