56

I have a file that contains non-English chars and was saved in ANSI encoding using a non-English codepage. How can I read this file in C# and see the file content correctly?

Not working

StreamReader sr=new StreamReader(@"C:\APPLICATIONS.xml",Encoding.ASCII); var ags = sr.ReadToEnd(); sr=new StreamReader(@"C:\APPLICATIONS.xml",Encoding.UTF8); ags = sr.ReadToEnd(); sr=new StreamReader(@"C:\APPLICATIONS.xml",Encoding.Unicode); ags = sr.ReadToEnd(); 

Working but I need to know what is the code page in advance, which is not possible.

sr=new StreamReader(@"C:\APPLICATIONS.xml",Encoding.GetEncoding(1252)); ags = sr.ReadToEnd(); 
0

6 Answers 6

76
 var text = File.ReadAllText(file, Encoding.GetEncoding(codePage)); 

List of codepages : https://learn.microsoft.com/en-us/windows/win32/intl/code-page-identifiers?redirectedfrom=MSDN

Sign up to request clarification or add additional context in comments.

7 Comments

I will need to know the code page. I don't know it in advance.
I saw that old MS notepad is handling this file with no problems and thinking I missing something.
Remember joelonsoftware.com/articles/Unicode.html - The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky
Please note that .NET Core only supports ASCII, ISO-8859-1 and Unicode encodings. So you will get an error when trying to use encoding 1252 (ANSI Latin 1; Western European Windows). What works for me is encoding 65000 (utf-7 Unicode).
|
14

You get the question-mark-diamond characters when your textfile uses high-ANSI encoding -- meaning it uses characters between 127 and 255. Those characters have the eighth (i.e. the most significant) bit set. When ASP.NET reads the textfile it assumes UTF-8 encoding, and that most significant bit has a special meaning.

You must force ASP.NET to interpret the textfile as high-ANSI encoding, by telling it the codepage is 1252:

String textFilePhysicalPath = System.Web.HttpContext.Current.Server.MapPath("~/textfiles/MyInputFile.txt"); String contents = File.ReadAllText(textFilePhysicalPath, System.Text.Encoding.GetEncoding(1252)); lblContents.Text = contents.Replace("\n", "<br />"); // change linebreaks to HTML 

2 Comments

Should be the accepted answer IMHO.. Furthermore with .NET core 2.x or .NET Standard you will get a new problem. Codepage need to be registered before <sigh>.. see stackoverflow.com/questions/37870084/…
Please note that .NET Core only supports ASCII, ISO-8859-1 and Unicode encodings. So you will get an error when trying to use encoding 1252 (ANSI Latin 1; Western European Windows). What works for me is encoding 65000 (utf-7 Unicode).
3

If I remember correctly the XmlDocument.Load(string) method always assumes UTF-8, regardless of the XML encoding. You would have to create a StreamReader with the correct encoding and use that as the parameter.

xmlDoc.Load(new StreamReader( File.Open("file.xml"), Encoding.GetEncoding("iso-8859-15"))); 

I just stumbled across KB308061 from Microsoft. There's an interesting passage: Specify the encoding declaration in the XML declaration section of the XML document. For example, the following declaration indicates that the document is in UTF-16 Unicode encoding format:

<?xml version="1.0" encoding="UTF-16"?> 

Note that this declaration only specifies the encoding format of an XML document and does not modify or control the actual encoding format of the data.

Link Source:

XmlDocument.Load() method fails to decode € (euro)

4 Comments

-@MichaelT can u give a screen shot of your result?
-@MichaelT :try my new answer
If the <?xml?> prolog in your XML file says UTF-8, and it's not a proper UTF-8 stream, then what you have got is not well-formed and thereby not XML. Really you need to fix whatever is producing the bogus XML files.
0

In my case of c++/clr (WinForms) such approach had a success:

String^ str2 = File::ReadAllText("MyText_cyrillic.txt",System::Text::Encoding::GetEncoding(1251)); textBox1->Text = str2; 

Comments

0
using (StreamReader file = new StreamReader(filePath, Encoding.GetEncoding("ISO-8859-1"))) { JsonSerializer serializer = new JsonSerializer(); IList<Type> result= (IList<Type>)serializer.Deserialize(file, typeof(IList<Type>)); } 

ANSI Code : ISO-8859-1

Comments

-1
using (StreamWriter writer = new StreamWriter(File.Open(@"E:\Sample.txt", FileMode.Append), Encoding.GetEncoding(1250))) ////File.Create(path) { writer.Write("Sample Text"); } 

2 Comments

Little explenation with code helps more. Please explain what this code does.
I have to second what @OlcayErtaş said, especially given that there are several other high-quality answers to this.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.