How to read text files with ANSI encoding and non-English letters?

Question

I have a file that contains non-English chars and was saved in ANSI encoding using a non-English codepage. How can I read this file in C# and see the file content correctly?

Not working

StreamReader sr=new StreamReader(@"C:\APPLICATIONS.xml",Encoding.ASCII); var ags = sr.ReadToEnd(); sr=new StreamReader(@"C:\APPLICATIONS.xml",Encoding.UTF8); ags = sr.ReadToEnd(); sr=new StreamReader(@"C:\APPLICATIONS.xml",Encoding.Unicode); ags = sr.ReadToEnd();

Working but I need to know what is the code page in advance, which is not possible.

sr=new StreamReader(@"C:\APPLICATIONS.xml",Encoding.GetEncoding(1252)); ags = sr.ReadToEnd();

spottedmahn · Accepted Answer · 2021-11-08 01:37:25Z

76

 var text = File.ReadAllText(file, Encoding.GetEncoding(codePage));

List of codepages : https://learn.microsoft.com/en-us/windows/win32/intl/code-page-identifiers?redirectedfrom=MSDN

edited Nov 8, 2021 at 1:37

spottedmahn

16.2k21 gold badges124 silver badges204 bronze badges

answered Aug 26, 2012 at 13:03

L.B

116k20 gold badges189 silver badges229 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

MichaelT Over a year ago

I will need to know the code page. I don't know it in advance.

MichaelT Over a year ago

I saw that old MS notepad is handling this file with no problems and thinking I missing something.

L.B Over a year ago

@MichaelT How can I detect the encoding/codepage of a text file

gimel Over a year ago

Remember joelonsoftware.com/articles/Unicode.html - The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky

Martijn Over a year ago

Please note that .NET Core only supports ASCII, ISO-8859-1 and Unicode encodings. So you will get an error when trying to use encoding 1252 (ANSI Latin 1; Western European Windows). What works for me is encoding 65000 (utf-7 Unicode).

|

Snizzle · Accepted Answer · 2016-05-10 17:17:44Z

You get the question-mark-diamond characters when your textfile uses high-ANSI encoding -- meaning it uses characters between 127 and 255. Those characters have the eighth (i.e. the most significant) bit set. When ASP.NET reads the textfile it assumes UTF-8 encoding, and that most significant bit has a special meaning.

You must force ASP.NET to interpret the textfile as high-ANSI encoding, by telling it the codepage is 1252:

String textFilePhysicalPath = System.Web.HttpContext.Current.Server.MapPath("~/textfiles/MyInputFile.txt"); String contents = File.ReadAllText(textFilePhysicalPath, System.Text.Encoding.GetEncoding(1252)); lblContents.Text = contents.Replace("\n", "<br />"); // change linebreaks to HTML

Should be the accepted answer IMHO.. Furthermore with .NET core 2.x or .NET Standard you will get a new problem. Codepage need to be registered before <sigh>.. see stackoverflow.com/questions/37870084/…
Please note that .NET Core only supports ASCII, ISO-8859-1 and Unicode encodings. So you will get an error when trying to use encoding 1252 (ANSI Latin 1; Western European Windows). What works for me is encoding 65000 (utf-7 Unicode).

Community · Accepted Answer · 2017-05-23 10:29:58Z

If I remember correctly the XmlDocument.Load(string) method always assumes UTF-8, regardless of the XML encoding. You would have to create a StreamReader with the correct encoding and use that as the parameter.

xmlDoc.Load(new StreamReader( File.Open("file.xml"), Encoding.GetEncoding("iso-8859-15")));

I just stumbled across KB308061 from Microsoft. There's an interesting passage: Specify the encoding declaration in the XML declaration section of the XML document. For example, the following declaration indicates that the document is in UTF-16 Unicode encoding format:

<?xml version="1.0" encoding="UTF-16"?>

Note that this declaration only specifies the encoding format of an XML document and does not modify or control the actual encoding format of the data.

Link Source:

XmlDocument.Load() method fails to decode € (euro)

If the <?xml?> prolog in your XML file says UTF-8, and it's not a proper UTF-8 stream, then what you have got is not well-formed and thereby not XML. Really you need to fix whatever is producing the bogus XML files.

Олександр Добржанський · Accepted Answer · 2021-04-28 18:59:03Z

In my case of c++/clr (WinForms) such approach had a success:

String^ str2 = File::ReadAllText("MyText_cyrillic.txt",System::Text::Encoding::GetEncoding(1251)); textBox1->Text = str2;

James Skemp · Accepted Answer · 2025-01-10 15:23:33Z

using (StreamReader file = new StreamReader(filePath, Encoding.GetEncoding("ISO-8859-1"))) { JsonSerializer serializer = new JsonSerializer(); IList<Type> result= (IList<Type>)serializer.Deserialize(file, typeof(IList<Type>)); }

ANSI Code : ISO-8859-1

sebastin jiffin a j · Accepted Answer · 2017-03-16 10:32:43Z

-1

using (StreamWriter writer = new StreamWriter(File.Open(@"E:\Sample.txt", FileMode.Append), Encoding.GetEncoding(1250))) ////File.Create(path) { writer.Write("Sample Text"); }

answered Mar 16, 2017 at 10:32

sebastin jiffin a j

11

2 Comments

Olcay Ertaş Over a year ago

Little explenation with code helps more. Please explain what this code does.

EJoshuaS - Stand with Ukraine Over a year ago

I have to second what @OlcayErtaş said, especially given that there are several other high-quality answers to this.

Collectives™ on Stack Overflow

How to read text files with ANSI encoding and non-English letters?

6 Answers 6

7 Comments

2 Comments

4 Comments

Comments

Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

7 Comments

2 Comments

4 Comments

Comments

Comments

2 Comments

Linked

Related