My C# code doesn't read special characters from file

Question

as the title suggest, I have a problem with my c# code not reading files correctly, because when I try to read characters from file such as: č, ć, š, đ, ž, etc., I get �. I need my program to be able to read all characters even from other languages. I also tried using Encoding parameter with UTF-8 and Default but that also didn't work. Below is an example of code.

string[] lines = File.ReadAllLines(filePath, Encoding.UTF8);

@RaymondChen I don't really know, from what I can see on windows notepad when I open the file it says ANSI on the bottom so I guess that? I am reading the lines from an .srt subtitle file. — AGlasencnik
– AGlasencnik, Commented Aug 6, 2022 at 18:35
Your file could not UTF8 then, it could be one of specific, regional code pages. Instead of Encoding.UTF8 try Encoding.GetEncoding with one of supported code pages (I personally bet on 852 or 1250). — Wiktor Zychla
– Wiktor Zychla, Commented Aug 6, 2022 at 18:38
This is not a copy and paste answer. You have a text file that is encoded with a non UTF-8 encoding and we cannot tell you about the exact value for this because we don't have the text file. The answer is: You have to specify the encoding used in your file when reading it. There are online tools available that can help you or you simply try some. If you get "NotSupported", you can list the supported enconding like so: learn.microsoft.com/en-us/dotnet/api/… — Christoph Lütjen
– Christoph Lütjen, Commented Aug 6, 2022 at 18:54
@AGlasencnik Please read the docs before just asking for an encoding (hint: call CodePagesEncodingProvider.Instance.GetEncoding(1250)) — Wiktor Zychla
– Wiktor Zychla, Commented Aug 6, 2022 at 19:06

Wiktor Zychla · Accepted Answer · 2022-08-06 19:20:35Z

The

č, ć, š, đ, ž

suggests here that this could be one of ANSI code pages of Eastern Europe. A recommendation is then to try

CodePagesEncodingProvider.Instance.GetEncoding(1250)

as the encoding.

Sadly, there's no easy way to guess a code page of a 8-bit file. To overcome such issues, UTF-8 (and other unicode encodings) were designed. Thus, if there's a control on how source files are created, please strongly recommend to have UTF8 (or Unicode but there's no need) files.

Laxmikant · Accepted Answer · 2022-08-06 18:56:46Z

0

try this

stringbuilder sb = new stringbuilder(); using (System.IO.StreamReader reader = new System.IO.StreamReader(fileName, Encoding.GetEncoding("iso-8859-1"))) { using (System.IO.StreamWriter writer = new System.IO.StreamWriter( outFileName, Encoding.UTF8)) { sb.AppendLine(reader.ReadToEnd()); } }

answered Aug 6, 2022 at 18:56

Laxmikant

5984 silver badges12 bronze badges

3 Comments

AGlasencnik Over a year ago

Thank you for the response, but unfortunately this didn't work.

Laxmikant Over a year ago

what was the error?

AGlasencnik Over a year ago

There was no error, it just did the same thing as I think Encoding.Default did. On some characters was a character like special "è" which is now instead of "?".

Collectives™ on Stack Overflow

My C# code doesn't read special characters from file

2 Answers 2

Comments

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Linked

Related