1

as the title suggest, I have a problem with my c# code not reading files correctly, because when I try to read characters from file such as: č, ć, š, đ, ž, etc., I get �. I need my program to be able to read all characters even from other languages. I also tried using Encoding parameter with UTF-8 and Default but that also didn't work. Below is an example of code.

string[] lines = File.ReadAllLines(filePath, Encoding.UTF8); 
16
  • 2
    How is your file encoded? Commented Aug 6, 2022 at 18:32
  • @RaymondChen I don't really know, from what I can see on windows notepad when I open the file it says ANSI on the bottom so I guess that? I am reading the lines from an .srt subtitle file. Commented Aug 6, 2022 at 18:35
  • 1
    Your file could not UTF8 then, it could be one of specific, regional code pages. Instead of Encoding.UTF8 try Encoding.GetEncoding with one of supported code pages (I personally bet on 852 or 1250). Commented Aug 6, 2022 at 18:38
  • 1
    This is not a copy and paste answer. You have a text file that is encoded with a non UTF-8 encoding and we cannot tell you about the exact value for this because we don't have the text file. The answer is: You have to specify the encoding used in your file when reading it. There are online tools available that can help you or you simply try some. If you get "NotSupported", you can list the supported enconding like so: learn.microsoft.com/en-us/dotnet/api/… Commented Aug 6, 2022 at 18:54
  • 2
    @AGlasencnik Please read the docs before just asking for an encoding (hint: call CodePagesEncodingProvider.Instance.GetEncoding(1250)) Commented Aug 6, 2022 at 19:06

2 Answers 2

4

The

č, ć, š, đ, ž

suggests here that this could be one of ANSI code pages of Eastern Europe. A recommendation is then to try

CodePagesEncodingProvider.Instance.GetEncoding(1250) 

as the encoding.

Sadly, there's no easy way to guess a code page of a 8-bit file. To overcome such issues, UTF-8 (and other unicode encodings) were designed. Thus, if there's a control on how source files are created, please strongly recommend to have UTF8 (or Unicode but there's no need) files.

Sign up to request clarification or add additional context in comments.

Comments

0

try this

stringbuilder sb = new stringbuilder(); using (System.IO.StreamReader reader = new System.IO.StreamReader(fileName, Encoding.GetEncoding("iso-8859-1"))) { using (System.IO.StreamWriter writer = new System.IO.StreamWriter( outFileName, Encoding.UTF8)) { sb.AppendLine(reader.ReadToEnd()); } } 

3 Comments

Thank you for the response, but unfortunately this didn't work.
what was the error?
There was no error, it just did the same thing as I think Encoding.Default did. On some characters was a character like special "è" which is now instead of "?".

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.