0

I have a simple txt. file and i'm looking to know if there's a way in java to do what, for example, notepad++ does with file enconding. It can detect the encoding of the file (UTF-8, ASCII, UTF-16, ...) and, if we want to, it can convert it to another encoding without transform the special characters like 'ç' or '€' in strange characters.

Thanks.

2
  • You need to check for the Byte Order Mark(BOM) msdn.microsoft.com/en-us/library/windows/desktop/… Commented Oct 30, 2015 at 9:34
  • Thank you for your comment. Yes, there are encodings with the BOM which are easily detected. But, for example, there's UTF-8 and UTF-8 without BOM. And if doesn't have BOM, the problem remains the same. Commented Oct 30, 2015 at 9:39

2 Answers 2

1

Apache Tika has an EncodingDetector with implementations for different contexts. Typically these implementations use heuristics to determine the charset with some probability. If you are interested in the details you can dive into the source.

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you for your answer! I already wrote a simple code including Apache Tika. But the probability to guess it right it's around 60%. I really wanted to know how notepad does it :\
@JCaspar but this is not your original question. It is a little bit disappointing to make the effort of an answer just to learn that you meant something different. In the end you will need to dive into the sources of notepad++ when you want to know how it is implemented.
Of course i don't mean the exact code of notepad++, probably i will never reach it. It was just an outflow regarding the probability of my class, and that's why i mentioned notepad++ in the original question, because it has an high sucess guess rate. The question is the one i posted in the first place, and thank you for your attention.
0

You can do that in java.Already there is an another discussion about this topic on another thread. Best way to convert text files between character sets?

1 Comment

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.