1

I'm working on a big java web application in Eclipse, whose files have different encodings: some are in UTF-8, others in Cp1252, yet others are in ISO-8859-1 (with no distinction between JSP's or java source files, or CSS) — but I know the encoding of each file.

I'm converting the project to Maven, and this is a great occasion to turn all of them to UTF-8.
Of course I don't want to lose a single character (so fully automated conversions do not apply here).

How should I go about it? Is there a tool that can help me ensure I don't lose any special character?
The webapp is in Italian, so, especially in JSP's, there could be lots of accented letters (probably not everywhere HTML entities have been used).

The project is in Eclipse, but I can use an external editor if that could make the conversion easier.

2
  • Do you know for a fact that some of the files contain non-ASCII characters (i.e. outside 0x20-0x7F)? Commented Sep 11, 2014 at 21:15
  • @JimGarrison Absolutely! Accented letters are undoubtely present in many files (and other characters could be aswell). That's why I need something that will warn me if there's any character that could be converted into some other. Commented Sep 11, 2014 at 21:20

2 Answers 2

1

It's very easy to write code to convert encodings - although I'd expect there are tools to do it anyway. Simply:

  • Create one FileInputStream to the existing file, and wrap it in an InputStreamReader with the appropriate encoding
  • Create one FileOutputStream to the new file, and wrap it in an OutputStreamWriter with the appropriate encoding
  • Loop over the reader, reading characters into a buffer and writing out the contents of that buffer (just as many characters as you read) until you've read the whole file
  • Close all resources (automatic with a try-with-resources block)

The first two steps are simpler with Files.newBufferedReader and Files.newBufferedWriter, too.

Sign up to request clarification or add additional context in comments.

3 Comments

Wouldn't this be the same as opening a file in a good editor (i.e. Notepad++) with a specific encoding and then save the file in another?
@watery: Save in another editor? You shouldn't need to do that - you should be able to save within the same editor. And yes, you can do that - but given that you're asking on Stack Overflow rather than Superuser, I assumed you were asking about doing it programmatically. For example, if you have 100 files you don't want to do them all manually...
No I meant to save in another encoding (supposing to have an editor that lets you choose the encoding). Oh, about being on SO rather that SU, well, it's just that I'm used to SO. Feel free to vote for a move if you think another community would be better :-)
0

Converting a single file can be done with the iconv function (I used LibIconv for Windows).

It lets you specify the source and destinations encodings, and warns when characters can't be converted.

I tried it with a couple of source files and all the accented letters were correctly converted in UTF-8 from Cp1252.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.