1

We've got our servers running on CentOS and our Java backend sometimes has to process a file that was originally generated on a Windows machine (by one of our clients) using CP-1252, however in 95%+ use cases, we are processing UTF-8 files.

My question: if we know that certain files will always be UTF-8, and other files will always be CP-1252, is it possible to specify in Java the character set to use for reading in each file? If so:

  • Do we need to do anything at the systems-level for adding CP-1252 to CentOS? If so, what does this involve?
  • What Java objects would we use to apply the correct encoding on a per file basis?

Thanks in advance!

1

2 Answers 2

2

All you need to do is specify what charset/encoding the original file was written in while using the XXXReader(InputStream in, Charset cs). For e.g. look at InputStreamReader

Sign up to request clarification or add additional context in comments.

Comments

1

My question: if we know that certain files will always be UTF-8, and other files will always be CP-1252, is it possible to specify in Java the character set to use for reading in each file?

Assuming you're in charge of the code reading the file, it should be fine. Create a FileInputStream, then wrap it in an InputStreamReader specifying the relevant character encoding.

Do we need to do anything at the systems-level for adding CP-1252 to CentOS? If so, what does this involve?

That depends on what the JRE supports. I've never used CentOS, so I don't know whether it's likely to come with the relevant encoding as part of the JRE. You can use Charset.isSupported to check though, and Charset.availableCharsets to list what's available.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.