We've got our servers running on CentOS and our Java backend sometimes has to process a file that was originally generated on a Windows machine (by one of our clients) using CP-1252, however in 95%+ use cases, we are processing UTF-8 files.
My question: if we know that certain files will always be UTF-8, and other files will always be CP-1252, is it possible to specify in Java the character set to use for reading in each file? If so:
- Do we need to do anything at the systems-level for adding CP-1252 to CentOS? If so, what does this involve?
- What Java objects would we use to apply the correct encoding on a per file basis?
Thanks in advance!