10

I need to programatically change the encoding of a set of *nix scripts to UTF-8 from Java. I won't write anything to them, so I'm trying to find what's the easiest|fastest way to do this. The files are not too many and are not that big. I could:

  • "Write" an empty string using an OutputStream with UTF-8 set as encoding
  • Since I'm already using FileUtils (from Apache Commons), I could read|write the contents of these files, passing UTF-8 as encoding

Not a big deal, but has anyone run into this case before? Are there any cons on either approach?

6
  • 2
    The entire file must be read and re-written except in the case of normal 7-bit clean ASCII files (and such) that do not require an initial BOM. The BOM will shift the stream as well as any encoding changes. Commented Apr 10, 2012 at 17:16
  • But Unixes default encoding is UTF-8 I believe.What is the encoding of your scripts. Commented Apr 10, 2012 at 17:18
  • @user384706 Perhaps it is more appropriate to say that non-BOM streams are taken as UTF-8 by many "text" applications... a "default encoding" is more appropriate to talk about in relationship to a particular language/library/API. Commented Apr 10, 2012 at 17:28
  • Scripts are coming with ISO-8859-1. @pst thanks fo clarifying option 1 is not an option :) Commented Apr 10, 2012 at 17:37
  • @pst stick an answer in so we can get this off the unanswered list Commented Apr 10, 2012 at 18:28

1 Answer 1

15

As requested, and since you're using commons io, here is example code (error checking to the wind):

import java.io.File; import java.io.IOException; import org.apache.commons.io.FileUtils; public class Main { public static void main(String[] args) throws IOException { String filename = args[0]; File file = new File(filename); String content = FileUtils.readFileToString(file, "ISO8859_1"); FileUtils.write(file, content, "UTF-8"); } } 
Sign up to request clarification or add additional context in comments.

3 Comments

Is UTF-8 necessary?I think that Java's default encoding is UTF-8 anyway
there are a couple of things to say here. First the default is unlikely tio be utf8, and second that because this code is all about encodings it is best to be explicit. stackoverflow.com/questions/1006276/…
WARNING: For some reason this cuts files longer than several KB, essentially deleting the file's contents beyond a certain point

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.