I have created a text file on windows system where I think default encoding style is ANSI and contents of the file looks like this :
This is\u2019 a sample text file \u2014and it can .... I saved this file using the default encoding style of windows though there were encoding styles were also available like UTF-8,UTF-16 etc.
Now I want to write a simple java function where I will pass some input string and replace all of the unicodes with the corresponding ascii value.
e.g :- \u2019 should be replaced with "'" \u2014 should be replaced with "-" and so on.
Observation : When i created a string literal like this
String s = "This is\u2019 a sample text file \u2014and it can ...."; My code is working fine , but when I am reading it from the file it is not working. I am aware that in Java String uses UTF-16 encoding .
Below is the code that I am using to read the input file.
FileReader fileReader = new FileReader(new File("C:\\input.txt")); BufferedReader bufferedReader = new BufferedReader(fileReader) String record = bufferedReader.readLine(); I also tried using the InputStream and setting the Charset to UTF-8 , but still the same result.
Replacement code :
public static String removeUTFCharacters(String data){ for(Entry<String,String> entry : utfChars.entrySet()){ data=data.replaceAll(entry.getKey(), entry.getValue()); } return data; } Map :
utfChars.put("\u2019","'"); utfChars.put("\u2018","'"); utfChars.put("\u201c","\""); utfChars.put("\u201d","\""); utfChars.put("\u2013","-"); utfChars.put("\u2014","-"); utfChars.put("\u2212","-"); utfChars.put("\u2022","*"); Can anybody help me in understanding the concept and solution to this problem.
'\','u','2','0','1','9'?Stringafter you read it? Something likefor (i=0; i<record.length(), i++) System.out.printf("%04X ",(int)record.charAt(i));