7

I have a file that has strings hand typed as \u00C3. I want to create a unicode character that is being represented by that unicode in java. I tried but could not find how. Help.

Edit: When I read the text file String will contain "\u00C3" not as unicode but as ASCII chars '\' 'u' '0' '0' '3'. I would like to form unicode character from that ASCII string.

2
  • How is the file formatted? Are those strings one to a line, or what? Commented Feb 14, 2011 at 21:15
  • Yes, each one in it's own line (sorry I can't reproduce line breaks bere) \u0103 \u0104 \u0105 \u01CD Commented Feb 14, 2011 at 21:16

5 Answers 5

6

I picked this up somewhere on the web:

String unescape(String s) { int i=0, len=s.length(); char c; StringBuffer sb = new StringBuffer(len); while (i < len) { c = s.charAt(i++); if (c == '\\') { if (i < len) { c = s.charAt(i++); if (c == 'u') { // TODO: check that 4 more chars exist and are all hex digits c = (char) Integer.parseInt(s.substring(i, i+4), 16); i += 4; } // add other cases here as desired... } } // fall through: \ escapes itself, quotes any character but u sb.append(c); } return sb.toString(); } 
Sign up to request clarification or add additional context in comments.

2 Comments

Worked like charm - thank you I was struggling good 4 hours. if I may what did you search for in google to find the solution :)
As I recall, it was something like java unescape string
2

Dang, I was a bit slow. Here's my solution:

package ravi; import java.io.BufferedReader; import java.io.FileReader; import java.util.regex.Pattern; public class Ravi { private static final Pattern UCODE_PATTERN = Pattern.compile("\\\\u[0-9a-fA-F]{4}"); public static void main(String[] args) throws Exception { BufferedReader br = new BufferedReader(new FileReader("ravi.txt")); while (true) { String line = br.readLine(); if (line == null) break; if (!UCODE_PATTERN.matcher(line).matches()) { System.err.println("Bad input: " + line); } else { String hex = line.substring(2,6); int number = Integer.parseInt(hex, 16); System.out.println(hex + " -> " + ((char) number)); } } } } 

Comments

2

StringEscapeUtils.unescapeJava work fine:)

see: https://commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/apache/commons/lang/StringEscapeUtils.html#unescapeJava(java.lang.String)

Comments

1

If you want to escape only unicode and nothing else, programmatically, you can create a function:

private String unicodeUnescape(String string) { return new UnicodeUnescaper().translate(string); } 

This uses org.apache.commons.text.translate.UnicodeUnescaper.

Comments

0

Probably something along the lines:

Scanner s = new Scanner( new File("myNumbers") ); while( s.hasNextLine() ) { System.out.println( Character.valueOf( (char)(int) Integer.valueOf( s.nextLine().substring(2,6), 16 ) ) ); 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.