Skip to main content
formatting, wording, grammar
Source Link
Edwin Dalorzo
  • 78.9k
  • 25
  • 151
  • 211

java Java UTF8 encoding

I have a scenario wherein which some special characters getare stored in thea database (sybase) in the systemsystem's default encoding and I have to develop fetch this data stored and send it to a 3rd partythird-party in utf8UTF-8 encoding using a javaJava program.The condition

There is suchprecondition that the data sent to the 3rd partythird-party should not exceed a defined maxsizemaximum size. Since after utf8upon conversion ,to UTF-8 a special charcharacter may be get replaced by 2 or 3 chars . Socharacters then my logic is likedictates that after getting the data from the database , iI must encode it into the UTF8UTF-8 string and then split the string. FollowingThe following are my observations:-

  1. when any special character like chinese or greek character or any special character > ASCII 256 is encountered and when i convert it into utf8, a single character maybe represented by more than 1 byte, so how should i be sure that if the conversion is proper, for conversion i am using the following

// storing the data from database into string string s = getdata from the database;When any special character like Chinese or Greek characters or any special character > ASCII 256 is encountered and when I convert it into UTF-8, a single character maybe represented by more than 1 byte.

// converting all So how can I be sure that the data in byte array utf8 encoding byte [] b = s.getBytes("UTF-8");

// creating a new string as my split logicconversion is based onproper? For conversion I am using the string formatfollowing

String newString = new String(b,"UTF-8");

// storing the data from database into string string s = getdata from the database; // converting all the data in byte array utf8 encoding byte [] b = s.getBytes("UTF-8"); // creating a new string as my split logic is based on the string format String newString = new String(b,"UTF-8"); 

butBut when I output this newString to the console iI get ?? for the special characters.

So I have some doubts:-

  1. if my conversion logic is wrong , then how should i correct it.
  2. after doing my conversion to utf8, can i cross verify whether my conversion is ok or not? I mean is it the correct message which needs to be sent to the 3rd party, I assume that if the message is not user readable after conversion then there is some problem with the conversion.
  • If my conversion logic is wrong , then how could I correct it.
  • After doing my conversion to UTF-8, can I double-check whether my conversion is OK or not? I mean is it the correct message which needs to be sent to the third-party, I assume that if the message is not user-readable after conversion then there is some problem with the conversion.

wouldWould like to have some viewspoints of view from all the experts out there..... Please

Please do let me know if any further info is needed from my side.

java UTF8 encoding

I have a scenario where some special characters get stored in the database (sybase) in the system default encoding and I have to develop fetch this data stored and send it to a 3rd party in utf8 encoding using a java program.The condition is such that the data sent to the 3rd party should not exceed a defined maxsize. Since after utf8 conversion , a special char may be get replaced by 2 or 3 chars . So my logic is like after getting the data from the database , i encode it into the UTF8 string and then split the string. Following are my observations:-

  1. when any special character like chinese or greek character or any special character > ASCII 256 is encountered and when i convert it into utf8, a single character maybe represented by more than 1 byte, so how should i be sure that if the conversion is proper, for conversion i am using the following

// storing the data from database into string string s = getdata from the database;

// converting all the data in byte array utf8 encoding byte [] b = s.getBytes("UTF-8");

// creating a new string as my split logic is based on the string format

String newString = new String(b,"UTF-8");

but when I output this newString to the console i get ? for the special characters.

So I have some doubts:-

  1. if my conversion logic is wrong , then how should i correct it.
  2. after doing my conversion to utf8, can i cross verify whether my conversion is ok or not? I mean is it the correct message which needs to be sent to the 3rd party, I assume that if the message is not user readable after conversion then there is some problem with the conversion.

would like to have some views from all the experts out there..... Please do let me know if any further info is needed from my side.

Java UTF8 encoding

I have a scenario in which some special characters are stored in a database (sybase) in the system's default encoding and I have to fetch this data and send it to a third-party in UTF-8 encoding using a Java program.

There is precondition that the data sent to the third-party should not exceed a defined maximum size. Since upon conversion to UTF-8 a character may be replaced by 2 or 3 characters then my logic dictates that after getting the data from the database I must encode it into the UTF-8 string and then split the string. The following are my observations:

When any special character like Chinese or Greek characters or any special character > ASCII 256 is encountered and when I convert it into UTF-8, a single character maybe represented by more than 1 byte.

So how can I be sure that the conversion is proper? For conversion I am using the following

// storing the data from database into string string s = getdata from the database; // converting all the data in byte array utf8 encoding byte [] b = s.getBytes("UTF-8"); // creating a new string as my split logic is based on the string format String newString = new String(b,"UTF-8"); 

But when I output this newString to the console I get ? for the special characters.

So I have some doubts:

  • If my conversion logic is wrong , then how could I correct it.
  • After doing my conversion to UTF-8, can I double-check whether my conversion is OK or not? I mean is it the correct message which needs to be sent to the third-party, I assume that if the message is not user-readable after conversion then there is some problem with the conversion.

Would like to have some points of view from all the experts out there.

Please do let me know if any further info is needed from my side.

edited tags
Link
skaffman
  • 404.6k
  • 96
  • 825
  • 775
Source Link

java UTF8 encoding

I have a scenario where some special characters get stored in the database (sybase) in the system default encoding and I have to develop fetch this data stored and send it to a 3rd party in utf8 encoding using a java program.The condition is such that the data sent to the 3rd party should not exceed a defined maxsize. Since after utf8 conversion , a special char may be get replaced by 2 or 3 chars . So my logic is like after getting the data from the database , i encode it into the UTF8 string and then split the string. Following are my observations:-

  1. when any special character like chinese or greek character or any special character > ASCII 256 is encountered and when i convert it into utf8, a single character maybe represented by more than 1 byte, so how should i be sure that if the conversion is proper, for conversion i am using the following

// storing the data from database into string string s = getdata from the database;

// converting all the data in byte array utf8 encoding byte [] b = s.getBytes("UTF-8");

// creating a new string as my split logic is based on the string format

String newString = new String(b,"UTF-8");

but when I output this newString to the console i get ? for the special characters.

So I have some doubts:-

  1. if my conversion logic is wrong , then how should i correct it.
  2. after doing my conversion to utf8, can i cross verify whether my conversion is ok or not? I mean is it the correct message which needs to be sent to the 3rd party, I assume that if the message is not user readable after conversion then there is some problem with the conversion.

would like to have some views from all the experts out there..... Please do let me know if any further info is needed from my side.