13

Let's say that on the C++ side my function takes a variable of type jstring named myString. I can convert it to an ANSI string as follows:

const char* ansiString = env->GetStringUTFChars(myString, 0); 

is there a way of getting

const wchar_t* unicodeString = ...

10 Answers 10

13

If this helps someone... I've used this function for an Android project:

std::wstring Java_To_WStr(JNIEnv *env, jstring string) { std::wstring value; const jchar *raw = env->GetStringChars(string, 0); jsize len = env->GetStringLength(string); const jchar *temp = raw; while (len > 0) { value += *(temp++); len--; } env->ReleaseStringChars(string, raw); return value; } 

An improved solution could be (Thanks for the feedback):

std::wstring Java_To_WStr(JNIEnv *env, jstring string) { std::wstring value; const jchar *raw = env->GetStringChars(string, 0); jsize len = env->GetStringLength(string); value.assign(raw, raw + len); env->ReleaseStringChars(string, raw); return value; } 
Sign up to request clarification or add additional context in comments.

2 Comments

Neat, though I suspect loading the wstring with a buffer in one go would be more efficient that one character at a time.
Does the C++ compiler notice that you are returning an automatic, and allocate it on the heap and not the stack?
4

JNI has a GetStringChars() function as well. The return type is const jchar*, jchar is 16-bit on win32 so in a way that would be compatible with wchar_t. Not sure if it's real UTF-16 or something else...

6 Comments

Do you happen to know if jchar's byte ordering is compatible with the Win32 wchar_t one? It should be, but probably good to be sure. :-)
jchar is typedef'ed to unsigned short. I haven't tried it myself but my guess would be "yes".
char == jchar == unsigned 16 bits
char == unsigned 16 bits on which platform?
To keep this a little less confusing, in this discussion "char" is the built-in data type in the ISO C and C++ languages, not in Java.
|
4

And who frees wsz? I would recommend STL!

std::wstring JavaToWSZ(JNIEnv* env, jstring string) { std::wstring value; if (string == NULL) { return value; // empty string } const jchar* raw = env->GetStringChars(string, NULL); if (raw != NULL) { jsize len = env->GetStringLength(string); value.assign(raw, len); env->ReleaseStringChars(string, raw); } return value; } 

4 Comments

Not a great solution unless using C++11 since the wstring will be returned by value. (Obviously post C++11 it'll be move constructed which would be efficient)
value.assign(raw, len); is not valid. I think it should be value.assign(raw, raw + len); but I haven't tested yet.
Great - worked for me perfectly in a C# -> C++/CLI -> JNI -> Java application!
Don't you have to call ReleaseStringChars regardless of success of GetStringChars otherwise the jstring may be pinned and 'leak'
3

A portable and robust solution is to use iconv, with the understanding that you have to know what encoding your system wchar_t uses (UTF-16 on Windows, UTF-32 on many Unix systems, for example).

If you want to minimise your dependency on third-party code, you can also hand-roll your own UTF-8 converter. This is easy if converting to UTF-32, somewhat harder with UTF-16 because you have to handle surrogate pairs too. :-P Also, you must be careful to reject non-shortest forms, or it can open up security bugs in some cases.

4 Comments

You're suggesting converting the jstring to UTF-8 then back to UTF-16? Is that really necessary?
@Rup jstrings already are UTF-8: "The JNI uses modified UTF-8 strings to represent various string types. Modified UTF-8 strings are the same as those used by the Java VM. Modified UTF-8 strings are encoded so that character sequences that contain only non-null ASCII characters can be represented using only one byte per character, but all Unicode characters can be represented.....The Java VM does not recognize the four-byte format of standard UTF-8; it uses its own two-times-three-byte format instead."
@b1naryatr0phy Really? jni.h on my system (both 1.6 and 1.7) has typedef unsigned short jchar; which looks more like UTF-16 to me.
I must be misunderstanding something then, that quote was pulled directly from Oracle's documentation: http://docs.oracle.com/javase/1.5.0/docs/guide/jni/spec/types.html Feel free to explain if you can, I'm still trying to wrap my head around this.
3

I know this was asked a year ago, but I don't like the other answers so I'm going to answer anyway. Here's how we do it in our source:

wchar_t * JavaToWSZ(JNIEnv* env, jstring string) { if (string == NULL) return NULL; int len = env->GetStringLength(string); const jchar* raw = env->GetStringChars(string, NULL); if (raw == NULL) return NULL; wchar_t* wsz = new wchar_t[len+1]; memcpy(wsz, raw, len*2); wsz[len] = 0; env->ReleaseStringChars(string, raw); return wsz; } 

EDIT: This solution works well on platforms where wchar_t is 2 bytes, some platforms have a 4 byte wchar_t in which case this solution will not work.

6 Comments

This solution is wrong. I sucked 12 hours because of that. wchar_t and jchar are not necessary the same. The proof for that is the output of my test program: 01-26 20:28:43.675: E/[LMI-NATIVE](9280): len: 7, jchar: 2, wchar: 4
@Kobor42 - What does your test program do? Are you saying that you found an instance where wchar_t was 4 bytes? I didn't actually realise it but this function was designed to run (primarily) on Windows where wchar_t is always 2. I now realise wchar_t is compiler specific and may be different on your platform.
Exactly. On Android prior 2.1 wchar_t is 1 byte. 2.1 and after is 4 bytes.
You're mixing potentially incompatible types. A Java jchar is always UTF-16. But wchar_t is not always UTF-16, sometimes it is UTF-32. In such cases you need to convert UTF-16 to UTF-32 (it's NOT just a matter of padding jchar to 4 bytes, see en.wikipedia.org/wiki/UTF-16 for details).
I'm not mixing it. NDK is mixing it. I would like to convert java strings wihout information loss to c strings.
|
1

If we are not interested in cross platform-ability, in windows you can use the MultiByteToWideChar function, or the helpful macros A2W (ref. example).

Comments

0

Just use env->GetStringChars(myString, 0); Java pass Unicode by it's nature

Comments

0

Rather simple. But do not forget to free the memory by ReleaseStringChars

JNIEXPORT jboolean JNICALL Java_TestClass_test(JNIEnv * env, jobject, jstring string) { const wchar_t * utf16 = (wchar_t *)env->GetStringChars(string, NULL); ... env->ReleaseStringChars(string, utf16); } 

Comments

0

I try to jstring->char->wchar_t

char* js2c(JNIEnv* env, jstring jstr) { char* rtn = NULL; jclass clsstring = env->FindClass("java/lang/String"); jstring strencode = env->NewStringUTF("utf-8"); jmethodID mid = env->GetMethodID(clsstring, "getBytes", "(Ljava/lang/String;)[B"); jbyteArray barr = (jbyteArray)env->CallObjectMethod(jstr, mid, strencode); jsize alen = env->GetArrayLength(barr); jbyte* ba = env->GetByteArrayElements(barr, JNI_FALSE); if (alen > 0) { rtn = (char*)malloc(alen + 1); memcpy(rtn, ba, alen); rtn[alen] = 0; } env->ReleaseByteArrayElements(barr, ba, 0); return rtn; } jstring c2js(JNIEnv* env, const char* str) { jstring rtn = 0; int slen = strlen(str); unsigned short * buffer = 0; if (slen == 0) rtn = (env)->NewStringUTF(str); else { int length = MultiByteToWideChar(CP_ACP, 0, (LPCSTR)str, slen, NULL, 0); buffer = (unsigned short *)malloc(length * 2 + 1); if (MultiByteToWideChar(CP_ACP, 0, (LPCSTR)str, slen, (LPWSTR)buffer, length) > 0) rtn = (env)->NewString((jchar*)buffer, length); free(buffer); } return rtn; } jstring w2js(JNIEnv *env, wchar_t *src) { size_t len = wcslen(src) + 1; size_t converted = 0; char *dest; dest = (char*)malloc(len * sizeof(char)); wcstombs_s(&converted, dest, len, src, _TRUNCATE); jstring dst = c2js(env, dest); return dst; } wchar_t *js2w(JNIEnv *env, jstring src) { char *dest = js2c(env, src); size_t len = strlen(dest) + 1; size_t converted = 0; wchar_t *dst; dst = (wchar_t*)malloc(len * sizeof(wchar_t)); mbstowcs_s(&converted, dst, len, dest, _TRUNCATE); return dst; } 

Comments

0

Here is how I converted jstring to LPWSTR.

const char* nativeString = env->GetStringUTFChars(javaString, 0); size_t size = strlen(nativeString) + 1; LPWSTR lpwstr = new wchar_t[size]; size_t outSize; mbstowcs_s(&outSize, lpwstr, size, nativeString, size - 1); 

Comments