1

I've been searching for hours today and just can't find anything that works out for me. The one I've just had a look at, with no luck, is "How to convert UTF-8 encoded std::string to UTF-16 std::string".

My question is, with a brief explanation:

I want to make a valid NTLM hash in std C++, and I'm using OpenSSL's library to create the hash using its MD4 routines. I know how to do that, so does anyone know how to convert the std::string into a UTF-16 LE encoded string which I can pass to the MD4 functions to get a correct digest?

So, can I have a std::string which holds the char type, and convert it to a UTF16-LE encoded variable length std::string_type? Whether that be std::u16string, or std::wstring?

And would I use s.c_str() or s.data() and would the length() function report correctly in both cases?

4
  • 1
    Your title question is clear, your question body is not. Are you aware that UTF-16 is still variable-length? That you would hold a UTF-16 string in a std::u16string, not a std::string? -- Could you please focus down the question? It's a bit all over the place right now. Commented Oct 8, 2018 at 13:44
  • Thank you DevSolar. You are right. It's late at night and I'm a bit frustrated, so that came out a bit of a mess. I am aware that UTF16 is variable length, so I'm looking for std::string to std::u16string or std::wstring (if that works). I think the better question is perhaps: can I have a std::string which holds the char type, and convert it to a UTF16-LE encoded variable length std::string_type? Whether that be std::u16string, or std::wstring. Commented Oct 8, 2018 at 13:50
  • About the last question, length() will always correctly return the number of char-type elements in the string object -- char for std::string, char16_t for std::u16string, wchar_t for std::wstring. None of those (necessarily) equals the number of code units / code points, of course. ;-) Commented Oct 8, 2018 at 14:51
  • That have to pass trough this steps utf8 -> mono-entity-unicode -> utf16 . No way you can 8 to 16 without knowing the codepoint. Commented Nov 18, 2019 at 11:28

2 Answers 2

2

I think something like this should do the trick:

std::string utf16_to_utf8(std::u16string const& s) { std::wstring_convert<std::codecvt_utf8_utf16<char16_t, 0x10ffff, std::codecvt_mode::little_endian>, char16_t> cnv; std::string utf8 = cnv.to_bytes(s); if(cnv.converted() < s.size()) throw std::runtime_error("incomplete conversion"); return utf8; } std::u16string utf8_to_utf16(std::string const& utf8) { std::wstring_convert<std::codecvt_utf8_utf16<char16_t, 0x10ffff, std::codecvt_mode::little_endian>, char16_t> cnv; std::u16string s = cnv.from_bytes(utf8); if(cnv.converted() < utf8.size()) throw std::runtime_error("incomplete conversion"); return s; } 

Note: that std::wstring_convert is deprecated in C++17 but I still favor using it rather than a non-standard library given that it is portable, has no dependencies and will no doubt remain until replaced.

And, if all else fails, you can reimplement these same functions with alternative code without changing any other part of the application.

Sign up to request clarification or add additional context in comments.

3 Comments

Hi Galik, thanks very much for taking the time to type this out. I tried it for hours, I googled, I went nuts... didn't work, even though everything was telling me this looked ideal and we were on the right track here. To be honest though, I don't understand the C++ documentation completely for codecvt or any conversions. I'm more of a C programmer who likes to use C++ features whenever possible. I agree about being in favour of using it than a non-standard library. It should be possible.
@JYG On my system this produces UTF-16LE encoding from UTF-8. I am running on an x86 CPU which is littleendian. Are you running on a bigendian system?
@JYG I changed to code to explicitly specify UTF-16le, does that fix the issue?
0

Apologies, firsthand... this will be an ugly reply with some long code. I ended up using the following function, while effectively compiling in iconv into my windows application file by file :)

Hope this helps.

char* conver(const char* in, size_t in_len, size_t* used_len) { const int CC_MUL = 2; // 16 bit setlocale(LC_ALL, ""); char* t1 = setlocale(LC_CTYPE, ""); char* locn = (char*)calloc(strlen(t1) + 1, sizeof(char)); if(locn == NULL) { return 0; } strcpy(locn, t1); const char* enc = strchr(locn, '.') + 1; #if _WINDOWS std::string win = "WINDOWS-"; win += enc; enc = win.c_str(); #endif iconv_t foo = iconv_open("UTF-16LE", enc); if(foo == (void*)-1) { if (errno == EINVAL) { fprintf(stderr, "Conversion from %s is not supported\n", enc); } else { fprintf(stderr, "Initialization failure:\n"); } free(locn); return 0; } size_t out_len = CC_MUL * in_len; size_t saved_in_len = in_len; iconv(foo, NULL, NULL, NULL, NULL); char* converted = (char*)calloc(out_len, sizeof(char)); char *converted_start = converted; char* t = const_cast<char*>(in); int ret = iconv(foo, &t, &in_len, &converted, &out_len); iconv_close(foo); *used_len = CC_MUL * saved_in_len - out_len; if(ret == -1) { switch(errno) { case EILSEQ: fprintf(stderr, "EILSEQ\n"); break; case EINVAL: fprintf(stderr, "EINVAL\n"); break; } perror("iconv"); free(locn); return 0; } else { free(locn); return converted_start; } } 

6 Comments

Link to iconv plus the necessary includes would also improve this answer.
Thanks fritzone! I'd been banging my head for hours trying to get iconv() working until I gave up and came back to have another look :) Thanks very much, now the ntlm hashes are correct every time after the proper conversion. Who cares if it's not "great" code, it works!
@DevSolar this is just a function which I have implemented in one of my really old more experimental projects... which was not very well commented unfortunately since it was in the class of the home grown pet projects.... so I sort of forgot what and why, I just know that it well... works.
Hi DevSolar, I just copied and pasted this in above main() and added inline to the function signature. To use this, #include <iconv.h> and call it like this:
Hi DevSolar, I just copied and pasted this in above main() and added inline to the function signature. To use this, #include <iconv.h> and call it like this: char pass[64]; strcpy(pass, "p4ssw0rd"); size_t used_bytes = 64*3; char unicode_password = conver(pass, strlen(pass), &used_bytes); / Now make an NTLM hash */ MD4_CTX ctx; MD4_Init(&ctx); MD4_Update(&ctx, unicode_password, used_bytes); MD4_Final(message_digest_somewhere, &ctx); Install libiconv, compile: g++ -o program program.cpp -lcrypto -liconv, I've added the lib for openssl functions there too. Also free(unicode_password).
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.