Windows C++ multibyte / unicode

Question

In Windows C++ multibyte / unicode considerations, I notice that _tcslen() and lstrlen() both provide a string's length correctly regardless of whether you're compiling to multibyte or unicode.

_tcslen() is defined in TCHAR.H based on the def _UNICODE and lstrlen() is defined in WINBASE.H based on the def UNICODE.

Did someone just re-invent the wheel at some point or is there a reason for this apparent duplication?

No, those are macros that map to either ANSI (char)or multibyte (wchar) functions and types. There's no duplication and as for re-invent those macros were there since the early 90s to allow people to compile the same code for ANSI or Unicode. Remember, OSs didn't use Unicode before Windows NT. C++ only added Unicode in 2011 with the char16_t, char32_t,std::u16string and std::u32string types. — Panagiotis Kanavos
– Panagiotis Kanavos, Commented Nov 17, 2017 at 9:56
In any case, you shouldn't mix up C functions and types like lstrlen or` _tcslen` in C++. You should use the language's types and algorithms. Use std::string.length() or u16string.length(). You should use auto for type inference. Use and pass references and smart pointers with unique_ptr instead of raw pointers that can easily leak. Iterators instead of pointer arithmetic — Panagiotis Kanavos
– Panagiotis Kanavos, Commented Nov 17, 2017 at 9:59
One set of functions (macros) is part of the language runtime, the other set is part of the operating system. Neither is useful if you know beforehand what kind of character set your program is using. — Bo Persson
– Bo Persson, Commented Nov 17, 2017 at 11:26

Barmak Shemirani · Accepted Answer · 2017-11-18 07:49:14Z

lstrlen and other lstrxxx functions are Windows APIs. They have ANSI and Unicode versions lstrlenA and lstrenW. They had some advantages over strlen back in the days of Windows 3.1. They don't have any advantages now. If you write your code with these functions it will not be compatible with any standard in C, it will only compile in Windows.

_tstrlen is a macro for either ANSI strlen or Unicode wcslen which are standard C. This was useful in the 1990s and early 2000s because you could write one set of code which could be compiled in ANSI for Windows 95, and Unicode for Windows NT. It's also useful because Microsoft can write one set of documentation for both ANSI and Unicode.

Otherwise these _tstrxxx string macros and TCHAR etc. are no longer useful. There is no point to go through all this pain to write code which is compatible for Windows 95. You can just use "wide c-string" wcsxxx functions which are standard C.

But then, the *nix operating systems use UTF-8 and strxxx functions. As opposed to Windows which uses UTF-16 and wcsxxx. I suppose you could use the _tstrxxx macros to write code which is Unicode compatible in both *nix and Windows. Other programmers will be confused by your _tstrxxx macros but at least the code has a better chance to compile!

The reason I'm using these dual use macros is because I'm adding a new DLL to a Windows c++ project containing ~ 1000 DLLs all built to multibyte. My DLL may be reused in the future in a Unicode project.

Collectives™ on Stack Overflow

Windows C++ multibyte / unicode

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related