Why is the term "string" so often abbreviated as "sz"?

Question

A pattern I have noticed in many big C and C++ programs - including Microsoft Windows (REG_SZ type in Registry) and Valve's Source SDK (names of practically every string variable) - is that "sz" is used as an abbreviation for "string". Why is it "sz"? What does the "z" stand for? Where did this naming convention originate

Note that some of the code you are looking at in the Win32 API is 25-30 years old; things have changed considerably in the meantime; the ISO C++ Guidelines explicitly discourage using this notation, describing it as "generally unnecessary and actively harmful in a strongly statically-typed language like C++" (Source: isocpp.github.io/CppCoreGuidelines/… ) — Ben Cottrell
– Ben Cottrell, Commented Dec 22, 2023 at 8:56

GrandmasterB · Accepted Answer · 2023-12-22 05:55:46Z

62

Its a variable naming convention called Hungarian Notation. It was common in the 90's, and notably used a lot by Microsoft in their Windows API docs. The idea is to prefix variables with hints about their type. The exact prefixes used varied, but you might see a variable containing a count called nCount for integers, fCount for floating point values, etc.

The sz stands for 'zero (null) terminated string'. Basically a character array that ends in a null. You'll also see 'lpsz' used, 'long pointer to a string'.

If your first reaction is to say why was this needed when you can just hover over a variable in your IDE and see its type, you have to understand that such IDEs did not always exist.

A lot of people still use the general idea of Hungarian Notation these days, but rarely as rigidly as it was used in the past.

edited Dec 22, 2023 at 5:55

answered Dec 22, 2023 at 5:47

GrandmasterB

39.4k7 gold badges84 silver badges137 bronze badges

33

Regarding the IDE recognition, it's worth pointing out that in C, there is no data type for "zero-terminated string"; the actual data type is generally a char pointer. Such a pointer could also be used for a non-terminated string, or as an actual pointer to a single character while iterating. So "sz" is actually giving additional information not present in the type system, which was the original intention of Hungarian Notation.

IMSoP
– IMSoP

2023-12-22 10:53:07 +00:00
Commented Dec 22, 2023 at 10:53
38

"The idea is to prefix variables with hints about their type." – More precisely, the idea is to prefix variables with semantic information which isn't captured by the type.

Jörg W Mittag
– Jörg W Mittag

2023-12-22 11:49:18 +00:00
Commented Dec 22, 2023 at 11:49
9

@JörgWMittag That was the original intention, but it was misunderstood by many almost from the beginning… (Exemplified by the difference between Apps Hungarian and Systems Hungarian, as explained on [Wikipedia](DW\ 22-12\ Revelation\ Of\ The\ Daleks\ part\ 1.mp4).)

gidds
– gidds

2023-12-22 21:47:46 +00:00
Commented Dec 22, 2023 at 21:47
9

@gidds While I'm sure the Daleks had something to say about notation, I suppose you meant to link to this page.

Corrodias
– Corrodias

2023-12-23 11:22:40 +00:00
Commented Dec 23, 2023 at 11:22
18

I can’t be the only one who read, “Its a variable naming convention called Hungarian Notation”, and immediately thought, “Oh, so it stands for sztring!”.

Janus Bahs Jacquet
– Janus Bahs Jacquet

2023-12-23 15:49:05 +00:00
Commented Dec 23, 2023 at 15:49

| Show 5 more comments

Giacomo1968 · Accepted Answer · 2023-12-23 16:05:24Z

As @GrandmasterB says in their answer, this is "Hungarian Notation".

The term "Hungarian" originates from Charles Simonyi, a computer programmer of Hungarian background, who worked at Xerox and later Microsoft. He conceived and popularised this particular system of name prefixing.

The sz prefix means a zero-terminated string.

In both of the two main character encoding schemes, ASCII (for PC) and EBCDIC (for mainframes), the zero value means the Null character, hence the alternative (and probably more common) term, null-terminated string.

Terminated strings and string arrays are common in the Windows API and in the C programming language more generally, and so there is a third common term for this, "C-style strings".

The main alternative to null-termination is called length-prefixing, also called "Pascal-style strings", because the Pascal language used this approach for its native concept of strings.

Length-prefixing is where a value is stored in memory before the string, indicating the length of the string. My understanding is that the Hungarian prefix for this type of string is st, although I've never myself encountered this prefix.

There are different pros and cons to each approach, but the tendency of modern languages is to employ length-prefixing of strings.

Then, in Microsoft's COM technology, you have the BSTR, which is a string that is both length-prefixed and null-terminated! The common Hungarian-style prefix for this was bstr.

The bottom line is that there are a lot of different approaches to encoding strings, depending on language and context, without a specific standard.

In terms of facilities to cope with such variety, languages like C are now very old, and it was not designed with a very sophisticated type system. The typing system is so basic in C that any kind of string is considered a "complex" data type cobbled together from its elements, so although C is capable of handling all kinds of string encoding, there aren't built-in types for any of them.

C was also designed in an era before modern IDEs and when source code was often printed on listing paper, so there are fewer alternatives for coping except to encode things in the text of the source code.

Under those circumstances, a system of naming prefixes is absolutely necessary to stay on top of which data types you are dealing with, if you're in an environment where one particular approach to strings cannot simply be assumed as universal.

The Windows API itself is now of a similar vintage as the C language, and Windows was originally written predominantly in x86 assembly language and in C.

And Simonyi worked at Microsoft, and therefore was in a position to influence how they coped with the programming challenges of the era - including the fact that programming at Microsoft consisted of more than just Windows and its conventions.

That is basically why the use of null-terminated strings, and the use of Hungarian Notation, is widespread there.

It has nothing to do with C or IDEs. The notation was intended for BCPL (the predecessor to C), which lacks C's data types and does not distinguish between integers, pointers, etc. Variables in BCPL are defined in terms of machine words, so the notation is intended to encode semantics which don't exist in the language. Prefixes for primitive C or C++ data types have never been necessary because compilers do the job. It's more an accident of history that this was lost in translation at Microsoft, where the notation was adopted and popularised for Win32. — Ben Cottrell
– Ben Cottrell, Commented Dec 22, 2023 at 14:26
@BenCottrell, the OP specifically asked about C and about Windows! I'm obviously not saying C was the first language to lack adequate types, or that Simonyi was the first to use a system of prefixes. A string is not a "primitive" in C. I'm also surprised that you say it's nothing to do with IDEs - clearly, modern language IDEs are capable of providing various information that once had to be encoded explicitly in text for the ergonomics of the programmer. — Steve
– Steve, Commented Dec 22, 2023 at 14:42
@BenCottrell You might be interested in apps Hungarian, which is about making it clear what the value is, not what type it has. For instance, if you end up adding xwFoo + cbBar, you may start to suspect that you've done something wrong because it doesn't usually make much sense to add a buffer size to a horizontal screen coordinate. Labeling them both as i for integer, of course, tells you nothing the compiler can't. — OpenAI was the last straw
– OpenAI was the last straw, Commented Dec 22, 2023 at 16:25
@BenCottrell, in C, the question of whether the string is length-prefixed or sentinel-terminated is not known to the compiler, and anyway the more important question is not whether the compiler knows the data type but whether the programmer can easily read and distinguish the data type in context. I can easily imagine Microsoft had programmers working in circumstances where it was necessary to constantly rehearse as part of the variable names the subtle differences in data types, because occasionally I encounter such situations myself, where there's no hope of keeping track otherwise. — Steve
– Steve, Commented Dec 22, 2023 at 23:03
@BenCottrell "humans have no need to check the types as the compiler will do that job." You've either got an amazing memory, or never accidentally added two shorts into a short and watched it overflow. — RonJohn
– RonJohn, Commented Dec 23, 2023 at 5:39

Stack Exchange Network

Why is the term "string" so often abbreviated as "sz"?

2 Answers 2

Hot Network Questions

Why is the term "string" so often abbreviated as "sz"?

2 Answers 2

Related

Hot Network Questions