25
$\begingroup$

While scrolling through a Rust syntax cheat sheet today, I noticed an odd item:

struct S; // Define zero sized unit struct. Occupies no space, optimized away. 

This seems like a strange thing to allow; Rust already has a zero-size unit type, () (which does have uses), and I can't think of many situations where a zero-size struct would provide an advantage over that. Perhaps in a language with a type system that had non-tagged unions, you could do something like let dog_breed: Schnauser | Corgie = Schnauser {}; to implement a sort of enum pattern. This would still seem pretty useless though, especially in a language which already has enums/tagged unions.

What justification do existing languages with this feature provide, and/or what purposes could it serve?

$\endgroup$
8
  • 1
    $\begingroup$ Follow up question: How would a zero-size struct be different from void in practice? $\endgroup$ Commented Jul 1, 2023 at 6:04
  • 5
    $\begingroup$ @user16217248-OnStrike void in C is not inhabited in the same sense: an expression or function has type void but you cannot construct and pass around a value of type void. In Rust, a ZST mostly behaves like other types: you can have a value of type () or any other ZST. A !Copy ZST has the same ownership semantics of other types. $\endgroup$ Commented Jul 2, 2023 at 0:25
  • 1
    $\begingroup$ Are you asking about why it is allowed for a struct to have no members (like in C++), or are you asking about why it has zero size? $\endgroup$ Commented Jul 2, 2023 at 14:19
  • $\begingroup$ @user3840170 For this question, both for the most part. I'm used to making the assumption zero members == zero size, so I kind of worded the question around that, but answers talking about non-zero-size ones would still probably be useful tho. $\endgroup$ Commented Jul 2, 2023 at 16:01
  • 14
    $\begingroup$ Why would a language not allow zero-size structs? $\endgroup$ Commented Jul 2, 2023 at 22:47

14 Answers 14

39
$\begingroup$

Allowing zero-sized structs makes things easier for automatically-generated code, because the code generator doesn't need to treat this as a special case that it must avoid. Similar considerations apply to zero-length arrays, functions with no variables, etc.

Where many people notice this kind of problem is when they generate SQL, and they're generating WHERE column IN (<list of values>) dynamically. SQL doesn't allow the degenerate case of an empty list; if their source data has an empty list, they have to take special measures to address this (one approach I've used is to add an "impossible" value to the source list before filling it into the template, another is to conditionalize adding this condition to the WHERE clause).

$\endgroup$
6
  • 1
    $\begingroup$ On the contrary, zero-sized data adds a ton of special-cases to your average compiler. The Rustonomicon has an entire page dedicated to the special-casing that Vec has to have in order to work correctly with ZSTs. $\endgroup$ Commented Jul 1, 2023 at 4:30
  • 23
    $\begingroup$ I'm not talking about compilers, I'm talking about programs that generate structure definitions automatically. $\endgroup$ Commented Jul 1, 2023 at 4:31
  • 2
    $\begingroup$ Ah, that's a fair point. The C/C++ approach of "allow empty types, but make them one byte" is a compromise position, but your argument is still completely valid. $\endgroup$ Commented Jul 1, 2023 at 4:41
  • 7
    $\begingroup$ This is particularly important for Rust since it has strong support for macros which can generate code, including struct definitions. $\endgroup$ Commented Jul 1, 2023 at 16:16
  • 2
    $\begingroup$ I've seen and written code that has conditions of the form 'where 1=1' or 'where 1=0' just to have a generic way to build the structure. It is definitely useful if the language allows these use cases without such work arounds. $\endgroup$ Commented Jul 2, 2023 at 8:08
23
$\begingroup$

For use in generics

Consider code like this:

trait Message { fn print(&self) -> String } fn print_message<T: Message>(message: T) { print!("{}", T.print()); } struct EmptyMessage; impl Message for EmptyMessage { fn print(&self) -> String { "".to_string() } } 

In this case, EmptyMessage isn't actually really a struct, since it has no size. It's just used to indicate a certain behavior.

At runtime the entire struct will be completely optimized away, yet it's useful so we don't need to define separate functions depending on if the type actually needs any data to do it's work or no.

For a look at more advanced cases check out the Bevy game engine, which extensively uses unit structs as parameters to generics, even if the struct doesn't even implement any custom methods.

$\endgroup$
18
$\begingroup$

If it has methods or computed properties

Consider this most basic of SwiftUI views:

struct ExampleView: View { var body: some View { Text("Hello, World!") } } 

It has "zero" size, but its vtable is important.

$\endgroup$
5
  • 3
    $\begingroup$ Then the struct can't really have zero size, since it needs to store a pointer to the vtable or some runtime type information. $\endgroup$ Commented Jul 1, 2023 at 22:55
  • 2
    $\begingroup$ @dan04 There’s a reason I put “zero” in quotes. MemoryLayout<ExampleView>.size is zero but the second you put it in an any View it takes 40 bytes. $\endgroup$ Commented Jul 1, 2023 at 22:59
  • 5
    $\begingroup$ This applies to true zero-size types too, in languages that allow the right sorts of polymorphism. In C++, a captureless lambda would be a zero-sized type, if the language didn't have a 1 byte minimum. The type is useful since it has a operator() method attached. Lambdas are used with templates, so the method doesn't need to be virtual and no vtable is needed. In Haskell, you can make a data type with no fields an instance of a class. The class methods are attached to the type, but not the values, since Haskell visibly separates "vtables" from their data. $\endgroup$ Commented Jul 2, 2023 at 16:50
  • 5
    $\begingroup$ @dan04 A pointer to type information is only necessary if it's used in a place that handles it dynamically/polymorphically. When used in a context that has the exact concrete type (e.g. if you have a function that returns ExampleView and not any View), then the compiler knows where to lookup methods without needing it stored in the instances. $\endgroup$ Commented Jul 2, 2023 at 22:27
  • 1
    $\begingroup$ @HTNW: Note that C++ kinda support zero-sized types from C++20 on with [[no_unique_address]], comparators, hashers, allocators, etc... are now often zero-sized data-members, even if measured separately their size is reported as 1. $\endgroup$ Commented Jul 3, 2023 at 7:26
9
$\begingroup$

Regularity

A number of specific arguments have been provided, but I believe that a more generic argument applies here, and in many other situations: regularity is good.

For example, if we take a (relatively) immature language such as Rust, you'll notice that part of the difficulties for newcomers are that some things are available in some contexts, but not others.

Commas are easy:

// A trailing comma is allowed after the last argument of a function. fn foo(a: A, b: B,); // A trailing comma is allowed after the last argument of a call. foo(a, b,); // A trailing comma is allowed after the last field of a struct. struct Foo { a: A, b: B, } 

Generics are getting better:

// A type alias can be generic. type InlineVec<T, const N: usize> = Vec<T, InlineSingleStore<T, N>>; trait X { // An associated type can only be generic from 1.65 forward (yeah!). type Inline/*<T, const N: usize>*/; } 

Compile-time callable (const) are still iffy:

// A free-function can be marked `const`, and evaluated at compile-time. const fn foo() -> i32; impl S { // An associated function of a struct can be marked `const`. const fn bar(&self) -> i32; } trait X { // An associated function of a trait cannot be marked `const`. /*const*/ fn baz(&self) -> i32; } // Instead, a trait _implementation_ can be marked `const`, // but only if the trait declaration was tagged `#[const_trait]`. impl const X for S { fn baz(&self) -> i32 { 4 } } 

There are good reasons for this -- design and implementation complexity, notably -- however from a user point of view it's jarring. It's a myriad rules to remember.

Regularity, or Orthogonality, as in the ability to apply a concept everywhere it is sensible to, makes for a simpler language to use.

Thus, the question should be: Why would a language not allow X?

(ie, what advantage is there is NOT providing X?)

$\endgroup$
2
  • $\begingroup$ Welcome! I'd suggest avoiding ~~ and other meta-syntax that isn't part of the language you're demonstrating in code blocks, since this is likely to confuse people who aren't familiar with the language. $\endgroup$ Commented Jul 3, 2023 at 13:12
  • $\begingroup$ @kaya3-supportthestrike: Replaced with inline comment syntax. Gets better highlighting too so it's a win :) $\endgroup$ Commented Jul 3, 2023 at 13:29
8
$\begingroup$

Out of symmetry, if it has templates that allow you to pass void as an argument. In that case, you'd end up with means to create zero-sized structs anyway. Also, you can use them as markers for variable length content provided by subtypes, if the language has such types.

$\endgroup$
1
  • 6
    $\begingroup$ Very much this. Don't discount the value of "we would have to add a rule to explicitly disallow it, and that would require justification". $\endgroup$ Commented Jul 3, 2023 at 2:39
7
$\begingroup$

C99 allows for a Flexible array member to allow a variable length array to be allocated at the last member of a structure. This can simplify dynamic allocation of a structure which ends with a variable length array since the compiler will adding to ensure the flexible array type to have the correct starting alignment.

GCC allowed Arrays of Length Zero as extension prior to the C99 Flexible array member.

$\endgroup$
0
7
$\begingroup$

Here's one that doesn't actually apply to Rust but which nearly applies to Rust.

As a compile-time token, to help the programmer manage lifetimes or satisfy other ordering constraints. Here's some pseudocode:

type Token = private {} fun obtainToken() : Token = // some side-effectful stuff here, then… new Token {} fun doGuardedThing(_token: Token) = // some stuff that is guaranteed only to happen // if you've called `obtainToken` already 

Sure, it's not exactly common to have literally no data to pass around in the token, but I've certainly wanted to do this before. If you've only got one zero-sized type (the unit type), then you can't guarantee that someone obtained your token through your desired API: any caller is free to magic up a value of type () anyway.

You can instead solve this problem with a single-case enum, but that takes space at runtime unless you're holding a zero-sized type anyway, which just punts the problem down the line: enum cases have the same visibility as their containing type, in Rust.

(The reason this doesn't apply to Rust in real life is because you can actually construct a Foo {} even if pub struct Foo = {}. The workaround, I guess, is pub struct Foo = { garbage: () }, or a PhantomData. I'm really considering a hypothetical language which had something more like F#'s visibility modifiers.)

$\endgroup$
1
  • $\begingroup$ The workaround of pub struct Foo = { garbage: () } does work in Rust - and is a zero-sized token that thus only exists at compile time and not at runtime. $\endgroup$ Commented Aug 15, 2024 at 16:39
7
$\begingroup$

Having named zero-sized types is useful to create handles to resources that are fixed at compile-time. This pattern comes up a lot in embedded systems / bare-metal programming, for example. The system has a fixed amount of resources and peripherals, often at fixed addresses or specific CPU instructions; a library that provides bindings to the peripheral can create a handle to represent that resource.

For example:

pub struct Rng { _private: (), } 

The _private field makes it impossible for library users to construct it, so they must obtain the handle through a constructor function in the peripheral library. The peripheral library can use this to track how many handles exist. For example, it's often used to guarantee unique "ownership" of the resource, by only providing one handle to the library user:

impl Rng { pub fn take() -> Option<Self> { // This doesn't quite work; simplified for demonstration purposes static mut TAKEN = false; if TAKEN { None } else { TAKEN = true; Some(Self { _private: () }) } } } 

The handle to the peripheral doesn't need to store any runtime references, since the peripheral address is known at compile-time. Most of the time, it also doesn't need to store any state - it just provides functions to conveniently, directly access the peripheral's I/O:

const RNG_ADDR: *const u32 = 0x48021800 as *const u32; impl Rng { pub fn read(&mut self) -> u32 { unsafe { RNG_ADDR.read_volatile() } } } 

Therefore, it only makes sense that it takes up no memory. It's just a marker used at compile-time to check who has control over that resource at that point in the program.

For a practical example, see the stm32f0 crate. Most peripheral-access crates (PACs) look like this. The entrypoint is usually Peripherals::take(), which provides the Peripherals collection exactly once to the first caller.

$\endgroup$
6
$\begingroup$

There are many good examples in the other responses about why this might be a reasonable thing to do in some cases, so I won't repeat those.

But I'd like to add that I think the premise of the question has it backwards. You ask:

This would still seem pretty useless though ... what justification do existing languages with this feature provide?

This reminds me of an old joke: "The difference between Country A and Country B is that in A, everything that is not explicitly permitted is forbidden, whereas in B, everything that is not explicitly forbidden is permitted."

Your question presumes that the right default for language designers is that supporting an interaction of features has to be justified, rather than requiring justification to prohibit an interaction of features. There are several reasons this premise is backwards.

  • Lack of imagination. One can find many, many examples of prohibitions in complex systems that are based on naive notions of "no one would ever need that" or "that's always a mistake."

  • Complexity. Languages that have lots of rules that constrain interactions between features tend to have more accidental complexity, which increases cognitive load on developers.

As an example, imagine you have a language with properties, and a defined concept of "getter" and "setter". You will surely find someone who thinks the combination of "setter but no getter" (or "public setter, private getter") is silly enough to be outlawed, but that guy is almost always wrong. You do your users no favors by baking "style guide" opinions into the language.

So, it is great to ask the question "are there good uses for zero-sized structs?" (And as the other answers show, there are.) But we should be on the lookout for coupling that with "and, if not, shouldn't we restrict them?"

Language designers should give users tools that work together in predictable and useful ways. Not all combinations are as useful as others; that's fine.

$\endgroup$
2
  • $\begingroup$ There are, of course, different philosophies of language design. Some think "If we give people enough rope to hang themselves, that's going to happen, and that's bad." The security implications of buffer overflows are a classic case in point. $\endgroup$ Commented Apr 18 at 20:09
  • $\begingroup$ @Barmar True, but not really relevant to this discussion. There is a universe of difference between "we do not support this feature because it undermines safety" and enforcement of stylistic concerns ("I think this combination of features is silly, so I will outlaw it.") Most "ordinary programmers" who try their hand at language design tend to initially fall into the trap of using the language specification as a stylistic enforcement mechanism. $\endgroup$ Commented Apr 21 at 17:33
4
$\begingroup$

Zero-sized structs could be used to force changes to the alignment of members of a containing struct. This is more likely when the object is a primitive and not an object.

struct { type_a a[2]; // Potential padding here type_b b[0]; // Potential padding here type_c c[10]; } d; 
$\endgroup$
5
  • $\begingroup$ In the example, I don't see where a zero-sized struct might come in. $\endgroup$ Commented Jul 2, 2023 at 13:05
  • $\begingroup$ @PaŭloEbermann This is more of a zero sized array than a zero sized type. The alignment requirements of type_b still affect d, even though b[] has zero size. $\endgroup$ Commented Jul 2, 2023 at 15:18
  • $\begingroup$ @chux-ReinstateMonica But in what ways does it do that? I'm not familiar enough with C's struct layout rules to see how this would be any different without b $\endgroup$ Commented Jul 2, 2023 at 16:03
  • $\begingroup$ @RydwolfPrograms C does not allow 0 sized arrays. This example surmises what would happen if it did. Suppose alignment requirements were 1, 4, TBD for types: type_a, type_b, type_c and size of .a is 1*2. With type_b b[0]; the member .c cannot start on a +2 address as .b needs a multiple of 4, even though it is size 0 forcing 2 padding bytes between .a and .b. Now the padding requirement for .c to .b is determined. If .c alignment is 1, then with no .b, there would be no padding at all from .a to .c. $\endgroup$ Commented Jul 2, 2023 at 18:51
  • $\begingroup$ @RydwolfPrograms: For example, take int32_t a[1]; int64_t b[0]; int16_t c[2];. Without b, we could store c immediately after a, because 32-bit alignment is adequate for 16-bit alignment. But with b, we must leave 32 bits of padding to reach a 64-bit boundary first, so that b is aligned; and we must also pad after c so that the end of the structure is 64-bit–aligned. $\endgroup$ Commented Jul 2, 2023 at 19:47
4
$\begingroup$

For when you need a value, but don't care about the actual value

Go allows empty structs (struct{}). Its main use is with the built-in map type for imitating sets, where the value of the map isn't needed:

eles := []string{ ... } set := make(map[string]struct{}) for _,e := range eles { if _,ok := set[e]; !ok { set[e] = struct{} // add the element to the set } } for k,_ := range set { // do stuff with k, which is a unique element of eles } 

and with channels, as a signal with a small size:

import ("time";"fmt") signal := make(chan struct{}) // read-write channel go func() { time.Sleep(2*time.Second) signal <- struct{} // tell the outside world we're done } for { select { case <-signal: fmt.Println("Waited 2 seconds") break // exit loop default: fmt.Println("Waiting...") time.Sleep(time.Second / 2) } } 
$\endgroup$
3
  • 2
    $\begingroup$ The idiom for that, in languages like Rust, is to use an empty tuple rather than an empty struct. $\endgroup$ Commented Jul 1, 2023 at 4:35
  • 1
    $\begingroup$ @Bbrk24 sure, but Go doesn't have a tuple type, and in any case tuples are just a degenerate case of structs where the elements can't have meaningful names. If there are no elements, that distinction vanishes, so an empty tuple and an empty struct are really the same thing :) $\endgroup$ Commented Jul 2, 2023 at 18:45
  • $\begingroup$ In Swift, tuples and structs are actually quite different (and actually, I do mention empty types in that post). $\endgroup$ Commented Jul 3, 2023 at 1:17
4
$\begingroup$

For trait implementations

They can be useful for getting a concrete implementation of a trait. For example, see this example in the Rust design patterns book:

// The data we will visit mod ast { pub enum Stmt { Expr(Expr), Let(Name, Expr), } pub struct Name { value: String, } pub enum Expr { IntLit(i64), Add(Box<Expr>, Box<Expr>), Sub(Box<Expr>, Box<Expr>), } } // The abstract visitor mod visit { use ast::*; pub trait Visitor<T> { fn visit_name(&mut self, n: &Name) -> T; fn visit_stmt(&mut self, s: &Stmt) -> T; fn visit_expr(&mut self, e: &Expr) -> T; } } use visit::*; use ast::*; // An example concrete implementation - walks the AST interpreting it as code. struct Interpreter; impl Visitor<i64> for Interpreter { fn visit_name(&mut self, n: &Name) -> i64 { panic!() } fn visit_stmt(&mut self, s: &Stmt) -> i64 { match *s { Stmt::Expr(ref e) => self.visit_expr(e), Stmt::Let(..) => unimplemented!(), } } fn visit_expr(&mut self, e: &Expr) -> i64 { match *e { Expr::IntLit(n) => n, Expr::Add(ref lhs, ref rhs) => self.visit_expr(lhs) + self.visit_expr(rhs), Expr::Sub(ref lhs, ref rhs) => self.visit_expr(lhs) - self.visit_expr(rhs), } } } 

Note the zero-sized Interpreter struct in the code. It is solely used for the implementation and instantiation of the Visitor trait.

$\endgroup$
2
$\begingroup$

Consider Option types.

Some a struct with a backed field. None is a zero size struct.

The usefulness of None is that it can participate in type casts.

A language can recognize that zero-size structs can be singleton, so you could distinguish Some from None via a pointer check.

Some runtimes instead store out-of-band type information when a struct is up-cast to an abstract type. Go for example does this so that it can check that down-casts from interface types are safe.

$\endgroup$
4
  • $\begingroup$ This is only sort of an answer, since this same functionality can be achieved with a zero-size tuple/unit type, and I'm more asking about why a language would specifically allow something along the lines of struct X; $\endgroup$ Commented Jul 5, 2023 at 14:42
  • $\begingroup$ @RydwolfPrograms Sorry, I misunderstood. I interpreted "struct type" as product type. But unless your type system allows great flexibility with union types, you still need some kind of common super-type for Some and None? $\endgroup$ Commented Jul 5, 2023 at 15:12
  • $\begingroup$ Isn't that normally done with a tagged union? Like None would be 00 with some padding, and Some would be 01 followed by the data? (With things like null pointer optimization if the data is a pointer) $\endgroup$ Commented Jul 5, 2023 at 15:14
  • $\begingroup$ It often is, and often tagged unions are syntactic sugar for a group of struct type definitions that have with some tag bits at a well known offset. Doing it that way means that a tagged union like type ThreeValueLogic = Unknown | Known(Bool) can benefit from struct layout optimizations to pack the single tag bit with the single bit for the bool into one byte so the sizeof the union as a whole is 1 instead of treating tag bits as some special out-of-band thing. (Though in this case you can also enumerate the possible values) $\endgroup$ Commented Jul 5, 2023 at 15:32
1
$\begingroup$

Meta Typing

Another reason i didn't found in the answers are zero cost meta typing. I will use meta code to better show it because even though it is rust specific, the question are a general one.

lets take a ML (meta language) like language.

type 'T Stream = { ty: 'T stream: IOhandle // pseudonym for any input point from the kernel. } // assumed zero types type Readable = {} type Writable = {} let Read (src: Readable Stream) = src.stream.Read() let Write (dst: Writable Stream) item = dst.stream.Write item 

In this example, we use zero size types to enforce code behavior into type checking. since Readable and Writable both are zero types, we end up with two types that by the compiler is different in types, hence we can ensure by type that we not by mistake passes a stream that is readable into the write function. But also because the two Readable and Writable are zero types, they are dropped afterward by the compiler in optimization phases, because the information they hold are purely type/expected behavior based.

This is very useful when coding programs which need to be secure or has a very low tolerance of error in end product. It will eliminate a lot of common error not normally addressed by the type system or check for by the compiler. This leads to less need of testing, because the compiler by compiling the code without error are proof that it will behave correct, since behavior are encoded into the types.

$\endgroup$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.