3
$\begingroup$

In C++, templated code are compiled multiple times according to the parameters. But I just realized how restrictive C# generics are:

  • The interfaces you are using on variables in a generic parameter type must be specified on the parameter.
  • There isn't a feature like explicit template specification, to get unrelated types from different types as inputs.
  • The implementable interface members are technically not called virtual, but just like virtual. They need lookups anyway, and there is nothing to optimize.

There is a case that seems to require compiling multiple times, that is to define variables of different sizes. But most types in C# are references of fixed size. For the purpose of this question, let's assume variables are always references of fixed size and ignore other cases in a hypothetical language.

In this case, are there benefits of compiling this kind of generics to multiple instances of target code? If there are, where and why? If there are not, is this the exhaustive list of reasons why it doesn't need recompiling like C++? It's also helpful to know how C# is really implemented.

(They are termed monomorphization and type-erasure as two complication strategies. My question is, in this specific case, do they actually make any differences, to make it even meaningful to choose between something in a strategy?)

$\endgroup$
3
  • 1
    $\begingroup$ I think C# generics are code-generated once for reference types, and once for every value type (or perhaps once for every value-type size, as you say). Hope someone knowledgable (hint hint) will confirm/refute/explain. :-) $\endgroup$ Commented Aug 2, 2024 at 12:07
  • $\begingroup$ The term "recompilation" has a particular meaning, that source code needs to be recompiled, for example, because a dependency on another source file where something significant (like signature) has changed. I don't think that's the meaning you're going for however. $\endgroup$ Commented Aug 3, 2024 at 23:03
  • 2
    $\begingroup$ The .NET JIT will make multiple copies of generated code for generics for the various type parameters when they are scalar/built-in types, because that improves code performance quite a bit over boxing. Otherwise all reference types share the same generated code, as far as I know from a few years ago. $\endgroup$ Commented Aug 3, 2024 at 23:07

2 Answers 2

2
$\begingroup$

I'm not sure what the official .NET implementation does these days, but from what I remember from working on an ahead-of-time compiler for C# a while back, the main constraints that limit sharing code between generic instantiations are:

(all examples assume that T is a generic parameter)

Size

Two instantiations where T has different size will require different code for loading, storing, and copying variables of type T. Similarly, if T is used as the type of a field, other fields of the class may have different field offsets. So, instantiations for int and long will probably not share the same code. You could still share code for any method that doesn't actually need to know the size of T. For example, on List<int>.get_Size() and List<long>.get_Size() might well be able to share code, even though List<int>.Add() and List<long>.Add() probably wouldn't share code.

GC Layout

If T is a struct, then, in addition to knowing the size of T, the code generator may need to know which offsets within an instance of T hold GC references, so that it can report all GC references to the garbage collector. So, the instantiation where T is a struct containing two long fields will probably not share code with the instantiation where T is a struct containing one long field and one string field, even though both structs are 16 bytes in size.

Uses of T without an object

There are several situations in which the type of T must be known at runtime. The most common are casts to type T (including is and as), instantiating an instance of T (where T has the new() constraint), and typeof(T).

Unlike the "size" and "GC layout" cases above, there's a reasonably efficient way to share the code between instantiations even if type of T is required at runtime. If T is a generic parameter on a generic class C<T>, the code generator can store the runtime type of T in the vtable of each instantiation of C<T>. This allows the code for all instance methods of C<T> to have access to the runtime type without bloating the size of every instance of C<T>. For static methods, and for the case where T is a generic parameter on a generic method, the code generator can generate a hidden extra parameter for that method, in which the caller is expected to pass a pointer to the runtime type information for that instantiation's generic parameters.

$\endgroup$
3
$\begingroup$

When T is a ValueType (i.e a primitive or a struct), the generic will be monomorphized and there will be one implementation for that specific type (even if two types have the same behaviour for that method).

When T is not a ValueType, (an interface or a class) there is one shared method that uses the object's vtable to dispatch the appropriate calls for the type.

There is a common pattern in high-performance C# of constraining to a struct and interface that uses static types in order to force monomorphization:

interface IFoo { int Transform(int a); } struct Bar : IFoo { public int Transform(int a) => a * 2; } // ... // the usage of forcing T to be a struct is this code will be monomorphized // and `IFoo.Transform` can undergo inlining, whereas for a traditional // interface it will be a slow multi-dispatch call void MapFoo<T>(int[] arr) where T : struct, IFoo { for (var i = 0; i < arr.Length; i++) { arr[i] = default(T).Transform(arr[i]); } } 
$\endgroup$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.