21

Does the C# compiler or .NET CLR do any clever memory optimisation of string literals/constants? I could swear I'd heard of the concept of "string internalisation" so that in any two bits of code in a program, the literal "this is a string" would actually refer to the same object (presumably safe, what with strings being immutable?). I can't find any useful reference to it on Google though...

Have I heard this wrong? Don't worry - I'm not doing anything horrible in my code with this information, just want to better my understanding of how it works under the covers.

1

3 Answers 3

21

EDIT: While I strongly suspect the statement below is true for all C# compiler implementations, I'm not sure it's actually guaranteed in the spec. Section 2.4.4.5 of the spec talks about literals referring to the same string instance, but it doesn't mention other constant string expressions. I suspect this is an oversight in the spec - I'll email Mads and Eric about it.


It's not just string literals. It's any string constant. So for example, consider:

public const string X = "X"; public const string Y = "Y"; public const string XY = "XY"; void Foo() { string z = X + Y; } 

The compiler realises that the concatenation here (for z) is between two constant strings, and so the result is also a constant string. Therefore the initial value of z will be the same reference as the value of XY, because they're compile-time constants with the same value.

EDIT: The reply from Mads and Eric suggested that in the Microsoft C# compiler string constants and string literals are usually treated the same way - but that other implementations may differ.

Sign up to request clarification or add additional context in comments.

4 Comments

Do two identical string constants in different assemblies point to the same object too? / Does the jitter intern string literals?
@CodeInChaos: I believe that depends on the CompilationRelaxationsAttribute(CompilationRelaxations.NoStringInterning) attribute. I wouldn't like to say for sure though.
Hi @JonSkeet, please advice whether interned strings with same content always have the same reference? Does it mean that comparing references of such strings will return true?
@Johnny_D: Yes and yes - guaranteed within the same assembly, at least. Between assemblies it gets trickier, IIRC.
10

This article explains string interning pretty well. Quote:

.NET has the concept of an "intern pool". It's basically just a set of strings, but it makes sure that every time you reference the same string literal, you get a reference to the same string. This is probably language-dependent, but it's certainly true in C# and VB.NET, and I'd be very surprised to see a language it didn't hold for, as IL makes it very easy to do (probably easier than failing to intern literals). As well as literals being automatically interned, you can intern strings manually with the Intern method, and check whether or not there is already an interned string with the same character sequence in the pool using the IsInterned method. This somewhat unintuitively returns a string rather than a boolean - if an equal string is in the pool, a reference to that string is returned. Otherwise, null is returned. Likewise, the Intern method returns a reference to an interned string - either the string you passed in if was already in the pool, or a newly created interned string, or an equal string which was already in the pool.

1 Comment

Sidenote: Since internend strings aren't freed during the live-time of the AppDomain improper use of intering can cause a memory leak.
9

Yes it does optimize string literals. One simple example where you can see that:

string s1="A"; string s2="A"; object.ReferenceEquals(s1,s2); //true 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.