yield return - memory optimization

Question

And yet another question about yield return

So I need to execute remotely different SQL scripts. The scripts are in TFS so I get them from TFS automatically and the process iterates through all the files reading their content in memory and sending the content to the remote SQL servers.

So far the process works flawlessly. But now some of the scripts will contain bulk inserts increasing the size of the script to 500,000 MB or more.

So I built the code "thinking" that I was reading the content of the file once in memory but now I have second thoughts.

This is what I have (over simplified):

 public IEnumerable<SqlScriptSummary> Find(string scriptsPath) { if (!Directory.Exists(scriptsPath)) { throw new DirectoryNotFoundException(scriptsPath); } var path = new DirectoryInfo(scriptsPath); return path.EnumerateFiles("*.sql", SearchOption.TopDirectoryOnly) .Select(x => { var script = new SqlScriptSummary { Name = x.Name, FullName = x.FullName, Content = File.ReadAllText(x.FullName, Encoding.Default) }; return script; }); } .... public void ExecuteScripts(string scriptsPath) { foreach (var script in Find(scriptsPath)) { _scriptRunner.Run(script.Content); } }

My understanding is that EnumerateFiles will yield return each file at a time, so that's what made me "think" that I was loading one file at a time in memory.

But...

Once that I'm iterating them, in the ExecuteScripts method what happens with the script variable used in the foreach loop after it goes out of scope? Is that disposed? or does it remain in memory?

If it remains in memory that means that even when I'm using iterators and internally using yield return when I iterate through all of them they are still in memory right? so at the end it would be like using ToList just with a lazy execution is that right?
If the script variable is disposed when it goes out of scope then I think I would be fine

How could I re-design the code to optimize memory consumption, like forcing just to load the content of a script into memory one at a time

Additional questions:

How can I test (unit/integration test) that I'm loading just one script at a time in memory?
How can I test (unit/integration test) that each script is released/not released from memory?

Jon Skeet · Accepted Answer · 2015-03-20 18:10:21Z

Once that I'm iterating them, what happens with the script variable used in the foreach loop after it goes out of scope? Is that disposed? or does it remain in memory?

If you mean in the ExecuteScripts method - there's nothing to dispose, unless SqlScriptSummary implements IDisposable, which seems unlikely. However, there are two different things here:

The script variable goes out of scope after the foreach loop, and can't act as a GC root
Each object that the script variable has referred to will be eligible for garbage collection when nothing else refers to it... including script on the next iteration.

So yes, basically that should be absolutely fine. You'll be loading one file at a time, and I can't see any reason why there's be more than one file's content in memory at a time, in terms of objects that the GC can't collect. (The GC itself is lazy, so it's unlikely that there'd be exactly one script in memory at a time, but you don't need to worry about that side of things, as your code makes sure that it doesn't keep live references to more than one script at a time.)

The way you can test that you're only loading a single script at a time is to try it with a large directory of large scripts (that don't actually do anything). If you can process more scripts than you have memory, you're fine :)

So @JonSkeet, if I understood correctly, the script variable will be collected by GC but not exactly after each iteration in the foreach loop right? so that means that I could have sometimes one or two files loaded in memory at a time but the GC will ensure to collect them later is that right?
@Jupaol: Variables aren't collected. Objects are. You need to distinguish between the two. But yes, apart from that, it sounds about right.
@Jupaol Variables aren't collected by the GC, objects are. Variables simply point to objects that exist in memory somewhere (for reference types, value types are a bit different). If the GC determines that no variables point to an object, that object becomes eligible for collection, which means that at some future time, its memory will be reclaimed. EDIT: Beaten by Jon Skeet with the exact wording @_@.
@Kyle Jon Skeet cannot be beaten. Thank you for your explanation, I need to read more about GC

Collectives™ on Stack Overflow

yield return - memory optimization

But...

1 Answer 1

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

But...

1 Answer 1

4 Comments

Related