2

I am using Linq to Sql. Here is the code:

Dictionary<string, int> allResults; using (var dc= new MyDataContext()) { dc.CommandTimeout = 0; allResults = dc.MyTable.ToDictionary(x => x.Text, x => x.Id); } 

it is ran on a 64 bit machine and the compilation is AnyCPU. It throws a System.OutOfMemoryException.

This accesses an SQL Server database. The Id field maps to an SQL Server int field, And the Text field maps to Text(nvarchar(max)) field. Running select COUNT(*) from TableName results in 1,173,623 records and running select sum(len(Text)) from TableName results in 48,915,031. Since int is a 32 bit integer, the ids should take only 4.69MB of space and the strings less than 1GB. So we are not even bumping against the 2GB/object limit.

I then change the code in this way:

Dictionary<string, int> allResults; using (var dc = new MyDataContext()) { Dictionary<string, int> threeHundred; dc.CommandTimeout = 0; var tot = dc.MyTable.Count(); allResults = new Dictionary<string, int>(tot); int skip = 0; int takeThis = 300000; while (skip < tot) { threeHundred = dc.MyTable.Skip(skip).Take(takeThis).ToDictionary(x => x.Text, x => x.Id); skip = skip + takeThis; allResults = allResults.Concat(threeHundred).ToDictionary(x => x.Key, x => x.Value); threeHundred = null; GC.Collect(); } } 

I learn that garbage collaction here does not help and that the out of memory exception is thrown on the first line in the while loop once skip = 900,000.

What is wrong and how do I fix this?

22
  • 1
    Do you actually have enough memory to hold all of those at once? Commented Apr 28, 2014 at 18:29
  • 2
    Side note: Make sure your process runs as x64 - by default AnyCPU runs x86 ("prefer 32-bit"). Commented Apr 28, 2014 at 18:40
  • 1
    @Joe, See my note to sous2817 above. However you two might be on to something. I think I need to change my config file: By default, the .NET Framework does not support arrays that are greater than 2 GB, but in the .NET Framework 4.5, you can use the <gcAllowVeryLargeObjects> element in your application configuration file to enable arrays that are greater than this size. Source: blogs.msdn.com/b/dotnet/archive/2012/07/20/… Commented Apr 28, 2014 at 23:07
  • 1
    @Magnus, access to in memory collections is always faster than access to any external data source no matter how well indexed it is. My interest is in figuring out what goes wrong here and how to fix it more than re-engineering the product to avoid this error. Commented Apr 30, 2014 at 17:19
  • 1
    The 2GB/object does not apply here anyway, the Dictionary is a big cluster of objects, each well below that limit. A memory profiler might help. Commented Apr 30, 2014 at 17:42

1 Answer 1

1

Without getting into your calculations of how much it should take in memory (as there could be issues of encoding that could easily double the size of the data), I'll try to give a few pointers.

Starting with the cause of the issue - my guess is that the threeHundred dictionary is causing a lot of allocations. When you add items to a dictionary like above, the dictionary won't be able to know how many items it should pre-allocated. Which will cause a massive re-allocation and coping of all data to newly created dictionaries. Please set a size (using the ctor) to the threeHundred dictionary before adding any items to it.

Please read this article I've published which goes in-depth into Dictionary internals - I'm sure it will shed some light on those symptoms. http://www.codeproject.com/Articles/500644/Understanding-Generic-Dictionary-in-depth

In addition, when trying to populate this large amount of data, I suggest to fully control the process. My suggestion:

  • Pre-allocate slots in the Dictionary (using a Count query directly on the DB, and passing it to the Dictionary ctor)
  • Work with DataReader for populating those items without loading all of the query result into memory. If you know for a fact (which is VERY important to know this in advance) - think of using string.Intern - only if there are many duplicated items! - you should test to see how it is working
  • Memory-profile the code - you should only see ONE allocation for the Dictionary, and strings as the amount of the items from the query (int is value type - therefor it is not allocated on the heap as an object, but instead it sits inside the Dictionary.

Either way, you should check if you are running on 32 bit or 64 bit. .Net 4.5 prefers 32 bit. (check it on Task Manager or the project properties)

Hope this helps, Ofir.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.