2

I'm trying to get a list of files in a specific directory that contains over 20 million files ranging from 2 to 20 KB each.
The problem is that my program throws the Out Of Memory Exception everytime, while tools like robocopy are doing a good job copying the folder to another directory with no problem at all. Here's the code I'm using to enumerate files:

 List<string> files = new List<string>(Directory.EnumerateFiles(searchDir)); 

What should I do to solve this problem? Any help would be appreciated.

9
  • 2
    Don't create a list of the files. Just iterate over the result of EnumerateFiles and do whatever it is you want to do. Commented Sep 28, 2016 at 16:51
  • Are you trying to hold that much data in memory?On way you can do is create sub directory and break it into groups. Commented Sep 28, 2016 at 16:52
  • @Rohit Yes. I was trying to create a list, then iterate over them and do some processing. Commented Sep 28, 2016 at 16:53
  • 1
    @JeremyMc Would need to see more code to determine if there are any other potential memory issues. Commented Sep 28, 2016 at 16:54
  • 2
    @rory.ap That would be even worse as it would return an array of the files then create a list from that array, thus doubling the amount of memory used. Commented Sep 28, 2016 at 16:55

2 Answers 2

9

You are creating a list of 20 million object in memory. I don't think you will ever use that, even if it become possible.

Instead use to Directory.EnumerateFiles(searchDir) and iterate each item one by one.

like:

foreach(var file in Directory.EnumerateFiles(searchDir)) { //Copy to other location, or other stuff } 

With your current code, your program will have 20 million objects first loaded up in memory and then you have to iterate, or perform operations on them.

See: Directory.EnumerateFiles Method (String)

The EnumerateFiles and GetFiles methods differ as follows: When you use EnumerateFiles, you can start enumerating the collection of names before the whole collection is returned; when you use GetFiles, you must wait for the whole array of names to be returned before you can access the array. Therefore, when you are working with many files and directories, EnumerateFiles can be more efficient.

Sign up to request clarification or add additional context in comments.

3 Comments

Isn't that going to run into the same problem?
@rory.ap, not it will not. This will not load up the 20 million files path in memory, instead it will be one object (string path) at a time in memory
@GillBates, no. Enumeration, doesn't mean returning collection. This will do lazy evaluation. Just like File.ReadLine vs File.ReadAllLines.
1

The answer above covers one directory level. To be able to enumerate through multiple levels of directories, each having a large number of directories with a large number of files, one can do the following:

public IEnumerable<string> EnumerateFiles(string startingDirectoryPath) { var directoryEnumerables = new Queue<IEnumerable<string>>(); directoryEnumerables.Enqueue(new string[] { startingDirectoryPath }); while (directoryEnumerables.Any()) { var currentDirectoryEnumerable = directoryEnumerables.Dequeue(); foreach (var directory in currentDirectoryEnumerable) { foreach (var filePath in EnumerateFiles(directory)) { yield return filePath; } directoryEnumerables.Enqueue(Directory.EnumerateDirectories(directory)); } } } 

The function will traverse a collection of directories through enumerators, so it will load the directory contents one by one. The only thing left to solve is the depth of the hierarchy...

1 Comment

I guess fileEnumeratorFunc does not exist?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.