0

I'm processing a large number of files, therefore, I don't want to wait until the whole search is finished before the array is returned. So I don't want to use Directory.GetFiles()

According to this answer , I need to use EnumerateFiles() in order to get results during the search process. However, I'm using NET2.0 and this function seems to be introduced starting from NET 4.0

What is the equivalent of EnumerateFiles() in Net 2.0 ?

Any hints would be highly appreciated

6
  • There is no equivalent. But if the processing is complex and expensive Directory.GetFiles is the wrong place to optimize. You could optimize the processing method or you could load all paths with GetFiles, then process one part after the other, so for example every 10th file. Commented May 12, 2014 at 8:57
  • Do you think this answer is what I need stackoverflow.com/a/929418/2340370 ? Commented May 12, 2014 at 9:00
  • 2
    It still uses Directory.GetFiles(path) first even if the iterator yields one after the other. So no, that just fakes deferred execution for a single directory with a large number of files. Commented May 12, 2014 at 9:04
  • Are there that many files in one directory to make GetFiles expensive? Or is it only expensive because you search subdirectories? In the latter case you can simply write the recursive descent yourself and you only incur the cost of a shallow search before starting the search. One implementation is at Enumerating Files Throwing Exception Commented May 12, 2014 at 9:40
  • 2
    If those files are in same directory, then you may have to take a look into winapi direction: FindFirstFile, FindNextFile, etc.. Commented May 12, 2014 at 9:43

3 Answers 3

2

What you need are the WinAPI calls for FindFirstFile and FindNextFile. Here's some code that uses the wrapped api calls.

IEnumerable<string> EnumerateFiles(string path) { APIWrapper.FindData findData = new APIWrapper.FindData(); APIWrapper.SafeFindHandle handle = APIWrapper.SafeNativeMethods.FindFirstFile(System.IO.Path.Combine(path, "*"), findData); if(!handle.IsInvalid && !handle.IsClosed) { yield return findData.fileName; while(!APIWrapper.SafeNativeMethods.FindNextFile(handle, findData)) yield return findData.fileName; handle.Close(); } } 

I just hand typed EnumerateFiles so treat it as pseudo code, but the class it relies on is production ready, this is it here

internal class APIWrapper { [StructLayout(LayoutKind.Sequential, CharSet = CharSet.Auto)] internal sealed class FILETIME { public int Low; public int High; public Int64 ToInt64() { Int64 h = High; h = h << 32; return h + Low; } } [StructLayout(LayoutKind.Sequential, CharSet = CharSet.Auto)] internal sealed class FindData { public int fileAttributes; public FILETIME CreationTime; public FILETIME LastAccessTime; public FILETIME LastWriteTime; public int FileSizeHigh; public int FileSizeLow; public int dwReserved0; public int dwReserved1; [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)] public String fileName; [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 14)] public String alternateFileName; } internal sealed class SafeFindHandle : Microsoft.Win32.SafeHandles.SafeHandleMinusOneIsInvalid { /// <summary> /// Constructor /// </summary> public SafeFindHandle() : base(true) { } /// <summary> /// Release the find handle /// </summary> /// <returns>true if the handle was released</returns> [ReliabilityContract(Consistency.WillNotCorruptState, Cer.Success)] protected override bool ReleaseHandle() { return SafeNativeMethods.FindClose(handle); } } internal enum SearchOptions { NameMatch, LimitToDirectories, LimitToDevices } [SecurityPermissionAttribute(SecurityAction.Assert, UnmanagedCode = true)] internal static class SafeNativeMethods { [DllImport("Kernel32.dll", CharSet = CharSet.Auto)] public static extern SafeFindHandle FindFirstFile(String fileName, [In, Out] FindData findFileData); [DllImport("Kernel32.dll", CharSet = CharSet.Auto)] public static extern SafeFindHandle FindFirstFileEx( String fileName, //__in LPCTSTR lpFileName, [In] int infoLevel, //__in FINDEX_INFO_LEVELS fInfoLevelId, [In, Out] FindData findFileData, //__out LPVOID lpFindFileData, [In, Out] SearchOptions SerchOps, //__in FINDEX_SEARCH_OPS fSearchOp, [In] int SearchFilter, //__reserved LPVOID lpSearchFilter, [In] int AdditionalFlags); //__in DWORD dwAdditionalFlags [DllImport("kernel32", CharSet = CharSet.Auto)] [return: MarshalAs(UnmanagedType.Bool)] public static extern bool FindNextFile(SafeFindHandle hFindFile, [In, Out] FindData lpFindFileData); [DllImport("kernel32", CharSet = CharSet.Auto)] [return: MarshalAs(UnmanagedType.Bool)] public static extern bool FindClose(IntPtr hFindFile); } } 
Sign up to request clarification or add additional context in comments.

4 Comments

Doesn't that break for directories which contain no files? You return a filename without checking if FindFirstFile succeeded,
Just spotted that and fixed it. Like I said, treat EnumerateFiles as pseudo code. Thanks for feed back.
A down vote with no explanation, how can this be? Folks if there's something wrong with my answer can you tell me what it is? After all we're all here to learn :)
Tried to read system32 with it, the only thing returned is .
0

Specially added as a new answer..

Since .NET 2.0 There is IENumerable and yield keyword does Lazy Initialization/deferred execution..With these, you can get your wants.

public IEnumerable<string> GetFiles(string rootPath, string [] fileNameStartChars, string[] extensionsFilter) { FileSystemInfo[] fsi = null; for(int i = 0; i < fileNameStartChars.Length; i++) { for(int k = 0; k<extensionsFilter.Length; k++) { fsi = new DirectoryInfo(rootPath).GetFileSystemInfos(fileNameStartChars[i]+extensionsFilter[k]); if (fsi.Length > 0) { for (int j = 0; j < fsi.Length; j++) { /// .Name returns the filename with extension..if you need, please implement here a substring for eliminate the extension of the file yield return fsi[j].Name; } } } } } 

And usage :

possible filenames startsWithChar table

public string[] table = new string[] { "A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z", "a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z", "1","2","3","4","5","6","7","8","9","0","#","_","-",".","@","+",",","%","&","(",")","[","]","{","}","*", "<",">","^"," ","|",";","`" }; 

And extensions :

string[] Exts = new string[] { ".mp3", ".midi", ".wav"}; 

with this method, you can filter your data within small parts as such as using startswithchar filtering, so you won't get Memory problem which depends to your files count..This is the tricky part of trying to imitate .net v4's EnumerateFiles method with 100% .net v2 managed code..

 IEnumerable<string> strNumerable = GetFiles(@"D:\Music", table, Exts); ///Since its deferred execution, method didn't get any memory alloc for your data till now..Memory Alloc will start within this foreach.. foreach (string s in strNumerable) { //do your work } 

Comments

-2

Since .NET 2.0 There is IENumerable and yield keyword does Lazy Initialization..With these, you can get your wants.

With a pseudo :

public IENumerable GetFiles(string Path, string FileExtension) { // Create a new IENumerable instance // Get FileCount with DirectoryInfo or some similar // Implement a for-loop with File count // If DirectoryFiles [ indexOfForLoop ] .Extension == FileExtension yield return DirectoryFiles [indexOfForLoop ] } 

In this pseudo the yield keyword take responsibility of the filtering..If filtering returns true the yield return immediately return the result to the IENumerable instance / callee..

And IEnumerable takes responsibility of Lazy Loading..

Depends to your needs, Also you can use yield break keyword in loop to not include the result..

And with a simple call :

List<string> FilesInDirectory = GetFiles( path, "*.txt").ToList(); 

Hope this helps..

7 Comments

. . . and where do you get the actual file names from?
@BinaryWorrier If Op wants, within for loop there can be a FileInfo..Or as Op wants, out of the method, after getting results to list with another loop / thread he can get filenames..As we know the yield keyword returns immediately..So, no need to wait whole array/list execution ;)
No, sorry, I can't follow that, can you add some.actual code that shows where you would get the file names?
I don't need to follow it? Surely anyone who reads the answer should be able to follow it. If you can replace the handy wavy comments with some method that actually gets file names I'll apologise profusely and upvote your answer. Thanks :)
Dude I am sincerely sorry that someone so close to you is ill. You seem to have read my comments asking for claidication as hostility and/or arrogance, nothing could be further from the truth. Please dont take these things personally, we're all here to learn and hopefully make the internet a better place by leaving good questions and answers. I want every answer on SO to be the best answer it can be. Again, this isnt a personal.attack, I dont know you or anything about you. All i know is the content of your answer.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.