0

I have a huge directory I need retrieve files from including subdirectories.

I have files that are folders contain various files but I am only interested in specific proprietary files named with an extension with a length of 7 digits.

For example, I have folder that contains the following files:

abc.txt def.txt GIWFJ1XA.0201000 GIWFJ1UC.0501000 NOOBO0XA.0100100 summary.pdf someinfo.zip T7F4JUXA.0300600 vxy98796.csv YJHLPLBO.0302300 YJHLPLUC.0302800 

I have tried the following:

var fileList = Directory.GetFiles(someDir, "*.???????", SearchOption.AllDirectories) 

and also

string searchSting = string.Empty; for (int j = 0; j < 9999999; j++) { searchSting += string.Format(", *.{0} ", j.ToString("0000000")); } var fileList2 = Directory.GetFiles(someDir, searchSting, SearchOption.AllDirectories); 

which errors because the string is too long obviously.

I want to only return the files with the specified length of the extension, in this case, 7 digits to avoid having to loop over the thousands I would have to process.

I have considered creating a variable string for the search criteria that would contain all 99,999,999 possible digits but d

How can I accomplish this?

3
  • @XiangWeiHuang I vote to undelete your answer; it provides a Regex alternative and OP already knows how to make GetFiles search all directories with the third argument Commented Apr 22, 2022 at 5:53
  • The main point of Xiang's answer was effectively to do Regex r = new Regex(@"\.\d{7}$"); and GetFiles(...).Where(r.IsMatch) - I'd use EnumerateFiles but this Regex is "ends with a dot and 7 digits" which is a compact alternative too Commented Apr 22, 2022 at 5:59
  • oof I didn't know GetFiles can do subfolder searching. But I think @DiplomacyNotWar's answer is ideal enough because it doesn't use regex. Still, I revived it for future references. Thanks for suggestion Commented Apr 22, 2022 at 6:08

3 Answers 3

2

I don't believe there's a way you can do this without looping through the files in the directory and its subfolders. The search pattern for GetFiles doesn't support regular expressions, so we can't really use something like [\d]{7} as a filter. I would suggest using Directory.EnumerateFiles and then return the files that match your criteria.

You can use this to enumerate the files:

private static IEnumerable<string> GetProprietaryFiles(string topDirectory) { Func<string, bool> filter = f => { string extension = Path.GetExtension(f); // is 8 characters long including the . // all remaining characters are digits return extension.Length == 8 && extension.Skip(1).All(char.IsDigit); }; // EnumerateFiles allows us to step through the files without // loading all of the filenames into memory at once. IEnumerable<string> matchingFiles = Directory.EnumerateFiles(topDirectory, "*", SearchOption.AllDirectories) .Where(filter); // Return each file as the enumerable is iterated foreach (var file in matchingFiles) { yield return file; } } 

Path.GetExtension includes the . so we check that the number of characters including the . is 8, and that all remaining characters are digits.

Usage:

List<string> fileList = GetProprietaryFiles(someDir).ToList(); 
Sign up to request clarification or add additional context in comments.

3 Comments

Note for OP, you could also just take the middle statement and put it wherever you wanted to use, eg foreach(var file in Directory.EnumerateFiles(...).Select(Path.GetExtension).Where(e => e.Length == 8 && e.Skip(1).All(Char.IsDigit))). Also, side note, these extensions could parse as doubles for an alternative to skip/all
or maybe because you originally tried a filter of .??????? you don't even have to test them for being digits
I know the extensions are all digits and 7 in length (at least for the time being). They will always be longer than the "standard" 3 or 4. That was why I tried the obvious of the questions marks. I am just trying reduce the size of the array/list returned I am looping over.
0

I would just grab the list of files in the directory, and then check if the substring length after the '.' is equal to 7. (* As long as you know no other files would have that length extension)

EDITED to use Path instead:

Directory.GetFiles(@"C:\temp").Where( fileName => Path.GetExtension(fileName).Length == 8 ).ToList(); 

OLD:

Directory.GetFiles(someDir).Where( fileName => fileName.Substring(fileName.LastIndexOf('.') + 1).Length == 7 ).ToList(); 

4 Comments

This would return files like "ABCDEFG" (without extension) as false positives.
Yea true, I guess it really depends on their use case for the directory whether or not they need additional cases covered
Don't use string manipulations to work with paths. Use Path class instead
Good point, definitely safer
0

Consider files as Directory.GetFiles() result.

using System; using System.Collections.Generic; using System.Linq; using System.IO; using System.Text.RegularExpressions; public class Program { public static void Main() { List<string> files = new List<string>() {"abc.txt", "def.txt", "GIWFJ1XA.0201000", "GIWFJ1UC.0501000", "NOOBO0XA.0100100", "summary.pdf", "someinfo.zip", "T7F4JUXA.0300600", "vxy98796.csv", "YJHLPLBO.0302300", "YJHLPLUC.0302800"}; Regex r = new Regex("^\\.\\d{7}$"); foreach (string file in files.Where(o => r.IsMatch(Path.GetExtension(o)))) { Console.WriteLine(file); } } } 

Output:

GIWFJ1XA.0201000 GIWFJ1UC.0501000 NOOBO0XA.0100100 T7F4JUXA.0300600 YJHLPLBO.0302300 YJHLPLUC.0302800 

Edit: I tried (r.IsMatch) instead of using o but dotnetfiddle Compiler is giving me error saying

Compilation error (line 14, col 27): The call is ambiguous between the following methods or properties: 'System.Linq.Enumerable.Where<string>(System.Collections.Generic.IEnumerable<string>, System.Func<string,bool>)' and 'System.Linq.Enumerable.Where<string>(System.Collections.Generic.IEnumerable<string>, System.Func<string,int,bool>)' 

Can't debug it since I am busy now, I'd be happy if anyone passing by suggest any fix for that. But the current code above works.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.