How do I check for illegal characters in a path?

Question

Is there a way to check if a String meant for a path has invalid characters, in .Net? I know I could iterate over each character in Path.InvalidPathChars to see if my String contained one, but I'd prefer a simple, perhaps more formal, solution.

Is there one?

I've found I still get an exception if I only check against Get

Update:

I've found GetInvalidPathChars does not cover every invalid path character. GetInvalidFileNameChars has 5 more, including '?', which I've come across. I'm going to switch to that, and I'll report back if it, too, proves to be inadequate.

Update 2:

GetInvalidFileNameChars is definitely not what I want. It contains ':', which any absolute path is going to contain ("C:\whatever"). I think I'm just going to have to use GetInvalidPathChars after all, and add in '?' and any other characters that cause me problems as they come up. Better solutions welcome.

Isn't this a duplicate of stackoverflow.com/questions/146134/…? — René
– René, Commented Nov 16, 2011 at 13:32
FYI: in .NET 4.0 on Windows, Path.GetInvalidPathChars() is a subset of Path.GetInvalidFilenameChars(). To be precise, Path.GetInvalidFilenameChars() == Path.GetInvalidPathChars().Concat(new[] { ':', '*', '?', '\\', '\' }) — Ian Kemp - SO dead by AI greed
– Ian Kemp - SO dead by AI greed, Commented Dec 17, 2013 at 18:39

Jeremy Bell · Accepted Answer · 2010-04-02 13:35:04Z

InvalidPathChars is deprecated. Use GetInvalidPathChars() instead:

 public static bool FilePathHasInvalidChars(string path) { return (!string.IsNullOrEmpty(path) && path.IndexOfAny(System.IO.Path.GetInvalidPathChars()) >= 0); }

Edit: Slightly longer, but handles path vs file invalid chars in one function:

 // WARNING: Not tested public static bool FilePathHasInvalidChars(string path) { bool ret = false; if(!string.IsNullOrEmpty(path)) { try { // Careful! // Path.GetDirectoryName("C:\Directory\SubDirectory") // returns "C:\Directory", which may not be what you want in // this case. You may need to explicitly add a trailing \ // if path is a directory and not a file path. As written, // this function just assumes path is a file path. string fileName = System.IO.Path.GetFileName(path); string fileDirectory = System.IO.Path.GetDirectoryName(path); // we don't need to do anything else, // if we got here without throwing an // exception, then the path does not // contain invalid characters } catch (ArgumentException) { // Path functions will throw this // if path contains invalid chars ret = true; } } return ret; }

I'm tired now (3AM) but methinks that IndexOfAny returns -1 if no invalid char is found, thus the result is true if NO such char is found in either filename or fileDirectory, exactly the opposite of what is wanted. But, more importantly, how does this solve "c:\first\second:third\test.txt"? Would it catch the second, illegal ':'?
See edits to original post. As to your other question, "C:\first\second:third\test.txt" does not contain any invalid characters for a path, since ":" is a valid path character. True, the path is an invalid path, but the purpose of the function was not to validate proper paths. For that, the best bet would be to test the path string against a regular expression. You could also do: foreach(String s in path.Split('\\')) {// test s for invalid file characters} but that implementation is a little brittle since you have to make an exception for the "C:"
The second function does not seem to catch ? or * characters.
Might be good to cache Path.GetInvalidPathChars() since it will be cloned with every call to GetInvalidPathChars.
Ive noticed Path.GetDirectoryName can be quite slow when u give it an invalid path.

Glenn Slayden · Accepted Answer · 2020-04-27 20:28:28Z

As of .NET 4.7.2, Path.GetInvalidFileNameChars() reports the following 41 'bad' characters.

 0x0000 0 '\0' | 0x000d 13 '\r' | 0x001b 27 '\u001b' 0x0001 1 '\u0001' | 0x000e 14 '\u000e' | 0x001c 28 '\u001c' 0x0002 2 '\u0002' | 0x000f 15 '\u000f' | 0x001d 29 '\u001d' 0x0003 3 '\u0003' | 0x0010 16 '\u0010' | 0x001e 30 '\u001e' 0x0004 4 '\u0004' | 0x0011 17 '\u0011' | 0x001f 31 '\u001f' 0x0005 5 '\u0005' | 0x0012 18 '\u0012' | 0x0022 34 '"' 0x0006 6 '\u0006' | 0x0013 19 '\u0013' | 0x002a 42 '*' 0x0007 7 '\a' | 0x0014 20 '\u0014' | 0x002f 47 '/' 0x0008 8 '\b' | 0x0015 21 '\u0015' | 0x003a 58 ':' 0x0009 9 '\t' | 0x0016 22 '\u0016' | 0x003c 60 '<' 0x000a 10 '\n' | 0x0017 23 '\u0017' | 0x003e 62 '>' 0x000b 11 '\v' | 0x0018 24 '\u0018' | 0x003f 63 '?' 0x000c 12 '\f' | 0x0019 25 '\u0019' | 0x005c 92 '\\' | 0x001a 26 '\u001a' | 0x007c 124 '|'

As noted by another poster, this is a proper superset of the set of characters returned by Path.GetInvalidPathChars().

The following function detects the exact set of 41 characters shown above:

public static bool IsInvalidFileNameChar(Char c) => c < 64U ? (1UL << c & 0xD4008404FFFFFFFFUL) != 0 : c == '\\' || c == '|';

This is still not enough, a perfectly valid character in path name may still combine into an invalid path or file name such as AUX, COM1, LPT1

René · Accepted Answer · 2011-11-16 13:34:24Z

Be careful when relying on Path.GetInvalidFileNameChars, which may not be as reliable as you'd think. Notice the following remark in the MSDN documentation on Path.GetInvalidFileNameChars:

The array returned from this method is not guaranteed to contain the complete set of characters that are invalid in file and directory names. The full set of invalid characters can vary by file system. For example, on Windows-based desktop platforms, invalid path characters might include ASCII/Unicode characters 1 through 31, as well as quote ("), less than (<), greater than (>), pipe (|), backspace (\b), null (\0) and tab (\t).

It's not any better with Path.GetInvalidPathChars method. It contains the exact same remark.

The GetInvalid*NameChars methods are neither useful nor reliable. Path validity/invalidity is implicitly tied to the filesystem on which the code is executing, and since System.IO.* doesn't do filesystem sniffing - just returns a hard-coded array - what is invalid on filesystem A may be completely valid on filesystem B. tl;dr: don't rely on these methods, roll your own.

Randy Burden · Accepted Answer · 2015-12-08 05:55:40Z

I ended up borrowing and combining a few internal .NET implementations to come up with a performant method:

/// <summary>Determines if the path contains invalid characters.</summary> /// <remarks>This method is intended to prevent ArgumentException's from being thrown when creating a new FileInfo on a file path with invalid characters.</remarks> /// <param name="filePath">File path.</param> /// <returns>True if file path contains invalid characters.</returns> private static bool ContainsInvalidPathCharacters(string filePath) { for (var i = 0; i < filePath.Length; i++) { int c = filePath[i]; if (c == '\"' || c == '<' || c == '>' || c == '|' || c == '*' || c == '?' || c < 32) return true; } return false; }

I then used it like so but also wrapped it up in a try/catch block for safety:

if ( !string.IsNullOrWhiteSpace(path) && !ContainsInvalidPathCharacters(path)) { FileInfo fileInfo = null; try { fileInfo = new FileInfo(path); } catch (ArgumentException) { } ... }

Community · Accepted Answer · 2017-05-23 12:32:05Z

It's probably too late for you, but may help somebody else. I faced the same issue and needed to find a reliable way to sanitize a path.

Here is what I ended up using, in 3 steps:

Step 1: Custom cleaning.

public static string RemoveSpecialCharactersUsingCustomMethod(this string expression, bool removeSpecialLettersHavingASign = true) { var newCharacterWithSpace = " "; var newCharacter = ""; // Return carriage handling // ASCII LINE-FEED character (LF), expression = expression.Replace("\n", newCharacterWithSpace); // ASCII CARRIAGE-RETURN character (CR) expression = expression.Replace("\r", newCharacterWithSpace); // less than : used to redirect input, allowed in Unix filenames, see Note 1 expression = expression.Replace(@"<", newCharacter); // greater than : used to redirect output, allowed in Unix filenames, see Note 1 expression = expression.Replace(@">", newCharacter); // colon: used to determine the mount point / drive on Windows; // used to determine the virtual device or physical device such as a drive on AmigaOS, RT-11 and VMS; // used as a pathname separator in classic Mac OS. Doubled after a name on VMS, // indicates the DECnet nodename (equivalent to a NetBIOS (Windows networking) hostname preceded by "\\".). // Colon is also used in Windows to separate an alternative data stream from the main file. expression = expression.Replace(@":", newCharacter); // quote : used to mark beginning and end of filenames containing spaces in Windows, see Note 1 expression = expression.Replace(@"""", newCharacter); // slash : used as a path name component separator in Unix-like, Windows, and Amiga systems. // (The MS-DOS command.com shell would consume it as a switch character, but Windows itself always accepts it as a separator.[16][vague]) expression = expression.Replace(@"/", newCharacter); // backslash : Also used as a path name component separator in MS-DOS, OS/2 and Windows (where there are few differences between slash and backslash); allowed in Unix filenames, see Note 1 expression = expression.Replace(@"\", newCharacter); // vertical bar or pipe : designates software pipelining in Unix and Windows; allowed in Unix filenames, see Note 1 expression = expression.Replace(@"|", newCharacter); // question mark : used as a wildcard in Unix, Windows and AmigaOS; marks a single character. Allowed in Unix filenames, see Note 1 expression = expression.Replace(@"?", newCharacter); expression = expression.Replace(@"!", newCharacter); // asterisk or star : used as a wildcard in Unix, MS-DOS, RT-11, VMS and Windows. Marks any sequence of characters // (Unix, Windows, later versions of MS-DOS) or any sequence of characters in either the basename or extension // (thus "*.*" in early versions of MS-DOS means "all files". Allowed in Unix filenames, see note 1 expression = expression.Replace(@"*", newCharacter); // percent : used as a wildcard in RT-11; marks a single character. expression = expression.Replace(@"%", newCharacter); // period or dot : allowed but the last occurrence will be interpreted to be the extension separator in VMS, MS-DOS and Windows. // In other OSes, usually considered as part of the filename, and more than one period (full stop) may be allowed. // In Unix, a leading period means the file or folder is normally hidden. expression = expression.Replace(@".", newCharacter); // space : allowed (apart MS-DOS) but the space is also used as a parameter separator in command line applications. // This can be solved by quoting, but typing quotes around the name every time is inconvenient. //expression = expression.Replace(@"%", " "); expression = expression.Replace(@" ", newCharacter); if (removeSpecialLettersHavingASign) { // Because then issues to zip // More at : http://www.thesauruslex.com/typo/eng/enghtml.htm expression = expression.Replace(@"ê", "e"); expression = expression.Replace(@"ë", "e"); expression = expression.Replace(@"ï", "i"); expression = expression.Replace(@"œ", "oe"); } return expression; }

Step 2: Check any invalid characters not yet removed.

A an extra verification step, I use the Path.GetInvalidPathChars() method posted above to detect any potential invalid characters not yet removed.

public static bool ContainsAnyInvalidCharacters(this string path) { return (!string.IsNullOrEmpty(path) && path.IndexOfAny(Path.GetInvalidPathChars()) >= 0); }

Step 3: Clean any special characters detected in Step 2.

And finally, I use this method as final step to clean anything left. (from How to remove illegal characters from path and filenames?):

public static string RemoveSpecialCharactersUsingFrameworkMethod(this string path) { return Path.GetInvalidFileNameChars().Aggregate(path, (current, c) => current.Replace(c.ToString(), string.Empty)); }

I log any invalid character not cleaned in the first step. I choose to go that way to improve my custom method as soon as a 'leak' is detected. I can't rely on the Path.GetInvalidFileNameChars() because of the following statement a reported above (from MSDN):

"The array returned from this method is not guaranteed to contain the complete set of characters that are invalid in file and directory names. "

It may not be the ideal solution, but given the context of my application and the level of reliability required, this is the best solution I found.

In the part regarding replacing double spaces to single space, shouldn't we do a loop to continuously replace all double spaces with single space, until there is no double space left? " " will become " ", which should ideally become " ".

ProgrammingLlama · Accepted Answer · 2019-11-26 04:34:58Z

I recommend using a HashSet for this to increase efficiency:

private static HashSet<char> _invalidCharacters = new HashSet<char>(Path.GetInvalidPathChars());

Then you can simply check that the string isn't null/empty and that there aren't any invalid characters:

public static bool IsPathValid(string filePath) { return !string.IsNullOrEmpty(filePath) && !filePath.Any(pc => _invalidCharacters.Contains(pc)); }

Try it online

Since the number of invalid characters is typically very finite (~40), iterating over it will probably not significantly impact efficiency, especially compared to the I/O operations that are presumably involved when dealing with file names.

Gogu CelMare · Accepted Answer · 2020-07-08 00:35:41Z

Simple and as correct as it can be considering MS documentation:

bool IsPathValid(String path) { for (int i = 0; i < path.Length; ++i) if (Path.GetInvalidFileNameChars().Contains(path[i])) return false return true; }

Rick Strahl · Accepted Answer · 2020-11-04 22:23:18Z

Just for reference the framework has internal methods that do this - but unfortunately they are marked internal.

For reference here are the relevant bits, which are similar to the accepted answer here.

internal static bool HasIllegalCharacters(string path, bool checkAdditional = false) => (AppContextSwitches.UseLegacyPathHandling || !PathInternal.IsDevice(path)) && PathInternal.AnyPathHasIllegalCharacters(path, checkAdditional); internal static bool AnyPathHasIllegalCharacters(string path, bool checkAdditional = false) { if (path.IndexOfAny(PathInternal.InvalidPathChars) >= 0) return true; return checkAdditional && PathInternal.AnyPathHasWildCardCharacters(path); } internal static bool HasWildCardCharacters(string path) { int startIndex = AppContextSwitches.UseLegacyPathHandling ? 0 : (PathInternal.IsDevice(path) ? "\\\\?\\".Length : 0); return PathInternal.AnyPathHasWildCardCharacters(path, startIndex); } internal static bool AnyPathHasWildCardCharacters(string path, int startIndex = 0) { for (int index = startIndex; index < path.Length; ++index) { switch (path[index]) { case '*': case '?': return true; default: continue; } } return false; }

rattler · Accepted Answer · 2018-07-04 15:01:11Z

I'm also too late. But if the task is to validate if user entered something valid as path, there is a combined solution for paths.

Path.GetInvalidFileNameChars() returns list of characters illegal for file, but the directory follows the file's rules except the separators (which we could get from system) and the root specifier (C:, we can just remove it from search). Yes, Path.GetInvalidFileNameChars() returns not the complete set, but it is better than try to find all of them manually.

So:

private static bool CheckInvalidPath(string targetDir) { string root; try { root = Path.GetPathRoot(targetDir); } catch { // the path is definitely invalid if it has crashed return false; } // of course it is better to cache it as it creates // new array on each call char[] chars = Path.GetInvalidFileNameChars(); // ignore root for (int i = root.Length; i < targetDir.Length; i++) { char c = targetDir[i]; // separators are allowed if (c == Path.DirectorySeparatorChar || c == Path.AltDirectorySeparatorChar) continue; // check for illegal chars for (int j = 0; j < chars.Length; j++) if (c == chars[j]) return false; } return true; }

I've found that methods like Path.GetFileName will not crash for paths like C:\* (which is completely invalid) and even exception-based check is not enough. The only thing which will crash the Path.GetPathRoot is invalid root (like CC:\someDir). So everything other should be done manually.

Collectives™ on Stack Overflow

How do I check for illegal characters in a path?

9 Answers 9

5 Comments

1 Comment

1 Comment

Comments

1 Comment

1 Comment

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

5 Comments

1 Comment

1 Comment

Comments

1 Comment

1 Comment

Comments

Comments

Comments

Linked

Related