Because your 'broken' filename doesn't actually contain \ characters, you cannot replace those characters either. You have a ASCII 9 TAB character, not the two separate characters \ and t:
>>> len('\t') 1 >>> '\' in '\t' False
You'd have to try and 'repair' the broken string; this is not going to be foolproof, but you can create a replacement table to handle the common escape sequences. For filenames, which generally don't handle carriage returns, tabs or formfeed characters anyway, that's perfectly feasible.
Python string literals only support a limited number of one-letter \ escape sequences; see the Python string literal documentation:
\a ASCII Bell (BEL) \b ASCII Backspace (BS) \f ASCII Formfeed (FF) \n ASCII Linefeed (LF) \r ASCII Carriage Return (CR) \t ASCII Horizontal Tab (TAB) \v ASCII Vertical Tab (VT)
I've omitted the multi-character sequences as these tend to error out when defining the literal. Simply replace these characters with the escaped sequence:
mapping = {'\a': r'\a', '\b': r'\b', '\f': r'\f', '\n': r'\n', '\r': r'\r', '\t': r'\t', '\v': r'\v'} for char, escaped in mapping.items(): filename = filename.replace(char, escaped)
Alternatively, we could map these characters with the 'string_escape' codec:
>>> '\t'.encode('string_escape') '\\t'
You cannot apply this to the whole string as that would double up any properly escaped backslash however. Moreover, for many of the escape codes above, it'll use the \xhh escape sequence instead:
>>> '\a'.encode('string_escape') '\\x07'
so this method isn't that suitable for your needs.
For characters encoded with \xhh, these are much harder to repair. Windows filesystems support Unicode codepoints just fine, for example. If you make the assumption than only ASCII codepoints are used then it becomes easier. You could use a regular expression to replace these with their 'escaped' version:
import re filename = re.sub(r'[\x80-\xff]', lambda m: m.group().encode('string_escape'), filename)
This changes any byte outside of the ASCII range into a escape sequence:
>>> import re >>> re.sub(r'[\x80-\xff]', lambda m: m.group().encode('string_escape'), '\xc0') '\\xc0'
With a carefully chosen range of characters, the above could also be applied to all non-printable ASCII characters, and repair most such broken filenames with one expression, provided we first apply the above mapping to replace codes that are not handled correctly by 'string_escape':
def repair_filename(filename): mapping = {'\a': r'\a', '\b': r'\b', '\f': r'\f', '\v': r'\v'} for char, escaped in mapping.items(): filename = filename.replace(char, escaped) filename = re.sub(r'[\x00-\x1f\x7f-\xff]', lambda m: m.group().encode('string_escape'), filename) return filename
Demo on your sample input:
>>> def repair_filename(filename): ... mapping = {'\a': r'\a', '\b': r'\b', '\f': r'\f', '\v': r'\v'} ... for char, escaped in mapping.items(): ... filename = filename.replace(char, escaped) ... filename = re.sub(r'[\x00-\x1f\x7f-\xff]', ... lambda m: m.group().encode('string_escape'), ... filename) ... return filename ... >>> filename = 'D:\tdx\vipdoc\szf10\300383.Txt' >>> repair_filename(filename) 'D:\\tdx\\vipdoc\\szf10\\xc0383.Txt'
This should fixe most such broken filenames for you. It won't repair \x09 for example, because that's replaced by \\t as well.
It also cannot detect octal escape codes, nor repair them. Note that \300 was repaired as \xc0. This would require trial and error runs, trying out all possible combinations, or making assumptions about the input. You could assume \xhh never occurs but \ooo does, for example.
In that case the expression becomes:
filename = re.sub(r'[\x00-\x1f\x7f-\xff]', lambda m: '\\{:o}'.format(ord(m.group())), filename)
Demo:
>>> def repair_filename(filename): ... mapping = {'\a': r'\a', '\b': r'\b', '\f': r'\f', '\v': r'\v'} ... for char, escaped in mapping.items(): ... filename = filename.replace(char, escaped) ... filename = re.sub(r'[\x00-\x1f\x7f-\xff]', ... lambda m: '\\{:o}'.format(ord(m.group())), ... filename) ... return filename ... >>> repair_filename(filename) 'D:\\11dx\\vipdoc\\szf10\\300383.Txt'
What works and doesn't depends a great deal on what kind of filenames you were expecting. More can be done if you know that the final part of the filename always ends with a 6 digit number, for example.
Best, however, would be to avoid corrupting the filenames altogether, of course.
\\etc?os.path.joinwherever necessary.