how to escape the whole string in python?

Question

file=r'D:\tdx\vipdoc\szf10\300383.Txt' text=open(file,"r").read()

The file can be read, but at first I wrote file as:

file='D:\tdx\vipdoc\szf10\300383.Txt'

I can't read it as text=open(file,"r").read()

Traceback (most recent call last): File "<stdin>", line 1, in <module> OSError: [Errno 22] Invalid argument: 'D:\tdx\x0bipdoc\\szf10\xc0383.Txt'

What can I do in the case of not using file=r'D:\tdx\vipdoc\szf10\300383.Txt'?

Maybe I have to escape the whole string of file with some method?

The problem is: file was defined at the start, file is a variable containing a string now, I can only call it in the program, how can I repair it in the program.

method 1: file=r'D:\tdx\vipdoc\szf10\300383.Txt' can not be used.
method 2: file='D:\\tdx\\vipdoc\\szf10\\300383.Txt' can not be used either.

When the program is already running, given file is a string variable, how can I repair it now?

Say file was not a string literal but was passed to my code from another part of the code, which I can not fix to use the correct format but still want to be able to use the filename.

Why can't I replace 'D:\tdx\vipdoc\szf10\300383.Txt' with 'D:\\tdx\\vipdoc\\szf10\\300383.Txt' simply withfile.replace("\","\\")`?

>>> file="D:\tdx\vipdoc\shf10\300383.Txt" >>> file.replace("\x5c","\x5c\x5c") #can't work 'D:\tdx\x0bipdoc\\\\shf10\xc0383.Txt'

I want to cut it into two parts ,failed.

>>> filename = 'D:\tdx\vipdoc\szf10\300383.Txt' >>> re.search('(.*?)(\d+\.Txt)',filename).group(1) 'D:\tdx\x0bipdoc\\szf10\xc0' >>> re.search('(.*?)(\d+\.Txt)',filename).group(2) '383.Txt'

With the help of Martijn Pieters ,i solved it adding '\300':r'\300' in the mapping.

mapping = {'\a': r'\a', '\b': r'\b', '\f': r'\f', '\n': r'\n', '\r': r'\r', '\t': r'\t', '\v': r'\v','\300':r'\300'} filename = 'D:\tdx\vipdoc\szf10\300383.Txt' for char, escaped in mapping.items(): filename = filename.replace(char, escaped)

For sanity, just use forward slashes in every path name and let python figure out what your platform uses for the path separator. — roippi
– roippi, Commented May 23, 2014 at 2:52

Martijn Pieters · Accepted Answer · 2014-05-25 16:08:43Z

Because your 'broken' filename doesn't actually contain \ characters, you cannot replace those characters either. You have a ASCII 9 TAB character, not the two separate characters \ and t:

>>> len('\t') 1 >>> '\' in '\t' False

You'd have to try and 'repair' the broken string; this is not going to be foolproof, but you can create a replacement table to handle the common escape sequences. For filenames, which generally don't handle carriage returns, tabs or formfeed characters anyway, that's perfectly feasible.

Python string literals only support a limited number of one-letter \ escape sequences; see the Python string literal documentation:

\a ASCII Bell (BEL) \b ASCII Backspace (BS) \f ASCII Formfeed (FF) \n ASCII Linefeed (LF) \r ASCII Carriage Return (CR) \t ASCII Horizontal Tab (TAB) \v ASCII Vertical Tab (VT)

I've omitted the multi-character sequences as these tend to error out when defining the literal. Simply replace these characters with the escaped sequence:

mapping = {'\a': r'\a', '\b': r'\b', '\f': r'\f', '\n': r'\n', '\r': r'\r', '\t': r'\t', '\v': r'\v'} for char, escaped in mapping.items(): filename = filename.replace(char, escaped)

Alternatively, we could map these characters with the 'string_escape' codec:

>>> '\t'.encode('string_escape') '\\t'

You cannot apply this to the whole string as that would double up any properly escaped backslash however. Moreover, for many of the escape codes above, it'll use the \xhh escape sequence instead:

>>> '\a'.encode('string_escape') '\\x07'

so this method isn't that suitable for your needs.

For characters encoded with \xhh, these are much harder to repair. Windows filesystems support Unicode codepoints just fine, for example. If you make the assumption than only ASCII codepoints are used then it becomes easier. You could use a regular expression to replace these with their 'escaped' version:

import re filename = re.sub(r'[\x80-\xff]', lambda m: m.group().encode('string_escape'), filename)

This changes any byte outside of the ASCII range into a escape sequence:

>>> import re >>> re.sub(r'[\x80-\xff]', lambda m: m.group().encode('string_escape'), '\xc0') '\\xc0'

With a carefully chosen range of characters, the above could also be applied to all non-printable ASCII characters, and repair most such broken filenames with one expression, provided we first apply the above mapping to replace codes that are not handled correctly by 'string_escape':

def repair_filename(filename): mapping = {'\a': r'\a', '\b': r'\b', '\f': r'\f', '\v': r'\v'} for char, escaped in mapping.items(): filename = filename.replace(char, escaped) filename = re.sub(r'[\x00-\x1f\x7f-\xff]', lambda m: m.group().encode('string_escape'), filename) return filename

Demo on your sample input:

>>> def repair_filename(filename): ... mapping = {'\a': r'\a', '\b': r'\b', '\f': r'\f', '\v': r'\v'} ... for char, escaped in mapping.items(): ... filename = filename.replace(char, escaped) ... filename = re.sub(r'[\x00-\x1f\x7f-\xff]', ... lambda m: m.group().encode('string_escape'), ... filename) ... return filename ... >>> filename = 'D:\tdx\vipdoc\szf10\300383.Txt' >>> repair_filename(filename) 'D:\\tdx\\vipdoc\\szf10\\xc0383.Txt'

This should fixe most such broken filenames for you. It won't repair \x09 for example, because that's replaced by \\t as well.

It also cannot detect octal escape codes, nor repair them. Note that \300 was repaired as \xc0. This would require trial and error runs, trying out all possible combinations, or making assumptions about the input. You could assume \xhh never occurs but \ooo does, for example.

In that case the expression becomes:

filename = re.sub(r'[\x00-\x1f\x7f-\xff]', lambda m: '\\{:o}'.format(ord(m.group())), filename)

Demo:

>>> def repair_filename(filename): ... mapping = {'\a': r'\a', '\b': r'\b', '\f': r'\f', '\v': r'\v'} ... for char, escaped in mapping.items(): ... filename = filename.replace(char, escaped) ... filename = re.sub(r'[\x00-\x1f\x7f-\xff]', ... lambda m: '\\{:o}'.format(ord(m.group())), ... filename) ... return filename ... >>> repair_filename(filename) 'D:\\11dx\\vipdoc\\szf10\\300383.Txt'

What works and doesn't depends a great deal on what kind of filenames you were expecting. More can be done if you know that the final part of the filename always ends with a 6 digit number, for example.

Best, however, would be to avoid corrupting the filenames altogether, of course.

D:\tdx\vipdoc\szf10\300383.Txt was changed into D:\\tdx\\vipdoc\\szf10\xc0383.Txt,how to make it D:\\tdx\\vipdoc\\szf10\\300383.Txt`?
@it_is_a_literature: the octal escape code is not easy to recover, that'd require trying out all possible combinations and seeing if the file exists.
>>> filename = 'D:\tdx\vipdoc\szf10\300383.Txt' >>> re.search('(.*?)(\d+\.Txt)',filename).group(1) 'D:\tdx\x0bipdoc\\szf10\xc0' >>> re.search('(.*?)(\d+\.Txt)',filename).group(2) '383.Txt'

Amber · Accepted Answer · 2014-05-23 02:43:15Z

If you use '' instead of r'' you need to manually escape each backslash in your string literal:

filename = 'D:\\tdx\\vipdoc\\szf10\\300383.Txt'

Using r'' is simpler because it disables interpreting \ as an escape character, and thus \ itself doesn't have to be escaped when you just want it there as a literal slash.

Please also add that on Windows since forever you may also use forward slashes.

metatoaster · Accepted Answer · 2014-05-23 02:46:25Z

You generally can't because for instance 'D:\tdx', the \t is interpreted as a tab character. You however can try to convert the escaped chars into something that resembles the original string but that's way more work than writing that file name properly in the first place.

AMADANON Inc. · Accepted Answer · 2014-05-23 02:51:02Z

I think, if you originally write it using the unescaped version, you will have some special characters in the filename. It will also be in the directory that the script originally ran in.

\t will come out as a tab character, \v as a vertical tab, \s is ok, and \300 as a high ascii character.

I suggest you run the following command in python:

import shutil shutil.move('D:\tdx\vipdoc\szf10\300383.Txt',r'D:\tdx\vipdoc\szf10\300383.Txt')

Make sure you run it in the same directory that the script originally ran in. this should place it where you expect, with the file name you expect.

From then on, you can use the correct version.

no ,you can't do it.filename='D:\tdx\vipdoc\szf10\300383.Txt' import shutil shutil.move(filename,r'filename')
If you run the above commands exactly, that will rename the file to a valid filename; after that you can use filename=r'D:\tdx\vipdoc\szf10\300383.Txt'

whereswalden · Accepted Answer · 2014-05-29 21:08:10Z

0

You don't need to do anything complicated, Python has built-in tools for dealing with this sort of problem, in particular, os.path.normpath. See this article for the implementation details.

answered May 29, 2014 at 21:08

whereswalden

5,0193 gold badges30 silver badges42 bronze badges

2 Comments

showkey Over a year ago

>>> import os >>> file="D:\tdx\vipdoc\shf10\300383.Txt" >>> os.path.normpath(file) 'D:\tdx\x0bipdoc\\shf10\xc0383.Txt',you can't get it.

whereswalden Over a year ago

@it_is_a_literature: Did you read the article? You have to specify the filename with forward slashes: e.g. file = os.path.normpath("D:/tdx/vipdoc/shf10/300383.Txt")

Collectives™ on Stack Overflow

how to escape the whole string in python?

5 Answers 5

12 Comments

1 Comment

Comments

2 Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

12 Comments

1 Comment

Comments

2 Comments

2 Comments

Linked

Related