13

According to RFC, in multipart/form-data content-disposition header filename field receives as parameter HTTP quoted string - string between quites where character '\' can escape any other ascii character.

The problem is, web browsers don't do it.

IE6 sends:

Content-Disposition: form-data; name="file"; filename="z:\tmp\test.txt" 

Instead of expected

Content-Disposition: form-data; name="file"; filename="z:\\tmp\\test.txt" 

Which should be parsed as z:tmptest.txt according to rules instead of z:\tmp\test.txt.

Firefox, Konqueror and Chrome don't escape " characters for example:

Content-Disposition: form-data; name="file"; filename=""test".txt" 

Instead of expected

Content-Disposition: form-data; name="file"; filename="\"test\".txt" 

So... how would you suggest to deal with this issue?

Does Anybody have an idea?

2 Answers 2

5

Though an old thread, adding the below java solution for whoever might be interested.

// import com.sun.xml.internal.messaging.saaj.packaging.mime.internet.*; try { ContentDisposition contentDisposition = new ContentDisposition("attachment; filename=\"myfile.log\"; filename*=UTF-8''myfile.log"); System.out.println(contentDisposition.getParameter("filename")); } catch (ParseException e) { e.printStackTrace(); } 
Sign up to request clarification or add additional context in comments.

3 Comments

Since the question is not particular to Java, an explanation of how this solves the problem would be useful.
Agreed. While looking for the same problem, I even found a thread discussing the regex pattern (stackoverflow.com/a/27226712/3940047). Added this solution as it might help someone in same context. People just google with appropriate keywords and can land here and if they happen to be Java guys, might find it useful.
@PavanKumar totally agree, this is should be a language agnostic solution considering the question didn't mention Java. But as I always say, if you have the option, always use a well defined library for parsing.
2

Is there a reason that you need to parse this filename at all?

At least the one thing that's consistent is that the filename portion of the header ends with a double quote, so you just need to read in everything between filename=" and the final ".

Then you can probably treat any backslash other than \\, \" or \" as a literal backslash, unless you think it's particularly likely that users will be uploading filenames with tabs in them. :)

2 Comments

"Is there a reason that you need to parse this filename at all?" -- yes I want to know the file name ;). "At least the one thing that's consistent is that the filename portion of the header ends with a double quote," The filename and name fields should not come in this specific order, so it is bad idea to suppose that file-name ends with last quotation mark.
Want != need. ;) Ok, so you're at least guaranteed that it'll end with " or with "; -- with this lack of consistency you have to make some concessions, like relying on the fact that users won't put "; in the middle of their file names :) Alternatively, are you using a web framework that supports a best-effort parsing of this attribute for you?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.