python httplib/urllib get filename

Question

is there a possibillity to get the filename

e.g. xyz.com/blafoo/showall.html

if you work with urllib or httplib?

so that i can save the file under the filename on the server?

if you go to sites like

xyz.com/blafoo/

you cant see the filename.

Thank you

possible duplicate of urllib2 file name

KevinDTimm
– KevinDTimm

2012-08-02 18:11:48 +00:00
Commented Aug 2, 2012 at 18:11 — KevinDTimm
– KevinDTimm, Commented Aug 2, 2012 at 18:11

jfs · Accepted Answer · 2012-08-02 18:29:05Z

31

To get filename from response http headers:

import cgi response = urllib2.urlopen(URL) _, params = cgi.parse_header(response.headers.get('Content-Disposition', '')) filename = params['filename']

To get filename from the URL:

import posixpath import urlparse path = urlparse.urlsplit(URL).path filename = posixpath.basename(path)

edited Aug 2, 2012 at 18:29

answered Aug 2, 2012 at 18:09

jfs

417k210 gold badges1k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Jorge Vargas Over a year ago

Great answer, one tiny fix. Using os.path.basename(path) is a cross platform way of doing this.

jfs Over a year ago

@JorgeVargas: no. posixpath is the correct module here. Moreover it would be a mistake to use os.path here. If you can't figure out "why", ask, I'll elaborate.

Karl M. Davis Over a year ago

I'll ask: why should one use posixpath?

jfs Over a year ago

@KarlM.Davis: urls use '/' in their path segment. os.path on Windows may use '\\' that is not appropriate for urls as pathname separator. posixpath uses '/'.

the21st · Accepted Answer · 2019-04-18 11:27:55Z

Use urllib.request.Request:

import urllib req = urllib.request.Request(url, method='HEAD') r = urllib.request.urlopen(req) print(r.info().get_filename())

Example :

In[1]: urllib.request.urlopen(urllib.request.Request('https://httpbin.org/response-headers?content-disposition=%20attachment%3Bfilename%3D%22example.csv%22', method='HEAD')).info().get_filename() Out[1]: 'example.csv'

user2665694 · Accepted Answer · 2012-08-02 18:09:25Z

Does not make much sense what you are asking. The only thing that you have is the URL. Either extract the last part from the URL or you may check the HTTP response for something like

content-disposition: attachment;filename="foo.bar"

This header can be set by the server to indicate that the filename is foo.bar. This is usually used for file downloads or something similar.

Community · Accepted Answer · 2017-05-23 12:17:02Z

I searched for you question on google and I saw that it was answered in stackoverflow before I believe.

Try looking at this post:

Using urllib2 in Python. How do I get the name of the file I am downloading?

The filename is usually included by the server through the content-disposition header:
content-disposition: attachment; filename=foo.pdf 
You have access to the headers through
result = urllib2.urlopen(...) result.info() <- contains the headers i>>> import urllib2 ur>>> result = urllib2.urlopen('http://zopyx.com') >>> print result <addinfourl at 4302289808 whose fp = <socket._fileobject object at 0x1006dd5d0>> >>> result.info() <httplib.HTTPMessage instance at 0x1006fbab8> >>> result.info().headers ['Date: Mon, 04 Apr 2011 02:08:28 GMT\r\n', 'Server: Zope/(unreleased version, python 2.4.6, linux2) ZServer/1.1 
Plone/3.3.4\r\n', 'Content-Length: 15321\r\n', 'Content-Type: text/html; charset=utf-8\r\n', 'Via: 1.1 www.zopyx.com\r\n', 'Cache-Control: max-age=3600\r\n', 'Expires: Mon, 04 Apr 2011 03:08:28 GMT\r\n', 'Connection: close\r\n']

See

http://docs.python.org/library/urllib2.html

Collectives™ on Stack Overflow

python httplib/urllib get filename

4 Answers 4

4 Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

4 Comments

Comments

Comments

Comments

Linked

Related