Use python requests to download CSV

Question

Here is my code:

import csv import requests with requests.Session() as s: s.post(url, data=payload) download = s.get('url that directly download a csv report')

This gives me the access to the csv file. I tried different method to deal with the download:

This will give the the csv file in one string:

print download.content

This print the first row and return error: _csv.Error: new-line character seen in unquoted field

cr = csv.reader(download, dialect=csv.excel_tab) for row in cr: print row

This will print a letter in each row and it won't print the whole thing:

cr = csv.reader(download.content, dialect=csv.excel_tab) for row in cr: print row

My question is: what's the most efficient way to read a csv file in this situation. And how to download it.

thanks

Community · Accepted Answer · 2017-05-23 12:26:32Z

This should help:

import csv import requests CSV_URL = 'http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv' with requests.Session() as s: download = s.get(CSV_URL) decoded_content = download.content.decode('utf-8') cr = csv.reader(decoded_content.splitlines(), delimiter=',') my_list = list(cr) for row in my_list: print(row)

Ouput sample:

['street', 'city', 'zip', 'state', 'beds', 'baths', 'sq__ft', 'type', 'sale_date', 'price', 'latitude', 'longitude'] ['3526 HIGH ST', 'SACRAMENTO', '95838', 'CA', '2', '1', '836', 'Residential', 'Wed May 21 00:00:00 EDT 2008', '59222', '38.631913', '-121.434879'] ['51 OMAHA CT', 'SACRAMENTO', '95823', 'CA', '3', '1', '1167', 'Residential', 'Wed May 21 00:00:00 EDT 2008', '68212', '38.478902', '-121.431028'] ['2796 BRANCH ST', 'SACRAMENTO', '95815', 'CA', '2', '1', '796', 'Residential', 'Wed May 21 00:00:00 EDT 2008', '68880', '38.618305', '-121.443839'] ['2805 JANETTE WAY', 'SACRAMENTO', '95815', 'CA', '2', '1', '852', 'Residential', 'Wed May 21 00:00:00 EDT 2008', '69307', '38.616835', '-121.439146'] [...]

Related question with answer: https://stackoverflow.com/a/33079644/295246

Edit: Other answers are useful if you need to download large files (i.e. stream=True).

Is it necessary to read the entire thing into memory? This seems non-scaleable.
I notice this doesn't always maintain structure of csv. I'm downloading a csv with a Notes column and the notes have newlines in them. Using this solution, all newlines are ignored and its not ideal

adnauseam · Accepted Answer · 2021-07-10 22:38:00Z

To simplify these answers, and increase performance when downloading a large file, the below may work a bit more efficiently.

import requests from contextlib import closing import csv from codecs import iterdecode url = "http://download-and-process-csv-efficiently/python.csv" with closing(requests.get(url, stream=True)) as r: reader = iterdecode(csv.reader(r.iter_lines(), 'utf-8'), delimiter=',', quotechar='"') for row in reader: print(row)

By setting stream=True in the GET request, when we pass r.iter_lines() to csv.reader(), we are passing a generator to csv.reader(). By doing so, we enable csv.reader() to lazily iterate over each line in the response with for row in reader.

This avoids loading the entire file into memory before we start processing it, drastically reducing memory overhead for large files.

I had to also import codecs and wrap the r.iter_lines() within codecs.iterdecode() like so: codecs.iterdecode(r.iterlines(), 'utf-8') ... in order to solve byte vs str issues, unicode decoding problems and universal new line problems.
Thanks @IrvinH. , I ran into the same problem. btw it should be r.iter_lines() you missed the underscore.
On Python 3.7 this results in: error: iterator should return strings, not bytes (did you open the file in text mode?)

wescpy · Accepted Answer · 2021-08-24 22:24:32Z

I like the answers from The Aelfinn and aheld. I can improve them only by shortening a bit more, removing superfluous pieces, using a real data source, making it 2.x & 3.x-compatible, and maintaining the high-level of memory-efficiency seen elsewhere:

import csv import requests CSV_URL = 'http://web.cs.wpi.edu/~cs1004/a16/Resources/SacramentoRealEstateTransactions.csv' with requests.get(CSV_URL, stream=True) as r: lines = (line.decode('utf-8') for line in r.iter_lines()) for row in csv.reader(lines): print(row)

Too bad 3.x is less flexible CSV-wise because the iterator must emit Unicode strings (while requests does bytes) while the 2.x-only version—for row in csv.reader(r.iter_lines()):—is more Pythonic (shorter and easier-to-read). Anyhow, note the 2.x/3.x solution above won't handle the situation described by the OP where a NEWLINE is found unquoted in the data read.

For the part of the OP's question regarding downloading (vs. processing) the actual CSV file, here's another script that does that, 2.x & 3.x-compatible, minimal, readable, and memory-efficient:

import os import requests CSV_URL = 'http://web.cs.wpi.edu/~cs1004/a16/Resources/SacramentoRealEstateTransactions.csv' with open(os.path.split(CSV_URL)[1], 'wb') as f, \ requests.get(CSV_URL, stream=True) as r: for line in r.iter_lines(): f.write(line+'\n'.encode())

To support the widest audience, it should work with all currently-deployed versions of Python, not just the latest... thx though! :-) (min version is 2.6)
For downloading, I think most users will want f.write(line + '\n'.encode()) - currently your example writes one enormous line which won't be easily loaded back into memory by a CSV reader
Thx for spotting that. I normally try to take advantage of the "free" \n that comes w/text files but neglected to recall that respectable libraries drop those for the purpose of data-scrubbing, requiring us to add them back when creating our own files w/such data.

Antti Haapala · Accepted Answer · 2017-01-24 07:16:51Z

10

You can also use the DictReader to iterate dictionaries of {'columnname': 'value', ...}

import csv import requests response = requests.get('http://example.test/foo.csv') reader = csv.DictReader(response.iter_lines()) for record in reader: print(record)

edited Jan 24, 2017 at 7:16

answered Jan 24, 2017 at 5:48

Antti Haapala

135k23 gold badges297 silver badges349 bronze badges

2 Comments

wieczorek1990 Over a year ago

_csv.Error: iterator should return strings, not bytes (the file should be opened in text mode)

bfontaine Over a year ago

This should be a comment, not a separate answer.

Michal Skop · Accepted Answer · 2019-03-05 00:45:04Z

I use this code (I use Python 3):

import csv import io import requests url = "http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv" r = requests.get(url) r.encoding = 'utf-8' # useful if encoding is not sent (or not sent properly) by the server csvio = io.StringIO(r.text, newline="") data = [] for row in csv.DictReader(csvio): data.append(row)

Thanks. I just had to change the encoding to "utf-8-sig" and it worked.

Binyamin Even · Accepted Answer · 2021-12-23 06:56:49Z

To convert to Pandas DataFrame:

from io import StringIO text=StringIO(download.content.decode('utf-8')) df=pd.read_csv(text)

kagami · Accepted Answer · 2016-02-12 20:31:45Z

From a little search, that I understand the file should be opened in universal newline mode, which you cannot directly do with a response content (I guess).

To finish the task, you can either save the downloaded content to a temporary file, or process it in memory.

Save as file:

import requests import csv import os temp_file_name = 'temp_csv.csv' url = 'http://url.to/file.csv' download = requests.get(url) with open(temp_file_name, 'w') as temp_file: temp_file.writelines(download.content) with open(temp_file_name, 'rU') as temp_file: csv_reader = csv.reader(temp_file, dialect=csv.excel_tab) for line in csv_reader: print line # delete the temp file after process os.remove(temp_file_name)

In memory:

(To be updated)

aheld · Accepted Answer · 2016-05-23 17:15:05Z

You can update the accepted answer with the iter_lines method of requests if the file is very large

import csv import requests CSV_URL = 'http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv' with requests.Session() as s: download = s.get(CSV_URL) line_iterator = (x.decode('utf-8') for x in download.iter_lines(decode_unicode=True)) cr = csv.reader(line_iterator, delimiter=',') my_list = list(cr) for row in my_list: print(row)

abbas abaei · Accepted Answer · 2022-02-02 11:43:26Z

3

I used below solution (Unfortunately others did not work for me):

import pandas as pd df = pd.read_csv('http://.../file.csv')

answered Feb 2, 2022 at 11:43

abbas abaei

733 gold badges3 silver badges8 bronze badges

1 Comment

Yunnosch Over a year ago

While this code may solve the question, including an explanation of how and why this solves the problem would really help to improve the quality of your post, and probably result in more up-votes. Remember that you are answering the question for readers in the future, not just the person asking now. Please edit your answer to add explanations and give an indication of what limitations and assumptions apply.

aamir23 · Accepted Answer · 2018-12-15 21:12:12Z

The following approach worked well for me. I also did not need to use csv.reader() or csv.writer() functions, which I feel makes the code cleaner. The code is compatible with Python2 and Python 3.

from six.moves import urllib DOWNLOAD_URL = "https://raw.githubusercontent.com/gjreda/gregreda.com/master/content/notebooks/data/city-of-chicago-salaries.csv" DOWNLOAD_PATH ="datasets\city-of-chicago-salaries.csv" urllib.request.urlretrieve(URL,DOWNLOAD_PATH)

Note - six is a package that helps in writing code that is compatible with both Python 2 and Python 3. For additional details regarding six see - What does from six.moves import urllib do in Python?

Hassan Anwer · Accepted Answer · 2021-01-18 12:16:06Z

Python3 Supported Code

 with closing(requests.get(PHISHTANK_URL, stream=True})) as r: reader = csv.reader(codecs.iterdecode(r.iter_lines(), 'utf-8'), delimiter=',', quotechar='"') for record in reader: print (record)

LucianoBAF · Accepted Answer · 2023-03-07 20:09:37Z

Python 3.x:

For anyone trying the solutions above, if you are getting a \ufeff at every new line, just change the encoding from utf-8 to utf-8-sig.

If you only want the CSV in string format, simply access the .text property of the request, like this:

req = requests.get("https: ...") req.encoding = 'utf-8-sig' csv_data = req.text

Justin S · Accepted Answer · 2021-02-16 21:46:53Z

this worked nicely for me:

from csv import DictReader f = requests.get('https://somedomain.com/file').content.decode('utf-8') reader = DictReader(f.split('\n')) csv_dict_list = list(reader)

Sy.Yah · Accepted Answer · 2023-04-26 03:35:10Z

There are 2 ways to handle the csv data

To download csv data into a DataFrame

url = 'https://cdn.wsform.com/wp-content/uploads/2020/06/color_srgb.csv' import pandas as pd df = pd.read_csv(url) print(df.head())

You should be able to see the output:

 Name HEX RGB 0 White #FFFFFF rgb(100,100,100) 1 Silver #C0C0C0 rgb(75,75,75) 2 Gray #808080 rgb(50,50,50) 3 Black #000000 rgb(0,0,0) 4 Red #FF0000 rgb(100,0,0)

To download csv data into a csv file

import requests import os resp = requests.get(url) with open('temp.csv', 'w') as f: f.write(resp.content.decode('utf-8')) print('File saved in:', os.getcwd())

You should be able to see the saved file in the current working directory (use os.getcwd() to see the current working directory)

Collectives™ on Stack Overflow

Use python requests to download CSV

14 Answers 14

3 Comments

3 Comments

4 Comments

2 Comments

1 Comment

Comments

Comments

Comments

1 Comment

Comments

Comments

Comments

1 Comment

Comments

Linked

Hot Network Questions