Fetching multiple urls with aiohttp in Python 3.5

Question

Since Python 3.5 introduced async with the syntax recommended in the docs for aiohttp has changed. Now to get a single url they suggest:

import aiohttp import asyncio async def fetch(session, url): with aiohttp.Timeout(10): async with session.get(url) as response: return await response.text() if __name__ == '__main__': loop = asyncio.get_event_loop() with aiohttp.ClientSession(loop=loop) as session: html = loop.run_until_complete( fetch(session, 'http://python.org')) print(html)

How can I modify this to fetch a collection of urls instead of just one url?

In the old asyncio examples you would set up a list of tasks such as

 tasks = [ fetch(session, 'http://cnn.com'), fetch(session, 'http://google.com'), fetch(session, 'http://twitter.com') ]

I tried to combine a list like this with the approach above but failed.

@AndrewSvetlov Wonderful to hear from you. What I mean is I could not understand how to do it. When I define a list of tasks then use results = loop.run_until_complete(tasks) I get a runtime error. async with is such a new feature with so little literature that it would be super convenient for people learning to use it if the aiohttp doc showed an example of grabbing more than one url. The library looks terrific, just needing a bit of hand-holding to get started. Thank you! — Hans Schindler
– Hans Schindler, Commented Mar 9, 2016 at 18:22

Lord Elrond · Accepted Answer · 2020-01-19 09:32:13Z

50

For parallel execution you need an asyncio.Task

I've converted your example to concurrent data fetching from several sources:

import aiohttp import asyncio async def fetch(session, url): async with session.get(url) as response: if response.status != 200: response.raise_for_status() return await response.text() async def fetch_all(session, urls): tasks = [] for url in urls: task = asyncio.create_task(fetch(session, url)) tasks.append(task) results = await asyncio.gather(*tasks) return results async def main(): urls = ['http://cnn.com', 'http://google.com', 'http://twitter.com'] async with aiohttp.ClientSession() as session: htmls = await fetch_all(session, urls) print(htmls) if __name__ == '__main__': asyncio.run(main())

edited Jan 19, 2020 at 9:32

Lord Elrond

16.5k8 gold badges54 silver badges92 bronze badges

answered Mar 9, 2016 at 19:07

Andrew Svetlov

17.5k8 gold badges70 silver badges72 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Hans Schindler Over a year ago

Thanks a million! Accepting your answer, but... 1. there's still a typo with the placement of the parenthesis. I'll edit it if you don't mind. 2. It looks to me like to print the actual result the line print(html) is deceiving and you actually need something like print('\n'.join(list((str(some_task._result) for some_tuple in html for some_task in some_tuple)))), maybe that could be added to the answer? 3. This seems really useful, I'd recommend adding something like this to readthedocs. Thanks again! :)

Hans Schindler Over a year ago

Andrew, where can I put a test like if response.status == 200? If one url does not exist, the script breaks, and I am not understanding where to check the response within the async with session.get(url) as response: return await response.text()

Hans Schindler Over a year ago

Thank you to the other person who left a comment before. I have started a new question to clarify this.

Andrew Svetlov Over a year ago

Updated to aiohttp 3.x and python 3.7 usage

n1_ Over a year ago

the asyncio.create_task part is not necessary. Task is created from the coroutine in 3.7.

|

Collectives™ on Stack Overflow

Fetching multiple urls with aiohttp in Python 3.5

1 Answer 1

10 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

10 Comments

Linked

Related