23

Since Python 3.5 introduced async with the syntax recommended in the docs for aiohttp has changed. Now to get a single url they suggest:

import aiohttp import asyncio async def fetch(session, url): with aiohttp.Timeout(10): async with session.get(url) as response: return await response.text() if __name__ == '__main__': loop = asyncio.get_event_loop() with aiohttp.ClientSession(loop=loop) as session: html = loop.run_until_complete( fetch(session, 'http://python.org')) print(html) 

How can I modify this to fetch a collection of urls instead of just one url?

In the old asyncio examples you would set up a list of tasks such as

 tasks = [ fetch(session, 'http://cnn.com'), fetch(session, 'http://google.com'), fetch(session, 'http://twitter.com') ] 

I tried to combine a list like this with the approach above but failed.

2
  • Could you explain what is your fail? Commented Mar 9, 2016 at 17:31
  • @AndrewSvetlov Wonderful to hear from you. What I mean is I could not understand how to do it. When I define a list of tasks then use results = loop.run_until_complete(tasks) I get a runtime error. async with is such a new feature with so little literature that it would be super convenient for people learning to use it if the aiohttp doc showed an example of grabbing more than one url. The library looks terrific, just needing a bit of hand-holding to get started. Thank you! Commented Mar 9, 2016 at 18:22

1 Answer 1

50

For parallel execution you need an asyncio.Task

I've converted your example to concurrent data fetching from several sources:

import aiohttp import asyncio async def fetch(session, url): async with session.get(url) as response: if response.status != 200: response.raise_for_status() return await response.text() async def fetch_all(session, urls): tasks = [] for url in urls: task = asyncio.create_task(fetch(session, url)) tasks.append(task) results = await asyncio.gather(*tasks) return results async def main(): urls = ['http://cnn.com', 'http://google.com', 'http://twitter.com'] async with aiohttp.ClientSession() as session: htmls = await fetch_all(session, urls) print(htmls) if __name__ == '__main__': asyncio.run(main()) 
Sign up to request clarification or add additional context in comments.

10 Comments

Thanks a million! Accepting your answer, but... 1. there's still a typo with the placement of the parenthesis. I'll edit it if you don't mind. 2. It looks to me like to print the actual result the line print(html) is deceiving and you actually need something like print('\n'.join(list((str(some_task._result) for some_tuple in html for some_task in some_tuple)))), maybe that could be added to the answer? 3. This seems really useful, I'd recommend adding something like this to readthedocs. Thanks again! :)
Andrew, where can I put a test like if response.status == 200? If one url does not exist, the script breaks, and I am not understanding where to check the response within the async with session.get(url) as response: return await response.text()
Thank you to the other person who left a comment before. I have started a new question to clarify this.
Updated to aiohttp 3.x and python 3.7 usage
the asyncio.create_task part is not necessary. Task is created from the coroutine in 3.7.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.