2

My code is working in this way but it's speed is very slow because of for loops, can you help me, to make it work with aiohttp, asyncio?

def field_info(field_link): response = requests.get(field_link) soup = BeautifulSoup(response.text, 'html.parser') races = soup.findAll('header', {'class': 'dc-field-header'}) tables = soup.findAll('table', {'class': 'dc-field-comp'}) for i in range(len(races)): race_name = races[i].find('h3').text race_time = races[i].find('time').text names = tables[i].findAll('span', {'class': 'title'}) trainers = tables[i].findAll('span', {'class': 'trainer'}) table = [] for j in range(len(names)): table.append({ 'Name': names[j].text, 'Trainer': trainers[j].text, }) return { 'RaceName': race_name, 'RaceTime': race_time, 'Table': table } links = [link1, link2, link3] for link in links: scraped_info += field_info(link) 
2
  • 1
    Why? Neither asyncio nor aiohttp will give your code magic parallelism, nor will they speed up CPU-bound tasks. They're meant for asynchronous programming. Commented Jun 10, 2019 at 15:14
  • This is unrelated to your question, but instead of using range(len(names)), you can use for name, trainer in zip(names, trainers) and avoid the index lookups inside the loop. Commented Jun 10, 2019 at 15:57

1 Answer 1

3

1) Create a coroutine to make request asynchronously:

import asyncio import aiohttp async def get_text(url): async with aiohttp.ClientSession() as session: async with session.get(url) as resp: return await resp.text() 

2) Replace all synchronious requests with awaiting for this coroutine, making outer functions coroutines also:

async def field_info(field_link): # async - to make outer function coroutine text = await get_text(field_link) # await - to get result from async funcion soup = BeautifulSoup(text, 'html.parser') 

3) Make outer code to do jobs concurrently using asyncio.gather():

async def main(): links = [link1, link2, link3] scraped_info = asyncio.gather(*[ field_info(link) for link in links ]) # do multiple field_info coroutines concurrently (parallely) 

4) Pass top-level coroutine to asyncio.run():

asyncio.run(main()) 
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for your step by step answer, it really helped me to understand how this stuff is working.
@paskh you're welcome! You may be also interested in reading this answer - it's about how asyncio works and how to use it in general.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.