How to make this loop asynchronous?

Question

I'm working with Github API right now, and here's a function that fetches all pull requests for each repo in list:

async def get_all_pulls(repos, api): pulls = [] for repo in repos: try: async for pull in api.getiter(f'/repos/{org}/{repo}/pulls?state=all'): pull['repo'] = repo if pull not in pulls: pulls.append(pull) except Exception: print(f"Bad repo/no access=> [{repo}]") continue return pulls

Everything works fine but one little problem, it takes a lot of time because of that iteration over repos(let's say there is 30 of them).

I was trying to make it async like this(sure thing I am getting rid of for loop in declaration when using this):

 # gather all prs for all repos tasks = [asyncio.ensure_future(get_all_pulls_for_repo(api, repo)) for repo in repos] results = await asyncio.gather(*tasks) # unwrap list of lists for res in results: all_pull_requests += res

But I get crashes and saying repos are bad etc. I think I'm missing something important here but can't get what.

Why does it crash with async for loop? And can I make it work?

UPDATE1: Traceback at get_all_reviews:

Traceback (most recent call last): File "/home/metal/Documents/projects/-git/async_git_tool.py", line 193, in <module> loop.run_until_complete(main()) File "/home/metal/.pyenv/versions/3.6.0/lib/python3.6/asyncio/base_events.py", line 466, in run_until_complete return future.result() File "/home/metal/Documents/projects/-git/async_git_tool.py", line 113, in main reviewed = await get_all_reviews(created, api, ss_programmers) File "/home/metal/Documents/projects/-git/async_git_tool.py", line 181, in get_all_reviews async for review in api.getiter(f'/repos/{org}/{pr_repo}/pulls/{pr_number}/reviews'): File "/home/metal/Documents/projects/-git/venv/lib/python3.6/site-packages/gidgethub/abc.py", line 85, in getiter data, more = await self._make_request("GET", url, url_vars, b"", accept) File "/home/metal/Documents/projects/-git/venv/lib/python3.6/site-packages/gidgethub/abc.py", line 66, in _make_request data, self.rate_limit, more = sansio.decipher_response(*response) File "/home/metal/Documents/projects/-git/venv/lib/python3.6/site-packages/gidgethub/sansio.py", line 284, in decipher_response rate_limit = RateLimit.from_http(headers) File "/home/metal/Documents/projects/-git/venv/lib/python3.6/site-packages/gidgethub/sansio.py", line 226, in from_http limit = int(headers["x-ratelimit-limit"]) File "multidict/_multidict.pyx", line 140, in multidict._multidict._Base.__getitem__ File "multidict/_multidict.pyx", line 135, in multidict._multidict._Base._getone KeyError: "Key not found: 'x-ratelimit-limit'"

Here's the funciton itself:

 async def get_all_reviews(pulls, api, programmers): reviewed_pulls = [] for pull in pulls: pr_repo = pull['repo'] pr_number = str(pull['number']) async for review in api.getiter(f'/repos/{org}/{pr_repo}/pulls/{pr_number}/reviews'): if review['user']['login'] not in programmers \ and pull not in reviewed_pulls: reviewed_pulls.append(pull) return reviewed_pulls

and I'm calling it like that:

reviewed = await get_all_reviews(softserve_created, api, ss_programmers)

@MikhailGerasimov I am using _Gidgegethub_(gidgethub.readthedocs.io/en/stable) — Roman
– Roman, Commented Nov 21, 2017 at 18:31

Mikhail Gerasimov · Accepted Answer · 2017-11-21 19:53:54Z

Idea you described worked fine for me:

import asyncio import aiohttp import gidgethub from gidgethub.aiohttp import GitHubAPI # TODO # paste your token to have rate limits # https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/ TOKEN = '...' async def get_all_pulls_for_repo(gh, org, repo): pulls = [] async for pull in gh.getiter(f'/repos/{org}/{repo}/pulls?state=all'): pulls.append(pull) await gh.sleep(0.1) # avoid RateLimitExceeded, you should count it somehow return pulls async def main(): org = 'brettcannon' repos = ['gidgethub', 'caniusepython3', 'importlib_resources'] async with aiohttp.ClientSession() as session: gh = GitHubAPI(session, 'requester', oauth_token=TOKEN) tasks = [ asyncio.ensure_future(get_all_pulls_for_repo(gh, org, repo)) for repo in repos ] results = await asyncio.gather(*tasks) for res in results: for pull in res: print(pull['url']) loop = asyncio.get_event_loop() try: loop.run_until_complete(main()) finally: loop.run_until_complete(loop.shutdown_asyncgens()) loop.close()

Create token for requests, paste it and you'll see list of PR urls.

sleep() worked well for eliminating the issue at get_all_pulls, but I still have a traceback at get_all_reviews, please check updated question at the top, thanks.
plus I've just noticed that if I'm using non-async for repo in repos I'm getting a total of 9983 pull requests and if async - It's only 7k
@Roman, error you get is not related to your code, it's a bug in gidgethub - github.com/brettcannon/gidgethub/issues/25 I think number of PRs you retrieve is also related to how gidgethub works.

Collectives™ on Stack Overflow

How to make this loop asynchronous?

1 Answer 1

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Related