2

I'm working with Github API right now, and here's a function that fetches all pull requests for each repo in list:

async def get_all_pulls(repos, api): pulls = [] for repo in repos: try: async for pull in api.getiter(f'/repos/{org}/{repo}/pulls?state=all'): pull['repo'] = repo if pull not in pulls: pulls.append(pull) except Exception: print(f"Bad repo/no access=> [{repo}]") continue return pulls 

Everything works fine but one little problem, it takes a lot of time because of that iteration over repos(let's say there is 30 of them).

I was trying to make it async like this(sure thing I am getting rid of for loop in declaration when using this):

 # gather all prs for all repos tasks = [asyncio.ensure_future(get_all_pulls_for_repo(api, repo)) for repo in repos] results = await asyncio.gather(*tasks) # unwrap list of lists for res in results: all_pull_requests += res 

But I get crashes and saying repos are bad etc. I think I'm missing something important here but can't get what.

Why does it crash with async for loop? And can I make it work?

UPDATE1: Traceback at get_all_reviews:

Traceback (most recent call last): File "/home/metal/Documents/projects/-git/async_git_tool.py", line 193, in <module> loop.run_until_complete(main()) File "/home/metal/.pyenv/versions/3.6.0/lib/python3.6/asyncio/base_events.py", line 466, in run_until_complete return future.result() File "/home/metal/Documents/projects/-git/async_git_tool.py", line 113, in main reviewed = await get_all_reviews(created, api, ss_programmers) File "/home/metal/Documents/projects/-git/async_git_tool.py", line 181, in get_all_reviews async for review in api.getiter(f'/repos/{org}/{pr_repo}/pulls/{pr_number}/reviews'): File "/home/metal/Documents/projects/-git/venv/lib/python3.6/site-packages/gidgethub/abc.py", line 85, in getiter data, more = await self._make_request("GET", url, url_vars, b"", accept) File "/home/metal/Documents/projects/-git/venv/lib/python3.6/site-packages/gidgethub/abc.py", line 66, in _make_request data, self.rate_limit, more = sansio.decipher_response(*response) File "/home/metal/Documents/projects/-git/venv/lib/python3.6/site-packages/gidgethub/sansio.py", line 284, in decipher_response rate_limit = RateLimit.from_http(headers) File "/home/metal/Documents/projects/-git/venv/lib/python3.6/site-packages/gidgethub/sansio.py", line 226, in from_http limit = int(headers["x-ratelimit-limit"]) File "multidict/_multidict.pyx", line 140, in multidict._multidict._Base.__getitem__ File "multidict/_multidict.pyx", line 135, in multidict._multidict._Base._getone KeyError: "Key not found: 'x-ratelimit-limit'" 

Here's the funciton itself:

 async def get_all_reviews(pulls, api, programmers): reviewed_pulls = [] for pull in pulls: pr_repo = pull['repo'] pr_number = str(pull['number']) async for review in api.getiter(f'/repos/{org}/{pr_repo}/pulls/{pr_number}/reviews'): if review['user']['login'] not in programmers \ and pull not in reviewed_pulls: reviewed_pulls.append(pull) return reviewed_pulls 

and I'm calling it like that:

reviewed = await get_all_reviews(softserve_created, api, ss_programmers) 
2
  • What concrete module you use to work with Github api? Commented Nov 21, 2017 at 18:13
  • @MikhailGerasimov I am using _Gidgegethub_(gidgethub.readthedocs.io/en/stable) Commented Nov 21, 2017 at 18:31

1 Answer 1

1

Idea you described worked fine for me:

import asyncio import aiohttp import gidgethub from gidgethub.aiohttp import GitHubAPI # TODO # paste your token to have rate limits # https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/ TOKEN = '...' async def get_all_pulls_for_repo(gh, org, repo): pulls = [] async for pull in gh.getiter(f'/repos/{org}/{repo}/pulls?state=all'): pulls.append(pull) await gh.sleep(0.1) # avoid RateLimitExceeded, you should count it somehow return pulls async def main(): org = 'brettcannon' repos = ['gidgethub', 'caniusepython3', 'importlib_resources'] async with aiohttp.ClientSession() as session: gh = GitHubAPI(session, 'requester', oauth_token=TOKEN) tasks = [ asyncio.ensure_future(get_all_pulls_for_repo(gh, org, repo)) for repo in repos ] results = await asyncio.gather(*tasks) for res in results: for pull in res: print(pull['url']) loop = asyncio.get_event_loop() try: loop.run_until_complete(main()) finally: loop.run_until_complete(loop.shutdown_asyncgens()) loop.close() 

Create token for requests, paste it and you'll see list of PR urls.

Sign up to request clarification or add additional context in comments.

3 Comments

sleep() worked well for eliminating the issue at get_all_pulls, but I still have a traceback at get_all_reviews, please check updated question at the top, thanks.
plus I've just noticed that if I'm using non-async for repo in repos I'm getting a total of 9983 pull requests and if async - It's only 7k
@Roman, error you get is not related to your code, it's a bug in gidgethub - github.com/brettcannon/gidgethub/issues/25 I think number of PRs you retrieve is also related to how gidgethub works.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.