I want to download many files from dukaskopy. A typical url looks like this.
url = 'http://datafeed.dukascopy.com/datafeed/AUDUSD/2014/01/02/00h_ticks.bi5' I tried the answer here but most of the files are of size 0.
But when I simply looped using wget(see below), I got complete files.
import wget from urllib.error import HTTPError pair = 'AUDUSD' for year in range(2014,2015): for month in range(1,13): for day in range(1,32): for hour in range(24): try: url = 'http://datafeed.dukascopy.com/datafeed/' + pair + '/' + str(year) + '/' + str(month-1).zfill(2) + '/' + str(day).zfill(2) + '/' + str(hour).zfill(2) + 'h_ticks.bi5' filename = pair + '-' + str(year) + '-' + str(month-1).zfill(2) + '-' + str(day).zfill(2) + '-' + str(hour).zfill(2) + 'h_ticks.bi5' x = wget.download(url, filename) # print(url) except HTTPError as err: if err.code == 404: print((year, month,day, hour)) else: raise I have utilized the following code earlier for scraping websites but not for downloading files.
#!/usr/bin/env python3 # -*- coding: utf-8 -*- from aiohttp import ClientSession, client_exceptions from asyncio import Semaphore, ensure_future, gather, run from json import dumps, loads limit = 10 http_ok = [200] async def scrape(url_list): tasks = list() sem = Semaphore(limit) async with ClientSession() as session: for url in url_list: task = ensure_future(scrape_bounded(url, sem, session)) tasks.append(task) result = await gather(*tasks) return result async def scrape_bounded(url, sem, session): async with sem: return await scrape_one(url, session) async def scrape_one(url, session): try: async with session.get(url) as response: content = await response.read() except client_exceptions.ClientConnectorError: print('Scraping %s failed due to the connection problem', url) return False if response.status not in http_ok: print('Scraping%s failed due to the return code %s', url, response.status) return False content = loads(content.decode('UTF-8')) return content if __name__ == '__main__': urls = ['http://demin.co/echo1/', 'http://demin.co/echo2/'] res = run(scrape(urls)) print(dumps(res, indent=4)) There is an answer to download multiple files using multiprocessing here. But I think asyncio could be faster.
When the files of 0 size are returned it could be the server limiting number of requests but I still would like to explore if there is a possibility of downloading multiple files using wget and asyncio.