fault-tolerant crond replacement

Question

I'm looking for a replacement to crond, or possibly an extension of it.

The feature that I definitely would want to have is fault-tolerance. E.g. if a job failed to run due to the computer not being on at the specified time (e.g. because of a power failure), or if the task failed to run successfully (i.e. rc!=0), (e.g. because of no internet access) then the software in question should retry periodically until the next scheduled run at which point it would continue its regular operation; assuming that run was a success.

Other features that are nice-to-have:

remote control via e.g. a REST interface
better logging

In case there is no such software available, could someone point me in the right direction as to which is a better idea: an extension of an existing software or writing something from scratch?

you are looking for an asynchronous cron. Here's a list (easiest to hardest). I'm using the easiest one (cronie) — Sebastian
– Sebastian, Commented Oct 27, 2014 at 9:44
Thanks, I'll try one out. But do these only solve the problem of a particular job not being run due to the computer being powered off or asleep? Or can it also do retries based on the return code? Even better would be to specify conditions for a job to be successful, such that it wouldn't depend only on the return code. — ParaDoX
– ParaDoX, Commented Oct 27, 2014 at 9:48
Yes, only the former (and also no REST of course). I'm not sure if there exists any cron which has this option. What you can always do is wrap the command you want to run into a shell script, and specify by yourself what success means, write that to the disk (e.g. /root/.jobstatus) and check that file the next time you run the script. — Sebastian
– Sebastian, Commented Oct 27, 2014 at 9:52
@Sebastian Additionally for such /root/.jobstatus you'd have to check if time of modification|creation equal to scheduled time period or great. — Costas
– Costas, Commented Oct 27, 2014 at 9:58
The features you're describing are covered by products called "job schedulers"/"batch schedulers". There are some open-source ones (but never tried any), and commercial (usually surprisingly expensive). Might want to look at the Wikipedia page for those (en.wikipedia.org/wiki/List_of_job_scheduler_software) — Mat
– Mat, Commented Oct 27, 2014 at 10:13

Anthon · Accepted Answer · 2014-10-27 20:37:03Z

I have several jobs that require running at least once a day. What I do is start the scripts for these jobs every hour (or more often) and the scripts themselve check if they have already run by checking a status file on disc.

If the status file exists and is up-to-date then the script exits.

If that file is to old (i.e. last written the day before) or doesn't exists the script does run and on successful termination writes the status file.

If you cannot build this functionality into an existing program, it is simple to make wrapper script that checks if the program must run, calls the program if necessary, and on success (exit value, parsed output) writes the status file.

/usr/local/bin/catchup.simple:

#! /usr/bin/env python """ first parameter is a path to a file /..../daily/some_name That is a status/script file and the /daily/ indicates it needs to run at least once a day (after reboot, after midnight). The rest of the parameters is the command executed and its parameters. If there are no more parameters beyond the first the actual status file is /..../daily/some_name.status and is expected to be updated by calling the /....daily/some_name script (which has to be executable). That script doesn't need to know about the frequency and gets called with the status file as first (and only) argument. Valid directory names and their functioning: /daily/ run once a day (UTC) /hourly/ run once an hour The actual scheduling and frequency to check if running is necessary, is done using a crontab entry: CU=/usr/local/bin/catchup.simple CUD=/root/catchup # month, hour, day_of_month, month day_of_week command */5 * * * * $CU $CUD/daily/getlogs curl .... If mulitple days (or hours) have gone by, no runs are made for skipped days. If subprocess.check_output() fails the status file is not updated. """ import sys import datetime import subprocess verbose = False # set to True to debug def main(): if len(sys.argv) < 2: print 'not enough parameters for', sys.argv[0] return if len(sys.argv) == 2: status_file_name = sys.argv[1] + '.status' cmd = [sys.argv[1]] else: status_file_name = sys.argv[1] cmd = sys.argv[2:] freq = sys.argv[1].rsplit('/', 2)[-2] if verbose: print 'cmd', cmd print 'status', status_file_name print 'frequency', freq try: last_status = datetime.datetime.strptime( open(status_file_name).read().split('.')[0], "%Y-%m-%dT%H:%M:%S", ) except (IOError, ValueError): last_status = datetime.datetime(2000, 1, 1) now = datetime.datetime.utcnow().replace(microsecond=0) if verbose: print last_status print 'now', now.isoformat() if freq == 'daily': if last_status.date() < now.date(): subprocess.check_output(cmd) elif verbose: print 'already done today' elif freq == 'hourly': if last_status.date() < now.date() or \ last_status.date() == now.date() and \ last_status.hour < now.hour: subprocess.check_output(cmd) elif verbose: print 'already done this hour' with open(status_file_name, 'w') as fp: fp.write(now.isoformat()) if __name__ == "__main__": main()

@Sebastian :-) it seemed to me be the best way to do that without replicating cron functionality. I really needed it when I still dual booted between Windows and Linux to catch up with automated tasks. — Anthon
– Anthon, Commented Oct 27, 2014 at 9:56
Makes sense. It would be better to have a generic wrapper script for this, so that I can prepend it to all runs of a command/script as a cronjob. The problem however then is that all my cronjobs would need to be run hourly (or even every minute, in case the job was supposed to run every 5 minutes), and it would rather be my wrapper handling the scheduling. Sounds a bit ugly. But it is perhaps the easiest way forward in this case if one doesn't want to replace crond completely. — ParaDoX
– ParaDoX, Commented Oct 27, 2014 at 10:07
@ParaDoX If you want and give me half an hour of time I will upload the (stripped down) python script I use for that. — Anthon
– Anthon, Commented Oct 27, 2014 at 10:27
@Anthon Sure. Is it a generic wrapper script which also handles scheduling? — ParaDoX
– ParaDoX, Commented Oct 27, 2014 at 11:35
@Anthon How is it coming along with the stripped python script? — ParaDoX
– ParaDoX, Commented Oct 27, 2014 at 20:00

Stack Exchange Network

fault-tolerant crond replacement

1 Answer 1

You must log in to answer this question.

Linked

Hot Network Questions

fault-tolerant crond replacement

1 Answer 1

You must log in to answer this question.

Linked

Related

Hot Network Questions