1

Here's my problem:

In my database I have a requests table, which has a status field. A cronjob runs every minute, looking for every request with a "pending" status. WHen it's done processing the request it sets the status to "finished".

Sometimes this cronjob takes over a minute to run, so two of the same cronjob are running. I need to make sure they don't process the same request.

Right now I'm doing this:

$requests = db_all($this->db->query('SELECT * FROM x__mx_request WHERE status="pending"')); //few lines of code later $this->db->query('UPDATE x__mx_request SET status="processing" WHERE id IN('.implode(',', $ids).')'); 

I set the status to processing after I select it, which stops other scripts from processing it too, but Isn't it possible that one script selects all requests, then a 2nd script selects all requests before the 1st script sets them to processing?

I'd like a more secure way of doing this!

EDIT: Thanks for the great answers, I'll have to actually try them out though before I can mark one as correct.

3
  • why not one polling service that feeds the two cron jobs with what they need, or one cron job that can fork and run multiple requests? Commented May 7, 2012 at 18:22
  • They are one cronjob, after 1 minute though the same cronjob runs again, and if the first one didn't finish all the requests before a 2nd one starts, that's where the problem happens. Commented May 7, 2012 at 18:25
  • cron job could easily enough create a file called last_processed. First thing you do is read the file and ensure your current job id doesn't match the previously processed/processing job id. If it does you can select a new job. Or you can read that last_processed and use it as an exclusion criteria for your job selecting logic. Like "Select * from jobs where id <> last_processed" kind of deal. Commented May 7, 2012 at 18:26

3 Answers 3

2

you can try to make a lock/unlock file so cronjobs can check to "know" if a job is/isn't already running. also process pid could be used for identifying status of process in case something failed, not responding etc..

Sign up to request clarification or add additional context in comments.

Comments

2

Set the status to a unique value, and then process everything with that unique value. Have a timeout or some sort of fallback if your processing fails half-way through (but you need that with your current situation anyway).

Something like (and I'm just making up PHP, since I don't particularly know the language):

 $guid = new_guid(); $this->db->query( 'UPDATE x__mx_request SET status = ? WHERE status = "pending";', $guid ); $requests = db_all( $this->db->query('SELECT * FROM x__mx_request WHERE status = ?;', $guid) ); 

Another option is transactions - but I think you'd need SERIALIZABLE, which means you're basically stuck with only 1 job processing anyway. If you want to do that, a lock file for your cron job makes that easy to do without changing code.

Comments

0

transactions are what you want. So you have a single transaction that reads the data, changes the pending status to some intermediate value such as inprocess, and returns the data.

This prevents two queries from returning the same data.

Roughly you want something like this pseudo sql

begin transaction select into tmp table * from blah where status = "pending" update status = "being processed" where id in tmp end transaction return select * from tmp table 

4 Comments

I tried transactions before but my database kept getting locked up (probably from errors in my code) but either way I thought I'd try to avoid them since I don't fully understand them and I think they reduce performance and prevent multi-threading.
They don't prevent multi threading at all. In fact, in a single thread single user model, you don't need transactions. As for performance, its likely negligible. That the database kept getting locked suggests that either you have way to many chron jobs( i doubt it) or the transaction is wrongly scoped. See updated answer in a moment for how to do it correctly
You'd need a FOR UPDATE/UPDLOCK or a SERIALIZABLE transaction to prevent a 2nd job from running the same "SELECT INTO" statement. You basically do need to prevent multithreading between the SELECT and the UPDATE - otherwise, you risk multiple threads (cron jobs) picking up the same IDs. MSSQL has an OUTPUT clause that would help - but I don't think MySQL has an equivalent.
Ah mysql as somewhat odd transaction semantics. And no, preventing multithreading IS NOT the same thing as using a transaction. Transactions can be viewed very roughly as optimistic locking: you have multiple threads executing and if something conflicts, then you you get into a lock/ abort .

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.