11

I have the following query in jquery. It is reading the "publish" address of an Nginx subscribe/publish pair set up using Nginx's long polling module.

function requestNextBroadcast() { // never stops - every reply triggers next. // and silent errors restart via long timeout. getxhr = $.ajax({ url: "/activity", // dataType: 'json', data: "id="+channel, timeout: 46000, // must be longer than max heartbeat to only trigger after silent error. error: function(jqXHR, textStatus, errorThrown) { alert("Background failed "+textStatus); // should never happen getxhr.abort(); requestNextBroadcast(); // try again }, success: function(reply, textStatus, jqXHR) { handleRequest(reply); // this is the normal result. requestNextBroadcast(); } }); } 

The code is part of a chat room. Every message sent is replied to with a null rply (with 200/OK) reply, but the data is published. This is the code to read the subscribe address as the data comes back.

Using a timeout all people in the chatroom are sending a simple message every 30 to 40 seconds, even if they don't type anything, so there is pleanty of data for this code to read - at least 2 and possibly more messages per 40 seconds.

The code is 100% rock solid in EI and Firefox. But one read in about 5 fails in Chrome.

When Chrome fails it is with the 46 seconds timeout.

The log shows one /activity network request outstanding at any one time.

I've been crawling over this code for 3 days now, trying various idea. And every time IE and Firefox work fine and Chrome fails.

One suggestion I have seen is to make the call syncronous - but that is clearly impossible because it would lock up te user interface for too long.

Edit - I have a partial solution: The code is now this

function requestNextBroadcast() { // never stops - every reply triggers next. // and silent errors restart via long timeout. getxhr = jQuery.ajax({ url: "/activity", // dataType: 'json', data: "id="+channel, timeout: <?php echo $delay; ?>, error: function(jqXHR, textStatus, errorThrown) { window.status="GET error "+textStatus; setTimeout(requestNextBroadcast,20); // try again }, success: function(reply, textStatus, jqXHR) { handleRequest(reply); // this is the normal result. setTimeout(requestNextBroadcast,20); } }); } 

Result is sometimes the reply is delayed until the $delay (15000) happens, Then the queued messages arrive too quicly to follow. I have been unable to make it drop messages (only tested with netwrok optomisation off) with this new arrangement.

I very much doubt that delays are dur to networking problems - all machines are VMs within my one real machine, and there are no other users of my local LAN.

Edit 2 (Friday 2:30 BST) - Changed the code to use promises - and the POST of actions started to show the same symptoms, but the receive side started to work fine! (????!!!???). This is the POST routine - it is handling a sequence of requests, to ensure only one at a time is outstanding.

function issuePostNow() { // reset heartbeat to dropout to send setTyping(false) in 30 to 40 seconds. clearTimeout(dropoutat); dropoutat = setTimeout(function() {sendTyping(false);}, 30000 + 10000*Math.random()); // and do send var url = "handlechat.php?"; if (postQueue.length > 0) { postData = postQueue[0]; var postxhr = jQuery.ajax({ type: 'POST', url: url, data: postData, timeout: 5000 }) postxhr.done(function(txt){ postQueue.shift(); // remove this task if ((txt != null) && (txt.length > 0)) { alert("Error: unexpected post reply of: "+txt) } issuePostNow(); }); postxhr.fail(function(){ alert(window.status="POST error "+postxhr.statusText); issuePostNow(); }); } } 

About one action in 8 the call to handlechat.php will timeout and the alert appears. Once the alert has been OKed, all queued up messages arrive.

And I also noticed that the handlechat call was stalled before it wrote the message that others would see. I'm wondering if it could be some strange handling of session data by php. I know it carefully queues up calls so that session data is not corrupted, so I have been careful to use different browsers or different machines. There are only 2 php worker threads however php is NOT used in the handling of /activity or in the serving of static content.

I have also thought it might be a shortage of nginx workers or php processors, so I have raised those. It is now more difficult to get things to fail - but still possible. My guess is the /activity call now fails one in 30 times, and does not drop messages at all.

And thanks guys for your input.


Summary of findings.

1) It is a bug in Chrome that has been in the code for a while.
2) With luck the bug can be made to appear as a POST that is not sent, and, when it times out it leaves Chrome in such a state that a repeat POST will succeed.
3) The variable used to store the return from $.ajax() can be local or global. The new (promises) and the old format calls both trigger the bug.
4) I have not found a work around or way to avoid the bug.

Ian

12
  • 1
    If you suspect it's a Chrome issue, it might help others to know what version you're using. jQuery version too. Commented Jul 2, 2012 at 15:03
  • Chrome 20.0.1132.47 m jquery 1.7.2 O/S 64 bit windows on the client, and Ubuntu 11:04 on the server. Commented Jul 2, 2012 at 15:08
  • I've been testing this in Safari (5.1.7) and found it works. SO it is not a Webkit problem. Also tried in Version 20.0.1132.47 of Chrome for Linux and that has the problem. Commented Jul 2, 2012 at 15:38
  • I have altered the timeout to 10 seconds and removed the alert, so that if a read fails to find any data it times-out and is re-issued. Now the logs show lots of timed out reads, but all the data gets through! It apears that the clear up in Chrome after a tmeout/abort sets things fine, while reusing it after success can fail in some way sometimes. Its not a pretty solution, and not performant - but perhaps necessary. Commented Jul 3, 2012 at 8:53
  • Spoke too soon. It can STILL lose messages. This is notisable, when a send is NOT followed shortly by a completion of the read. I tried to detect the missing reply, and abort/reissue the read. What a complete mess that caused! Multple reads outstanding, and sends failing repeatedly - then suddenly it would all clear in a slew of background reads, leaving two outstanding. It appears .abort() doesn't alwasy abort the message. :( Commented Jul 4, 2012 at 10:35

5 Answers 5

5

I had a very similar issue with Chrome. I am making an Ajax call in order to get the time from a server every second. Obviously the Ajax call must be asynchronous because it will freeze up the interface on a timeout if it's not. But once one of the Ajax calls is a failure, each subsequent one is as well. I first tried setting a timeout to be 100ms and that worked well in IE and FF, but not in Chrome. My best solution was setting the type to POST and that solved the bug with chrome for me:

 setInterval(function(){ $.ajax({ url: 'getTime.php', type: 'POST', async: true, timeout: 100, success: function() { console.log("success"); }, error: function() { console.log("error"); } }); }, 1000); 

Update: I believe the actual underlying problem here is Chrome's way of caching. It seems that when one request fails, that failure is cached, and therefore subsequent requests are never made because Chrome will get the cached failure before initiating subsequent requests. This can be seen if you go to Chrome's developer tools and go to the Network tab and examine each request being made. Before a failure, ajax requests to getTime.php are made every second, but after 1 failure, subsequent requests are never initiated. Therefore, the following solution worked for me:

 setInterval(function(){ $.ajax({ url: 'getTime.php', cache: false, async: true, timeout: 100, success: function() { console.log("success"); }, error: function() { console.log("error"); } }); }, 1000); 

The change here, is I am disabling caching to this Ajax query, but in order to do so, the type option must be either GET or HEAD, that's why I removed 'type: 'POST'' (GET is default).

Sign up to request clarification or add additional context in comments.

Comments

2

try moving your polling function into a webworker to prevent freezing up in chrome. Otherwise you could try using athe ajax .done() of the jquery object. that one always works for me in chrome.

2 Comments

Thanks Michael. Webworker is impractical - I must support IE - and unnecessary becasue nothing I do takes long enough to notice. The problem is not that success() is not being called, its that the read does not finish EVEN WHEN THERE IS KNOWN DATA TO BE READ. In short, success might be called as much as 8 seconds later than the data became available. I'll check out done() though, and let you know.
What I was thinking is that you do a browser check,if its chrome, activate webworker who does a synchronous call. that way you might have a fix for chrome, since only chrome gives issues. Gotta love browser differences. I hope the .done solution works for you. Thats the easiest yo implement.
0

I feel like getxhr should be prefixed with "var". Don't you want a completely separate & new request each time rather than overwriting the old one in the middle of success/failure handling? Could explain why the behavior "improves" when you add the setTimeout. I could also be missing something ;)

2 Comments

getxhr is global, and I don't over-write it until I'm done with it,so while I could make it local, I don't see what difference it might make. I did wonder if Chrome wasn't queueing up some clean-up operation that was getting done before the re-send.
So you say, but your code says you're over-writing it every time requestNextBroadcast is called. That might be ok -- afterall, I rarely even use the return value from $.ajax
0

Comments won't format code, so reposting as a 2nd answer:

I think Michael Dibbets is on to something with $.ajax.done -- the Deferred pattern pushes processing to the next turn of the event loop, which I think is the behavior that's needed here. see: http://www.bitstorm.org/weblog/2012-1/Deferred_and_promise_in_jQuery.html or http://joseoncode.com/2011/09/26/a-walkthrough-jquery-deferred-and-promise/

I'd try something like:

function requestNextBroadcast() { // never stops - every reply triggers next. // and silent errors restart via long timeout. getxhr = jQuery.ajax({ url: "/activity", // dataType: 'json', data: "id="+channel, timeout: <?php echo $delay; ?> }); getxhr.done(function(reply){ handleRequest(reply); }); getxhr.fail(function(e){ window.status="GET error " + e; }); getxhr.always(function(){ requestNextBroadcast(); }); 

Note: I'm having a hard time finding documentation on the callback arguments for Promise.done & Promise.fail :(

4 Comments

Why dont you daisychain your events? That way all the callback events are injected withouy delay. Now you have the risc that the ajax fetching is complete before a complete handler is assigned and the processes mismatch.
I know... but after having seen the internet grow since IE4 and netscape nothing suprises me anymore... I've seen the most impossible things work(for example ie and the zoom:1 thing). If it doesn't work one way, another way it might, and somethimes you have to do the impossible thing to achieve somthing... Browser differences, Who doesn't love them.
Plus, it might be "unlikely" but remember that javascript is a single thread process without web workers, and all other execution gets delayed when something else jumps up front(an image slider, a loading bar appearing, etc...) and that way your getxhr.xxxx functions might get delayed because triggers fire and other events get more priority... With a daisychain you make sure everything in the chain is handled and set before the document thinks it can execute the next item on the stack.
See edit. The POSTS are daisy chained so they don't over-lap. The reads can't overlap and messages will wait for up to 60 seconds while we handle the front of the queue, so daisyhchaining is not necessary.
0

Perhaps it can be worked around by changing the push module settings (there are a few) - Could you please post these?

From the top of my head:

  • setting it to interval poll, would kinda uglily solve it
  • the concurrency settings might have some effect
  • message storage might be used to avoid missing data

I would also use something like Charles to see what exactly does happen on the network/application layers

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.