4

I'm stumped!

I have a rake task which is cron'd to run every minute.

It's logs in, it finds the JSON that I'm interested in but can take up to 30 runs of the task before any changes in the JSON are noticed in the rake task. During which time I've missed several changes of certain JSON objects.

Seems like there's some caching going on, I've tried to turn off Mechanize caching as shown, just not sure what else I can try now.

Any pointers?

Thanks in advance.

 agent = Mechanize.new # {|a| a.log = Logger.new(STDERR) } agent.history.clear agent.max_history = 0 agent.user_agent_alias = 'Mac Safari' page = agent.get 'http://website.com' form = page.forms.first form.email = '[email protected]' form.password = 'mypassword' page = agent.submit form page = agent.get 'http://website.com/password_protected_page' jsonDirty = page.search '//script[@type="application/json"]' 

Response from server:

{"server"=>"nginx", "date"=>"Thu, 13 Sep 2012 14:16:43 GMT", "content-type"=>"text/html; charset=utf-8", "connection"=>"close", "vary"=>"Cookie", "content-language"=>"plfplen", "set-cookie"=>"csrftoken=pVDg2SJ4KHqONz2OiEkNK7IbKlnJSQQf; expires=Thu, 12-Sep-2013 14:16:43 GMT; Max-Age=31449600; Path=/, affiliate=; expires=Thu, 01-Jan-1970 00:00:00 GMT; Max-Age=0; Path=/, one-click-join=; expires=Thu,01-Jan-1970 00:00:00 GMT; Max-Age=0; Path=/", "expires"=>"Thu, 01 Jan 1970 00:00:01 GMT", "cache-control"=>"no-cache", "content-encoding"=>"gzip", "transfer-encoding"=>"chunked"} 
4
  • do you have access to the server logs? If not, would you be able to output the server response headers to the logs? Commented Sep 11, 2012 at 19:56
  • Thanks Bryce, I've added the reponse from the server.. Commented Sep 13, 2012 at 14:29
  • Have you verified that the result really changes more frequent? You could try to submit the form using curl and compare the javascript. Commented Sep 15, 2012 at 15:12
  • I'll knock up a program to do exactly that, will run it on my laptop rather than the server to be sure. Commented Sep 16, 2012 at 9:02

1 Answer 1

1

You could try appending a random query parameter to the URL. Such as:

page = agent.get "http://website.com/password_protected_page?random=#{Time.now.to_i}" 
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks Christian, will give that a go and let you know
Unfortunately it didn't make a difference, still the same long delay

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.