34

Recently, whenever I try to archive a question using the Wayback Machine, Cloudflare's been blocking the attempt:

A failed Wayback Machine archive page for a Space SE question with a "Website is not accessible via this address" error and a "Performance & security by Cloudfare" notice.

Given that any negative impact on the Wayback Machine's ability to archive SE pages is officially considered a bug, I assume this blocking is accidental.

Not a dupe of this question, as the blocking behavior is completely different.

1 Answer 1

21

Thanks for the bug report, this should be fixed now. This triggered one of the anti scraping bot rules, I've adjusted it to exclude internet archive.

11
  • Can confirm that Wayback Machine archiving is working now. Much appreciated! Commented May 8 at 23:11
  • 1
    I'm still get a lot "Error! Job failed." when the Wayback Machine attempts to archive outlinks that are questions, tags, or users on SO. Any idea why? And on SE in general I get error "We’re currently facing some limitations when it comes to archiving this site. We apologize for any inconvenience this might cause and appreciate your understanding. Please email us at "[email protected]" if you would like to discuss this more." Commented May 13 at 23:57
  • @Starship I'd need more details like a RayID in order to troubleshoot the errors you're getting. Commented May 14 at 20:34
  • @JoshZhang How does one find RayID? Commented May 14 at 20:34
  • @Starship sorry I always call it RayID but it's actually CF-Ray, see meta.stackexchange.com/a/403462/784098. In the screen shot above, you see the RayID at the bottom. Commented May 14 at 21:17
  • @JoshZhang Don't see the RayID. Here's an example of what happens when wayback can complete the archive, but archives the captcha page instead and I don't see a RayID. And I have no idea how I'd look at the bottom of the page for a page which wayback machine never was able to visit (and hence record for me to see) Commented May 14 at 22:33
  • @Starship it's hard to say for sure but it's possible their bot hit the main site rate limiter, without a RayID I can't tell for sure. Commented May 15 at 0:04
  • @JoshZhang I thought the Wayback Machine was exempted from the site rate-limiter? Commented May 20 at 23:24
  • @Vikki no traffic is exempt from ALL rate limiters, otherwise people could abuse the archive function en masse to bring the site down. The general site rate limiters applies to all traffic regardless of source. Commented May 21 at 13:11
  • @JoshZhang: Wouldn't be possible, the Wayback Machine's archiver won't hit the same page again until it's been an hour since the last time it went there. Commented May 23 at 1:56
  • 1
    @Vikki I just audited a weeks worth of traffic from Archive.org, no legitimate request was blocked or rate limited, which means it never hit the general site rate limiter or any rule. As a rule, we don't poke holes in the base rate limiter that keeps the app from crashing. Commented May 23 at 18:19

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.