Cloudflare blocking Wayback Machine from archiving Stack Exchange questions

Question

Recently, whenever I try to archive a question using the Wayback Machine, Cloudflare's been blocking the attempt:

Given that any negative impact on the Wayback Machine's ability to archive SE pages is officially considered a bug, I assume this blocking is accidental.

Not a dupe of this question, as the blocking behavior is completely different.

Josh Zhang · Accepted Answer · 2025-05-08 13:13:01Z

21

Thanks for the bug report, this should be fixed now. This triggered one of the anti scraping bot rules, I've adjusted it to exclude internet archive.

answered May 8 at 13:13

Josh ZhangStaffMod

11.1k9 gold badges36 silver badges48 bronze badges

Can confirm that Wayback Machine archiving is working now. Much appreciated!

Vikki
– Vikki

2025-05-08 23:11:18 +00:00
Commented May 8 at 23:11
1

I'm still get a lot "Error! Job failed." when the Wayback Machine attempts to archive outlinks that are questions, tags, or users on SO. Any idea why? And on SE in general I get error "We’re currently facing some limitations when it comes to archiving this site. We apologize for any inconvenience this might cause and appreciate your understanding. Please email us at "[email protected]" if you would like to discuss this more."

Starship
– Starship

2025-05-13 23:57:38 +00:00
Commented May 13 at 23:57
@Starship I'd need more details like a RayID in order to troubleshoot the errors you're getting.

Josh Zhang
– Josh Zhang StaffMod

2025-05-14 20:34:00 +00:00
Commented May 14 at 20:34
@JoshZhang How does one find RayID?

Starship
– Starship

2025-05-14 20:34:26 +00:00
Commented May 14 at 20:34
@Starship sorry I always call it RayID but it's actually CF-Ray, see meta.stackexchange.com/a/403462/784098. In the screen shot above, you see the RayID at the bottom.

Josh Zhang
– Josh Zhang StaffMod

2025-05-14 21:17:07 +00:00
Commented May 14 at 21:17
@JoshZhang Don't see the RayID. Here's an example of what happens when wayback can complete the archive, but archives the captcha page instead and I don't see a RayID. And I have no idea how I'd look at the bottom of the page for a page which wayback machine never was able to visit (and hence record for me to see)

Starship
– Starship

2025-05-14 22:33:20 +00:00
Commented May 14 at 22:33
@Starship it's hard to say for sure but it's possible their bot hit the main site rate limiter, without a RayID I can't tell for sure.

Josh Zhang
– Josh Zhang StaffMod

2025-05-15 00:04:13 +00:00
Commented May 15 at 0:04
@JoshZhang I thought the Wayback Machine was exempted from the site rate-limiter?

Vikki
– Vikki

2025-05-20 23:24:53 +00:00
Commented May 20 at 23:24
@Vikki no traffic is exempt from ALL rate limiters, otherwise people could abuse the archive function en masse to bring the site down. The general site rate limiters applies to all traffic regardless of source.

Josh Zhang
– Josh Zhang StaffMod

2025-05-21 13:11:23 +00:00
Commented May 21 at 13:11
@JoshZhang: Wouldn't be possible, the Wayback Machine's archiver won't hit the same page again until it's been an hour since the last time it went there.

Vikki
– Vikki

2025-05-23 01:56:01 +00:00
Commented May 23 at 1:56
1

@Vikki I just audited a weeks worth of traffic from Archive.org, no legitimate request was blocked or rate limited, which means it never hit the general site rate limiter or any rule. As a rule, we don't poke holes in the base rate limiter that keeps the app from crashing.

Josh Zhang
– Josh Zhang StaffMod

2025-05-23 18:19:30 +00:00
Commented May 23 at 18:19

Add a comment |

Stack Exchange Network

Cloudflare blocking Wayback Machine from archiving Stack Exchange questions

1 Answer 1

You must log in to answer this question.

Linked

Hot Network Questions

Cloudflare blocking Wayback Machine from archiving Stack Exchange questions

1 Answer 1

You must log in to answer this question.

Linked

Related

Hot Network Questions