Skip to content
This repository was archived by the owner on Sep 28, 2022. It is now read-only.

Fixed unfiltered duplicates bug, removed dont_filter#16

Open
flagist0 wants to merge 1 commit intobrandicted:masterfrom
flagist0:master
Open

Fixed unfiltered duplicates bug, removed dont_filter#16
flagist0 wants to merge 1 commit intobrandicted:masterfrom
flagist0:master

Conversation

@flagist0
Copy link
Copy Markdown

Middleware was emitting requests with dont_filter=True, causing multiple uncaught duplicates.

dont_filter is not needed by itself, but it was protecting request queue from exhaustion -- middleware emits one request at a time, so there is always only one request in Scrapy queue. If this request is duplicate and it is dropped by dupefilter, Scrapy request queue becomes empty and spider is closed, even if there are many requests in middleware's queue.

The solution is to catch spider_idle signal and supply next request from the queue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

1 participant