How to free up memory from Puppeteer in infinite scroll?

Question

I browse an infinite scroll page using Puppeteer but this page is really really long. The problem is that the memory used by Puppeteer grows way too much and after a while, it crashes. I was wondering if there is a nice way to somehow free up memory during the scroll.

For example, would it be possible to pause every minute to remove the HTML that has been loaded so far and copy it to the hard disk? That way, after I'm done scrolling, I have all the HTML in a file and can easily work with it. Is it possible to do that? If yes, how? If no, what would be a viable solution?

Scottmas · Accepted Answer · 2021-04-15 17:12:11Z

I would wager that the approach you outline would work. The trick will be to remove nodes from only the list that is being added to. The implementation would maybe look something like this:

 await page.addScriptTag({ url: "https://code.jquery.com/jquery-3.2.1.min.js" }); const scrapedData = []; while (true) { const newData = await page.evaluate(async () => { const listElm = $(".some-list"); const tempData = listElm.toArray().map(elm => { //Get data... }); listElm .children() .slice(20) .remove(); //TODO: Scroll and wait for new content... return tempData; }); scrapedData.push(...newData) if(someCondition){ break; } }

Collectives™ on Stack Overflow

How to free up memory from Puppeteer in infinite scroll?

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related