3

I browse an infinite scroll page using Puppeteer but this page is really really long. The problem is that the memory used by Puppeteer grows way too much and after a while, it crashes. I was wondering if there is a nice way to somehow free up memory during the scroll.

For example, would it be possible to pause every minute to remove the HTML that has been loaded so far and copy it to the hard disk? That way, after I'm done scrolling, I have all the HTML in a file and can easily work with it. Is it possible to do that? If yes, how? If no, what would be a viable solution?

1 Answer 1

2

I would wager that the approach you outline would work. The trick will be to remove nodes from only the list that is being added to. The implementation would maybe look something like this:

 await page.addScriptTag({ url: "https://code.jquery.com/jquery-3.2.1.min.js" }); const scrapedData = []; while (true) { const newData = await page.evaluate(async () => { const listElm = $(".some-list"); const tempData = listElm.toArray().map(elm => { //Get data... }); listElm .children() .slice(20) .remove(); //TODO: Scroll and wait for new content... return tempData; }); scrapedData.push(...newData) if(someCondition){ break; } } 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.