1

I'm trying to scrape a page with business listings and get the title, location and website. The issue is that some of these businesses don't have a website. I'm currently using an array of arrays to store the data:

[ [websites], [titles], [locations] ] 

When exporting the output in excel, I want to pass a blank value to the array of websites when there is no website listed and the URL for those that do have a website. In other words, I want to have something like this:

Websites Titles Locations
Website A Title A Location A
(blank because it doesn't have a website) Title B Location B
Website C Title C Location C
... ... ...

The code I've written so far is the following:

async function main(){ try{ const browser = await puppeteer.launch({"headless":false}); const page = await browser.newPage(); await page.goto(url), { waitUntil: 'networkidle0' }; const businessesPosts = await page.$$eval("[class^='AdvItemBox']", allPosts => allPosts.map(post => [ post.querySelector(".siteLink.urlClickLoggingClass").href != null ? post.querySelector(".siteLink.urlClickLoggingClass").href : " ", //throws error "Cannot read property 'href' of null" post.querySelector("[class^='CompanyName']").innerText, // get the title post.querySelector("[class^='AdvAddress']").innerText] // get the location )); const wb = xlsx.utils.book_new(); const ws = xlsx.utils.aoa_to_sheet(businessesPosts); xlsx.utils.book_append_sheet(wb,ws); xlsx.writeFile(wb, "posts.xlsx"); await browser.close() } catch(e){ console.log('error',e); } }; main(); 

Here's the HTML code of the website's class

<a class="siteLink urlClickLoggingClass" target="_blank" product="AdvListing" productid="2419662++1926511++1" href="http://www.test.com"> 

Apparently there's something wrong when trying to insert a condition inside the array.

Any help would be much appreciated!

1 Answer 1

1

Instead of:

 post.querySelector(".siteLink.urlClickLoggingClass").href != null ? post.querySelector(".siteLink.urlClickLoggingClass").href : " ", 

try:

 post.querySelector(".siteLink.urlClickLoggingClass")?.href ?? " ", 

See:

Sign up to request clarification or add additional context in comments.

1 Comment

Many thanks!! Looks like it solved the issue :)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.