I'm trying to scrape a page with business listings and get the title, location and website. The issue is that some of these businesses don't have a website. I'm currently using an array of arrays to store the data:
[ [websites], [titles], [locations] ] When exporting the output in excel, I want to pass a blank value to the array of websites when there is no website listed and the URL for those that do have a website. In other words, I want to have something like this:
| Websites | Titles | Locations |
|---|---|---|
| Website A | Title A | Location A |
| (blank because it doesn't have a website) | Title B | Location B |
| Website C | Title C | Location C |
| ... | ... | ... |
The code I've written so far is the following:
async function main(){ try{ const browser = await puppeteer.launch({"headless":false}); const page = await browser.newPage(); await page.goto(url), { waitUntil: 'networkidle0' }; const businessesPosts = await page.$$eval("[class^='AdvItemBox']", allPosts => allPosts.map(post => [ post.querySelector(".siteLink.urlClickLoggingClass").href != null ? post.querySelector(".siteLink.urlClickLoggingClass").href : " ", //throws error "Cannot read property 'href' of null" post.querySelector("[class^='CompanyName']").innerText, // get the title post.querySelector("[class^='AdvAddress']").innerText] // get the location )); const wb = xlsx.utils.book_new(); const ws = xlsx.utils.aoa_to_sheet(businessesPosts); xlsx.utils.book_append_sheet(wb,ws); xlsx.writeFile(wb, "posts.xlsx"); await browser.close() } catch(e){ console.log('error',e); } }; main(); Here's the HTML code of the website's class
<a class="siteLink urlClickLoggingClass" target="_blank" product="AdvListing" productid="2419662++1926511++1" href="http://www.test.com"> Apparently there's something wrong when trying to insert a condition inside the array.
Any help would be much appreciated!