0

I'd like to scrape product links (675 products) from a website. The first page has only 24 products with a "Show Next 23" button. I tried two methods to load more products so I can get their links.

from selenium import webdriver from selenium.common.exceptions import TimeoutException, NoSuchElementException from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC options.add_experimental_option("excludeSwitches", ["enable-automation"]) options.add_experimental_option('useAutomationExtension', False) driver = webdriver.Chrome() wait = WebDriverWait(driver, 10) driver.get('https://www.3m.com.au/3M/en_AU/p/c/medical') while True: try: more_button = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, 'MMM-- btn MMM--btn_tertiary MMM--btn_noAnimation js-pageLoader wt-link wtLoaded mix- MMM--btn_allCaps'))).click() except TimeoutException: break 

I also tried

more_button = wait.until(EC.visibility_of_element_located((By.XPATH,' //*@id="pageContent"]/div[3]/div/div/div[3]/div[5]/div[2]/div[3]/div/div[2]/ div[2]/a'))).click() 

But both methods couldn't hit the "SHOW NEXT 24" button. I believe error 403-forbidden doesn't let me load more products.

Here is the screenshot of the tag: enter image description here

Any tip or solution will be very appreciated. Thanks in advance.

1
  • See my answer. Please let me know if it worked. Commented Jul 18, 2021 at 7:52

1 Answer 1

3
import requests import pandas as pd params = { 'ort': 'cp', 'rt': 'cart', 'cartridgeId': 'root/content/contents[0]/Results[0]' } headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:90.0) Gecko/20100101 Firefox/90.0' } def main(url): with requests.Session() as req: req.headers.update(headers) allin = [] for num in range(0, 675, 24): params['No'] = num r = req.get(url, params=params) for item in r.json()['Results'][0]['records']: allin.append([item.get('name', 'N/A'), item['detailsUrl']]) df = pd.DataFrame(allin, columns=["Title", "Url"]) print(df) main( 'https://www.3m.com.au/wps/PA_Snaps286/AjaxServlet/portlet286/prod/en_AU/https/www.3m.com.au/3M/en_AU/p/c/medical/') 

Output:

 Title Url 0 3M™ Littmann® Cardiology IV™ Stethoscope https://www.3m.com.au/3M/en_AU/p/d/b00037563/ 1 3M™ Littmann® Classic III™ Stethoscope https://www.3m.com.au/3M/en_AU/p/d/b00037556/ 2 3M™ Coban™ Self-Adherent Wrap 1581, Tan, 25mm ... https://www.3m.com.au/3M/en_AU/p/d/v000106081/ 3 3M™ Coban™ Self-Adherent Wrap 1581B, Blue, 25m... https://www.3m.com.au/3M/en_AU/p/d/v000106085/ 4 3M™ Coban™ Self-Adherent Wrap 1582, Tan, 50mm ... https://www.3m.com.au/3M/en_AU/p/d/v000077505/ .. ... ... 670 3M™ Littmann® Master Classic II Veterinary Ste... https://www.3m.com.au/3M/en_AU/p/d/v101112000/1/ 671 3M™ Synthetic Cast Stockinet MS02, 1RL/BX https://www.3m.com.au/3M/en_AU/p/d/v000199505/1/ 672 3M™ Red Dot™ Repositionable Monitoring Electro... https://www.3m.com.au/3M/en_AU/p/d/v000154357/1/ 673 3M™ Bair Hugger™ Warming Blanket, 55501, Paedi... https://www.3m.com.au/3M/en_AU/p/d/v000253003/1/ 674 3M™ Red Dot™ Repositionable Monitoring Electro... https://www.3m.com.au/3M/en_AU/p/d/v000154308/ [675 rows x 2 columns] 
Sign up to request clarification or add additional context in comments.

7 Comments

@Prophet for what i need to down-vote your answer?
@Prophet Then you've to leave a comment under your answer to ask downvoters to leave a comment. they may/may not leave an explanation
@Prophet i noticed that you usually delete your comment which stick the reader into confusion point.
@Prophet that's not how the community walk through. anyway ...
This is what I saw other users with much more points and experience than my doing..
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.