0

I am looking at this website

https://shopee.sg/search?keyword=cosmetics

and when I search for xpath:

//div[@class="PFM7lj"] 

It initially only finds 15 elements until I look at each item/scroll down to the end. Then when I search it again, it shows 60 elements found

What do I need to do here?

Additionally when I call the item through BeautifulSoup I get a very different output like this: 15th Item

but when I call the 16th item (results[15]), it shows: 15th element

My code so far looks like this:

from bs4 import BeautifulSoup from selenium import webdriver from selenium.webdriver.chrome.options import Options import os chromedriver = "path to chromedriver" options = Options() options.headless = True driver = webdriver.Chrome(chromedriver, options=options) url = "https://shopee.sg/search?keyword=cosmetics" driver.get(url) soup = BeautifulSoup(driver.page_source, 'html.parser') results = soup.find_all('div', {'data-sqe': 'item'}) print(results[14]) print(results[15]) 
5
  • modern pages use JavaScript to add element (they add when you scroll page - it is so called "lazy loading" to display page faster) but BeautifulSoup can't run JavaScript and you may need Selenium to control real web browser which can run JavaScript Commented Jul 15, 2021 at 7:18
  • @furas but shouldn't it still work if I used headless chrome driver? it's still is giving me the same result Commented Jul 15, 2021 at 7:30
  • do you use headless chrome driver? I don't see it in code. Better show minimal working code which we could copy and run. Did you scroll page using headless chrome driver? Some server may also detect driver and block it. Commented Jul 15, 2021 at 7:34
  • @furas I mean there's literally nothing much to it but I did include it just in case you wanted to run it yourself :) Commented Jul 15, 2021 at 7:43
  • you have to scroll page before you get driver.page_source - selenium has some method to move to some element from end of page, or you may need to use JavaScript to scroll it. You should find few questions on Stackoverflow which show it. Commented Jul 15, 2021 at 7:48

1 Answer 1

2
import requests def main(url): params = { "by": "relevancy", "keyword": "cosmetics", "limit": "100", # maximum is 100 per page "newest": "0", # you can loop from here 0, 100, 200 and so on "order": "desc", "page_type": "search", "scenario": "PAGE_GLOBAL_SEARCH", "version": "2" } r = requests.get(url, params=params) # print(f"Total Count: {r.json()['total_count']}") # can be used as a logic for loop for i in r.json()['items']: print(i['item_basic']['name']) main('https://shopee.sg/api/v4/search/search_items') 
Sign up to request clarification or add additional context in comments.

8 Comments

care to do some explanation...?
@DHK check my previous answer which will learn you how to track an XHR request.
mind if I ask how I would loop this so I can get the data for the next page?
thanks, I think I got most of what I need from your answer and the previous answer on another post, but I can't seem to find the URL/link for each item? I can see that there are the names, rating, price, discount rate and etc.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.