BeautifulSoup can't properly find element

Question

I am looking at this website

https://shopee.sg/search?keyword=cosmetics

and when I search for xpath:

//div[@class="PFM7lj"]

It initially only finds 15 elements until I look at each item/scroll down to the end. Then when I search it again, it shows 60 elements found

What do I need to do here?

Additionally when I call the item through BeautifulSoup I get a very different output like this:

but when I call the 16th item (results[15]), it shows:

My code so far looks like this:

from bs4 import BeautifulSoup from selenium import webdriver from selenium.webdriver.chrome.options import Options import os chromedriver = "path to chromedriver" options = Options() options.headless = True driver = webdriver.Chrome(chromedriver, options=options) url = "https://shopee.sg/search?keyword=cosmetics" driver.get(url) soup = BeautifulSoup(driver.page_source, 'html.parser') results = soup.find_all('div', {'data-sqe': 'item'}) print(results[14]) print(results[15])

modern pages use JavaScript to add element (they add when you scroll page - it is so called "lazy loading" to display page faster) but BeautifulSoup can't run JavaScript and you may need Selenium to control real web browser which can run JavaScript — furas
– furas, Commented Jul 15, 2021 at 7:18
@furas but shouldn't it still work if I used headless chrome driver? it's still is giving me the same result — DHK
– DHK, Commented Jul 15, 2021 at 7:30
do you use headless chrome driver? I don't see it in code. Better show minimal working code which we could copy and run. Did you scroll page using headless chrome driver? Some server may also detect driver and block it. — furas
– furas, Commented Jul 15, 2021 at 7:34
@furas I mean there's literally nothing much to it but I did include it just in case you wanted to run it yourself :) — DHK
– DHK, Commented Jul 15, 2021 at 7:43
you have to scroll page before you get driver.page_source - selenium has some method to move to some element from end of page, or you may need to use JavaScript to scroll it. You should find few questions on Stackoverflow which show it. — furas
– furas, Commented Jul 15, 2021 at 7:48

αԋɱҽԃ αмєяιcαη · Accepted Answer · 2021-07-15 08:00:23Z

2

import requests def main(url): params = { "by": "relevancy", "keyword": "cosmetics", "limit": "100", # maximum is 100 per page "newest": "0", # you can loop from here 0, 100, 200 and so on "order": "desc", "page_type": "search", "scenario": "PAGE_GLOBAL_SEARCH", "version": "2" } r = requests.get(url, params=params) # print(f"Total Count: {r.json()['total_count']}") # can be used as a logic for loop for i in r.json()['items']: print(i['item_basic']['name']) main('https://shopee.sg/api/v4/search/search_items')

edited Jul 15, 2021 at 8:00

answered Jul 15, 2021 at 7:51

αԋɱҽԃ αмєяιcαη

11.6k3 gold badges23 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

DHK Over a year ago

care to do some explanation...?

αԋɱҽԃ αмєяιcαη Over a year ago

@DHK check my previous answer which will learn you how to track an XHR request.

DHK Over a year ago

mind if I ask how I would loop this so I can get the data for the next page?

DHK Over a year ago

thanks, I think I got most of what I need from your answer and the previous answer on another post, but I can't seem to find the URL/link for each item? I can see that there are the names, rating, price, discount rate and etc.

αԋɱҽԃ αмєяιcαη Over a year ago

@DHK stackoverflow.com/help/someone-answers

|

Collectives™ on Stack Overflow

BeautifulSoup can't properly find element

1 Answer 1

8 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Linked

Related