7

I am trying to do web scraping a web page which includes multiple tabs inside itself. When I click on the desired tab and after showing up the its contents there are two problems at first. 1- The web page address does not change and is the same for all tabs. 2- When I see the page source with "view page source" of the browser (firefox and chrome), the page source is also looks same for all tabs whereas when I use "Inspect Elemnt" for one of the tabs I see my target content in the html form of the shown code.

The problem is I could not access the desired tab's contents via python typical codes for web scraping available all over the WEB world. These codes normally are based on bs4.

Does anyone have any idea or sample code to learn how to handle my problem? The page I am looking is on the following address: http://tsetmc.com/Loader.aspx?ParTree=151311&i=63917421733088077#

1
  • 3
    The page content is probably rendered with javascript. BeautifulSoup only processes the initial response from the server and can not handle javascript. "View Source" will show you the response that BeautifulSoup will get, but "Inspect Element" will show you how the page is currently rendered. If you want to extract data from a dynamically loaded webpage, you can either try to find the source of the calls and hit the API directly, or use something like Selenium that can render the javascript for you. Commented Feb 6, 2020 at 14:48

1 Answer 1

3

web scraping with beautifullsoup can not be done correctly if a page has javascript DOM element. the page your are trying to scrape has javascript element and shows data with that. The difference between View Source and Inspect Element is due to the browser. Actually the browser makes it readable for users. To sum up, you have to use simulate the browser to achieve those data you are looking for. This can be done by Selenium. you can search for using selenium and python for webscraping.

here is a simple example of using selenium and python for web scraping:

from selenium import webdriver from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.by import By from selenium.common.exceptions import NoSuchElementException url = 'http://tsetmc.com/Loader.aspx?ParTree=151311&i=63917421733088077#' #firefox driver for selenium from: https://github.com/mozilla/geckodriver/releases driver = webdriver.Firefox(executable_path=r'your-path\geckodriver.exe') driver.get(url) wait = WebDriverWait(driver, 10) try: #wait for the page to load completely element = wait.until(EC.visibility_of_all_elements_located((By.XPATH, "/html/body/div[4]/form/div[3]/div[2]/div[1]/div[2]/div[1]/table/tbody"))) time.sleep(1) finally: driver.quit() 

This code will open the firefox you have to put your directory in the 'your-path\geckodriver.exe' section. Pay attention to the comment which is about geckodriver. you need it for running selenium.

you can search for more information about Selenium.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.