0

I am trying to scrape the data for this link: page.

If you click the up arrow you will notice the highlighted days in the month sections. Clicking on a highlighted day, a table with initiated tenders for that day will appear. All I need to do is get the data in each table for each highlighted day in the calendar. There might be one or more tenders (up to max of 7) per day.

Table appears on click

I have done some web scraping with bs4, however I think that this is a job for selenium (please, correct me if I am wrong) with which I am not very familiar.

So far, I have managed to find the arrow element by XPATH to navigate around the calendar and show me more months. After that I try clicking on a random day (in below code I clicked on 30.03.2020) upon which an html object called: "tenders-table cloned" appears in the html on inspect. The object name stays the same no matter what day you click on.

I am pretty stuck now, have tried to select by iterate and/or print what is inside that object table, it either says that object is not iterable or is None.

from selenium import webdriver chrome_path = r"C:\Users\<name>\chromedriver.exe" driver = webdriver.Chrome(chrome_path) driver.get("http://www.ibex.bg/bg/данни-за-пазара/централизиран-пазар-за-двустранни-договори/търговски-календар/") driver.find_element_by_xpath("""//*[@id="content"]/div[3]/div/div[1]/div/i""").click() driver.find_element_by_xpath("""//*[@id="content"]/div[3]/div/div[2]/div[1]/div[3]/table/tbody/tr[6]/td[1]""").click() 

Please advice how I can proceed to extract the data from the table pop-up.

2 Answers 2

1

Please try below solution

driver.maximize_window() wait = WebDriverWait(driver, 20) elemnt=wait.until(EC.presence_of_element_located((By.XPATH, "//body/div[@id='wrapper']/div[@id='content']/div[@class='tenders']/div[@class='form-group']/div[1]/div[1]//i"))) elemnt.click() elemnt1=wait.until(EC.presence_of_element_located((By.XPATH, "//div[@class='form-group']//div[1]//div[3]//table[1]//tbody[1]//tr[6]//td[1]"))) elemnt1.click() lists=wait.until(EC.presence_of_all_elements_located((By.XPATH, "//table[@class='tenders-table cloned']"))) for element in lists: print element.text 
Sign up to request clarification or add additional context in comments.

3 Comments

Hey, Dipak. Thanks for the solution. Please note that my code above also finds the elements and clicks on them in order for the table to come up. My problem is how can I proceed to gather the data in each table that appears upon clicking on the highlighted days. I have also tried your solution and it returns a blank page.
Thanks! That works fine and extracts a list for a single highlighted day in the calendar.
Hi Dipak, please note that αԋɱҽԃ αмєяιcαη answer is much better and straight to the point. My need was to identify and extract the data in each of the highlighted days, not just one and his solution is much faster. Thank you again for your solution. I am afraid I cannot upvote your answers since my account is still new. Will try later though.
0

Well, i see there's no reason to use selenium for such case as it's will slow down your task.

The website is loaded with JavaScript event which render it's data dynamically once the page loads.

requests library will not be able to render JavaScript on the fly. so you can use selenium or requests_html. and indeed there's a lot of modules which can do that.

Now, we do have another option on the table, to track from where the data is rendered. I were able to locate the XHR request which is used to retrieve the data from the back-end API and render it to the users side.

You can get the XHR request by open Developer-Tools and check Network and check XHR/JS requests made depending of the type of call such as fetch

import requests import json data = { 'from': '2020-1-01', 'to': '2020-3-01' } def main(url): r = requests.post(url, data=data).json() print(json.dumps(r, indent=4)) # to see it in nice format. print(r.keys()) main("http://www.ibex.bg/ajax/tenders_ajax.php") 

Because am just a lazy coder: I will do it in this way:

import requests import re import pandas as pd import ast from datetime import datetime data = { 'from': '2020-1-01', 'to': '2020-3-01' } def main(url): r = requests.post(url, data=data).json() matches = set(re.findall(r"tender_date': '([^']*)'", str(r))) sort = (sorted(matches, key=lambda k: datetime.strptime(k, '%d.%m.%Y'))) print(f"Available Dates: {sort}") opa = re.findall(r"({\'id.*?})", str(r)) convert = [ast.literal_eval(x) for x in opa] df = pd.DataFrame(convert) print(df) df.to_csv("data.csv", index=False) main("http://www.ibex.bg/ajax/tenders_ajax.php") 

Output: view-online

enter image description here

5 Comments

Thanks mate! That is perfectly straight to the point solution! I was also looking into the XHR in the dev tools, however I found myself a little confused. Now it is much more clear. Thanks!
@sc-coder you welcome, you know how to handle the rest well :)?
Yep, I think I will handle it. Again, thanks a lot!
@sc-coder check updated answer :P just for a record
nice one! Pretty elegant! Laziness all the way! :D

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.