0

I am attempting to extract Titles, Authors, and PDF Links from the oral presentations tab, spanning pages 1 to 6, on the website https://openreview.net/group?id=ICML.cc/2024/Conference#tab-accept-oral.

While I've successfully managed to extract information from the first page, I'm encountering issues automating the process up to page 6. I understand that the page contents are dynamically displayed through JavaScript, but the code provided by GPT consistently results in StaleElementReferenceException errors.

Here is my code (which was wrong)

from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.chrome.options import Options from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC options = Options() options.add_argument('--headless') options.add_argument('--no-sandbox') options.add_argument('--disable-dev-shm-usage') driver = webdriver.Chrome(options=options) driver.get("https://openreview.net/group?id=ICML.cc/2024/Conference#tab-accept-oral") try: for i in range(1, 7): # repeat from 1 page to 6 page WebDriverWait(driver, 10).until( EC.presence_of_all_elements_located((By.CSS_SELECTOR, "div#accept-oral div.note")) ) notes = driver.find_elements(By.CSS_SELECTOR, "div#accept-oral div.note") for note in notes: title = note.find_element(By.CSS_SELECTOR, "h4 a").text authors = note.find_element(By.CSS_SELECTOR, "div.note-authors").text pdf_link = note.find_element(By.CSS_SELECTOR, "a.pdf-link").get_attribute('href') print("Title:", title) print("Authors:", authors) print("PDF Link:", pdf_link) # click next page if i < 6: next_page_button = WebDriverWait(driver, 10).until( EC.element_to_be_clickable((By.CSS_SELECTOR, f"ul.pagination li:nth-child({i+2}) a")) ) driver.execute_script("arguments[0].click();", next_page_button) finally: driver.quit() 
3
  • USE WATIR, a wrapper for Ruby’s Selenium binding. The advantage of using WATIR is that it automatically retrieves the element from the DOM if it becomes stale, prior to performing any action. On the other hand, if you decide to stick with direct Selenium binding in Python or Java or Ruby, it’s certainly possible, but you’ll need to manually code the retrieval of the element. Commented Jul 1, 2024 at 13:38
  • There's an official api Commented Jul 1, 2024 at 14:56
  • "presence_of_all_elements_located" returns when at least one element is present. Which means you can have an array of webelement references returned while the list is still populating. Easy fix is just put in a sleep before you use that method. Otherwise you'd catch stale element exceptions (which are thrown when you go to use a method on a stale reference) and find the elements again. (functionize and re-call or use a time-based method) Commented Jul 2, 2024 at 16:41

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.