0

Currently I am practicing using selenium for Web Scraping and encountering StaleElementReferenceException Error. I tried to scrap the phone information from a retailed website and tried to make it for 3 pages. I used a for loop and it worked fine for the 1st page. Then I encountered the error. I have tried WebDriverWait, time.sleep, etc. but it didn't work. Please help me with this. Below is my code:

driver = webdriver.Chrome() driver.get('https://tiki.vn/') category = driver.find_element(By.XPATH,"//a[@title='Điện Thoại - Máy Tính Bảng']").click() phone_information = [] for page in range(1,4): next_page = driver.find_element(By.XPATH, '//a[@data-view-label="{}"]'.format(page)).get_attribute('href') driver.get(next_page) element = (By.XPATH, '//div[@class="inner"]') WebDriverWait(driver, 30).until(EC.visibility_of_element_located(element)) phone_names = driver.find_elements(By.XPATH , '//div[@class="info"]') for phone in phone_names: print(phone.text) WebDriverWait(driver, 60) driver.quit() 

This is the output:

StaleElementReferenceException Traceback (most recent call last) Cell In[7], line 16 14 time.sleep(10) 15 for phone in phone_names: ---> 16 print(phone.text) 17 time.sleep(20) 19 WebDriverWait(driver, 60) 
4
  • The code in the error does not match the posted code. Please post your real code. Commented Oct 1, 2023 at 18:50
  • Hi @JohnGordon, that's my actual code. I used jupyter to work with in this practice. First cell was the packages that are needed to import. It would be: import selenium from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC import time The second cell was the posted code Commented Oct 1, 2023 at 19:14
  • @JohnGordon, I have tried WebDriverWait, time.sleep, etc. but it didn't work, so I dropped them out of the code. Commented Oct 1, 2023 at 19:17
  • so I dropped them out of the code Okay, so please run the actual code and post the actual error message. Commented Oct 1, 2023 at 19:23

1 Answer 1

0

TL;DR

Just enclose it in a try-catch block and re-find the element. Quick and dirty solution. Error-prone.

from selenium.common.exceptions import StaleElementReferenceException # import the exception type // ... for phone in phone_names: try: print(phone.text) except StaleElementReferenceException: element = (By.XPATH, '//div[@class="inner"]') WebDriverWait(driver, 30).until(EC.visibility_of_element_located(element)) phone_names = driver.find_elements(By.XPATH , '//div[@class="info"]') continue // ... 

Further intuition into why you get that error

Hi. The problem is that the reference to the element you created when calling find_element() is pointing to an element that not longer present in the DOM. Note that the CSS.Selector or XPATH that you pass to describe the structure of an element is distinct from the element per se. As stated in Selenium's docs for the WebDriver API, the error you're getting:

exception
selenium.common.exceptions.StaleElementReferenceException(msg:
Optional[str] = None, screen: Optional[str] = None, stacktrace:
Optional[Sequence[str]] = None)

Bases: selenium.common.exceptions.WebDriverException

Thrown when a reference to an element is now “stale”.

Stale means the element no longer appears on the DOM of the page.

Possible causes of StaleElementReferenceException include, but not limited to:

  • You are no longer on the same page, or the page may have refreshed since the element was located.
  • The element may have been removed and re-added to the screen, since it was located. Such as an element being relocated.
  • Element may have been inside an iframe or another context which was refreshed.

Approach to solution

Althought a more robust solution cuold be achieved with some refactorig, knowing that the problem is in the reference provides you with the insight that you should only look up the element again and it should work fine. You've stated that the problem occurs when refreshing, and as you haven't provided the relevant HTML I'll just assume that the XPATH for the new element stays the same.

To make this easier to work with we should try to divide the task in it's constituent parts. i.e. finding the element, and then operating over it. We know the exception is raised in the finding bits. Furthermore, as long as the element you are finding is the phone_names collection and not each phone per se, you don't need to check for every phone, just the collection.

def print_phones(phone_names_element): for phone in phone_names_element: print(phone.text) driver = webdriver.Chrome() driver.get('https://tiki.vn/') category = driver.find_element(By.XPATH,"//a[@title='Điện Thoại - Máy Tính Bảng']").click() phone_information = [] for page in range(1,4): next_page = driver.find_element(By.XPATH, '//a[@data-view-label="{}"]'.format(page)).get_attribute('href') driver.get(next_page) element = WebDriverWait(driver, 30).until(EC.visibility_of_element_located((By.XPATH, '//div[@class="inner"]')) phone_names = driver.find_elements(By.XPATH , '//div[@class="info"]') try: print_phones(phone_names_element) except StaleElementReferenceException: continue WebDriverWait(driver, 60) driver.quit() 

^^^^^^^^^^
In the above code snippet i've extracted the print_phones functionality, I've inlined the element's bits (as .until() returns the element) and I've enclosed the call in a try loop as to avoid stale references.

Notes:

  • you could replaces your sleep calls and explicity time waits with a driver configurated ìmplicit wait
  • it would prove useful to replce some of the magic numers and strings for a more declarative and programatic approach so you can easly identify bugs and further develop your srcipt.

Hope it helps.

Sign up to request clarification or add additional context in comments.

10 Comments

pointing to an element that not longer present in the DOM Why would that be, when the element was the result of a find_elements() call immediately before the for loop? I don't see why the DOM would have changed.
It seems weird to me too @JohnGordon. But given the exception that's being raised is clear that the WebDriver is getting a stale reference. As OP stated that the code, as is, runs and prints before a page reload my best guess is that the site he's trying to scrappe has a pretty intesive bunch of client-side JS, probably made with an obese framework, that's messing the state of the page, the DOM, and refreshing elements and what not. A lot could have changed from one statement to another. A poorly written event emitter, o any piece of generated JS could be the culprit.
Honestly, as the task is to extract text from a webpage using Python, I'd probably just GET the relevant HTML from the pages and parse them. Maybe use BS4 if the DOM tree is particularly obscure. Emulating a human interacting with a browser is a shared-state nightmare prone to this kind of weird-ass errors.
Hi @HernanATN, thank you for your support. It worked now for both the codes.
Great to hear. Good Luck
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.