3

I am writing a script in Python to monitor the change of a website. The aim is, once an element in the page is updated (e.g. a button from non-existent to existent), I'll receive a notification. I don't need to login to an account or something on the website. Because I don't have too much knowledge in web development, I just found some code and modifies to meet my need. Basically it looks like this:

import time import datetime import random from selenium import webdriver from fake_useragent import UserAgent from selenium.webdriver.support.wait import WebDriverWait screen_dims = [(375, 667), (411, 731), (360, 640), (414, 736), (375, 812), (768, 1024), (1024, 1366), (540, 720)] def main(): while (True): ua = UserAgent() user_agent = ua.random options = webdriver.ChromeOptions() options.add_experimental_option("excludeSwitches", ["enable-automation"]) options.add_experimental_option('useAutomationExtension', False) options.add_argument('disable-infobars') options.add_argument(f'user-agent={user_agent}') driver = webdriver.Chrome(chrome_options=options) set_viewport_size(driver) driver.get(a_url_to_the_page_of_interest) available = check_availability(driver) if (available): print("Found") break else: driver.quit() time.sleep(10) continue def set_viewport_size(driver): width, height = random.choice(screen_dims) window_size = driver.execute_script( """ return [window.outerWidth - window.innerWidth + arguments[0], window.outerHeight - window.innerHeight + arguments[1]]; """, width, height) driver.set_window_size(*window_size) def check_availability(driver): try: if (driver.find_element_by_id("privacy-button-id")): driver.find_element_by_id("privacy-button-id").click() except: pass try: if (driver.find_element_by_id("some-other-button")): return True except: return False 

The problem is, after the 3rd or 4th iteration in the main() loop, the website that I monitor will direct me to a Captcha page (due to frequent refreshing, I guess).

I tried several methods that I can find, like fake user-agent, different viewport size, extend the refresh frequency (wait 10s between each refresh), but none of them works.

Some stackoverflow posts I read and tried are like: this, this, and this

I don't want to interact with the captcha directly. I just want to avoid it. What I can think of is to use different IPs to send every request. However, 1. I don't know if this is helpful, 2. if it is, how can I implement this?

Are there any other choices?

Thank you for your help!

0

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.