1

So I recently started testing selenium for some personal projects and one problem I ran into was being banned from some websites due to recaptcha v3 tests. I did some more research and found the recaptcha v3 demo and did some testing and eventually wrote this:

from selenium import webdriver from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from webdriver_manager.chrome import ChromeDriverManager options = webdriver.ChromeOptions() options.add_argument("start-maximized") options.add_experimental_option("excludeSwitches", ["enable-automation"]) options.add_experimental_option('useAutomationExtension', False) options.add_argument("--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36"); driver = webdriver.Chrome(options=options, executable_path=ChromeDriverManager().install()) driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", { "source": """ Object.defineProperty(navigator, 'webdriver', { get: () => undefined }) """ }) driver.get("https://recaptcha-demo.appspot.com/recaptcha-v3-request-scores.php") WebDriverWait(driver, 10).until(EC.title_contains("Index")) 

I have looked at various stack overflow questions including the following,

Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection

Can a website detect when you are using selenium with chromedriver?

How does recaptcha 3 know I'm using selenium/chromedriver?

and more

While the arguments added do help to improve the recaptcha v3 score, it is still extremely inconsistent. about half the time I receive a passing score of .7 and the other half I receive a failing score of .1.

Please help me to improve my recaptcha scores and consistently pass

EDIT 1: Signing into a google account in the chrome instance often changes the results of the demo, however still do not entirely prevent failing scores

6
  • 1
    The whole point of Recaptcha is to prevent automation. Perhaps the inconsistent Recaptcha score means Recaptcha is actually working as intended. Commented Jan 20, 2020 at 21:42
  • @Christine I understand this but the whole point of this project is to find a way around recaptcha so i can continue to scrape and navigate the recaptcha protected pages Commented Jan 20, 2020 at 21:45
  • 1
    please be a good internet citizen... if the site doesn't want you scraping, do not scrape it. It's likely the collection of data there is the site owner's protected intellectual property and you could be breaking the law by attempting to create a whole copy of it. Commented Jan 20, 2020 at 21:58
  • @pcalkins i have no harmful intentions nor am I copying anything this entire project was for educational purposes. However, with the introduction of recaptcha i have become increasingly curious on how to bypass it and how it works Commented Jan 20, 2020 at 22:35
  • some of the new captchas capture behavior data from different parts of the site to build a sort of profile of the user. So it's not just a score resulting from a single page or hit to a site, but from a pattern of behavior... Some site's will just detect or prevent webdriver straight away by checking for script injection. (I think they store a sort of "clean state" hash and check that.) Commented Jan 20, 2020 at 22:51

3 Answers 3

2

To increase your scrore from .7 to higher levels i.e. .9 or so you can rotate through execute_cdp_cmd() as follows:

driver.execute_cdp_cmd("Network.setExtraHTTPHeaders", {"headers": {"User-Agent": "browserClientA"}}) 

In case there is a necessity you can add multiple as follows:

driver.execute_cdp_cmd("Network.setExtraHTTPHeaders", {"headers": {"User-Agent": "browserClientA"}}) driver.execute_cdp_cmd("Network.setExtraHTTPHeaders", {"headers": {"User-Agent": "browserClientB"}}) driver.execute_cdp_cmd("Network.setExtraHTTPHeaders", {"headers": {"User-Agent": "browserClientC"}}) 

Solution

So effectively your working solution would be:

  • Code Block:

    from selenium import webdriver from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC options = webdriver.ChromeOptions() options.add_argument("start-maximized") options.add_experimental_option("excludeSwitches", ["enable-automation"]) options.add_experimental_option('useAutomationExtension', False) driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe') driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", { "source": """ Object.defineProperty(navigator, 'webdriver', { get: () => undefined }) """ }) driver.execute_cdp_cmd("Network.enable", {}) driver.execute_cdp_cmd("Network.setExtraHTTPHeaders", {"headers": {"User-Agent": "browser1"}}) driver.get("https://recaptcha-demo.appspot.com/recaptcha-v3-request-scores.php") print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "li.step3 pre.response"))).get_attribute("innerHTML")) 
  • Console Output:

    DevTools listening on ws://127.0.0.1:53748/devtools/browser/eac086e8-f1c0-42d3-8ef8-d132f4b4c82b { "success": true, "hostname": "recaptcha-demo.appspot.com", "challenge_ts": "2020-01-20T22:31:32Z", "apk_package_name": null, "score": 0.9, "action": "examples/v3scores", "error-codes": [] } 
  • Console Snapshot:

recaptcha3_score

Sign up to request clarification or add additional context in comments.

8 Comments

This helps, whenever i first do the demo i always pass, however when refreshing the demo page multiple times on the same chrome instance, the score is still inconsistent, fluctuating between .1 and .7. I am not so much looking to raise the score to a .9 but rather I need help producing a .7 consistently every time you refresh the demo page. Thank You for the help though
@ChrisYun As I mentioned, you shouldn't keep the same UA for back to back execution. You have to change it (may be iterate through a list) to keep your score high.
upon second inspection I found that even rotating user agents leads still does not prevent failing scores. Is there any known method of completely avoiding failing scores?
Additionally even the first demo is starting to fail. I added a additional section that iterates through random user agents after each demo however i still fail sometimes
The downvote placed on your answer is not mine, I simply removed the checkmark. Also I thought that the first demo would always pass however, with more testing the results seemed to fluctuate. While the user agent rotation does drastically reduce the chance of failure, it does not completely eliminate it, which is what I am trying to do
|
0

Nobody really knows except google how they score these. But... we can imagine I think some obvious factors:

  • residential / business ip vs datacenter

  • google / oauth cookies

  • obvious things like user-agent and browser fingerprinting.

HTH.

2 Comments

I appreciate the insight but do you have any ideas on a potential solution?
Hey Chris did u ever found this solution ? Please tag me if u do ty
0

If you can scrape through pages without javascript, then disabling javascript while you scrape, might do the trick for you.

1 Comment

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.