47

I'm curious how reCAPTCHA v3 works. Specifically the browser fingerprinting.

When I launch an instance of Chrome through Selenium/chromedriver and test against reCAPTCHA 3 (https://recaptcha-demo.appspot.com/recaptcha-v3-request-scores.php) I always get a score of 0.1 when using Selenium/chromedriver.

When using incognito with a normal instance, I get 0.3.

I've beaten other detection systems by injecting JavaScript and modifying the web driver object and recompiling webdriver from source and modifying the $cdc_ variables.

I can see what looks like some obfuscated POST back to the server, so I'm going to start digging there.

What might it be looking for to determine if I'm running Selenium/chromedriver?

3
  • this question is often asked on the Internet... Commented Apr 3, 2019 at 18:14
  • 2
    While this may be an interesting question, it's not a programming question and doesn't fit on SO. Commented Apr 3, 2019 at 20:44
  • 2
    For me reCaptcha v3 does not detect Selenium (Firefox IDE) as a bot and returns a score of 0.9. Commented Jan 5, 2022 at 20:00

3 Answers 3

55

reCaptcha

Websites can easily detect the network traffic and identify your program as a BOT. Google have already released 5(five) reCAPTCHA to choose from when creating a new site. While four of them are active and reCAPTCHA v1 being shutdown.


reCAPTCHA versions and types

  • reCAPTCHA v3 (verify requests with a score): reCAPTCHA v3 allows you to verify if an interaction is legitimate without any user interaction. It is a pure JavaScript API returning a score, giving you the ability to take action in the context of your site: for instance requiring additional factors of authentication, sending a post to moderation, or throttling bots that may be scraping content.
  • reCAPTCHA v2 - "I'm not a robot" Checkbox: The "I'm not a robot" Checkbox requires the user to click a checkbox indicating the user is not a robot. This will either pass the user immediately (with No CAPTCHA) or challenge them to validate whether or not they are human. This is the simplest option to integrate with and only requires two lines of HTML to render the checkbox.

newCaptchaAnchor

  • reCAPTCHA v2 - Invisible reCAPTCHA badge: The invisible reCAPTCHA badge does not require the user to click on a checkbox, instead it is invoked directly when the user clicks on an existing button on your site or can be invoked via a JavaScript API call. The integration requires a JavaScript callback when reCAPTCHA verification is complete. By default only the most suspicious traffic will be prompted to solve a captcha. To alter this behavior edit your site security preference under advanced settings.

reCaptcha_invisible_badge

  • reCAPTCHA v2 - Android: The reCAPTCHA Android library is part of the Google Play services SafetyNet APIs. This library provides native Android APIs that you can integrate directly into an app. You should set up Google Play services in your app and connect to the GoogleApiClient before invoking the reCAPTCHA API. This will either pass the user through immediately (without a CAPTCHA prompt) or challenge them to validate whether they are human.
  • reCAPTCHA v1: reCAPTCHA v1 has been shut down since March 2018.

Solution

However there are some generic approaches to avoid getting detected while web-scraping:


Outro

Some food for thought:

Sign up to request clarification or add additional context in comments.

3 Comments

change user agent make hcaptcha not resolved , they return 403, any new solution?
1st solution - vary window size?
Does this answer answer the question "I'm curious how reCAPTCHA v3 works"?
10

Selenium and Puppeteer have some browser configurations that is different from a non-automated browser. Also, since some JavaScript functions are injected into browser to manipulate elements, you need to create some override to avoid detections.

There are some good articles explaining some points about Selenium and Puppeteer detection while it runs on a site with detection mechanisms:

Detecting Chrome headless, new techniques - You can use it to write defensive code for your bot.

It is not possible to detect and block Google Chrome headless - it explains in a clear and sound way the differences that JavaScript code can detect between a browser launched by automated software and a real one, and also how to fake it.

GitHub - headless-cat-n-mouse - Example using Puppeteer + Python to avoid detection

Comments

1

I'm one of the authors mentioned in the answers of this thread. As discussed, reCAPTCHA v3, and anti-bot detection in general, tend to rely on browser fingerprinting challenges to detect side effects linked to headless browsers and automated browsers.

There exists different tests to detect Selenium. Since it tends to evolve frequently, I wrote an updated article to explain how (headless) Chrome (even when modified) instrumented with Selenium can be detected as of June 2024.

Testing the presence of the HeadlessChrome substring in the user agent and verifying the value of navigator.webdriver is still helpful against bots that don't modify too much their fingerprint.

Otherwise, there is a new detection techniques that aims to detect CDP automation (Chrome devtool protocol) used by instrumentation frameworks like Selenium.

The new test looks as follows:

var cdpDetected = false; var e = new Error(); Object.defineProperty(e, 'stack', { get() { cdpDetected = true; } }); // This is part of the detection, the console.log shouldn't be removed! console.log(e); if (cdpDetected) { isBot = true; } 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.