1

So there is this chat website where you go, click "connect as guest", enter your username, complete a ReCaptcha v2, click "CONNECT" and you're in.

If I do this in my browser, it works normally. If I do this in the chromedriver browser controlled by selenium, I get an error. (The error is "please enter the captcha", but that's irrelevant. The server is clearly detecting me because there is a special response coming from the server, triggering this error)

Important: in both cases (my browser and chromedriver browser), I'm doing everything 100% manual! I just use selenium to launch the browser and then proceed from there. I even tried adding an option to the chromedriver browser to use my actual browser settings. It loads with my history, cookies, everything. But when I try to enter the chat room, I get the error.

I also looked online and found some people claiming websites could detect selenium by noticing some specific javascript variable "cdc_". I've edited the hex code of the chromedriver, changed the variable as instructed online, tried again, same result. I spent hours trying to figure this out...

There is one interesting thing that could help find the problem: If I have my browser opened and I try to run the python script with the chromedriver using my profile, the chromedriver browser will start and the python code will return an error saying the profile is already in use (but the browser will remain opened). Now if I try to access the chat room with this chromedriver browser, it works.

EDIT: I've looked at the requests through Fiddler for both cases and the headers are 100% identical! And I mean 100%! Even the sessionid, PHPSESSID and cfuid are the same since it uses the same profile.

The only thing changing is the post request data. More specifically the captcha response (because its a different one) and another variable s. This variable s is somehow calculated using a weird javascript file called challenge. I'm not sure what that could do or how it works.

EDIT: SOLVED I fixed this by adding an option:

options.add_argument("--disable-blink-features=AutomationControlled");

9
  • 1
    This is the topic of an ongoing arms race. Even if we knew which tactic was used in your specific instance, if it were publicly disclosed the folks behind recaptcha would find another one and repeat -- so any knowledge base entry on the topic wouldn't stay useful for long. Commented Sep 13, 2020 at 18:48
  • 1
    Ways to try to detect headless browsers include enumerating fonts, inspecting screen resolution and DPI, looking at available extensions and their behavior (including things like WebGL), etc. Commented Sep 13, 2020 at 18:50
  • 1
    The point stands: ongoing arms race. We prefer questions whose answers are likely to remain accurate over time. Commented Sep 13, 2020 at 18:51
  • 2
    There are lots of companies offering services to protect websites from bots (Selenium, puppeteer, requests, ...) - they won't tell you how to bypass their services or what is going on behind the scenes. Commented Sep 13, 2020 at 18:55
  • 4
    Whether you're using a headless browser is not particularly pertinent. The point, repeating myself once more, is that you're asking us to get involved in an arms race. As Maurice says, there are folks actively researching detection methods, and any detection mechanism for which an evasion method becomes public is likely to change to defeat that evasion. Commented Sep 13, 2020 at 18:58

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.