Hoping an expert can help me with a Selenium/Cloudflare mystery. I can get a website to load in normal (non-headless) Selenium, but no matter what I try, I can't get it to load in headless.
I have followed the suggestions from the StackOverflow posts like Is there a version of Selenium WebDriver that is not detectable?. I've also looked at all the properties of window and window.navigator objects and fixed all the diffs between headless and non-headless, but somehow headless is still being detected. At this point I am extremely curious how Cloudflare could possibly figure out the difference. Thank you for the time!
List of the things I have tried:
- User-agent
- Replace
cdc_with another string in chromedriver options.add_experimental_option("excludeSwitches", ["enable-automation"])options.add_experimental_option('useAutomationExtension', False)options.add_argument('--disable-blink-features=AutomationControlled')(this was necessary to get website to load in non-headless)- Set
navigator.webdriver = undefined - Set
navigator.plugins,navigator.languages, andnavigator.mimeTypes - Set
window.ScreenY,window.screenTop,window.outerWidth,window.outerHeightto be nonzero - Set
window.chromeandwindow.navigator.chrome - Set width and height of images to be nonzero
- Set WebGL parameters
- Fix
Modernizr
Replicating the experiment
In order to get the website to load in normal (non-headless) Selenium, you have to follow a _blank link from another website (so that the target website opens in another tab). To replicate the experiment, first create an html file with the content <a href="https://poocoin.app" target="_blank">link</a>, and then paste the path to this html file in the following code.
The version below (non-headless) runs fine and loads the website, but if you set options.headless = True, it will get stuck on Cloudflare.
from selenium import webdriver import time # Replace this with the path to your html file FULL_PATH_TO_HTML_FILE = 'file:///Users/simplepineapple/html/url_page.html' def visit_website(browser): browser.get(FULL_PATH_TO_HTML_FILE) time.sleep(3) links = browser.find_elements_by_xpath("//a[@href]") links[0].click() time.sleep(10) # Switch webdriver focus to new tab so that we can extract html tab_names = browser.window_handles if len(tab_names) > 1: browser.switch_to.window(tab_names[1]) time.sleep(1) html = browser.page_source print(html) print() print() if 'Charts' in html: print('Success') else: print('Fail') time.sleep(10) options = webdriver.ChromeOptions() # If options.headless = True, the website will not load options.headless = False options.add_argument("--window-size=1920,1080") options.add_experimental_option("excludeSwitches", ["enable-automation"]) options.add_experimental_option('useAutomationExtension', False) options.add_argument('--disable-blink-features=AutomationControlled') options.add_argument('user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36') browser = webdriver.Chrome(options = options) browser.execute_cdp_cmd('Page.addScriptToEvaluateOnNewDocument', { "source": ''' Object.defineProperty(navigator, 'webdriver', { get: () => undefined }); Object.defineProperty(navigator, 'plugins', { get: function() { return {"0":{"0":{}},"1":{"0":{}},"2":{"0":{},"1":{}}}; } }); Object.defineProperty(navigator, 'languages', { get: () => ["en-US", "en"] }); Object.defineProperty(navigator, 'mimeTypes', { get: function() { return {"0":{},"1":{},"2":{},"3":{}}; } }); window.screenY=23; window.screenTop=23; window.outerWidth=1337; window.outerHeight=825; window.chrome = { app: { isInstalled: false, }, webstore: { onInstallStageChanged: {}, onDownloadProgress: {}, }, runtime: { PlatformOs: { MAC: 'mac', WIN: 'win', ANDROID: 'android', CROS: 'cros', LINUX: 'linux', OPENBSD: 'openbsd', }, PlatformArch: { ARM: 'arm', X86_32: 'x86-32', X86_64: 'x86-64', }, PlatformNaclArch: { ARM: 'arm', X86_32: 'x86-32', X86_64: 'x86-64', }, RequestUpdateCheckStatus: { THROTTLED: 'throttled', NO_UPDATE: 'no_update', UPDATE_AVAILABLE: 'update_available', }, OnInstalledReason: { INSTALL: 'install', UPDATE: 'update', CHROME_UPDATE: 'chrome_update', SHARED_MODULE_UPDATE: 'shared_module_update', }, OnRestartRequiredReason: { APP_UPDATE: 'app_update', OS_UPDATE: 'os_update', PERIODIC: 'periodic', }, }, }; window.navigator.chrome = { app: { isInstalled: false, }, webstore: { onInstallStageChanged: {}, onDownloadProgress: {}, }, runtime: { PlatformOs: { MAC: 'mac', WIN: 'win', ANDROID: 'android', CROS: 'cros', LINUX: 'linux', OPENBSD: 'openbsd', }, PlatformArch: { ARM: 'arm', X86_32: 'x86-32', X86_64: 'x86-64', }, PlatformNaclArch: { ARM: 'arm', X86_32: 'x86-32', X86_64: 'x86-64', }, RequestUpdateCheckStatus: { THROTTLED: 'throttled', NO_UPDATE: 'no_update', UPDATE_AVAILABLE: 'update_available', }, OnInstalledReason: { INSTALL: 'install', UPDATE: 'update', CHROME_UPDATE: 'chrome_update', SHARED_MODULE_UPDATE: 'shared_module_update', }, OnRestartRequiredReason: { APP_UPDATE: 'app_update', OS_UPDATE: 'os_update', PERIODIC: 'periodic', }, }, }; ['height', 'width'].forEach(property => { const imageDescriptor = Object.getOwnPropertyDescriptor(HTMLImageElement.prototype, property); // redefine the property with a patched descriptor Object.defineProperty(HTMLImageElement.prototype, property, { ...imageDescriptor, get: function() { // return an arbitrary non-zero dimension if the image failed to load if (this.complete && this.naturalHeight == 0) { return 20; } return imageDescriptor.get.apply(this); }, }); }); const getParameter = WebGLRenderingContext.getParameter; WebGLRenderingContext.prototype.getParameter = function(parameter) { if (parameter === 37445) { return 'Intel Open Source Technology Center'; } if (parameter === 37446) { return 'Mesa DRI Intel(R) Ivybridge Mobile '; } return getParameter(parameter); }; const elementDescriptor = Object.getOwnPropertyDescriptor(HTMLElement.prototype, 'offsetHeight'); Object.defineProperty(HTMLDivElement.prototype, 'offsetHeight', { ...elementDescriptor, get: function() { if (this.id === 'modernizr') { return 1; } return elementDescriptor.get.apply(this); }, }); ''' }) visit_website(browser) browser.quit() 