Using HTTPkerberosauth with a javascript enabled web scraper

Asked 27 days ago

Viewed 72 times

I'm working on integration tests for a web application that's running in a Docker container within our GitLab CI/CD pipeline. The application is a frontend that requires Kerberos/SPNEGO authentication followed by BAM, a token-based authentication to access.

I've successfully configured Kerberos authentication in my GitLab CI pipeline (krb5.conf, keytab files, etc.), and I can verify that the Kerberos authentication is working correctly because when I use the requests library with requests-kerberos, I get successful responses with all the connection details and authentication tokens returned properly.

However, when I try to use Selenium WebDriver to access the same application with the same authentication setup, I consistently get a 401 Unauthorized error and completely empty HTML (). This is frustrating because I need Selenium to handle the JavaScript-rendered content on the page - the requests library alone won't work since the data I need to scrape is loaded dynamically by JavaScript after the initial page load.

What's Working When I use the requests library with Kerberos authentication, everything works perfectly. I make a request to the application URL, and I receive a proper HTTP 200 response. This HTML includes a BAM authentication form with hidden fields like bamToken, challenge, and redirectURL. The form is designed to auto-submit and redirect the user to the actual application, and the authentication flow completes successfully.

I can see all the authentication headers in the response, the cookies are set correctly, and the server is clearly accepting my Kerberos credentials. The connection details confirm that the Kerberos configuration in my GitLab pipeline is set up properly and functioning as expected.

The problem occurs when I try to use Selenium WebDriver to access the same application. I've configured Chrome with all the necessary SPNEGO authentication arguments (auth-server-allowlist, auth-negotiate-delegate-allowlist, auth-schemes set to basic,digest,ntlm,negotiate), but when Selenium tries to navigate to the URL, it receives a 401 Unauthorized response and the page source is just empty HTML tags with no content whatsoever.

I've tried multiple approaches to solve this. First, I attempted to authenticate using requests first, then transfer the cookies from that authenticated session to Selenium before navigating to the page. Even after successfully adding all the cookies to the Selenium driver, I still get the 401 error.

I've also tried extracting the BAM token form data from the successful requests response and submitting it programmatically, then transferring those post-authentication cookies to Selenium. Still no luck - same 401 error with empty HTML.

I've experimented with various Chrome arguments and options, including disabling automation detection features, setting custom user agents to match what requests uses, and even trying different authentication schemes. I've also attempted to set Chrome enterprise policies for automatic authentication. None of these approaches have resolved the issue.

It seems like Selenium's Chrome browser isn't properly presenting the Kerberos credentials to the server, even though those credentials are available in the environment and working for requests. I need to use Selenium because the page content is dynamically loaded via JavaScript, so I can't just parse the HTML from requests. Is there any way to bridge this gap - either by getting Selenium to properly authenticate, or by somehow rendering the JavaScript from a successfully authenticated requests response?

asked Oct 30 at 17:47

ben green

331 silver badge7 bronze badges

You might try using the CDP method for this: selenium.dev/documentation/webdriver/bidi/cdp/network

browsermator
– browsermator

2025-11-07 17:28:37 +00:00
Commented Nov 7 at 17:28

Add a comment |

1 Answer 1

Sorted by:

I haven't tested this, but give this a try using Chrome:

use kinit to get a ticket:

KEYTAB_FILE_PATH="/path/to/my.keytab" KERBEROS_PRINCIPAL_NAME="HTTP/[email protected]" kinit -kt "${KEYTAB_FILE_PATH}" "${KERBEROS_PRINCIPAL_NAME}"

Then setup a driver so it uses the environment variables (with the cached ticket) when launching the chromedriver process:

import os from selenium import webdriver service = webdriver.ChromeService(env=os.environ) driver = webdriver.Chrome(service=service)

answered Oct 31 at 14:30

Corey Goldberg

61.5k30 gold badges135 silver badges147 bronze badges

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Using HTTPkerberosauth with a javascript enabled web scraper

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related