6

I'm using Apache HttpClient in a web crawler that is only for crawling public data.

I'd like it to be able to crawl sites with invalid certificates, no matter how invalid.

My crawler won't be passing in any usernames, passwords, etc and no sensitive data is being sent or received.

For this use case, I'd crawl the http version of a site if it exists, but sometimes it doesn't of course.

How can this be done with Apache's HttpClient?

I tried a few suggestions like this one, but they still fail for some invalid certs, for example:

failed for url:https://dh480.badssl.com/, reason:java.lang.RuntimeException: Could not generate DH keypair failed for url:https://null.badssl.com/, reason:Received fatal alert: handshake_failure failed for url:https://rc4-md5.badssl.com/, reason:Received fatal alert: handshake_failure failed for url:https://rc4.badssl.com/, reason:Received fatal alert: handshake_failure failed for url:https://superfish.badssl.com/, reason:Connection reset 

Note that I've tried this with my $JAVA_HOME/jre/lib/security/java.security file's jdk.tls.disabledAlgorithms set to nothing, to ensure this wasn't an issue, and I still get failures like the above.

2
  • 1
    It's impossible to communicate with servers that fail to do the DH key exchange or reset your connection when you connect. You can't change that client side. Commented Nov 15, 2018 at 14:19
  • Have you tried this example for Apache's HttpClient? stackoverflow.com/a/50274496/3523579 Commented Nov 23, 2018 at 17:24

5 Answers 5

6
+200

The short answer to your question, which is to specifically trust all certs, would be to use the TrustAllStrategy and do something like this:

SSLContextBuilder sslContextBuilder = new SSLContextBuilder(); sslContextBuilder.loadTrustMaterial(null, new TrustAllStrategy()); SSLConnectionSocketFactory socketFactory = new SSLConnectionSocketFactory( sslContextBuilder.build()); CloseableHttpClient httpclient = HttpClients.custom().setSSLSocketFactory( socketFactory).build(); 

However... an invalid cert may not be your main issue. A handshake_failure can occur for a number of reasons but in my experience it's usually due to a SSL/TLS version mismatch or cipher suite negotiation failure. This doesn't mean the ssl cert is "bad", it's just a mismatch between the server and client. You can see exactly where the handshake is failing using a tool like Wireshark (more on that)

While Wireshark can be great to see where it's failing, it won't help you come up with a solution. Whenever I've gone about debugging handshake_failures in the past I've found this tool particularly helpful: https://testssl.sh/

You can point that script at any of your failing websites to learn more about what protocols are available on that target and what your client needs to support in order to establish a successful handshake. It will also print information about the certificate.

For example (showing only two sections of the output of testssl.sh):

./testssl.sh www.google.com .... Testing protocols (via sockets except TLS 1.2, SPDY+HTTP2) SSLv2 not offered (OK) SSLv3 not offered (OK) TLS 1 offered TLS 1.1 offered TLS 1.2 offered (OK) .... Server Certificate #1 Signature Algorithm SHA256 with RSA Server key size RSA 2048 bits Common Name (CN) "www.google.com" subjectAltName (SAN) "www.google.com" Issuer "Google Internet Authority G3" ("Google Trust Services" from "US") Trust (hostname) Ok via SAN and CN (works w/o SNI) Chain of trust "/etc/*.pem" cannot be found / not readable Certificate Expiration expires < 60 days (58) (2018-10-30 06:14 --> 2019-01-22 06:14 -0700) .... Testing all 102 locally available ciphers against the server, ordered by encryption strength (Your /usr/bin/openssl cannot show DH/ECDH bits) Hexcode Cipher Suite Name (OpenSSL) KeyExch. Encryption Bits ------------------------------------------------------------------------ xc030 ECDHE-RSA-AES256-GCM-SHA384 ECDH AESGCM 256 xc02c ECDHE-ECDSA-AES256-GCM-SHA384 ECDH AESGCM 256 xc014 ECDHE-RSA-AES256-SHA ECDH AES 256 xc00a ECDHE-ECDSA-AES256-SHA ECDH AES 256 x9d AES256-GCM-SHA384 RSA AESGCM 256 x35 AES256-SHA RSA AES 256 xc02f ECDHE-RSA-AES128-GCM-SHA256 ECDH AESGCM 128 xc02b ECDHE-ECDSA-AES128-GCM-SHA256 ECDH AESGCM 128 xc013 ECDHE-RSA-AES128-SHA ECDH AES 128 xc009 ECDHE-ECDSA-AES128-SHA ECDH AES 128 x9c AES128-GCM-SHA256 RSA AESGCM 128 x2f AES128-SHA RSA AES 128 x0a DES-CBC3-SHA RSA 3DES 168 

So using this output we can see that if your client only supported SSLv3, the handshake would fail because that protocol isn't supported by the server. The protocol offering is unlikely the problem but you can double check what your java client supports by getting the list of enabled protocols. You can provide an overridden implementation of the SSLConnectionSocketFactory from above code snippet to get the list of enabled/supported protocols and cipher suites as follows (SSLSocket):

class MySSLConnectionSocketFactory extends SSLConnectionSocketFactory { @Override protected void prepareSocket(SSLSocket socket) throws IOException { System.out.println("Supported Ciphers" + Arrays.toString(socket.getSupportedCipherSuites())); System.out.println("Supported Protocols" + Arrays.toString(socket.getSupportedProtocols())); System.out.println("Enabled Ciphers" + Arrays.toString(socket.getEnabledCipherSuites())); System.out.println("Enabled Protocols" + Arrays.toString(socket.getEnabledProtocols())); } } 

I often encounter handshake_failure when there is a cipher suite negotiation failure. To avoid this error, your client's list of supported cipher suites must contain at least one match to a cipher suite from the server's list of supported cipher suites.

If the server requires AES256 based cipher suites you probably need the java cryptographic extensions (JCE). These libraries are nation restricted so they may not be available to someone outside the US.

More on cryptography restrictions, if you're interested: https://crypto.stackexchange.com/questions/20524/why-there-are-limitations-on-using-encryption-with-keys-beyond-certain-length

Sign up to request clarification or add additional context in comments.

2 Comments

Even though I picked this as the answer, and I understand that ultimately a server is in control of what it sends to clients, it seems crazy to me that there's no 100% way to say "ignore all security concerns server, please send me this response over our broken https contract, which is no better than http, but, whatever!". I guess it's not the norm to want this - accept https as if it was http, but i'm sure there are others out there that'd like to do the same for similar reasons!
The TrustAllStrategy does tell the server to ignore all authenticity concerns but the connection still requires a successful handshake because that's the SSL protocol. It's no longer HTTPS without the handshake... Essentially what you're looking to do is downgrade the connection to use http instead of https and yes, that's up to the server to whether or not that would be supported on a separate port (i.e. 80 instead of 443). If a server offers a webpage over http instead of https the website it no longer considered a secure website so many don't even support this anymore.
0

I think that the post you are referring is very close to what it needs to be done. Have you tried something like:

HttpClientBuilder clientBuilder = HttpClientBuilder.create(); SSLContextBuilder sslContextBuilder = SSLContextBuilder.create(); sslContextBuilder.setSecureRandom(new java.security.SecureRandom()); try { sslContextBuilder.loadTrustMaterial(new TrustStrategy() { @Override public boolean isTrusted(X509Certificate[] arg0, String arg1) throws CertificateException { return true; } }); clientBuilder.setSSLContext(sslContextBuilder.build()); } catch (Throwable t) { Logger.getLogger(getClass().getName()).log(Level.SEVERE, "Can't set ssl context", t); } CloseableHttpClient apacheHttpClient = clientBuilder.build(); 

I have not tried this code but hopefully it could work.

Cheers

Comments

0

If you are fine to use other open source libraries like netty then worth trying below:

SslProvider provider = SslProvider.JDK; // If you are not concerned about http2 / http1.1 then JDK provider will be enough SSLContext sslCtx = SslContextBuilder.forClient() .sslProvider(provider) .trustManager(InsecureTrustManagerFactory.INSTANCE) // This will trust all certs ... // Any other required parameters used for ssl context.e.g. protocols , ciphers etc. .build(); 

I have used below version of netty for trusting any certificates with above code:

<dependency> <groupId>io.netty</groupId> <artifactId>netty-all</artifactId> <version>4.1.29.Final</version> </dependency> 

1 Comment

No need for any additional library as org.apache.http.conn.ssl.TrustAllStrategy can be used and is part of httpclient.
0

I think @nmorenor answer is pretty close to the mark. What I would have done in addition is explicitly enabling SSLv3 (HttpClient automatically disables it by default due to security concerns) and disabling host name verification.

SSLContext sslContext = SSLContexts.custom() .loadTrustMaterial((chain, authType) -> true) .build(); CloseableHttpClient client = HttpClients.custom() .setSSLSocketFactory(new SSLConnectionSocketFactory(sslContext, new String[]{"SSLv3", "TLSv1", "TLSv1.1", "TLSv1.2"}, null, NoopHostnameVerifier.INSTANCE)) .build(); 

2 Comments

As @Friwi already pointed out in the comments, you won't be able to handle issues in the key exchange phase of TLS with this. To be able to handle this you will need a custom (SSL)Socket implementation.
@dpr If one wants control beyond APIs provided by JSSE, yes, one might need a custom JSSE provider.
0

You can do it with core jdk too, but iirc, httpclient also allows you to set the SSL Socket Factory too.

The factory defines and uses a ssl context that you construst with a trust manager. That manager would simply not verify the cert chain, as shown in above post.

You also need a hostnameverifier instance that would also choose to ignore the potential mismatch of cert hostname with the url's host (or ip). Otherwise, it would still fail even if the cert signer is blindly trusted.

I used to convert many client stack to 'accept self-signed' and it's quite easy in most stack. The worse cases is when the 3rd party lib doesn't allow choosing a ssl socket factory instance but only its clasname. In that case, I use a ThreadLocalSSLSocketFactory which doesn't own any actual factory but simply looks up the threadlocal to find one that the upper stackframes (that you can control) would have prepared. This only works if the 3rd party lib is not doing the work on distinct thread of course. I know http client can be told to use a specific ssl socket factory so this is easy.

Also take the time to read the JSSE doc, it is totally worth the time it takes to read.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.