Substitute host name with its IP address in HTTPS requests

Question

I'm working on a web crawler and I'm trying to understand how the IP substitution works.

From what I have read, DNS hostname should be resolved to its IP address (one of many) and used instead of the host name in requests. Supposedly, it should improve performance (by resolving and caching), because the user agent no longer needs to resolve DNS itself.

It doesn't seem to work with HTTPS. I tried the following approaches:

With Node.js and playwright:

import { chromium } from "playwright"; import { resolve4 } from "dns/promises"; export const crawlPage = async (pageUrl: string) => { const url = new URL(pageUrl); const dns = await resolve4(url.hostname); console.log(dns); const ip = dns[0]!; const browser = await chromium.launch(); const context = await browser.newContext(); const page = await context.newPage(); await page.goto(pageUrl); console.log(`Page title: ${await page.title()}`); url.hostname = ip; await page.goto(url.toString()); console.log(`Page title: ${await page.title()}`); await browser.close(); };

And invoked like this: await crawlPage("https://example.com");

The output looks like this:

[ '23.192.228.80', '23.192.228.84', ... ] Page title: Example Domain node:internal/process/promises:391 triggerUncaughtException(err, true /* fromPromise */); ^ page.goto: net::ERR_CERT_COMMON_NAME_INVALID at https://23.192.228.80/ Call log: - navigating to "https://23.192.228.80/", waiting until "load" ... internal call stack Node.js v20.18.0

With curl it looks similar:

$ curl -H "Host: example.com" https://23.192.228.80 curl: (60) schannel: SNI or certificate check failed: SEC_E_WRONG_PRINCIPAL (0x80090322) - The target principal name is incorrect. More details here: https://curl.se/docs/sslcerts.html curl failed to verify the legitimacy of the server and therefore could not establish a secure connection to it. To learn more about this situation and how to fix it, please visit the webpage mentioned above.

How should it look like to work?

P.S. Am I overthinking this? Should I just drop it and use hostname?

"Am I overthinking this" - yes. If you need dns caching then use it. If you can't do this system-wide then github.com/microsoft/playwright/issues/2767 — Estus Flask
– Estus Flask, Commented Mar 18 at 13:34
"From what I have read, DNS hostname should be resolved to its IP address (one of many) and used instead of the host name in requests." - either what you've read is wrong or you interpreted it the wrong way. Unfortunately it is not clear what you've read - any references are missing. You cannot simply drop the hostname and use IP instead, otherwise you might get TLS handshake errors, certificate errrors or simply get the wrong content. If you want faster DNS lookup for repeated requests to the same domain you should use a local DNS cache instead as recommended in the previous comment. — Steffen Ullrich
– Steffen Ullrich, Commented Mar 18 at 14:31
That's interesting. How then the browsers do it? Anyway, most of the web-crawler articles (like Grokking System Design) say it's a must-have. Here's a quote from educative.io: "DNS resolver: The web crawler needs a DNS resolver to map hostnames to IP addresses for HTML content fetching. Since DNS lookup is a time-consuming process, a better approach is to create a customized DNS resolver and cache frequently-used IP addresses within their time-to-live because they’re bound to change after their time-to-live." — DK3Z
– DK3Z, Commented Mar 19 at 10:38

DK3Z · Accepted Answer · 2025-03-21 11:36:33Z

I initially tried substituting a hostname with its resolved IP address directly, expecting it to improve performance by skipping DNS resolution on the client side. However, this approach failed with HTTPS due to certificate verification errors (ERR_CERT_COMMON_NAME_INVALID). The issue arises because TLS certificates are issued for domain names, not raw IPs. When accessing a site via its IP, the browser expects a certificate for that IP, which typically doesn’t exist.

The Solution: Using a SOCKS5 Proxy

To properly resolve DNS while avoiding certificate issues, I implemented a SOCKS5 proxy. Since SOCKS5 operates at the network layer, it handles DNS resolution separately and forwards requests using the correct hostname during the TLS handshake.

Here’s how I configured Playwright to use the proxy:

 const browser = await chromium.launch(); const context = await browser.newContext({ proxy: { server: `socks5://localhost:${port}` }, }); const page = await context.newPage();

And here’s my basic SOCKS5 proxy implementation:

 import * as net from "net"; import { DnsResolved, LookupDns, ResolveDns } from "./core/context.types"; import { ResolvedDns } from "./core/types"; const SOCKS_VERSION = 5; type Props = { port: number; lookupDns: LookupDns; resolveDns: ResolveDns; dnsResolved: DnsResolved; }; export const startSocksProxy = async ({ port, lookupDns, resolveDns, dnsResolved, }: Props) => { const resolveDnsIp = async (hostname: string): Promise<ResolvedDns> => { let resolvedDns = await lookupDns(hostname); if (!resolvedDns) { resolvedDns = await resolveDns(hostname); if (!resolvedDns?.addresses?.length) { throw new Error(`Failed to resolve DNS for ${hostname}`); } setImmediate(() => dnsResolved(hostname, resolvedDns)); } return resolvedDns; }; const server = net.createServer((clientSocket) => { clientSocket.once("data", async (buffer) => { const [socksVersion, , ...authMethods] = buffer; if (socksVersion !== SOCKS_VERSION || !authMethods.includes(0x00)) { clientSocket.destroy(); return; } clientSocket.write(Buffer.from([SOCKS_VERSION, 0x00])); clientSocket.once("data", async (innerBuffer) => { const [,, , addressType, ...rest] = innerBuffer; let targetIp = "", targetHost = "", targetPort: number; if (addressType === 0x03) { // Domain name const domainLength = rest[0] ?? 0; targetHost = innerBuffer.subarray(5, 5 + domainLength).toString(); targetPort = innerBuffer.readUInt16BE(5 + domainLength); targetIp = (await resolveDnsIp(targetHost)).addresses[0]?.address ?? ""; } else { clientSocket.destroy(); return; } const remoteSocket = net.createConnection(targetPort, targetIp, () => { clientSocket.write(Buffer.from([SOCKS_VERSION, 0x00, 0x00, 0x01, 0, 0, 0, 0, 0, 0])); clientSocket.pipe(remoteSocket); remoteSocket.pipe(clientSocket); }); }); }); clientSocket.on("error", (err) => console.error(`Client error: ${err.message}`)); }); server.listen(port, () => console.log(`SOCKS5 Proxy Server running on port ${port}`)); };

Why This Works

With this setup, the proxy resolves hostnames and forwards requests while keeping the original domain name intact during the TLS handshake. This avoids certificate errors while also improving performance through DNS caching and optimization.

This solution allowed me to achieve my original goal—resolving DNS manually while keeping HTTPS working properly.

Collectives™ on Stack Overflow

Substitute host name with its IP address in HTTPS requests

1 Answer 1

The Solution: Using a SOCKS5 Proxy

Why This Works

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

The Solution: Using a SOCKS5 Proxy

Why This Works

Comments

Related