Skip to content

urlparse does not correctly handle signs, underscores, and whitespace in port numbers #96035

@kenballus

Description

@kenballus

Background

RFC 3986 (spec for URIs) defines a valid port string with the following grammar rule:

  • port = *DIGIT

Here's the WHATWG URL spec definition:
"""
A URL-port string must be one of the following:

  • the empty string
  • one or more ASCII digits representing a decimal number no greater than $2^{16} − 1$.

"""1

The bug

This is the port string parsing code from Lib/urllib/parse.py:166-176:

def port(self): port = self._hostinfo[1] if port is not None: try: port = int(port, 10) except ValueError: message = f'Port could not be cast to integer value as {port!r}' raise ValueError(message) from None if not ( 0 <= port <= 65535): raise ValueError("Port out of range 0-65535") return port

This will erroneously validate strings "-0" and f"+{x}" for any value of x in the valid range. Given that + and - are not digits, this behavior is in violation of both specifications.

This bug is easily reproducible with the following snippet:

from urllib.parse import urlparse url1 = urlparse("http://python.org:-0") url2 = urlparse("http://python.org:+80") print(url1.port) # prints 0, but error is expected print(url2.port) # prints 80, but error is expected

Happy to submit a PR, but don't want to step on any toes over at #25774.

My environment

  • CPython version tested on:
    • 3.10.6
  • Operating system and architecture:
    • Arch Linux x86_64

Footnotes

  1. Given that this is urlparse and not uriparse, it seems appropriate that we do not accept port numbers outside range(2**16), even though such numbers are allowed by RFC 3986.

Metadata

Metadata

Assignees

Labels

3.10only security fixes3.11only security fixes3.12only security fixesstdlibStandard Library Python modules in the Lib/ directorytriagedThe issue has been accepted as valid by a triager.type-bugAn unexpected behavior, bug, or error

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions