Skip to content

url: forbid certain confusable changes from being introduced by toASCII

The legacy url.parse() function attempts to convert Unicode domains (IDNs) into their ASCII/Punycode form through the use of the toASCII function. However, toASCII can introduce or remove various characters that at best invalidate the parsed URL, and at worst cause hostname spoofing:

url.parse('http://bad.c℀.good.com/').href === 'http://bad.ca/c.good.com/'
// (from https://hackerone.com/reports/678487)

url.parse('http://\u00AD/bad.com').href === 'http:///bad.com/'

While changes to the legacy URL parser are discouraged in general, the security implications here outweigh the desire for strict compatibility. This is since this commit only changes behavior when non-ASCII characters appear in the hostname, an unusual situation for most use cases. Additionally, despite the availability of the WHATWG URL API, url.parse remain widely deployed in the Node.js ecosystem, as exemplified by the recent un-deprecation of the legacy API.

This change is similar in spirit to CPython 3.8's change (https://github.com/python/cpython/commit/16e6f7dee7f02bb81aa6b385b982dcdda5b99286) fixing bpo-36216 aka CVE-2019-9636, which also occurred despite potential compatibility concerns.

See also: #23694, #31279 (closed), https://hackerone.com/reports/678487, and https://hackerone.com/reports/738333

cc @nodejs/url

Merge request reports

Loading