url: forbid certain confusable changes from being introduced by toASCII (!38631) · Merge requests · Rodrigo Test / Test Group-nodejs / node

Rodrigo Muino Tomonari requested to merge github/fork/TimothyGu/url-punycode into master May 11, 2021

The legacy url.parse() function attempts to convert Unicode domains (IDNs) into their ASCII/Punycode form through the use of the toASCII function. However, toASCII can introduce or remove various characters that at best invalidate the parsed URL, and at worst cause hostname spoofing:

url.parse('http://bad.c℀.good.com/').href === 'http://bad.ca/c.good.com/'
// (from https://hackerone.com/reports/678487)

url.parse('http://\u00AD/bad.com').href === 'http:///bad.com/'

While changes to the legacy URL parser are discouraged in general, the security implications here outweigh the desire for strict compatibility. This is since this commit only changes behavior when non-ASCII characters appear in the hostname, an unusual situation for most use cases. Additionally, despite the availability of the WHATWG URL API, url.parse remain widely deployed in the Node.js ecosystem, as exemplified by the recent un-deprecation of the legacy API.

This change is similar in spirit to CPython 3.8's change (https://github.com/python/cpython/commit/16e6f7dee7f02bb81aa6b385b982dcdda5b99286) fixing bpo-36216 aka CVE-2019-9636, which also occurred despite potential compatibility concerns.

cc @nodejs/url

Admin message

Admin message

url: forbid certain confusable changes from being introduced by toASCII

Merge request reports