esm: implement the getFileSystem hook (!41076) · Merge requests · Rodrigo Test / Test Group-nodejs / node

Rodrigo Muino Tomonari requested to merge github/fork/arcanis/mael/get-file-system-hook into main Dec 03, 2021

This is the implementation of https://github.com/nodejs/loaders/pull/44, as discussed at the last Loaders meeting (cc @nodejs/loaders). I leave it as draft because I expect to get feedback that could change the design, but it's already in a working state and I'd appreciate reviews. This PR includes:

A new getFileSystem hook lets user interact with the Node resolution by letting them override the code I/O dependencies of the resolution algorithms. Four functions have been ported, in both async and sync versions (look below for a more detailed sync/async discussion).
- Unlike my initial plans for a readFile hook, which would have implied to eventually create other hooks for statFile etc, this implementation takes a different approach by defining a single hook which must return an interface of utilities. This approach works better in my opinion:
  - It's more scalable; more filesystem utilities can be added without having to make them full-blown hooks, which would imo needlessly complexify the design (if only in terms of documentation; with getFileSystem, the filesystem utilities can be documented in a different place than the proper hooks).
  - Filesystem utilities are very connected to each other. As a result, it stands to reason some hook author would want to return one set of utilities or another based on various conditions (for example we could imagine a loader that would allow disabling its features by returning the default filesystem utilities if a certain environment variable is set). If each utility was its own hook, this would be convoluted to achieve.
  - It also addresses the concerns @GeoffreyBooth had by avoiding to entangle the hook calls: the getFileSystem hook is only called when Node boots, not from within other hooks (the filesystem utilities themselves will be called during the hooks execution, of course, but it seems to be this shouldn't be a problem as those functions should be assumed to have no visible side effect).
- The hook already implements the four utilities currently required for the Node resolution. I remember we discussed doing only readFile as a demo before adding the other ones, but I realized when implementing it that some things would appear over-engineered if I didn't show an example that used multiple utilities (for example, having getFileSystem that returns an object would make no sense if we only needed readFile - but we don't, which makes a design like getFileSystem more valuable).
- I made the utilities have the exact same return values as the Node core functions they replaced (in particular for internalModuleStat and InternalModuleReadJSON, which have unusual return types). I'm open to ideas as to how the utility signatures should be for the final version.
- I however normalised all utilities' inputs so that they all take a URL instance as parameter. In many case it just meant reusing url instances, and only in a few places I needed to instantiate new ones.
The demo/ folder is of course temporary and just meant to discuss how the hook looks like in practice. For this example, I implemented a loader that, given a file foo.mjs (or any other name), simulates the existence of a foo-sha512.mjs file that exports the hash of the original file). I picked this example because it's easy to conceptualise, and cannot be achieved by the other hooks.
- I considered writing a "load modules from http" loader, but it's a little more difficult due to the Node algorithms currently relying a lot on synchronous operations, even in the esm pipeline. It would still have been possible with workers and atomics, but I felt like this would be better avoided for a simple demo use case.
- I didn't write tests or documentation yet; those will be written once I un-draft the PR. For now, the demo/ folder can be seen as the reference on how to use and experiment with the PR.

Async vs sync

I initially implemented all the filesystem utilities as async functions, but quickly noticed that the Node resolution (even in the esm pipeline) was relying on them being sync. After some thoughts, I decided to implement both versions, even if some of them aren't in use right now. There are a few reasons for that:

I didn't want to rework the entirety of the Node resolution pipeline. Changing sync functions to become async was a no-goal for me, especially considering the other points that follow.
The loader implementation cost of having to write two different versions of a same hook can be offset by Node automatically reusing the sync version of an utility if the same loader declares it (check my demo loader for an example).
There's a decent case that the cjs resolution pipeline could benefit from the getFileSystem hooks as well, which would allow us to close https://github.com/nodejs/node/issues/33423. This is something I'd like to tackle once the current task is done, so I don't think it'd hurt to document both sync and async utilities and suggest implementers implement both.

Admin message

Admin message

esm: implement the getFileSystem hook

Async vs sync

Merge request reports