util: add util.parseArgs() (!35015) · Merge requests · Rodrigo Test / Test Group-nodejs / node

Rodrigo Muino Tomonari requested to merge github/fork/boneskull/parseargs into master Sep 01, 2020

Add a function, util.parseArgs(), which accepts an array of arguments and returns a parsed object representation thereof.

Ref: https://github.com/nodejs/tooling/issues/19

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
tests and/or benchmarks are included
documentation is changed or added
commit message follows commit guidelines

Motivation

While this has been discussed at length in the Node.js Tooling Group and its associated issue, let me provide a summary:

process.argv.slice(2) is rather awkward boilerplate, especially for those new to Node.js
Parsing options without the use of a userland module reliably requires, well, about as much code as this PR adds; the complexity quickly ramps up going from one argument to two
It's often useful to add a few command-line flags in an ad-hoc manner to, say, app.js
It's often useful to parse command-line flags in example code, which would otherwise demand installation of a userland module (thus complicating things)

process.parseArgs() makes handling command-line arguments "natively" much easier. Given that so many tools are being written in Node.js, it makes sense to center this experience.

Design Considerations

First, let me acknowledge that there are many ways to parse command-line arguments. There is no standard, cross-platform, agreed-upon convention. Command-line arguments look different in Windows vs. POSIX vs. GNU, and there's much variation across programs. And these are still conventions, not hard & fast requirements. We can easily be paralyzed attempting to choose what "styles" to support or not. It is certain that there will be someone who agrees that Node.js should have this feature, but should not do it in this way.

But to implement the feature, we have to do it in some way. This is why the way is the way it is:

I have researched the various features and behavior of many popular userland command-line parsing libraries, and have distilled it down to the most commonly supported features, while striving further to trim any features which are not strictly necessary to get the bulk of the work done. While these do not align to, say, POSIX conventions, they do align with end-user expectations of how a Node.js CLI should work. What follows is consideration of a few specific features.

The Requirement of `=` for Options Expecting a Value

For example, one may argue that --foo=bar should be the only way to use the value bar for the option foo; but users of CLI apps built on Node.js expect --foo bar to work just as well. There was not a single popular argument-parsing library that did not support this behavior. Thus, process.parseArgs() supports this behavior (it cannot be automatic without introducing ambiguity, but I will discuss that later).

Combination of Single-Character Flags

Another one is combining (or concatenating?) "short flags"--those using a single hyphen, like -v--where -vD would be equivalent to -v -D. While this is a POSIX convention, it is not universally supported by the popular command-line parsers. Since it is inherently sugar (and makes the implementation more complicated), we chose not to implement it.

Data Types

Like HTML attribute values (<tag attr="1">), command-line arguments are provided to programs as strings, regardless of the data type they imply. While most of the userland arg parsers support some notion of a "data type"-i.e., this argument value is a number, string, or boolean--it is not strictly necessary. It is up to the user to handle the coercion of these values.

Default Behavior: Boolean Flags

The default behavior is to treat anything that looks like an argument (that's mainly "arguments beginning with one or more dashes") as a boolean flag. The presence of one of these arguments implies true. From investigation of popular CLI apps, we found that most arguments are treated as boolean flags, so it makes sense for this to be the default behavior. This means that a developer who just wants to know whether something is "on" or "off" will not need to provide any options to process.parseArgs().

Handling Values

Some arguments do need values, (e.g., --require my-script.js), and in order to eliminate ambiguity, the API consumer must define which arguments expect a value. This is done via the expectsValue option to process.parseArgs(), which is the only option to process.parseArgs(). This is the only option process.parseArgs() accepts.

Possible alternatives:

Rename expectsValue to something else

Repeated Arguments

It's common to need to support multiple values for a single argument, e.g., --require a.js --require b.js. In this example, require needs to be listed in the expectsValue option. The result is an object containing a require property whose value is an array of strings; ['b.js', 'c.js']. In the example of --require c.js, the value of the require property is a string, 'c.js'.

When working with boolean flags (those not declared in expectsValue), it was trivial to support the case in which repeated arguments result in a count. One -v will result in an object where {v: true}, but -v -v will result in {v: 2}. Either way, the value will be truthy.

Possible alternatives:

Every argument expecting a value (as declared in expectsValue) will parse to Array of strings, even if there is only one string in the Array (e.g., --require c.js becomes {require: ['c.js']}. That makes the API more consistent at the expense of making the common case (no repetition) slightly more awkward.
Remove the "count" behavior. While this is widely supported by modules, I don't often see it used in the wild in Node.js CLI apps.

Positional Arguments

Arguments after -- or without a dash prefix are considered "positional". These are placed into the Array property _ of the returned object. This is a convention used by many other userland argparsers in Node.js. It is always present, even if empty. This also means that _ is reserved as a flag/option name (e.g., --_ will be ignored).

Possible alternatives:

Throw if _ is provided in expectsValue

Intended Audience

It is already possible to build great arg parsing modules on top of what Node.js provides; the prickly API is abstracted away by these modules. Thus, process.parseArgs() is not necessarily intended for library authors; it is intended for developers of simple CLI tools, ad-hoc scripts, deployed Node.js applications, and learning materials.

It is exceedingly difficult to provide an API which would both be friendly to these Node.js users while being extensible enough for libraries to build upon. We chose to prioritize these use cases because these are currently not well-served by Node.js' API.

Questions

In particular, I'm not 100% confident in the terminology I chose for the documentation ("Flags", "Options", "Positionals"). This does align with other documentation I've read on the subject of CLI arguments, I am unsure if introducing this terminology to our documentation is a Good Idea. Perhaps it can be expressed without new terminology.
I sorted some files around my modification in in node.gyp, which looked like it wanted to be in order, but was not. It did not seem to affect the build, but I can revert these changes if need be.
Should it be process.parseArgv()? While it does parse process.argv by default, it does not necessarily need to be used with process.argv.
Do I need to do more input validation, throw more exceptions, or take other defensive measures?

Credits

While this is my implementation, the design is a product of work by myself, @bcoe, @ruyadorno, and @nodejs/tooling.

Admin message

Admin message

util: add util.parseArgs()

Checklist

Motivation

Design Considerations

The Requirement of = for Options Expecting a Value

Combination of Single-Character Flags

Data Types

Default Behavior: Boolean Flags

Handling Values

Repeated Arguments

Positional Arguments

Intended Audience

Questions

Credits

Merge request reports

The Requirement of `=` for Options Expecting a Value