arrg

Flexible argument parsing for R scripts

https://github.com/jonclayden/arrg

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.9%) to scientific vocabulary

Keywords

command-line command-line-parser r
Last synced: 6 months ago · JSON representation

Repository

Flexible argument parsing for R scripts

Basic Info
  • Host: GitHub
  • Owner: jonclayden
  • Language: R
  • Default Branch: main
  • Homepage:
  • Size: 78.1 KB
Statistics
  • Stars: 7
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Topics
command-line command-line-parser r
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme Changelog

README.Rmd

```{r, echo=FALSE}
knitr::opts_chunk$set(collapse=TRUE)
options(width=70)
```

[![CRAN version](https://www.r-pkg.org/badges/version/arrg)](https://cran.r-project.org/package=arrg) [![CI](https://github.com/jonclayden/arrg/actions/workflows/ci.yaml/badge.svg)](https://github.com/jonclayden/arrg/actions/workflows/ci.yaml) [![codecov](https://codecov.io/gh/jonclayden/arrg/graph/badge.svg?token=CZCLmaK8Ty)](https://codecov.io/gh/jonclayden/arrg) [![Dependencies](https://tinyverse.netlify.app/badge/arrg)](https://cran.r-project.org/package=arrg)

# arrg: Flexible argument parsing for R scripts

[R](https://www.r-project.org) is a scripting language. While often used interactively in exploratory data science, or run batch-style to replicate a previous analysis, the language can also be used for shell-like scripting at a command line. This usage is supported by the `Rscript` binary provided with R, and by [the `littler` project](https://github.com/eddelbuettel/littler), but parsing script arguments requires additional effort.

There are several R packages available to provide option and argument parsing, and choosing between them is largely a matter of taste. [`docopt`](http://docopt.org) is cute, and [available for R](https://cran.r-project.org/package=docopt) thanks to Edwin de Jonge, but I find [some of its heuristics strange](https://mastodon.online/@jonclayden/112213538389204350) and that puts me off. Also available on CRAN are [`argparse`](https://CRAN.R-project.org/package=argparse), [`argparser`](https://CRAN.R-project.org/package=argparser), [`batch`](https://CRAN.R-project.org/package=batch), [`defineOptions`](https://cran.r-project.org/package=defineOptions), [`getopt`](https://cran.r-project.org/package=getopt), [`GetoptLong`](https://cran.r-project.org/package=GetoptLong), [`optigrab`](https://cran.r-project.org/package=optigrab), [`optparse`](https://cran.r-project.org/package=optparse) and [`scribe`](https://cran.r-project.org/package=scribe), so there's no shortage of options.

But none of these suited me, [because I'm picky](#why-arrg), so I wrote my own.

## Installation and status

The latest release version of the `arrg` package is available [on CRAN](https://cran.r-project.org/package=arrg). The development version can be installed using the `remotes` package if required:

```{r, eval=FALSE}
# install.packages("remotes")
remotes::install_github("jonclayden/arrg")
```

**The package is still in an experimental phase**, so the interface and syntax are likely to change as it is developed. Among features planned but not currently implemented are default patterns, subcommands and automatic help options.

## Creating a parser

The package's key function is also called `arrg`. It takes structured information about how your command is to be used, and creates a parser to extract the necessary information from user arguments, and check them for validity.

Here's an example.

```{r}
#! /usr/bin/env Rscript --vanilla

library(arrg)

parser <- arrg("test",
    opt("h,help", "Display this usage information and exit"),
    opt("n,times", "Run test the specifed number of times", arg="count", default=1L),
    opt("t,time", "Print the overall run-time once the test is completed"),
    opt("install", "Install the code before testing it"),
    patterns=list(
        pat(options="h!"),
        pat("command", "arg...?", options="nt"),
        pat("path?", options="n,t,install")),
    header="Test your code",
    footer="Run the test on the code at the specified path (default \".\"), or run a specific command.")
```

There's quite a lot going on here, so let's take it a bit at a time.

The ["shebang"](https://en.wikipedia.org/wiki/Shebang_(Unix)) line starting `#!` allows the script to be run directly from the command line without explicitly passing it through `Rscript` or `r`. It's not required but is a helpful convenience on Unix-like systems.

The first argument to `arrg()`, in this case `"test"`, defines the name of the command. That should generally match the name of the file, but this is an abstract example without a file so it could be anything suitable.

The various calls to `opt()` specify options that the command accepts. When the user calls the command they would need to preface them with one or two hyphens, for short and long versions respectively, but when specifying them this is optional. A short (single-character) or long (word-length) option must be specified, or both, separated by a comma as in `"h,help"`. The second argument gives a description of the option to help the user, which is displayed in the usage output.

Some options take an argument, like the `-n` or `--times` option above. In that case, we specify the name of the argument (`arg="count"`) and, optionally, a default value (`default=1L`). Since the default is of integer mode, any specified value of this argument will be coerced to integer.

Patterns give mutually exclusive ways in which the command may be used, and will be displayed separately in the usage information. Here, we have one pattern that requires the `-h` (or `--help`) option, as the exclamation mark indicates. The second pattern accepts a subcommand and possible arguments (the ellipsis, `...`, indicates one or more values, and the question mark, `?`, that the argument is optional), and accepts the `-n` and `-t` options. The third takes an optional path and accepts the `-n`, `-t` and `--install` options.

This is all made explicit to the user in the usage information, which can be shown using the `show()` method. This is also where the `header` and `footer` arguments above come in, as they're shown before and after the usage and option summary:

```{r}
parser$show()
```

Note that the text blocks are wrapped neatly, each pattern is shown separately, optional arguments are surrounded by square brackets, argument names are used in appropriate places, and so on.

## Parsing arguments

The `parse()` method then does the actual parse. By default it takes script arguments from the result of `commandArgs(trailingOnly=TRUE)`, but arguments can also be specified explicitly. (We would use `argv` for `littler`.)

```{r}
parser$parse("-h")
```

The result is a list with the valid arguments included, keyed by their argument names, long option labels or short option labels, in that order of preference. Note that `-h` is a flag, so its value is a Boolean (`TRUE`/`FALSE`) value. In practice this form would prompt us to `show()` the usage information above. A more interesting example could be

```{r}
parser$parse(c("-tn3", "--install", "."))
```

Here, we're using the third pattern, which `arrg` distinguishes from the second by the presence of the `--install` option, which the second pattern doesn't accept. Note that `-tn3` combines the `-t` option with the `-n` option and its argument, which is a common space-saving syntax that `arrg` supports; and that the argument to the `-n` (times) option gets converted to integer mode automatically to match the default value.

The script can now use the parsed argument list to implement its core functionality.

## Why `arrg`?

OK, so why might you want to use `arrg` in preference to one of the many alternatives for R? As I stated at the outset, this is largely a question of taste, but here are my reasons.

For a fairly basic piece of functionality like this, portability is important to me, and that means that I want dependencies to be minimal, especially outside the R ecosystem. That excludes the `argparse` package, which requires Python, and `GetoptLong`, which requires Perl.

Package `batch` doesn't use Unix-style options, but rather an argument list of alternating variable names and values which are interpreted as R expressions. It's an neat and simple solution, but a little verbose to use in my opinion, especially for simple on/off flags. `getopt` is very bare-bones, and is essentially subsumed by `optparse`. I couldn't get `defineOptions` to work as expected using the documentation.

That still leaves various alternatives. `docopt` is a solid option if its heuristics work for you. Indeed, it parses `arrg`'s help string correctly for the test case above, and is the only other package that seems to handle multiple usage patterns and short-option clusters like `-tn3`.

```{r}
# Write the usage help output to a character vector
parser$show(textConnection("help", "w"))

docopt::docopt(paste(help,collapse="\n"), c("-tn3", "--install", "."))
```

`optigrab` doesn't use a parser object or up-front interface specification, but just searches for each option on demand. As a result, the requested options must already be in scope when its functions are called. It generates basic usage for options it has seen when `opt_help()` is called.

```{r}
library(optigrab)

opts <- c("-t", "-n", "3", "--install", ".")

list(help=opt_get(c("h","help"), FALSE, description="Display this usage information and exit", opts=opts),
    times=opt_get(c("n","times"), 1L, description="Run test the specifed number of times", opts=opts),
    time=opt_get(c("t","time"), FALSE, description="Print the overall run-time once the test is completed", opts=opts),
    install=opt_get("install", FALSE, description="Install the code before testing it", opts=opts),
    opt_get_verb(opts=opts))

# This will quit a non-interative R session
# opt_help(opts="--help")
```

The `optparse` package works similarly to `arrg`, although it has quite basic support for positional arguments and creates only a simple usage block by default.

```{r}
library(optparse)

op.parser <- OptionParser(prog="test",
    add_help_option=FALSE,
    description="Test your code",
    epilogue="Run the test on the code at the specified path (default \".\"), or run a specific command.") |>
    add_option(c("-h","--help"), "store_true", help="Display this usage information and exit") |>
    add_option(c("-n","--times"), "store", default=1L, help="Run test the specifed number of times") |>
    add_option(c("-t","--time"), "store_true", help="Print the overall run-time once the test is completed") |>
    add_option("--install", "store_true", help="Install the code before testing it")

print_help(op.parser)
parse_args(op.parser, "-h", print_help_and_exit=FALSE, positional_arguments=TRUE)
parse_args(op.parser, c("-t", "-n", "3", "--install", "."), print_help_and_exit=FALSE, positional_arguments=TRUE)
```

And finally, `scribe` generates nice help output, but I couldn't figure out how to fully suppress its default `--help` and `--version` options:

```{r}
library(scribe)

s.parser <- command_args(c("-t", "-n", "3", "--install", "."), include=NA_character_)
s.parser$add_description("Test your code")

s.parser$add_argument("-h", "--help", action="flag", options=list(no=FALSE), info="Display this usage information and exit")
s.parser$add_argument("-n", "--times", n=1L, default=1L, info="Run test the specifed number of times")
s.parser$add_argument("-t", "--time", action="flag", options=list(no=FALSE), info="Print the overall run-time once the test is completed")
s.parser$add_argument("--install", action="flag", options=list(no=FALSE), info="Install the code before testing it")
s.parser$add_argument("command", n=1L, info="Command to run or path to code")

s.parser$help()
s.parser$parse()
```

Owner

  • Name: Jon Clayden
  • Login: jonclayden
  • Kind: user
  • Location: United Kingdom
  • Company: University College London

Image processing, scientific computing and data science, mostly with R and C++

GitHub Events

Total
  • Push event: 1
Last Year
  • Push event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 208 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 1
  • Total maintainers: 1
cran.r-project.org: arrg

Flexible Argument Parsing for R Scripts

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 208 Last month
Rankings
Dependent packages count: 28.1%
Dependent repos count: 34.6%
Average: 49.8%
Downloads: 86.6%
Maintainers (1)
Last synced: 6 months ago