luct

https://github.com/nerc-ceh/luct

Science Score: 72.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
✓
Committers with academic emails
1 of 2 committers (50.0%) from academic institutions
✓
Institutional organization owner
Organization nerc-ceh has institutional domain (www.ceh.ac.uk)
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.7%) to scientific vocabulary

Last synced: 6 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: NERC-CEH
License: other
Language: R
Default Branch: master
Size: 91.7 MB

Statistics

Stars: 2
Watchers: 4
Forks: 1
Open Issues: 0
Releases: 0

Created over 4 years ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation

Tracking Land-Use Change

This is an open, reproducible, computational research project on land-use change in the UK.

The code is developed by:

Peter Levy

based on a combination of targets and workflowr packages in R.

The background to the method for the estimation of land-use change is described in this paper.

Project dependencies

This is an open, shareable, reproducible, computational research project.

All the computational work and document preparation is done with the R statistical computing environment.
The research project is contained in a single directory, with the exception that some data sets are too large to store on GitHub.
We use the renv package to manage the R package versions used by the project
We are using the targets package to structure the project so that the work is computationally reproducible.
The project code and documents are shared publicly on GitHub at https://github.com/NERC-CEH/luct
The main report is produced using bookdown and shared publicly on GitHub at https://nerc-ceh.github.io/luct/
We are exploring the workflowr package to structure the project so that all the materials and outputs are available via an openly accessible, automatically generated website. However, GitHub cannot currently show both the bookdown and workflowr website documents simultaneously, so this is still under investigation.

Workflow management

The project uses the R targets package to structure and manage the workflow and to make it reproducible. Central to this is the idea of the workflow as a "pipeline" - a defined list of functions which transform data. Here, the core pipeline contains the computational steps that read, reformat and process the input data (time series and maps of land use and land-use change data), and run the data assimilation steps that estimate the matrices of land-use change, and produce the maps of past land use. Potentially there can be multiple pipelines, which produce other analyses, reports, or publications, in addition to the core process. These are used to generate documentation in the form of web pages with the workflowr package, but are not discussed further here.

The core pipeline is defined in the file _targets.R as a list of "targets".
The targets represent the steps in the series of computations which make up the pipeline. A target is defined with the syntax tar_target(target_name, function_name(inputs)). The target is thus a named R data object which is the outcome of a named function with specified inputs. The one exception to this is that the target may simply be a file for input or output. In the current project, the core pipeline is a list of 87 targets which specify the input files, the reformatting and transformation of these data, and subsequent calculations which make up the data assimilation algorithm.

The pipeline is managed using a "Make"-like procedure, which analyses the dependencies between the different steps in the pipeline. If there have been no changes to the code in the target functions or input data since the last time it was run, it identifies that everything is up-to-date, and no further computation is needed. If any the source code of target function or the content of any data file has changed, it identifies which parts of the pipeline are affected by this, and all the dependencies are recomputed. This has several advantages: forcing the workflow to be declared at a higher level of abstraction; only running the necessary computation, so saving run-time for tasks that are already up to date; and most importantly, providing tangible evidence that the results match the underlying code and data, and confirm the computation is reproducible. So as to identify changes, each target is represented by its hash value, stored in the _targets directory.

Project directory structure

`_targets` directory

This directory is managed by the targets package. It contains the metadata describing the status of the computational pipelines and the cached results of those computations.

`analysis` directory

workflowr creates a set of standard directories. See the package documentation for details on how these directories are used. The analysis contains rmarkdown notebooks which document the workflow. These are still in development.

`R` directory

This contains the bulk of the R source code for the functions used in the project.

`data-raw` directory

This contains the raw data files for the project, in their original form as far as possible. To avoid duplication, this is a symbolic link to an earlier iteration However, many of these are too large to share via GitHub, and would need to be shared by another mechanism (e.g. as binary assets).

`data` directory

This contains the processed data files resulting from transformations of the raw data. This typically involves reprojection, reclassification, filtering and unit conversions. Again, many of these are too large to share via GitHub.

`docs` directory

This contains the html web pages generated by the Rmarkdown files in with workflowr or bookdown.

`output` directory

This contains output files from the project, the results of the data assimlation.

`slurm` directory

This contains files for the steps which require high-performance computing, run via slurm, the widely used job scheduling system on HPC systems. These are generic enough to run on any HPC machine with slurm, and have been run on both JASMIN and POLAR, althouh the queue names, number of processors and memory limits will be system-specific.

`manuscripts` directory

The report is prepared and formatted using bookdown in a subdirectory of manuscripts that contains all the necessary infrastructure files (templates, bibliographies, etc.).

`renv` directory

The renv package keeps track of the R packages (and their versions) used by the project. It allows anyone to reinstate the same packages and versions in their local copy of the project.

The renv directory contains the information need by renv to reinstate the local package environment

`.gitignore`

.gitignore in the R project root directory is used for all manual entries so that all the manual rules are in one place. Packages, such as renv, may create their own .gitignore files in subdirectories that they manage.

Installation

Assuming you already have a current version of R installed, clone the project repository https://github.com/NERC-CEH/luct from GitHub.

When you open the project, you may get warning messages about packages not being installed. This is because you need to use the renv package to reinstate the packages that are used by the project.

Install renv in that project if it is not already installed
Use renv::restore() to install all the needed packages in the project-specific library:
```
renv::restore()
```

Get data

Any files in data, output and _targets that are more than trivially small are not shared via Git and GitHub. They will be shared via a separate, yet to be determined, mechanism (e.g. Zenodo).

`renv` collaboration

The renv package is used to keep track of the installed packages and their versions. See the renv collaboration guide or the workflow for synchronising package environments between collaborators.

Still to do

More detailed setup instructions and notes should go in this project-level READ.md file.
The README.md files in the subdirectories are currently generic, but should describe the purpose of each subdirectory and the files in that directory.

Acknowledgements

The website is based on a template by Ross W. Gayler

Owner

Name: UK Centre for Ecology & Hydrology
Login: NERC-CEH
Kind: organization
Location: UK

Website: http://www.ceh.ac.uk/
Repositories: 155
Profile: https://github.com/NERC-CEH

Citation (CITATION)

The project is currently work-in-progress
so please contact us first if you wish to cite it.
As work is finalised we will add preferred citations.

GitHub Events

Total

Watch event: 1
Create event: 1

Last Year

Watch event: 1
Create event: 1

Committers

Last synced: 9 months ago

All Time

Total Commits: 68
Total Committers: 2
Avg Commits per committer: 34.0
Development Distribution Score (DDS): 0.221

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
ADCEH\plevy	p**y@c**k	53
plevy	p**y@l**l	15

Committer Domains (Top 20 + Academic)

ceh.ac.uk: 1

Issues and Pull Requests

Last synced: 9 months ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

luct

Science Score: 72.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Tracking Land-Use Change

Project dependencies

Workflow management

Project directory structure

_targets directory

analysis directory

R directory

data-raw directory

data directory

docs directory

output directory

slurm directory

manuscripts directory

renv directory

.gitignore

Installation

Get data

renv collaboration

Still to do

Acknowledgements

Owner

Citation (CITATION)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

`_targets` directory

`analysis` directory

`R` directory

`data-raw` directory

`data` directory

`docs` directory

`output` directory

`slurm` directory

`manuscripts` directory

`renv` directory

`.gitignore`

`renv` collaboration