https://github.com/bcdev/nc2zarr

A Python tool that converts NetCDF files to Zarr format

https://github.com/bcdev/nc2zarr

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.1%) to scientific vocabulary

Keywords from Contributors

cci climate conda esa data-processing earth-observation eo raster-data
Last synced: 10 months ago · JSON representation

Repository

A Python tool that converts NetCDF files to Zarr format

Basic Info
  • Host: GitHub
  • Owner: bcdev
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 1.88 MB
Statistics
  • Stars: 11
  • Watchers: 4
  • Forks: 4
  • Open Issues: 12
  • Releases: 9
Created over 5 years ago · Last pushed about 1 year ago
Metadata Files
Readme Changelog License

README.md

Build Status

nc2zarr

A Python tool that converts multiple NetCDF files to single Zarr datasets.

Create Python environment

$ conda install -n base -c conda-forge mamba
$ cd nc2zarr
$ mamba env create

Install nc2zarr from Sources

$ cd nc2zarr
$ conda activate nc2zarr
$ python setup.py develop

Testing and Test Coverage

$ pytest --cov nc2zarr --cov-report=html tests   

Usage

``` $ nc2zarr --help Usage: nc2zarr [OPTIONS] [INPUT_FILE ...]

Reads one or more input datasets and writes or appends them to a single Zarr output dataset.

INPUTFILE may refer to a NetCDF file, or Zarr dataset, or a glob that identifies multiple paths, e.g. "L3SST/*/.nc".

OUTPUTPATH must be directory which will contain the output Zarr dataset, e.g. "L3SST.zarr".

CONFIG_FILE must be in YAML format. It comprises the optional objects "input", "process", and "output". See nc2zarr/res/config-template.yml for a template file that describes the format. Multiple --config options may be passed as a chain to allow for reuse of credentials and other common parameters. Contained configuration objects are recursively merged, lists are appended, and other values overwrite each other from left to right. For example:

nc2zarr -c s3.yml -c common.yml -c inputs-01.yml -o out-01.zarr nc2zarr -c s3.yml -c common.yml -c inputs-02.yml -o out-02.zarr nc2zarr out-01.zarr out-02.zarr -o final.zarr

Command line arguments and options have precedence over other configurations and thus override settings in any CONFIG_FILE:

[--finalize-only] overrides /finalizeonly [--dry-run] overrides /dryrun [--verbose] overrides /verbosity

[INPUTFILE ...] overrides /input/paths in CONFIGFILE [--multi-file] overrides /input/multifile [--concat-dim] overrides /input/concatdim [--decode-cf] overrides /input/decodecf [--sort-by] overrides /input/sortby

[--output OUTPUTFILE] overrides /output/path [--overwrite] overrides /output/overwrite [--append] overrides /output/append [--adjust-metadata] overrides /output/adjustmetadata

Options: -c, --config CONFIGFILE Configuration file (YAML). Multiple may be given. -o, --output OUTPUTPATH Output name. Defaults to "out.zarr". -d, --concat-dim DIM_NAME Dimension for concatenation. Defaults to "time". -m, --multi-file Open multiple input files as one block. Works for NetCDF files only. Use --concat-dim to specify the dimension for concatenation.

-w, --overwrite Overwrite existing OUTPUTPATH. If OUTPUTPATH does not exist, the option has no effect. Cannot be used with --append.

-a, --append Append inputs to existing OUTPUTPATH. If OUTPUTPATH does not exist, the option has no effect. Cannot be used with --overwrite.

--decode-cf Decode variables according to CF conventions. Caution: array data may be converted to floating point type if a "_FillValue" attribute is present.

-s, --sort-by [path|name] Sort input files by specified property. --adjust-metadata Adjust metadata attributes after the last write/append step.

--finalize-only Whether to just run "finalize" tasks on an existing output dataset. Currently, this updates the metadata only, given that configuration output/adjust_metadata is set or output/metadata is not empty. See also option --adjust-metadata.

-d, --dry-run Open and process inputs only, omit data writing. -v, --verbose Print more output. Use twice for even more output.

--version Show version number and exit. --help Show this message and exit. ```

Configuration file format

The format of the configuration files passed via the --config option is described as a configuration template.

Examples

Convert multiple NetCDFs to single Zarr:

bash $ nc2zarr -o outputs/SST.zarr inputs/**/SST-*.nc

Append single NetCDF to an existing Zarr:

bash $ nc2zarr -a -o outputs/SST.zarr inputs/2020/SST-20200610.nc

Concatenate multiple Zarrs to a new Zarr:

bash $ nc2zarr -o outputs/SST.zarr outputs/SST-part1.zarr outputs/SST-part2.zarr

Append one Zarr to existing Zarr:

bash $ nc2zarr -a -o outputs/SST.zarr outputs/SST-part3.zarr

Custom processors

nc2zarr's built-in processors can be expanded with custom processors, Python functions which modify the dataset at particular points in the conversion pipeline. A processor function takes an xarray.Dataset as an argument and returns an xarray.Dataset as its result. A processor is specified in the configuration file as <MODULE_NAME>:<FUNCTION_NAME>, so for example the processor specification mymodule:myfunction could refer to a function defined in a file mymodule.py with the following contents:

python def myfunction(dataset): dataset.attrs["greeting"] = "Hello world!" return dataset

This processor function adds a predefined attribute to the dataset (modifying it in-place), then returns the modified dataset.

There are three points at which processors may be run:

| Section | Parameter name | When is the processor run? | | -- | -- | -- | | input | custom_preprocessor | After variable selection | | process | custom_processor | After variable renaming, before rechunking | | output | custom_postprocessor | Before writing data |

See the template configuration file for more details of syntax. The module is searched for on Python's current search path, so it will usually be necessary to ensure that the parent directories of all processor modules are listed in the PYTHONPATH environment variable, e.g. by executing

shell export PYTHONPATH="${PYTHONPATH:+${PYTHONPATH}:}/path/to/module/directory/"

before running nc2zarr. See the Python documentation for more details on PYTHONPATH.

Owner

  • Name: Brockmann Consult Development
  • Login: bcdev
  • Kind: user
  • Location: Germany
  • Company: Brockmann Consult GmbH

GitHub Events

Total
  • Create event: 2
  • Issues event: 2
  • Release event: 1
  • Watch event: 3
  • Delete event: 1
  • Issue comment event: 1
  • Push event: 5
  • Pull request review event: 1
  • Pull request event: 2
  • Fork event: 1
Last Year
  • Create event: 2
  • Issues event: 2
  • Release event: 1
  • Watch event: 3
  • Delete event: 1
  • Issue comment event: 1
  • Push event: 5
  • Pull request review event: 1
  • Pull request event: 2
  • Fork event: 1

Committers

Last synced: about 3 years ago

All Time
  • Total Commits: 282
  • Total Committers: 7
  • Avg Commits per committer: 40.286
  • Development Distribution Score (DDS): 0.574
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Norman Fomferra n****a@g****m 120
Pontus Lurcock p****k@b****e 97
Norman Fomferra n****a@b****e 56
Tonio Fincke t****e@b****e 6
SabineEmbacher s****r@b****e 1
gunbra32 r****2 1
Brockmann Consult Development 2****v@u****m 1
Committer Domains (Top 20 + Academic)

Packages

  • Total packages: 1
  • Total downloads: unknown
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 5
conda-forge.org: nc2zarr

nc2zarr reads one or more NetCDF datasets and writes or appends them to a single Zarr output dataset.

  • Versions: 5
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Forks count: 53.7%
Average: 53.9%
Stargazers count: 54.1%
Last synced: 11 months ago

Dependencies

setup.py pypi
environment.yml pypi