https://github.com/bcdev/nc2zarr
A Python tool that converts NetCDF files to Zarr format
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.1%) to scientific vocabulary
Keywords from Contributors
Repository
A Python tool that converts NetCDF files to Zarr format
Basic Info
- Host: GitHub
- Owner: bcdev
- License: mit
- Language: Python
- Default Branch: main
- Size: 1.88 MB
Statistics
- Stars: 11
- Watchers: 4
- Forks: 4
- Open Issues: 12
- Releases: 9
Metadata Files
README.md
nc2zarr
A Python tool that converts multiple NetCDF files to single Zarr datasets.
Create Python environment
$ conda install -n base -c conda-forge mamba
$ cd nc2zarr
$ mamba env create
Install nc2zarr from Sources
$ cd nc2zarr
$ conda activate nc2zarr
$ python setup.py develop
Testing and Test Coverage
$ pytest --cov nc2zarr --cov-report=html tests
Usage
``` $ nc2zarr --help Usage: nc2zarr [OPTIONS] [INPUT_FILE ...]
Reads one or more input datasets and writes or appends them to a single Zarr output dataset.
INPUTFILE may refer to a NetCDF file, or Zarr dataset, or a glob that identifies multiple paths, e.g. "L3SST/*/.nc".
OUTPUTPATH must be directory which will contain the output Zarr dataset, e.g. "L3SST.zarr".
CONFIG_FILE must be in YAML format. It comprises the optional objects "input", "process", and "output". See nc2zarr/res/config-template.yml for a template file that describes the format. Multiple --config options may be passed as a chain to allow for reuse of credentials and other common parameters. Contained configuration objects are recursively merged, lists are appended, and other values overwrite each other from left to right. For example:
nc2zarr -c s3.yml -c common.yml -c inputs-01.yml -o out-01.zarr nc2zarr -c s3.yml -c common.yml -c inputs-02.yml -o out-02.zarr nc2zarr out-01.zarr out-02.zarr -o final.zarr
Command line arguments and options have precedence over other configurations and thus override settings in any CONFIG_FILE:
[--finalize-only] overrides /finalizeonly [--dry-run] overrides /dryrun [--verbose] overrides /verbosity
[INPUTFILE ...] overrides /input/paths in CONFIGFILE [--multi-file] overrides /input/multifile [--concat-dim] overrides /input/concatdim [--decode-cf] overrides /input/decodecf [--sort-by] overrides /input/sortby
[--output OUTPUTFILE] overrides /output/path [--overwrite] overrides /output/overwrite [--append] overrides /output/append [--adjust-metadata] overrides /output/adjustmetadata
Options: -c, --config CONFIGFILE Configuration file (YAML). Multiple may be given. -o, --output OUTPUTPATH Output name. Defaults to "out.zarr". -d, --concat-dim DIM_NAME Dimension for concatenation. Defaults to "time". -m, --multi-file Open multiple input files as one block. Works for NetCDF files only. Use --concat-dim to specify the dimension for concatenation.
-w, --overwrite Overwrite existing OUTPUTPATH. If OUTPUTPATH does not exist, the option has no effect. Cannot be used with --append.
-a, --append Append inputs to existing OUTPUTPATH. If OUTPUTPATH does not exist, the option has no effect. Cannot be used with --overwrite.
--decode-cf Decode variables according to CF conventions. Caution: array data may be converted to floating point type if a "_FillValue" attribute is present.
-s, --sort-by [path|name] Sort input files by specified property. --adjust-metadata Adjust metadata attributes after the last write/append step.
--finalize-only Whether to just run "finalize" tasks on an existing output dataset. Currently, this updates the metadata only, given that configuration output/adjust_metadata is set or output/metadata is not empty. See also option --adjust-metadata.
-d, --dry-run Open and process inputs only, omit data writing. -v, --verbose Print more output. Use twice for even more output.
--version Show version number and exit. --help Show this message and exit. ```
Configuration file format
The format of the configuration files passed via the --config option is described
as a configuration template.
Examples
Convert multiple NetCDFs to single Zarr:
bash
$ nc2zarr -o outputs/SST.zarr inputs/**/SST-*.nc
Append single NetCDF to an existing Zarr:
bash
$ nc2zarr -a -o outputs/SST.zarr inputs/2020/SST-20200610.nc
Concatenate multiple Zarrs to a new Zarr:
bash
$ nc2zarr -o outputs/SST.zarr outputs/SST-part1.zarr outputs/SST-part2.zarr
Append one Zarr to existing Zarr:
bash
$ nc2zarr -a -o outputs/SST.zarr outputs/SST-part3.zarr
Custom processors
nc2zarr's built-in processors can be expanded with custom processors, Python
functions which modify the dataset at particular points in the conversion
pipeline. A processor function takes an xarray.Dataset as an argument and
returns an xarray.Dataset as its result. A processor is specified in the
configuration file as <MODULE_NAME>:<FUNCTION_NAME>, so for example the
processor specification mymodule:myfunction could refer to a function
defined in a file mymodule.py with the following contents:
python
def myfunction(dataset):
dataset.attrs["greeting"] = "Hello world!"
return dataset
This processor function adds a predefined attribute to the dataset (modifying it in-place), then returns the modified dataset.
There are three points at which processors may be run:
| Section | Parameter name | When is the processor run? |
| -- | -- | -- |
| input | custom_preprocessor | After variable selection |
| process | custom_processor | After variable renaming, before rechunking |
| output | custom_postprocessor | Before writing data |
See the template configuration file for more details of syntax. The module is
searched for on Python's current search path, so it will usually be necessary
to ensure that the parent directories of all processor modules are listed in
the PYTHONPATH environment variable, e.g. by executing
shell
export PYTHONPATH="${PYTHONPATH:+${PYTHONPATH}:}/path/to/module/directory/"
before running nc2zarr. See
the Python documentation
for more details on PYTHONPATH.
Owner
- Name: Brockmann Consult Development
- Login: bcdev
- Kind: user
- Location: Germany
- Company: Brockmann Consult GmbH
- Website: http://www.brockmann-consult.de/
- Repositories: 101
- Profile: https://github.com/bcdev
GitHub Events
Total
- Create event: 2
- Issues event: 2
- Release event: 1
- Watch event: 3
- Delete event: 1
- Issue comment event: 1
- Push event: 5
- Pull request review event: 1
- Pull request event: 2
- Fork event: 1
Last Year
- Create event: 2
- Issues event: 2
- Release event: 1
- Watch event: 3
- Delete event: 1
- Issue comment event: 1
- Push event: 5
- Pull request review event: 1
- Pull request event: 2
- Fork event: 1
Committers
Last synced: about 3 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Norman Fomferra | n****a@g****m | 120 |
| Pontus Lurcock | p****k@b****e | 97 |
| Norman Fomferra | n****a@b****e | 56 |
| Tonio Fincke | t****e@b****e | 6 |
| SabineEmbacher | s****r@b****e | 1 |
| gunbra32 | r****2 | 1 |
| Brockmann Consult Development | 2****v@u****m | 1 |
Committer Domains (Top 20 + Academic)
Packages
- Total packages: 1
- Total downloads: unknown
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 5
conda-forge.org: nc2zarr
nc2zarr reads one or more NetCDF datasets and writes or appends them to a single Zarr output dataset.
- Homepage: https://github.com/bcdev/nc2zarr
- License: MIT
-
Latest release: 1.2.2
published about 5 years ago