NCDatasets.jl
NCDatasets.jl: a Julia package for manipulating netCDF data sets - Published in JOSS (2024)
Science Score: 77.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: joss.theoj.org -
✓Committers with academic emails
3 of 24 committers (12.5%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.6%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Load and create NetCDF files in Julia
Basic Info
- Host: GitHub
- Owner: JuliaGeo
- License: mit
- Language: Julia
- Default Branch: master
- Homepage: https://juliageo.org/NCDatasets.jl/
- Size: 4.36 MB
Statistics
- Stars: 167
- Watchers: 9
- Forks: 32
- Open Issues: 28
- Releases: 71
Topics
Metadata Files
README.md
NCDatasets
NCDatasets allows one to read and create netCDF files.
NetCDF data set and attribute list behave like Julia dictionaries and variables like Julia arrays. This package implements the CommonDataModel.jl interface, which mean that the datasets can be accessed in the same way as GRIB files (GRIBDatasets.jl) and Zarr files (ZarrDatasets.jl).
The module NCDatasets provides support for the following netCDF CF conventions:
* _FillValue will be returned as missing (more information)
* scale_factor and add_offset are applied if present
* time variables (recognized by the units attribute) are returned as DateTime objects.
* support of the CF calendars (standard, gregorian, proleptic gregorian, julian, all leap, no leap, 360 day) using CFTime
* the raw data can also be accessed (without the transformations above).
* contiguous ragged array representation
Other features include:
* Support for NetCDF 4 compression and variable-length arrays (i.e. arrays of vectors where each vector can have potentailly a different length)
* The module also includes an utility function ncgen which generates the Julia code that would produce a netCDF file with the same metadata as a template netCDF file.
Installation
Inside the Julia shell, you can download and install the package by issuing:
julia
using Pkg
Pkg.add("NCDatasets")
Manual
This Manual is a quick introduction in using NCDatasets.jl. For more details you can read the stable or dev documentation.
- Create a netCDF file
- Explore the content of a netCDF file
- Load a netCDF file
- Edit an existing netCDF file
Create a netCDF file
The following gives an example of how to create a netCDF file by defining dimensions, variables and attributes.
```julia using NCDatasets using DataStructures: OrderedDict
This creates a new NetCDF file called file.nc.
The mode "c" stands for creating a new file (clobber)
ds = NCDataset("file.nc","c")
Define the dimension "lon" and "lat" with the size 100 and 110 resp.
defDim(ds,"lon",100) defDim(ds,"lat",110)
Define a global attribute
ds.attrib["title"] = "this is a test file"
Define the variables temperature with the attribute units
v = defVar(ds,"temperature",Float32,("lon","lat"), attrib = OrderedDict( "units" => "degree Celsius", "scale_factor" => 10, ))
add additional attributes
v.attrib["comments"] = "this is a string attribute with Unicode Ω ∈ ∑ ∫ f(x) dx"
Generate some example data
data = [Float32(i+j) for i = 1:100, j = 1:110];
write a single column
v[:,1] = data[:,1];
write a the complete data set
v[:,:] = data;
close(ds) ```
It is also possible to create the dimensions, the define the variable and set its value with a single call to defVar:
julia
using NCDatasets
ds = NCDataset("/tmp/test2.nc","c")
data = [Float32(i+j) for i = 1:100, j = 1:110]
v = defVar(ds,"temperature",data,("lon","lat"))
close(ds)
Explore the content of a netCDF file
Before reading the data from a netCDF file, it is often useful to explore the list of variables and attributes defined in it.
For interactive use, the following commands (without ending semicolon) display the content of the file similarly to ncdump -h file.nc:
julia
using NCDatasets
ds = NCDataset("file.nc")
This creates the central structure of NCDatasets.jl, NCDataset, which represents the contents of the netCDF file (without immediatelly loading everything in memory). NCDataset is an alias for Dataset.
The following displays the information just for the variable varname:
julia
ds["varname"]
while to get the global attributes you can do:
julia
ds.attrib
NCDataset("file.nc") produces a listing like:
``` Dataset: file.nc Group: /
Dimensions lon = 100 lat = 110
Variables temperature (100 × 110) Datatype: Float32 (Float32) Dimensions: lon × lat Attributes: units = degree Celsius scale_factor = 10 comments = this is a string attribute with Unicode Ω ∈ ∑ ∫ f(x) dx
Global attributes title = this is a test file ```
Load a netCDF file
Loading a variable with known structure can be achieved by accessing the variables and attributes directly by their name.
```julia
The mode "r" stands for read-only. The mode "r" is the default mode and the parameter can be omitted.
ds = NCDataset("file.nc","r") v = ds["temperature"]
load a subset
subdata = v[10:30,30:5:end]
load all data
data = v[:,:]
load all data ignoring attributes like scalefactor, addoffset, _FillValue and time units
data2 = v.var[:,:];
load an attribute
unit = v.attrib["units"] close(ds) ```
In the example above, the subset can also be loaded with:
julia
subdata = NCDataset("file.nc")["temperature"][10:30,30:5:end]
This might be useful in an interactive session. However, the file test.nc is not directly closed (closing the file will be triggered by Julia's garbage collector), which can be a problem if you open many files. On Linux the number of opened files is often limited to 1024 (soft limit). If you write to a file, you should also always close the file to make sure that the data is properly written to the disk.
An alternative way to ensure the file has been closed is to use a do block: the file will be closed automatically when leaving the block.
julia
data = NCDataset(filename,"r") do ds
ds["temperature"][:,:]
end # ds is closed
Edit an existing netCDF file
When you need to modify variables or attributes in a netCDF file, you have
to open it with the "a" option. Here, for example, we add a global attribute creator to the
file created in the previous step.
julia
ds = NCDataset("file.nc","a")
ds.attrib["creator"] = "your name"
close(ds);
Benchmark
The benchmark loads a variable of the size 1000x500x100 in slices of 1000x500 (applying the scaling of the CF conventions) and computes the maximum of each slice and the average of each maximum over all slices. This operation is repeated 100 times. The code is available at https://github.com/Alexander-Barth/NCDatasets.jl/tree/master/test/perf .
| Module | median | minimum | mean | std. dev. | |:---------------- | ------:| -------:| -----:| ---------:| | R-ncdf4 | 0.407 | 0.384 | 0.407 | 0.010 | | python-netCDF4 | 0.475 | 0.463 | 0.476 | 0.010 | | julia-NCDatasets | 0.265 | 0.249 | 0.267 | 0.011 |
All runtimes are in seconds. We use Julia 1.10.0 (with NCDatasets 0.14.0), R 4.1.2 (with ncdf4 1.22) and Python 3.10.12 (with netCDF4 1.6.5) on a i5-1135G7 CPU and NVMe SSD (WDC WDS100T2B0C).
Filing an issue
When you file an issue, please include sufficient information that would allow somebody else to reproduce the issue, in particular:
1. Provide the code that generates the issue.
2. If necessary to run your code, provide the used netCDF file(s).
3. Make your code and netCDF file(s) as simple as possible (while still showing the error and being runnable). A big thank you for the 5-star-premium-gold users who do not forget this point! 👍🏅🏆
4. The full error message that you are seeing (in particular file names and line numbers of the stack-trace).
5. Which version of Julia and NCDatasets are you using? Please include the output of:
versioninfo()
using Pkg
Pkg.installed()["NCDatasets"]
6. Does NCDatasets pass its test suite? Please include the output of:
julia
using Pkg
Pkg.test("NCDatasets")
Alternative
The package NetCDF.jl from Fabian Gans and contributors is an alternative to this package which supports a more Matlab/Octave-like interface for reading and writing NetCDF files.
Credits
netcdf_c.jl and the error handling code of the NetCDF C API are from NetCDF.jl by Fabian Gans (Max-Planck-Institut für Biogeochemie, Jena, Germany) released under the MIT license.
Owner
- Name: JuliaGeo
- Login: JuliaGeo
- Kind: organization
- Website: https://juliageo.org/
- Repositories: 32
- Profile: https://github.com/JuliaGeo
Geospatial packages for Julia
Citation (CITATION.cff)
cff-version: "1.2.0"
authors:
- family-names: Barth
given-names: Alexander
orcid: "https://orcid.org/0000-0003-2952-5997"
doi: 10.5281/zenodo.11067062
message: If you use this software, please cite our article in the
Journal of Open Source Software.
preferred-citation:
authors:
- family-names: Barth
given-names: Alexander
orcid: "https://orcid.org/0000-0003-2952-5997"
date-published: 2024-05-29
doi: 10.21105/joss.06504
issn: 2475-9066
issue: 97
journal: Journal of Open Source Software
publisher:
name: Open Journals
start: 6504
title: "NCDatasets.jl: a Julia package for manipulating netCDF data
sets"
type: article
url: "https://joss.theoj.org/papers/10.21105/joss.06504"
volume: 9
title: "NCDatasets.jl: a Julia package for manipulating netCDF data
sets"
GitHub Events
Total
- Create event: 5
- Commit comment event: 6
- Release event: 1
- Issues event: 6
- Watch event: 12
- Delete event: 1
- Issue comment event: 43
- Push event: 17
- Pull request review event: 1
- Pull request review comment event: 1
- Pull request event: 7
- Fork event: 2
Last Year
- Create event: 5
- Commit comment event: 6
- Release event: 1
- Issues event: 6
- Watch event: 12
- Delete event: 1
- Issue comment event: 43
- Push event: 17
- Pull request review event: 1
- Pull request review comment event: 1
- Pull request event: 7
- Fork event: 2
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Alexander Barth | b****r@g****m | 971 |
| Datseris | d****e@g****m | 44 |
| Tristan Carion | t****n@g****m | 32 |
| Martijn Visser | m****r@g****m | 22 |
| ctroupin | c****n@g****m | 11 |
| Rafael Schouten | r****n@g****m | 6 |
| adigitoleo | a****o@d****m | 6 |
| Argel Ramírez Reyes | a****z@g****m | 5 |
| github-actions[bot] | 4****] | 5 |
| Gael Forget | g****t@m****u | 5 |
| Philippe Roy | b****r@y****a | 4 |
| Kene Uba | j****a@p****h | 4 |
| Gregory L. Wagner | w****g@g****m | 4 |
| Alexander Barth | e****l@e****m | 3 |
| Fabian Gans | f****s@b****e | 3 |
| Peter Shin | p****h@g****m | 3 |
| Gabriele Bozzola | s****r@g****m | 2 |
| tcarion | t****n@g****m | 1 |
| Benoit Pasquier | 4****c | 1 |
| Charles Kawczynski | k****s@g****m | 1 |
| Julia TagBot | 5****t | 1 |
| Navid C. Constantinou | n****y | 1 |
| Steven G. Johnson | s****j@m****u | 1 |
| Kosuke Sando | k****o@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 149
- Total pull requests: 52
- Average time to close issues: 3 months
- Average time to close pull requests: 21 days
- Total issue authors: 59
- Total pull request authors: 22
- Average comments per issue: 5.26
- Average comments per pull request: 4.98
- Merged pull requests: 36
- Bot issues: 0
- Bot pull requests: 7
Past Year
- Issues: 9
- Pull requests: 8
- Average time to close issues: 8 days
- Average time to close pull requests: about 24 hours
- Issue authors: 7
- Pull request authors: 5
- Average comments per issue: 2.67
- Average comments per pull request: 0.38
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 1
Top Authors
Issue Authors
- Datseris (18)
- Balinus (9)
- rafaqz (9)
- haakon-e (8)
- ryofurue (8)
- natgeo-wong (7)
- Alexander-Barth (6)
- sjdaines (4)
- ctroupin (4)
- brynpickering (3)
- Yixiao-Zhang (3)
- tiemvanderdeure (3)
- kongdd (3)
- gvali (3)
- aramirezreyes (3)
Pull Request Authors
- Datseris (9)
- github-actions[bot] (7)
- visr (6)
- Balinus (3)
- gaelforget (3)
- Alexander-Barth (3)
- Sbozzolo (2)
- keduba (2)
- rafaqz (2)
- lupemba (2)
- glwagner (2)
- JuliaTagBot (1)
- charleskawczynski (1)
- briochemc (1)
- ctroupin (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- julia 874 total
- Total dependent packages: 69
- Total dependent repositories: 9
- Total versions: 60
juliahub.com: NCDatasets
Load and create NetCDF files in Julia
- Homepage: https://juliageo.org/NCDatasets.jl/
- Documentation: https://docs.juliahub.com/General/NCDatasets/stable/
- License: MIT
-
Latest release: 0.14.8
published 10 months ago
Rankings
Dependencies
- julia-actions/setup-julia latest composite
- JuliaRegistries/TagBot v1 composite
- actions/cache v1 composite
- actions/checkout v2 composite
- julia-actions/julia-buildpkg latest composite
- julia-actions/setup-julia v1 composite
- actions/cache v1 composite
- actions/checkout v2 composite
- codecov/codecov-action v1 composite
- julia-actions/julia-buildpkg latest composite
- julia-actions/julia-processcoverage v1 composite
- julia-actions/julia-runtest latest composite
- julia-actions/setup-julia v1 composite