gval
A Python framework to evaluate geospatial datasets by comparing candidate and benchmark maps to compute agreement maps and statistics.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.0%) to scientific vocabulary
Keywords
Repository
A Python framework to evaluate geospatial datasets by comparing candidate and benchmark maps to compute agreement maps and statistics.
Basic Info
Statistics
- Stars: 8
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
gVal: Geospatial Evaluation Framework
NOTE: Development of this package has migrated to noaa-owp/gval.
gVal (pronounced "g-val") is a high-level Python framework to evaluate the geospatial skill of candidate maps to benchmarks producing agreement maps and metrics.
Architecture
- Inputs maps
- Candidates and Benchmarks
- Including metadata
- Variable name
- ie inundation, land cover, land use, backscatter, etc
- Statistical Data Type
- Categorical (two- and multi- class)
- encodings for positive and negative condition values
- Continuous
- Categorical (two- and multi- class)
- Raster attribute table: associates names to data values
- Data format
- GDAL compatible vector
- GDAL compatible raster
- Cataloging standards with metadata
- modeling parameters
- time
- GeoNetwork
- STAC
- Decide on storage types and in-memory data structures
- Deserialization methods
- Especialy for metadata (STAC, geoparquet, geojson, etc)
- Candidates and Benchmarks
- Comparison Prep
- The following prep operations should be done during the comparison to avoid excessive I/O operations.
- Check for alignment between candidate and benchmark.
- spatial
- CRS
- extents (reject if no alignment is found)
- resolution
- temporal
- metadata
- spatial
- Data Format Check
- Check for vector and raster data formats
- Check for alignment between candidate and benchmark.
- This should be done after loading datasets
- Homogenize
- spatial
- Reproject
- Match extents
- Resample resolutions
- temporal
- select temporal mis-alignment criteria (done before loading)
- metadata
- select rules for disagreement (done before loading)
- spatial
- Statistical Data Type Conversions
- Pass operator functions both registered and user defined
- Conversion Types
- Categorical to binary
- Continuous to categorical
- Continuous to binary
- Data Format Conversion
- Convert to one consistent data format for comparison1
- Use (colortables)[https://rasterio.readthedocs.io/en/latest/topics/color.html]?
- Include (tags)[https://rasterio.readthedocs.io/en/latest/topics/tags.html]?
- Metadata prep
- Homogenize
- The following prep operations should be done during the comparison to avoid excessive I/O operations.
- Comparison
- Comparisons should avoid opening up the entire files to avoid excessive memory use.
- Comparisons should minimize I/O operations.
- Comparison type
- Binary
- Categorical
- one vs one
- one vs all
- Continuous
- Metrics to use:
- registered list per comparison type
- handle multiple names for same metric
- user provided
- user ignored
- registered list per comparison type
- Data format of comparison
- vector or raster
- Outputs
- Decide on storage types and methods to serialize
- agreement maps
- raster, vector, both
- metric values
- contingency tables
- metric values
- dataframes
Technology Stacks
Python - Serialization, Numerical Computation, and Scheduling - PyData Stack: Numpy, Pandas, xarray, Dask, xarray-spatial - Geospatial Components - Vector - OGR, fiona, shapely, geopandas - zarr for collections of vector files - Raster - GDAL, rasterio, xarray, rioxarray - STAC for collections of vector files
Road Map
Checkpoint 1: Minimum Viable Product
- [ ] Easy to use, well documented, component level functionality for use as an API
- [ ] Accepts GDAL compatible single-band raster formats.
- [ ] Accepts two-class (binary) categorical variables with set encodings.
- [ ] Handles a registry of encodings based on keyword descriptors (e.g. inundation map, landcover data, etc).
- [ ] Accepts local and remote files on S3s.
- [ ] Supports a wide array of two-class categorical metrics.
- [ ] Consider parallel (dask, xarray) and non-parallel (numpy, pandas) engines
- [ ] Reads files in chunks to avoid lower memory requirements.
- [ ] Conducts operations in parallel.
- [ ] Uses a consistent set of vocabulary in variables, comments, and documentation
- [ ] metrics, agreement, difference, evaluation, benchmark map, Candidate map? Need better name?
- [ ] Clear, concise, and foundational Object Oriented Architecture
- [ ] Organize functions, classes, modules, parameters, etc in a logical directory structure
- Use consistent styling
- [ ] Use consistent Docstring styling
- [ ] Comply with PEP8 standards
- [ ] Use linter and style checkers
- [ ] Make a documentation website
- [ ] installation, objects, usage examples
- [ ] Include test functionality
- [ ] unit tests
- [ ] integration tests
- [ ] benchmarking capabilities (pytest-benchmarking?)
- [ ] include simple test datasets
- [ ] include results for tests on readme/website
- [ ] tox for test automation
- [ ] Dependency management & packaging
- [ ] Environment packaging
- [ ] docker
- [ ] pypi / pip
- [ ] scientific environments such as conda
- [ ] standalone commandline tool with pipx
- [ ] Environment packaging
- [ ] Use a logger
- [ ] Have a clear user interface
- [ ] public functions as API
- [ ] command line tools
- [ ] Have a clear, code versioning and tagging system.
Checkpoint 2: Extending Functionality
- Extending to include continuous data inputs and metrics.
- Support discretization of continuous maps to categorical conversion
- Create a survey of metrics.
- Organize in hierarchy.
- Include in tables with descriptions, math formulas, and references.
Checkpoint 3: Scaling to Catalogs of Maps
- Evaluations should be scaled to accept a series of candidates and benchmarks.
- These maps should be accepted as lists of objects, file paths, or catalogs.
- Catalogs should be a data structure designed for this purpose to include experiment relevant parameters associated with each map.
- GeoNetwork
- STAC
- Candidate and benchmark maps need to be cataloged with associated metadata values
- space, time, parameters, etc
- Agreement maps and metrics should be able to inherit these metadata
- Consider meta-data problem: STAC, raster tags, database, table?
- When comparing catalogs, need to address the alignment problem
- Have functions to test for candidate and benchmark map alignment across the following dimensions:
- space (extents and resolutions)
- time (extents and resolutions)
- modeling parameters (ie flow rates)
- target variable (ie extents, depths, speeds, LULC, etc)
- Have functions to test for candidate and benchmark map alignment across the following dimensions:
- Computing statistical significance, confidence intervals, etc of a sampling of metrics.
Checkpoint 4: Extending Functionality
- Accepts vectors files (points, lines, and polygons) for candidate or benchmark maps.
- Handling raster/raster, vector/raster, raster/vector, or vector/vector comparison?
- Allows for metrics to be sorted by geometries with associated parameter combinations for analysis purposes.
- Multi-band raster support?
- Multi-class categorical extension
- Analyze contingency tables with statistics:
Contributing
Please see the Contributing file for instructions on how to contribute to this work.
References
Please see the References file for citations to all the references used in this work.
Owner
- Name: Fernando Aristizabal
- Login: fernando-aristizabal
- Kind: user
- Location: Florida, USA
- Company: ERT
- Website: www.linkedin.com/in/fernando-aristizabal
- Repositories: 29
- Profile: https://github.com/fernando-aristizabal
Scientist experimenting with remote sensing, machine learning, partial differential equations, flood inundation mapping, and geospatial sciences.
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Aristizabal" given-names: "Fernando" orcid: "https://orcid.org/0000-0000-0000-0000" title: "gVal: Geospatial Evaluation Engine" version: 0.0.0.1 doi: date-released: url: "https://github.com/fernando-aristizabal/gVal"
GitHub Events
Total
Last Year
Committers
Last synced: about 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| fernando-aristizabal | f****a@d****g | 94 |
| Fernando Aristizabal | 1****l | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- python 3.11-rc-bullseye build
- PyYAML ==6.0
- affine ==2.3.1
- attrs ==22.1.0
- certifi ==2022.9.24
- click ==8.1.3
- click-plugins ==1.1.1
- cligj ==0.7.2
- cloudpickle ==2.2.0
- dask ==2022.9.1
- fsspec ==2022.8.2
- iniconfig ==1.1.1
- locket ==1.0.0
- numpy ==1.23.3
- packaging ==21.3
- pandas ==1.5.0
- partd ==1.3.0
- pluggy ==1.0.0
- py ==1.11.0
- py-cpuinfo ==8.0.0
- pyparsing ==3.0.9
- pyproj ==3.4.0
- pytest ==7.1.3
- pytest-benchmark ==3.4.1
- python-dateutil ==2.8.2
- pytz ==2022.2.1
- rasterio ==1.3.2
- rioxarray ==0.12.2
- six ==1.16.0
- snuggs ==1.4.7
- tomli ==2.0.1
- toolz ==0.12.0
- xarray ==2022.6.0
- allure-pytest * test
- pytest * test
- pytest-benchmark * test
- pytest-flakes * test
- pytest-pep8 * test