environment-geodatabase
Repository to automate the building and sharing of GIS data using Docker, OGC standards and PostGIS tools.
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: springer.com, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.0%) to scientific vocabulary
Repository
Repository to automate the building and sharing of GIS data using Docker, OGC standards and PostGIS tools.
Basic Info
- Host: GitHub
- Owner: GMU-GeoSciences
- License: gpl-3.0
- Language: Jupyter Notebook
- Default Branch: main
- Size: 7.66 MB
Statistics
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 2
Metadata Files
README.md
Environmental Geodatabase
Repository to automate the building and sharing of GIS data using Docker, OGC standards and PostGIS tools.
This git repository is aimed at deploying a spatial database and OGC API capable of storing GPS tracking data and making it available to users as: - Raw and aggregated GPS position data - Publically available raster and vector data describing the environment - Publically available human population data - Create aggregations of the environment data
This repository creates a Postgres+PostGIS+TimescaleDB database using docker and docker compose. This database is configured using files stored in ./config, scripts stored in ./build/dbinitscripts, and initial data stored in ./build/dbinitdata.
Quick Start
To run this project you will need docker installed and configured on your machine.
```bash
Clone this repository to your machine:
git clone https://github.com/GMU-GeoSciences/environment-geodatabase
Move the sample.env file to the root directoryand rename it to .env
cp ./config/sample.env .env
Edit the file using your text editor of choice
nano .env
Build the database container using docker. Repeat this whenever you edit the files in ./build
docker compose build
Run the container, in the background
docker compose up -d
Check the log files to see what's going on
docker compose logs -f ```
Once the database has been created and all the scripts completed the GPS data needs to be inserted. This is either done with live data (to-do) or with historical data. Once GPS data is inserted the materialized views in the DB must be updated.
Methodology
This section covers what this data is being used for, how it is being aggregated and the end goal:
- Data is downloaded from the internet, or gathered from files and inserted into the Postgres database.
- Various views are created that describe deer behaviour
- Various views are created that describe the local environment
- These views are aggregated over a hexagonal grid and used as labels and input features for the training and testing of a machine learning algorithm
- This algorithm is used to predict deer liklihood for areas not covered by deer with GPS transmitters.
Input Data
White Tail deer GPS tracking data is being collected and is saved as a CSV file. This file is being processed and inserted into the database with the "Insert into DB" notebook.
NLCD Raster environmental data is also collected and inserted into the database: - 2019 NLCD Canopy Cover: NLCD tree canopy cover geospatial datasets with spatial resolutions of 30 m are produced by the USDA Forest Service. Tree canopy cover is derived from multi-spectral satellite imagery and other available ground and ancillary information. - 2019 NLCD Impervious Surfaces: NLCD imperviousness products represent urban impervious surfaces as a percentage of developed surface over every 30-meter pixel in the United States. - 2019 NLCD LandCover: The National Land Cover Database (NLCD) provides nationwide data on land cover and land cover change at a 30m resolution with a 16-class legend based on a modified Anderson Level II classification system.
OpenStreetMap vector data for Maryland is downloaded and inserted into the DB: - Downloaded from https://download.geofabrik.de/north-america/us/maryland.html - Inserted using https://osm2pgsql.org/
Howard County Census Block data is downloaded and inserted: - Downloaded from https://www2.census.gov/geo/tiger/GENZ2023/shp/ - Inserted using shp2pgsql
High Resolution Human Population data is downloaded, processed and inserted into DB: - Downloaded from https://data.humdata.org/dataset/united-states-high-resolution-population-density-maps-demographic-estimates - Processed and inserted using "PopDensity" notebook. - Processing removes all points not within Howard County and joins all the different population types into a single geometry. - Data is inserted into DB using same notebook
Generated Data
Data is spatially aggregated over a hexagon grid that covers the area of interest. Features are created for the grid and allow the creation of a model that uses the environmental data to predict the likelihood of deer being present within the hexagon. These features are generated from materialized views created in the 550rasterhexsummary.sql and 600api_functions.sql (needs a rename) files.
Published Data
Data is made available as views within the PostGISFTW (for the web) schema. Views here are automatically published as WFS layers using PG Featureserv:


Environmental Features
These are still under development and need to be formalised and added to the start up scripts.

Initial testing has been done using the following variables: - Canopy/Impervious Surface Cluster Size: Clusters are created from hexagons that have neighbouring cells with similar feature values. The cells contain the number of cells within the same cluster. - Population Total: Night Time Population count within grid cell - Canopy Mean/Max/Min/Stddev: Statistics derived from NLCD raster clipped to grid cell - Impervious Surfaces Mean/Max/Min/Stddev: Statistics derived from NLCD raster clipped to grid cell - Built Area: Portion of hex cell covered by OSM defined buildings - Asphalt/Highway Length: Length of OSM defined highways or asphalt roads within grid cell
More work needs to be done to determine features that are related to neighbouring cells. Something like the Fragstats connectivity or spatial autocorellation.

Deer Probability Classification or Regression
A window function is used over GPS data to determine the time and distance delta's from the same device id. This allows us to determine the speed between GPS messages. When this is plotted onto a map it becomes clear that deer have locations where they prefer to rest and they travel between these locations:
Sub-Directory Readme's
Owner
- Name: GMU Geography and Geosciences
- Login: GMU-GeoSciences
- Kind: organization
- Repositories: 1
- Profile: https://github.com/GMU-GeoSciences
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- given-names: Rory
family-names: Meyer
affiliation: GMU
orcid: "https://orcid.org/0000-0002-0440-215X"
- given-names: Taylor M.
family-names: Anderson
affiliation: GMU
orcid: "https://orcid.org/0000-0003-1145-0608"
title: "Automated Environmental GeoDatabase"
abstract: "Repository to automate the building and sharing of GIS data using Docker, OGC standards and PostGIS tools."
doi: 10.5281/zenodo.13936374
url: "https://github.com/GMU-GeoSciences/environment-geodatabase"
version: v1.0.1
date-released: 2024-10-15
GitHub Events
Total
- Release event: 2
- Watch event: 1
- Push event: 6
- Create event: 2
Last Year
- Release event: 2
- Watch event: 1
- Push event: 6
- Create event: 2