environment-geodatabase

Repository to automate the building and sharing of GIS data using Docker, OGC standards and PostGIS tools.

https://github.com/gmu-geosciences/environment-geodatabase

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: springer.com, zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.0%) to scientific vocabulary
Last synced: 9 months ago · JSON representation ·

Repository

Repository to automate the building and sharing of GIS data using Docker, OGC standards and PostGIS tools.

Basic Info
  • Host: GitHub
  • Owner: GMU-GeoSciences
  • License: gpl-3.0
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 7.66 MB
Statistics
  • Stars: 1
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 2
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

DOI

Environmental Geodatabase

Repository to automate the building and sharing of GIS data using Docker, OGC standards and PostGIS tools.

This git repository is aimed at deploying a spatial database and OGC API capable of storing GPS tracking data and making it available to users as: - Raw and aggregated GPS position data - Publically available raster and vector data describing the environment - Publically available human population data - Create aggregations of the environment data

This repository creates a Postgres+PostGIS+TimescaleDB database using docker and docker compose. This database is configured using files stored in ./config, scripts stored in ./build/dbinitscripts, and initial data stored in ./build/dbinitdata.

Quick Start

To run this project you will need docker installed and configured on your machine.

```bash

Clone this repository to your machine:

git clone https://github.com/GMU-GeoSciences/environment-geodatabase

Move the sample.env file to the root directoryand rename it to .env

cp ./config/sample.env .env

Edit the file using your text editor of choice

nano .env

Build the database container using docker. Repeat this whenever you edit the files in ./build

docker compose build

Run the container, in the background

docker compose up -d

Check the log files to see what's going on

docker compose logs -f ```

Once the database has been created and all the scripts completed the GPS data needs to be inserted. This is either done with live data (to-do) or with historical data. Once GPS data is inserted the materialized views in the DB must be updated.

Methodology

This section covers what this data is being used for, how it is being aggregated and the end goal:

  • Data is downloaded from the internet, or gathered from files and inserted into the Postgres database.
  • Various views are created that describe deer behaviour
  • Various views are created that describe the local environment
  • These views are aggregated over a hexagonal grid and used as labels and input features for the training and testing of a machine learning algorithm
  • This algorithm is used to predict deer liklihood for areas not covered by deer with GPS transmitters.

Input Data

White Tail deer GPS tracking data is being collected and is saved as a CSV file. This file is being processed and inserted into the database with the "Insert into DB" notebook.

NLCD Raster environmental data is also collected and inserted into the database: - 2019 NLCD Canopy Cover: NLCD tree canopy cover geospatial datasets with spatial resolutions of 30 m are produced by the USDA Forest Service. Tree canopy cover is derived from multi-spectral satellite imagery and other available ground and ancillary information. - 2019 NLCD Impervious Surfaces: NLCD imperviousness products represent urban impervious surfaces as a percentage of developed surface over every 30-meter pixel in the United States. - 2019 NLCD LandCover: The National Land Cover Database (NLCD) provides nationwide data on land cover and land cover change at a 30m resolution with a 16-class legend based on a modified Anderson Level II classification system.

OpenStreetMap vector data for Maryland is downloaded and inserted into the DB: - Downloaded from https://download.geofabrik.de/north-america/us/maryland.html - Inserted using https://osm2pgsql.org/

Howard County Census Block data is downloaded and inserted: - Downloaded from https://www2.census.gov/geo/tiger/GENZ2023/shp/ - Inserted using shp2pgsql

High Resolution Human Population data is downloaded, processed and inserted into DB: - Downloaded from https://data.humdata.org/dataset/united-states-high-resolution-population-density-maps-demographic-estimates - Processed and inserted using "PopDensity" notebook. - Processing removes all points not within Howard County and joins all the different population types into a single geometry. - Data is inserted into DB using same notebook

Generated Data

Data is spatially aggregated over a hexagon grid that covers the area of interest. Features are created for the grid and allow the creation of a model that uses the environmental data to predict the likelihood of deer being present within the hexagon. These features are generated from materialized views created in the 550rasterhexsummary.sql and 600api_functions.sql (needs a rename) files.

Published Data

Data is made available as views within the PostGISFTW (for the web) schema. Views here are automatically published as WFS layers using PG Featureserv:

image

image

Environmental Features

These are still under development and need to be formalised and added to the start up scripts.

image

Initial testing has been done using the following variables: - Canopy/Impervious Surface Cluster Size: Clusters are created from hexagons that have neighbouring cells with similar feature values. The cells contain the number of cells within the same cluster. - Population Total: Night Time Population count within grid cell - Canopy Mean/Max/Min/Stddev: Statistics derived from NLCD raster clipped to grid cell - Impervious Surfaces Mean/Max/Min/Stddev: Statistics derived from NLCD raster clipped to grid cell - Built Area: Portion of hex cell covered by OSM defined buildings - Asphalt/Highway Length: Length of OSM defined highways or asphalt roads within grid cell

More work needs to be done to determine features that are related to neighbouring cells. Something like the Fragstats connectivity or spatial autocorellation.

Forest Clusters

Deer Probability Classification or Regression

A window function is used over GPS data to determine the time and distance delta's from the same device id. This allows us to determine the speed between GPS messages. When this is plotted onto a map it becomes clear that deer have locations where they prefer to rest and they travel between these locations:

Sub-Directory Readme's

Owner

  • Name: GMU Geography and Geosciences
  • Login: GMU-GeoSciences
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - given-names: Rory
    family-names: Meyer
    affiliation: GMU
    orcid: "https://orcid.org/0000-0002-0440-215X"
  - given-names: Taylor M.
    family-names: Anderson
    affiliation: GMU
    orcid: "https://orcid.org/0000-0003-1145-0608"
title: "Automated Environmental GeoDatabase"
abstract: "Repository to automate the building and sharing of GIS data using Docker, OGC standards and PostGIS tools."
doi: 10.5281/zenodo.13936374
url: "https://github.com/GMU-GeoSciences/environment-geodatabase"
version: v1.0.1
date-released: 2024-10-15

GitHub Events

Total
  • Release event: 2
  • Watch event: 1
  • Push event: 6
  • Create event: 2
Last Year
  • Release event: 2
  • Watch event: 1
  • Push event: 6
  • Create event: 2