srai

Spatial Representations for Artificial Intelligence - a Python library toolkit for geospatial machine learning focused on creating embeddings for downstream tasks

https://github.com/kraina-ai/srai

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 11 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, acm.org
  • Committers with academic emails
    3 of 14 committers (21.4%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.4%) to scientific vocabulary

Keywords

artificial-intelligence data-science geo geospatial machine-learning python spatial spatial-analysis srai

Keywords from Contributors

energy-system-model
Last synced: 6 months ago · JSON representation ·

Repository

Spatial Representations for Artificial Intelligence - a Python library toolkit for geospatial machine learning focused on creating embeddings for downstream tasks

Basic Info
Statistics
  • Stars: 309
  • Watchers: 12
  • Forks: 27
  • Open Issues: 103
  • Releases: 37
Topics
artificial-intelligence data-science geo geospatial machine-learning python spatial spatial-analysis srai
Created over 3 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog Contributing License Citation

README.md

GitHub Checks GitHub Workflow Status - DEV GitHub Workflow Status - PROD pre-commit.ci status CodeFactor Grade Codecov Package version Supported Python versions PyPI - Downloads

Spatial Representations for Artificial Intelligence

⚠️🚧 This library is under HEAVY development. Expect breaking changes between minor versions 🚧⚠️

💬 Feel free to open an issue if you find anything confusing or not working 💬

Project Spatial Representations for Artificial Intelligence (srai) aims to provide simple and efficient solutions to geospatial problems that are accessible to everybody and reusable in various contexts where geospatial data can be used. It is a Python module integrating many geo-related algorithms in a single package with unified API. Please see getting started for installation and quick start instructions.

Use cases

In the current state, srai provides the following functionalities:

  • OSM data download - downloading OpenStreetMap data for a given area using different sources
  • OSM data processing - processing OSM data to extract useful information (e.g. road network, buildings, POIs, etc.)
  • GTFS processing - extracting features from GTFS data
  • Regionalization - splitting a given area into smaller regions using different algorithms (e.g. Uber's H3[1], Voronoi, etc.)
  • Embedding - embedding regions into a vector space based on different spatial features, and using different algorithms (eg. hex2vec[2], etc.)
  • Utilities for spatial data visualization and processing

For future releases, we plan to add more functionalities, such as:

  • Pre-computed embeddings - pre-computed embeddings for different regions and different embedding algorithms
  • Full pipelines - full pipelines for different embedding approaches, pre-configured from srai components
  • Image data download and processing - downloading and processing image data (eg. OSM tiles, etc.)

End-to-end examples

Right now, srai provides a toolset for data download and processing sufficient to solve downstream tasks. Please see this project by @RaczeQ, which predicts Bike Sharing System (BSS) stations' locations for a wide range of cities worldwide.

For srai integration into full kedro pipeline, see this project by @Calychas.

Installation

To install srai simply run:

bash pip install srai

This will install the srai package and dependencies required by most of the use cases. There are several optional dependencies that can be installed to enable additional functionality. These are listed in the optional dependencies section.

Optional dependencies

The following optional dependencies can be installed to enable additional functionality:

  • srai[all] - all optional dependencies
  • srai[osm] - dependencies required to download OpenStreetMap data
  • srai[voronoi] - dependencies to use Voronoi-based regionalization method
  • srai[gtfs] - dependencies to process GTFS data
  • srai[plotting] - dependencies to plot graphs and maps
  • srai[torch] - dependencies to use torch-based embedders

Tutorial

For a full tutorial on srai and geospatial data in general visit the srai-tutorial repository. It contains easy to follow jupyter notebooks concentrating on every part of the library. Additionally, there is a recording available from the EuroScipy 2023 conference covering that material.

Usage

If you prefer an interactive notebook, examples of srai usage are available in this Colab Notebook

Downloading OSM data

To download OSM data for a given area, using a set of tags use one of OSMLoader classes:

  • OSMOnlineLoader - downloads data from OpenStreetMap API using osmnx - this is faster for smaller areas or tags counts
  • OSMPbfLoader - loads data from automatically downloaded PBF file from protomaps - this is faster for larger areas or tags counts

Example with OSMOnlineLoader:

```python from srai.loaders import OSMOnlineLoader from srai.plotting import plotregions from srai.regionalizers import geocodetoregiongdf

query = {"leisure": "park"} area = geocodetoregion_gdf("Wrocław, Poland") loader = OSMOnlineLoader()

parksgdf = loader.load(area, query) foliummap = plotregions(area, colormap=["rgba(0,0,0,0)"], tilesstyle="CartoDB positron") parksgdf.explore(m=foliummap, color="forestgreen") ```

Downloading road network

Road network downloading is a special case of OSM data downloading. To download road network for a given area, use OSMWayLoader class:

```python from srai.loaders import OSMNetworkType, OSMWayLoader from srai.plotting import plotregions from srai.regionalizers import geocodetoregiongdf

area = geocodetoregion_gdf("Utrecht, Netherlands") loader = OSMWayLoader(OSMNetworkType.BIKE)

nodes, edges = loader.load(area)

foliummap = plotregions(area, colormap=["rgba(0,0,0,0.1)"], tilesstyle="CartoDB positron") edges[["geometry"]].explore(m=foliummap, color="seagreen") ```

Downloading GTFS data

To extract features from GTFS use GTFSLoader. It will extract trip count and available directions for each stop in 1h time windows.

```python from pathlib import Path

from srai.loaders import GTFSLoader, downloadfile from srai.plotting import plotregions from srai.regionalizers import geocodetoregion_gdf

area = geocodetoregiongdf("Vienna, Austria") gtfsfile = Path("viennagtfs.zip") downloadfile("https://transitfeeds.com/p/stadt-wien/888/latest/download", gtfsfile.asposix()) loader = GTFSLoader()

features = loader.load(gtfs_file)

foliummap = plotregions(area, colormap=["rgba(0,0,0,0.1)"], tilesstyle="CartoDB positron") features[["tripsat8", "geometry"]].explore("tripsat8", m=foliummap) ```

Regionalization

Regionalization is a process of dividing a given area into smaller regions. This can be done in a variety of ways:

  • H3Regionalizer - regionalization using Uber's H3 library
  • S2Regionalizer - regionalization using Google's S2 library
  • VoronoiRegionalizer - regionalization using Voronoi diagram
  • AdministativeBoundaryRegionalizer - regionalization using administrative boundaries

Example:

```python from srai.regionalizers import H3Regionalizer, geocodetoregion_gdf

area = geocodetoregion_gdf("Berlin, Germany") regionalizer = H3Regionalizer(resolution=7)

regions = regionalizer.transform(area)

foliummap = plotregions(area, colormap=["rgba(0,0,0,0.1)"], tilesstyle="CartoDB positron") plotregions(regionsgdf=regions, map=foliummap) ```

Embedding

Embedding is a process of mapping regions into a vector space. This can be done in a variety of ways:

  • Hex2VecEmbedder - embedding using hex2vec[1] algorithm
  • GTFS2VecEmbedder - embedding using GTFS2Vec[2] algorithm
  • CountEmbedder - embedding based on features counts
  • ContextualCountEmbedder - embedding based on features counts with neighbourhood context (proposed in [3])
  • Highway2VecEmbedder - embedding using Highway2Vec[4] algorithm

All of those methods share the same API. All of them require results from Loader (load features), Regionalizer (split area into regions) and Joiner (join features to regions) to work. An example using CountEmbedder:

```python from srai.embedders import CountEmbedder from srai.joiners import IntersectionJoiner from srai.loaders import OSMOnlineLoader from srai.plotting import plotregions, plotnumericdata from srai.regionalizers import H3Regionalizer, geocodetoregiongdf

loader = OSMOnlineLoader() regionalizer = H3Regionalizer(resolution=9) joiner = IntersectionJoiner()

query = {"amenity": "bicycleparking"} area = geocodetoregiongdf("Malmö, Sweden") features = loader.load(area, query) regions = regionalizer.transform(area) joint = joiner.transform(regions, features)

embedder = CountEmbedder() embeddings = embedder.transform(regions, features, joint)

foliummap = plotregions(area, colormap=["rgba(0,0,0,0.1)"], tilesstyle="CartoDB positron") plotnumericdata(regions, "amenitybicycleparking", embeddings, map=foliummap) ```

CountEmbedder is a simple method, which does not require fitting. Other methods, such as Hex2VecEmbedder or GTFS2VecEmbedder require fitting and can be used in a similar way to scikit-learn estimators:

```python from srai.embedders import Hex2VecEmbedder from srai.joiners import IntersectionJoiner from srai.loaders import OSMPbfLoader from srai.loaders.osmloaders.filters import HEX2VECFILTER from srai.neighbourhoods.h3neighbourhood import H3Neighbourhood from srai.regionalizers import H3Regionalizer, geocodetoregiongdf from srai.plotting import plotregions, plotnumeric_data

loader = OSMPbfLoader() regionalizer = H3Regionalizer(resolution=11) joiner = IntersectionJoiner()

area = geocodetoregiongdf("City of London") features = loader.load(area, HEX2VECFILTER) regions = regionalizer.transform(area) joint = joiner.transform(regions, features)

embedder = Hex2VecEmbedder() neighbourhood = H3Neighbourhood(regions_gdf=regions)

embedder = Hex2VecEmbedder([15, 10, 3])

Option 1: fit and transform

embedder.fit(regions, features, joint, neighbourhood, batch_size=128)

embeddings = embedder.transform(regions, features, joint)

Option 2: fit_transform

embeddings = embedder.fittransform(regions, features, joint, neighbourhood, batchsize=128)

foliummap = plotregions(area, colormap=["rgba(0,0,0,0.1)"], tilesstyle="CartoDB positron") plotnumericdata(regions, 0, embeddings, map=foliummap) ```

Pre-trained models usage

We provide pre-trained models for some of the embedding methods. To use them, simply download them from here and load them using load method:

```python from srai.embedders import Hex2VecEmbedder

modelpath = "path/to/model" embedder = Hex2VecEmbedder.load(modelpath) ```

Plotting, utilities and more

We also provide utilities for different spatial operations and plotting functions adopted to data formats used in srai For a full list of available methods, please refer to the documentation.

Contributing

If you are willing to contribute to srai, feel free to do so! Visit our contributing guide for more details.

Publications

Some of the methods implemented in srai have been published in scientific journals and conferences.

  1. Szymon Woźniak and Piotr Szymański. 2021. Hex2vec: Context-Aware Embedding H3 Hexagons with OpenStreetMap Tags. In Proceedings of the 4th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery (GEOAI '21). Association for Computing Machinery, New York, NY, USA, 61–71. paper, arXiv
  2. Piotr Gramacki, Szymon Woźniak, and Piotr Szymański. 2021. Gtfs2vec: Learning GTFS Embeddings for comparing Public Transport Offer in Microregions. In Proceedings of the 1st ACM SIGSPATIAL International Workshop on Searching and Mining Large Collections of Geospatial Data (GeoSearch'21). Association for Computing Machinery, New York, NY, USA, 5–12. paper, arXiv
  3. Kamil Raczycki and Piotr Szymański. 2021. Transfer learning approach to bicycle-sharing systems' station location planning using OpenStreetMap data. In Proceedings of the 4th ACM SIGSPATIAL International Workshop on Advances in Resilient and Intelligent Cities (ARIC '21). Association for Computing Machinery, New York, NY, USA, 1–12. paper, arXiv
  4. Kacper Leśniara and Piotr Szymański. 2022. Highway2vec: representing OpenStreetMap microregions with respect to their road network characteristics. In Proceedings of the 5th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery (GeoAI '22). Association for Computing Machinery, New York, NY, USA, 18–29. paper, arXiv
  5. Daniele Donghi and Anne Morvan. 2023. GeoVeX: Geospatial Vectors with Hexagonal Convolutional Autoencoders. In Proceedings of the 6th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery (GeoAI '23). Association for Computing Machinery, New York, NY, USA, 3–13. paper

Acknowledgements

We would like to thank Piotr Szymański PhD (@niedakh) for his invaluable guidance and support in the development of this library. His expertise and mentorship have been instrumental in shaping the library's design and functionality, and we are very grateful for his input.

Citation

If you wish to cite the SRAI library, please use our paper

bibtex @inproceedings{ Gramacki_SRAI_Towards_Standardization_2023, author = { Gramacki, Piotr and Leśniara, Kacper and Raczycki, Kamil and Woźniak, Szymon and Przymus, Marcin and Szymański, Piotr }, booktitle = {Proceedings of the 6th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery}, month = nov, publisher = {Association for Computing Machinery}, title = {{SRAI: Towards Standardization of Geospatial AI}}, url = {https://dl.acm.org/doi/10.1145/3615886.3627740}, year = {2023} }

License

This library is licensed under the Apache License 2.0.

The free OpenStreetMap data, which is used for the development of SRAI, is licensed under the Open Data Commons Open Database License (ODbL) by the OpenStreetMap Foundation (OSMF).

Owner

  • Name: kraina-ai
  • Login: kraina-ai
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.2.0
title: 'SRAI: Spatial Representations for Artificial Intelligence'
message: 'If you use this software, please cite it as below.'
type: software
authors:
  - family-names: Gramacki
    given-names: Piotr
    orcid: 'https://orcid.org/0000-0002-4587-5586'
  - family-names: Leśniara
    given-names: Kacper
    orcid: 'https://orcid.org/0000-0003-0875-4301'
  - family-names: Raczycki
    given-names: Kamil
    orcid: 'https://orcid.org/0000-0002-3715-4869'
  - family-names: Woźniak
    given-names: Szymon
    orcid: 'https://orcid.org/0000-0002-2047-1649'
identifiers:
  - type: doi
    value: 10.1145/3615886.3627740
repository-code: 'https://github.com/kraina-ai/srai/'
url: 'https://kraina-ai.github.io/srai/'
keywords:
  - python
  - data-science
  - machine-learning
  - artificial-intelligence
  - geo
  - spatial
  - geospatial
  - spatial-analysis
license: Apache-2.0
version: 0.9.7
date-released: '2022-11-23'
preferred-citation:
  type: conference-paper
  title: 'SRAI: Towards Standardization of Geospatial AI'
  authors:
    - family-names: Gramacki
      given-names: Piotr
      orcid: 'https://orcid.org/0000-0002-4587-5586'
    - family-names: Leśniara
      given-names: Kacper
      orcid: 'https://orcid.org/0000-0003-0875-4301'
    - family-names: Raczycki
      given-names: Kamil
      orcid: 'https://orcid.org/0000-0002-3715-4869'
    - family-names: Woźniak
      given-names: Szymon
      orcid: 'https://orcid.org/0000-0002-2047-1649'
    - family-names: Przymus
      given-names: Marcin
      orcid: 'https://orcid.org/0009-0004-7741-8541'
    - family-names: Szymański
      given-names: Piotr
      orcid: 'https://orcid.org/0000-0002-7733-3239'
  collection-title: 'Proceedings of the 6th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery'
  collection-type: proceedings
  month: 11
  year: 2023
  publisher:
    name: 'Association for Computing Machinery'
  url: 'https://dl.acm.org/doi/10.1145/3615886.3627740'
  identifiers:
    - type: other
      value: 'arXiv:2310.13098'
      description: 'The arXiv preprint of the paper'

GitHub Events

Total
  • Create event: 53
  • Release event: 13
  • Issues event: 43
  • Watch event: 78
  • Delete event: 40
  • Issue comment event: 74
  • Push event: 228
  • Pull request review event: 13
  • Pull request review comment event: 9
  • Pull request event: 89
  • Fork event: 10
Last Year
  • Create event: 53
  • Release event: 13
  • Issues event: 43
  • Watch event: 78
  • Delete event: 40
  • Issue comment event: 74
  • Push event: 228
  • Pull request review event: 13
  • Pull request review comment event: 9
  • Pull request event: 89
  • Fork event: 10

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 311
  • Total Committers: 14
  • Avg Commits per committer: 22.214
  • Development Distribution Score (DDS): 0.379
Past Year
  • Commits: 74
  • Committers: 6
  • Avg Commits per committer: 12.333
  • Development Distribution Score (DDS): 0.27
Top Committers
Name Email Commits
Kamil Raczycki r****l@g****m 193
Kacper Leśniara k****a@g****m 35
Piotr Gramacki 3****i 29
Kraina CI/CD 1****d 22
Szymon Woźniak 3****r 17
pre-commit-ci[bot] 6****] 6
Kacper Leśniara k****a@g****m 2
Enzo Bonnal b****v@g****m 1
Filip Czaplicki g****b@s****m 1
Max Schrader m****r@c****u 1
Mohamed Amine Bouzaghrane a****e@b****u 1
Shoaib Burq s****q@g****m 1
Zack Aemmer 5****r 1
mprzymus 2****9@s****l 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 115
  • Total pull requests: 167
  • Average time to close issues: 4 months
  • Average time to close pull requests: 12 days
  • Total issue authors: 14
  • Total pull request authors: 15
  • Average comments per issue: 0.52
  • Average comments per pull request: 1.05
  • Merged pull requests: 128
  • Bot issues: 0
  • Bot pull requests: 5
Past Year
  • Issues: 24
  • Pull requests: 85
  • Average time to close issues: 2 days
  • Average time to close pull requests: 4 days
  • Issue authors: 5
  • Pull request authors: 9
  • Average comments per issue: 0.42
  • Average comments per pull request: 0.96
  • Merged pull requests: 57
  • Bot issues: 0
  • Bot pull requests: 2
Top Authors
Issue Authors
  • RaczeQ (60)
  • Calychas (36)
  • piotrgramacki (3)
  • simonusher (3)
  • shichao-ma (1)
  • wxl112 (1)
  • reza-soltani (1)
  • ca9071jp2 (1)
  • vhvictorhugo (1)
  • adesso-dominik-chodounsky (1)
  • pmallas (1)
  • raphael10-collab (1)
  • jsfinesse (1)
  • mschrader15 (1)
Pull Request Authors
  • RaczeQ (95)
  • kraina-cicd (51)
  • Calychas (8)
  • mskaa3 (8)
  • simonusher (8)
  • pre-commit-ci[bot] (7)
  • Repcak2000 (4)
  • piotrgramacki (3)
  • mschrader15 (2)
  • bouzaghrane (2)
  • ebonnal (2)
  • Oceankok (2)
  • zackAemmer (2)
  • hubkrieb (2)
  • sabman (1)
Top Labels
Issue Labels
enhancement (26) bug (11) documentation (7) missing-tests (6) question (2) invalid (1)
Pull Request Labels
release (51) Skip-Changelog (19) enhancement (6) hacktoberfest-accepted (6) bug (3)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 360 last-month
  • Total docker downloads: 10
  • Total dependent packages: 0
  • Total dependent repositories: 2
  • Total versions: 39
  • Total maintainers: 2
pypi.org: srai

A set of python modules for geospatial machine learning and data mining

  • Versions: 39
  • Dependent Packages: 0
  • Dependent Repositories: 2
  • Downloads: 360 Last month
  • Docker Downloads: 10
Rankings
Docker downloads count: 4.3%
Stargazers count: 8.1%
Dependent packages count: 10.1%
Average: 10.4%
Downloads: 11.5%
Dependent repos count: 11.5%
Forks count: 16.9%
Maintainers (2)
Last synced: 6 months ago

Dependencies

.github/workflows/_tests.yml actions
  • actions/cache v3 composite
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • codecov/codecov-action v3 composite
  • pdm-project/setup-pdm v3 composite
.github/workflows/ci-dev.yml actions
  • actions/cache v3 composite
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • jannekem/run-python-script-action v1 composite
  • pdm-project/setup-pdm v3 composite
.github/workflows/ci-prod.yml actions
  • actions/cache v3 composite
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • jannekem/run-python-script-action v1 composite
  • pdm-project/setup-pdm v3 composite
.github/workflows/run-manual-pre-commit.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • pre-commit/action v3.0.0 composite
pyproject.toml pypi
  • geopandas >=0.11.1
  • geoparquet >=0.0.3
  • h3 >=4.0.0b1
  • numpy >=1.23.4
  • pandas >=1.3
  • pyarrow >=10.0.0
  • pyfunctional >=1.4.3
  • rtree >=1.0.1
  • s2 >=0.1.9
  • scipy >=1.9.3
  • shapely >=1.8.5.post1
  • topojson >=1.5
  • tqdm >=4.64.1
.github/workflows/run-changelog-enforcer.yml actions
  • actions/checkout v3 composite
  • dangoslen/changelog-enforcer v3 composite
.github/workflows/run-tests.yml actions
.github/workflows/bump-and-pr.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v4 composite
  • pdm-project/setup-pdm v3 composite
  • peter-evans/create-pull-request v5 composite
  • release-flow/keep-a-changelog-action v2 composite
.github/workflows/gh-release.yml actions
  • actions/checkout v4 composite
  • ffurrer2/extract-release-notes v1 composite
  • softprops/action-gh-release v1 composite
  • winterjung/split v2 composite
.github/workflows/test-dev.yml actions
.github/workflows/generate-dev-docs.yml actions
  • actions/cache v3 composite
  • actions/checkout v4 composite
  • actions/setup-python v4 composite
  • jannekem/run-python-script-action v1 composite
.github/workflows/manual_tests.yml actions
  • actions/cache v3 composite
  • actions/checkout v4 composite
  • actions/setup-python v4 composite
.github/workflows/redeploy_docs.yml actions
  • actions/checkout v4 composite
  • actions/deploy-pages v4 composite
  • actions/upload-pages-artifact v3 composite