geospatial-benchmark

Benchmark NoSQL(-like) database with geospatial.

https://github.com/guiyunbao/geospatial-benchmark

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.8%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Benchmark NoSQL(-like) database with geospatial.

Basic Info

Host: GitHub
Owner: guiyunbao
License: cc-by-sa-4.0
Language: TypeScript
Default Branch: main
Size: 18.9 MB

Statistics

Stars: 1
Watchers: 1
Forks: 1
Open Issues: 0
Releases: 1

Created almost 3 years ago · Last pushed almost 3 years ago

Metadata Files

Readme License Citation

README.md

Benchmark geospatial query performance on NoSQL(-like) databases

The full paper can be found in Releases. Source paper can be found at report.typ. You may want to compile it with typst.

Test Environment

Databases
- MongoDB Enterprise Server@latest
- Redis (Stack)@latest
Programming Language and lib
Host
- GitHub Codespaces
- GitHub Actions

Run Benchmarks

```log npm run start [-- [options]]

-t, --type type of dataset "inat2017", "random", "grid", "cluster" -c, --count number of data points (default: "100000") -r, --repeat number of repeat (default: "1") ```

Test dataset

By default, this benchmark uses iNaturalist 2017's Fine Grained Geolocation Datasets (visipedia/fg_geo). Which contains 654,818 records of geolocation point.

File was placed at datasets/inat2017/inat2017_file_name_to_geo.csv with header.

Format:

csv filename,latitude,longitude

We also provide 3 other runtime generated datasets:

Random
- Points are totally placed by RNG.
Grid
- Points separated evenly around the earth.
- Using Fibonacci sphere algorithm.
Cluster
- Every 50 points will be placed together with a bit offset as a cluster, and all clusters will be distributed randomly.

Test queries

Data should be loaded into the database before running the queries.
- Warm up index is allowed.
- Warm up query is not allowed.
Storage cost will be calculated.
- For memory-storage databases, both memory and persist storage cost will be calculated.
In theory, all databases should return the same result.

Basic test requires all queries runs one by one in a single process/thread.
Advanced test allows queries to run in parallel, and allows to optimized for test host.

Query A: Find nearest location

Pick a random location, find the closest location in the dataset.

Query B: Find locations within a radius

Pick a random location from the dataset, find all locations within certain distance.

Query C: Find locations within a radius and order them.

Pick a random location from the dataset, find all locations within certain distance, order by distance.

License and Citation

This project is licensed under the terms of the CC BY-SA 4.0 license.

To cite this report, check CITATION.cff.

Owner

Name: 桂运宝
Login: guiyunbao
Kind: organization
Location: China

Website: https://guiyunbao.cn/
Repositories: 1
Profile: https://github.com/guiyunbao

广西桂运宝科技有限公司

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this result, please cite it as below."
authors:
- family-names: "Shen"
  given-names: "LiangXiang"
  url: "https://github.com/kj415j45"
  email: "kj415j45@gmail.com"
  affiliation: "Guangxi GuiYunBao Tech Inc."
- family-names: "Zhou"
  given-names: "Feng"
  url: "https://github.com/guimasharing"
  affiliation: "Guangxi GuiYunBao Tech Inc."
title: "Benchmark geospatial query performance on NoSQL(-like) databases"
version: 1.0.0
url: "https://github.com/guiyunbao/geospatial-benchmark"

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science