https://github.com/ai-team-uoa/geotriples
Publishing Big Geospatial data as Linked Open Geospatial Data
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.1%) to scientific vocabulary
Keywords
Repository
Publishing Big Geospatial data as Linked Open Geospatial Data
Basic Info
- Host: GitHub
- Owner: AI-team-UoA
- License: apache-2.0
- Language: Java
- Default Branch: master
- Homepage: https://geotriples.di.uoa.gr
- Size: 276 MB
Statistics
- Stars: 39
- Watchers: 13
- Forks: 14
- Open Issues: 19
- Releases: 0
Topics
Metadata Files
README.md
GeoTriples 
Publishing geospatial data as Linked Geospatial Data. GeoTriples generates and processes extended R2RML and RML mappings that transform geospatial data from many input formats into RDF. GeoTriples allows the transformation of geospatial data stored in raw files (shapefiles, CSV, KML, XML, GML and GeoJSON) and spatially-enabled RDBMS (PostGIS and MonetDB) into RDF graphs using well-known vocabularies like GeoSPARQL and stSPARQL, but without being tightly coupled to a specific vocabulary.
Quickstart
Use GeoTriples binaries (Unix)
Assuming Java 8 is installed:
Download GeoTriples binaries from here
* Unzip the downloaded file geotriples-<version>-bin.zip
* Change directory to geotriples-<version>-bin
* Under the bin directory you can find the available starter script for GeoTriples
Generate Mapping files:
bin/geotriples-all generate_mapping -o <output_file (.ttl)> -b <URI base> <input file>
Transform file into RDF
bin/geotriples-all dump_rdf -o <output_file> -b http://example.com (-sh <shp file>) <path_to_the_mapping_file (.ttl)>
See more at Wiki pages
Execution by source
Clone this repository and install the source code by using
mvn package
Generate Mapping files:
java -cp <geotriples-core/ dependencies jar> eu.linkedeodata.geotriples.GeoTriplesCMD generate_mapping -o <output file(.ttl)> -b <URI base> <input file>
- -o output_file the name of the produced mapping file (RML/R2RML)
- -b URI_base the base URI that will describe the entities
- use the option -rml to force the generation of an RML file
Transform file into RDF
java -cp <geotriples-core/ dependencies jar> eu.linkedeodata.geotriples.GeoTriplesCMD dump_rdf -o <output file> -b <URI base> (-sh <shp file>) (-rml) <(produced) mapping file (.ttl)>
- -o output_file the path of the produced file
- -b URI_base the base URI that will describe the entities
- -sh shp_file if the input is a shapefile specify the .shp path using this flag
- use the -rml option if the input mapping file is expected to be an RML file, required for CSV files
GeoTriples-Spark
GeoTriples-Spark is an extended version of GeoTriples capable of transforming big geospatial data into RDF graphs. To enable the transformation of big geospatial data, we extended GeoTriples to run on top of Apache Spark and Hadoop or Hops (a new distribution of Apache Hadoop developed by KTH, RISE SICS, and Logical Clocks AB). GeoTriples-Spark can run in a standalone machine or in a Hadoop based cluster, but it is more efficient when it runs on Hops as it is a write-intensive application. GeoTriples-Sparks supports the transformation of CSV, GeoJSON and Shapefiles. You can examine the performance of GeoTriples-Spark in ISWC-experiments
Requirements
- Java 8
- Maven 3
- Apache Spark 2.4.0 or greater
- Apache Hadoop 2.7.0 or Hops
Build
mvn package
Execute
spark-submit --class eu.linkedeodata.geotriples.GeoTriplesCMD <geotriples-core/ dependencies jar> spark -i <in_file> -o <out_folder> <rml>
-i input_file: path to input dataset. You can enter multiple files, separated by ","
-o out_folder: path to the folder where the results will be stored. In case the folder exists, a new folder inside it will be created.
The rml indicates to the RML mapping file, produced by the generate_mapping procedure of GeoTriples.
Additional flags
-m mode: set the transformation mode. It can be either
partitionorrow(default mode). In thepartitionmode the RDF triples are written to the target file after the transformation of the whole partition. In therowmode, each record is transformed into RDF triples which are directly written to the target files. For small datasets thepartitionmode is faster, but we advise to use therowmode as it is more memory friendly.-r partitions: re-partition the input dataset. WARNING re-partitionig triggers data shuffling and therefore it can negative effects in the performance.
-sh folder_path: Load multiple ESRI shapefiles, that exist in the
folder_path(each one must be stored in a separate folder). For example the structure of the folder must look like:folder_path/shapefile1/shapefile1.(shp, dbf, shx, etc) folder_path/shapefile2/shapefile2.(shp, dbf, shx, etc) ...For each Shapefile, a different RDF dataset will be created. Furthermore, the RML mapping file must support all the input datasets.
-times n: Load the input dataset "n" times.
help: Print instrcuctions
Owner
- Name: AI Team - University of Athens
- Login: AI-team-UoA
- Kind: organization
- Email: ai.team@di.uoa.gr
- Location: Greece
- Website: https://ai.di.uoa.gr
- Twitter: AITeamUoA
- Repositories: 16
- Profile: https://github.com/AI-team-UoA
We work on various topics of AI. The team has published numerous influential papers and contributed with key technologies in the field.
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Issues and Pull Requests
Last synced: almost 2 years ago
All Time
- Total issues: 14
- Total pull requests: 21
- Average time to close issues: 11 months
- Average time to close pull requests: 12 months
- Total issue authors: 13
- Total pull request authors: 4
- Average comments per issue: 0.64
- Average comments per pull request: 0.24
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 18
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- p1d1d1 (2)
- davidshumway (1)
- ktk (1)
- guojing5 (1)
- tioannid (1)
- rapw3k (1)
- xeon88 (1)
- AlejoSalvo95 (1)
- dimitrianos (1)
- mathib (1)
- LukeKaim (1)
- Montanaz0r (1)
- selimsagir (1)
Pull Request Authors
- dependabot[bot] (18)
- retog (1)
- jglouis (1)
- lewismc (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- eu.linkedeodata.geotriples:geotriples-spark 1.1.6-SNAPSHOT compile
- be.ugent.mmlab.rml:geotriples-rml 1.1.6-SNAPSHOT
- eu.linkedeodata.geotriples:geotriples-evaluation 1.1.6-SNAPSHOT
- org.codehaus.mojo.appassembler:appassembler-booter 1.10
- org.d2rq:geotriples-r2rml 1.1.6-SNAPSHOT
- eu.linkedeodata.geotriples:geotriples-core 1.1.6-SNAPSHOT
- org.apache.pivot:pivot-core 2.0.5
- org.apache.pivot:pivot-web 2.0.5
- org.apache.pivot:pivot-wtk 2.0.5
- org.apache.pivot:pivot-wtk-terra 2.0.5
- org.codehaus.mojo.appassembler:appassembler-booter 2.0.0
- be.ugent.mmlab.rml:geotriples-rml 1.1.6-SNAPSHOT
- org.codehaus.mojo.appassembler:appassembler-booter 1.10
- org.d2rq:geotriples-r2rml 1.1.6-SNAPSHOT
- org.eclipse.rdf4j:rdf4j-runtime 2.0.2
- org.openrdf.sesame:sesame-queryparser-sparql 2.8.4
- javax.activation:activation 1.1.1 compile
- org.geotools.xsd:gt-xsd-gml2 20.0 compile
- eu.linkedeodata.geotriples:geotriples-evaluation 1.1.6-SNAPSHOT
- jgraph:jgraph 5.13.0.0
- jgrapht:jgrapht 0.7.3
- net.antidot:db2triples 1.0.2
- org.apache.xmlbeans:xmlbeans 2.6.0
- org.datasyslab:geospark 1.2.0 compile
- com.jayway.jsonpath:json-path 0.8.1
- commons-io:commons-io 2.0.1
- eu.linkedeodata.geotriples:geotriples-evaluation 1.1.6-SNAPSHOT
- in.jlibs:jlibs-xmldog 2.1
- jgraph:jgraph 5.13.0.0
- jgrapht:jgrapht 0.7.3
- net.antidot:db2triples 1.0.2
- net.sf.saxon:Saxon-HE 9.5.1-4
- net.sourceforge.javacsv:javacsv 2.0
- org.apache.hadoop:hadoop-common 2.7.3
- org.apache.spark:spark-core_2.11 2.4.0
- org.apache.spark:spark-sql_2.11 2.4.0
- org.apache.xmlbeans:xmlbeans 2.6.0
- org.openrdf.sesame:sesame-model 2.6.10
- xom:xom 1.2.5
- com.jayway.jsonpath:json-path-assert 0.9.1 test
- commons-codec:commons-codec 1.5
- commons-collections:commons-collections 3.2.1
- commons-lang:commons-lang 2.4
- joseki:joseki 3.3.4
- junit:junit 4.11
- monetdb:monetdb-jdbc-2.11 11.20.0-geo-LEO
- mysql:mysql-connector-java 5.1.22
- org.apache.httpcomponents:httpclient 4.1.2
- org.apache.httpcomponents:httpcore 4.1.3
- org.apache.jena:jena-arq 2.9.4
- org.apache.jena:jena-core 2.7.4
- org.apache.jena:jena-iri 0.9.4
- org.apache.velocity:velocity 1.7
- org.eclipse.jetty:jetty-webapp 8.1.8.v20121106
- org.gdal:gdal 1.11.2
- org.geotools.xsd:gt-xsd-kml 20.0
- org.geotools:gt-epsg-hsql 20.0
- org.geotools:gt-geojson 20.0
- org.geotools:gt-opengis 20.0
- org.geotools:gt-process 20.0
- org.geotools:gt-referencing 20.0
- org.geotools:gt-shapefile 20.0
- org.geotools:gt-xml 20.0
- org.hsqldb:hsqldb 2.2.9
- org.locationtech.jts.io:jts-io-common 1.16.0
- org.locationtech.jts:jts-core 1.16.0
- org.postgresql:postgresql 9.2-1002-jdbc4
- xerces:xercesImpl 2.11.0
- xml-apis:xml-apis 1.4.01
- be.ugent.mmlab.rml:geotriples-rml 1.1.6-SNAPSHOT compile
- org.apache.jena:jena-arq 3.10.0 compile
- io.hops:hadoop-client 2.8.2.10-RC0 provided
- com.databricks:spark-xml_2.11 0.5.0
- org.apache.jena:jena-arq 2.9.4
- org.apache.spark:spark-core_2.11 2.4.0
- org.apache.spark:spark-sql_2.11 2.4.0
- org.datasyslab:geospark 1.2.0
- org.datasyslab:geospark-sql_2.3 1.2.0