https://github.com/aphp/spark-etl

Better bridge apache spark and postgresql

https://github.com/aphp/spark-etl

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (1.4%) to scientific vocabulary

Keywords

etl postgresql spark
Last synced: 5 months ago · JSON representation

Repository

Better bridge apache spark and postgresql

Basic Info
  • Host: GitHub
  • Owner: aphp
  • License: apache-2.0
  • Language: Scala
  • Default Branch: master
  • Size: 6.85 MB
Statistics
  • Stars: 23
  • Watchers: 4
  • Forks: 8
  • Open Issues: 10
  • Releases: 0
Topics
etl postgresql spark
Created about 7 years ago · Last pushed over 2 years ago
Metadata Files
Readme License

README.md

SPARK-ETL

This repository contains several modules around ETL processes with a focus on scalability and quality. It is based on various technologies among:

  • apache SPARK
  • apache HIVE
  • apache SOLR
  • PostgreSQL

Owner

  • Name: Greater Paris University Hospitals (AP-HP)
  • Login: aphp
  • Kind: organization
  • Location: Paris

GitHub Events

Total
Last Year

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 558
  • Total Committers: 11
  • Avg Commits per committer: 50.727
  • Development Distribution Score (DDS): 0.52
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Rasmey SARETH r****h@g****m 268
parisni n****s@r****t 162
Jean-François YUEN j****n@d****m 72
Joseph Allemandou j****u@g****m 19
tlama t****a@g****m 9
Adrien Lavoillotte a****e@d****m 8
LAMA t****a@c****m 7
Nicolas Paris n****s@a****r 6
Saad ELBASSITI s****t@a****r 3
Adrien Lavoillotte s****c@f****r 3
saad elba e****d@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 15
  • Total pull requests: 20
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 3 months
  • Total issue authors: 3
  • Total pull request authors: 2
  • Average comments per issue: 0.27
  • Average comments per pull request: 0.85
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 19
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • parisni (13)
  • selmi2 (1)
  • sgorantla (1)
Pull Request Authors
  • dependabot[bot] (19)
  • ivan-veselovsky (1)
Top Labels
Issue Labels
enhancement (8)
Pull Request Labels
dependencies (19) java (6) javascript (1)

Dependencies

pom.xml maven
  • com.opentable.components:otj-pg-embedded 0.13.3 provided
  • com.sksamuel.scapegoat:scalac-scapegoat-plugin_2.11 provided
  • org.apache.hadoop:hadoop-client 2.6.5 provided
  • org.apache.hadoop:hadoop-common 2.6.5 provided
  • org.apache.spark:spark-core_2.11 2.4.3 provided
  • org.apache.spark:spark-sql_2.11 2.4.3 provided
  • com.amazon.deequ:deequ 1.0.2
  • com.esotericsoftware:kryo-shaded 4.0.2
  • com.fasterxml.jackson.core:jackson-core 2.6.7
  • com.fasterxml.jackson.core:jackson-databind 2.6.7
  • com.fasterxml.jackson.module:jackson-module-scala_2.11 2.6.7
  • com.github.tomakehurst:wiremock 1.56
  • com.sksamuel.scapegoat:scalac-scapegoat-plugin_2.11 1.3.10
  • com.typesafe.scala-logging:scala-logging_2.11 3.9.2
  • de.bytefish:pgbulkinsert 4.1
  • io.delta:delta-core_2.11 0.6.1
  • io.frama.parisni:spark-csv 1.0.13-SNAPSHOT
  • io.frama.parisni:spark-dataframe 1.0.13-SNAPSHOT
  • io.frama.parisni:spark-hive 1.0.13-SNAPSHOT
  • io.frama.parisni:spark-meta 1.0.13-SNAPSHOT
  • io.frama.parisni:spark-postgres 1.0.13-SNAPSHOT
  • io.frama.parisni:spark-quality 1.0.13-SNAPSHOT
  • io.frama.parisni:spark-sync 1.0.13-SNAPSHOT
  • net.jcazevedo:moultingyaml_2.11 0.4.1
  • org.apache.solr:solr-core 8.5.1
  • org.apache.solr:solr-solrj 8.5.1
  • org.apache.solr:solr-test-framework 8.5.1
  • org.joda:joda-convert 1.2
  • org.postgresql:postgresql 42.2.5
  • org.scala-lang:scala-library 2.11.11
  • com.lucidworks.spark:spark-solr 3.7.6 test
  • com.opentable.components:otj-pg-embedded 0.13.3 test
  • junit:junit test
  • junit:junit 4.13 test
  • org.apache.spark:spark-catalyst_2.11 2.4.3 test
  • org.apache.spark:spark-core_2.11 2.4.3 test
  • org.apache.spark:spark-sql_2.11 2.4.3 test
  • org.junit.jupiter:junit-jupiter-engine 5.1.0 test
  • org.scalatest:scalatest_2.11 test
  • org.scalatest:scalatest_2.11 3.0.8 test
spark-csv/pom.xml maven
  • io.frama.parisni:spark-dataframe
spark-dataframe/pom.xml maven
  • io.delta:delta-core_${scala.tools.version}
  • io.frama.parisni:spark-quality
spark-hive/pom.xml maven
  • io.delta:delta-core_${scala.tools.version}
  • io.frama.parisni:spark-dataframe
  • io.frama.parisni:spark-postgres
  • net.jcazevedo:moultingyaml_${scala.tools.version}
spark-meta/pom.xml maven
  • com.opentable.components:otj-pg-embedded
  • io.frama.parisni:spark-csv
  • io.frama.parisni:spark-dataframe
  • io.frama.parisni:spark-postgres
  • net.jcazevedo:moultingyaml_${scala.tools.version}
spark-postgres/pom.xml maven
  • com.opentable.components:otj-pg-embedded provided
  • org.apache.spark:spark-core_${scala.tools.version} provided
  • org.apache.spark:spark-sql_${scala.tools.version} provided
  • de.bytefish:pgbulkinsert
  • io.frama.parisni:spark-dataframe
  • org.postgresql:postgresql
  • com.opentable.components:otj-pg-embedded test
  • org.apache.spark:spark-sql_${scala.tools.version} test
  • org.scalatest:scalatest_${scala.tools.version} test
spark-quality/pom.xml maven
  • com.amazon.deequ:deequ
  • net.jcazevedo:moultingyaml_${scala.tools.version}
spark-sync/pom.xml maven
  • org.apache.solr:solr-core compile
  • com.esotericsoftware:kryo-shaded
  • com.fasterxml.jackson.core:jackson-core
  • com.fasterxml.jackson.core:jackson-databind
  • com.fasterxml.jackson.module:jackson-module-scala_${scala.tools.version}
  • io.delta:delta-core_${scala.tools.version}
  • io.frama.parisni:spark-dataframe
  • io.frama.parisni:spark-postgres
  • net.jcazevedo:moultingyaml_${scala.tools.version}
  • org.apache.solr:solr-solrj
  • org.postgresql:postgresql
  • com.github.tomakehurst:wiremock test
  • com.lucidworks.spark:spark-solr test
  • com.opentable.components:otj-pg-embedded test
  • org.apache.solr:solr-test-framework test
  • org.apache.spark:spark-sql_${scala.tools.version} test
spark-meta-frontend/package-lock.json npm
  • 1518 dependencies
spark-meta-frontend/package.json npm
  • @emotion/core ^10.0.28
  • @emotion/styled ^10.0.27
  • @material-ui/core ^4.9.7
  • @material-ui/icons ^4.9.1
  • @material-ui/lab ^4.0.0-alpha.48
  • @projectstorm/react-diagrams ^6.0.2
  • @projectstorm/react-diagrams-core ^6.0.2
  • @testing-library/jest-dom ^4.2.4
  • @testing-library/react ^9.5.0
  • @testing-library/user-event ^7.2.1
  • closest 0.0.1
  • dagre ^0.8.5
  • express ^4.17.1
  • mathjs ^6.6.1
  • pathfinding ^0.4.18
  • paths-js ^0.4.10
  • pg ^8.0.3
  • react ^16.13.1
  • react-dom ^16.13.1
  • react-router-dom ^5.1.2
  • react-scripts 3.4.1
  • resize-observer-polyfill ^1.5.1
  • tmp ^0.2.1
  • typescript ^3.8.3
spark-meta-frontend/Dockerfile docker
  • debian buster-slim build
spark-query/pom.xml maven