dataasee

DatAasee - A Metadata-Lake for Libraries

https://github.com/ulbmuenster/dataasee

Science Score: 75.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
    Organization ulbmuenster has institutional domain (www.ulb.uni-muenster.de)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.9%) to scientific vocabulary

Keywords

data-catalog data-engineering data-lake data-lakehouse datacite library-catalogue marc21 metadata metadata-catalog metadata-lake metadata-management metadata-mapping metalake oai-pmh
Last synced: 6 months ago · JSON representation ·

Repository

DatAasee - A Metadata-Lake for Libraries

Basic Info
Statistics
  • Stars: 14
  • Watchers: 3
  • Forks: 2
  • Open Issues: 0
  • Releases: 1
Topics
data-catalog data-engineering data-lake data-lakehouse datacite library-catalogue marc21 metadata metadata-catalog metadata-lake metadata-management metadata-mapping metalake oai-pmh
Created over 1 year ago · Last pushed 10 months ago
Metadata Files
Readme Changelog License Citation

README.md

DatAasee (0.3)

DatAasee schematic

Repository: github.com/ulbmuenster/dataasee (nb sources backup)

Maintainer: Christian Himpe (at University and State Library of Münster)

Licenses: MIT (add. CC-BY for openapi.yaml)

Function: Metadata-Lake, Metadata Catalog, Metadata Aggregator, Union Catalog

Audience: University Libraries, Research Libraries, Academic Libraries, Scientific Libraries

Tech Stack Canvas

  • Setting: Many distributed data and metadata sources
  • Goals:
    • Centralize metadata
    • Interlinked metadata catalog
    • Super-index for bibliographic and research data
  • Features:
    • Interact through HTTP-API (JSON)
    • Search by filter or full-text
    • Custom query via: SQL, Gremlin, Cypher, MQL, GraphQL
  • Frontend: Lowdefy
  • Backend: Connect (Benthos)
  • Data Storage: ArcadeDB
  • Infrastructure: Compose (via Docker or Podman)
  • Deployment: via Harbor (at Uni Münster)
  • Monitoring: Prometheus
  • Integrations:
    • Protocols: OAI-PMH (HTTP), S3 (HTTP), GET (HTTP), DatAasee (HTTP)
    • Encodings: XML (Plain-Text)
    • Formats: DataCite (XML), DC (XML), LIDO (XML), MARC (XML), MODS (XML)
  • Security: Priviledged endpoints (CQRS)
  • Testing: check-jsonschema
  • Development: Github

Documentation

Getting Started (Deployment)

  • Depends on docker-compose (and compatible to docker and podman)
  • To deploy, no need to clone, just use the compose.yaml file.
  • See the Deploy Documentation for details.

Quick Start: shell $ wget https://raw.githubusercontent.com/ulbmuenster/dataasee/0.3/compose.yaml $ mkdir -p backup $ DB_PASS=password1 DL_PASS=password2 docker compose up -d

Default Ports

  • 8343 DatAasee API
  • 2480 Database API (Development Only)
  • 9999 Database JMX (Development Only)
  • 8000 Web Frontend (Development Only)
  • 80 Web Frontend (Deployment Only)

Repository Contents

  • api/ - API definition and message schemas
  • assets/ - Logos and style definition
  • backend/ - Processor pipeline and component definitions
  • container/ - Dockerfiles
  • database/ - Database initialization, schemas and enumerated data
  • docs/ - Documentation of software, data and architecture
  • frontend/ - Prototype frontend definition
  • tests/ - Test definitions and data

Getting Started (Development)

  • Available make targets:
    • make setup Build server images
    • make start Start servers
    • make stop Stop servers
    • make reset Stop and start servers
    • make empty Delete database backups (requires priviledges)
    • make logs Show logs (requires grep)
    • make peak Report peak database memory usage (requires grep)
    • make test Run tests (requires check-jsonschema, busybox, wget)
    • make tidy List violations of StrictYAML (requires yamllint)
    • make todo List inline TODOs in repo (requires grep)
  • Custom make variable: COMPOSE

Contributors

Owner

  • Name: Universitäts- und Landesbibliothek Münster
  • Login: ulbmuenster
  • Kind: organization
  • Location: Muenster

Citation (CITATION.cff)

cff-version: 1.2.0
title: DatAasee
message: In Development
type: software
authors:
  - given-names: Christian
    family-names: Himpe
    orcid: 'https://orcid.org/0000-0003-2194-6754'
    affiliation: University of Münster
  - given-names: Philipp
    family-names: Kuschat
    affiliation: University of Münster
  - given-names: Holger
    family-names: Przibytzin
    affiliation: University of Münster
  - given-names: Marc
    family-names: Schutzeichel
    affiliation: University of Münster
  - given-names: Jan-Erik
    family-names: Stange
    affiliation: University of Münster
abstract: A metadata-lake for libraries
keywords:
  - Metadata Lake
  - Metadata Catalog
  - Metadata Management
  - Data Catalog
  - Data Engineering
license: MIT
version: '0.3'

GitHub Events

Total
  • Release event: 1
  • Watch event: 9
  • Push event: 1
  • Create event: 2
Last Year
  • Release event: 1
  • Watch event: 9
  • Push event: 1
  • Create event: 2