https://github.com/acdh-oeaw/arche-metadata-crawler

Utility tool to create/curate ARCHE-Metadata

https://github.com/acdh-oeaw/arche-metadata-crawler

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    1 of 2 committers (50.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.1%) to scientific vocabulary

Keywords

arche
Last synced: 6 months ago · JSON representation

Repository

Utility tool to create/curate ARCHE-Metadata

Basic Info
  • Host: GitHub
  • Owner: acdh-oeaw
  • License: mit
  • Language: PHP
  • Default Branch: master
  • Homepage:
  • Size: 3.37 MB
Statistics
  • Stars: 1
  • Watchers: 5
  • Forks: 0
  • Open Issues: 2
  • Releases: 41
Topics
arche
Created over 2 years ago · Last pushed 7 months ago
Metadata Files
Readme License

README.md

Metadata Crawler

Latest Stable Version Build status Coverage Status License

Functionality

A set of scripts:

  • Merging metadata of a collection from inputs in various formats
  • Validating the merged metadata
  • Generating XLSX metadata templates based on the current ontology (see the horizontal metadata files in metadata formats description)

used for the metadata curation during ARCHE ingestions.

Installation

Locally

  • Install PHP and composer
  • Run: bash composer require acdh-oeaw/arche-metadata-crawler

As a docker image

  • Install docker.
  • Run the acdhch/arche-ingest image mounting your data directory into it: bash docker run --rm -ti --entrypoint bash -u `id -u`:`id -g` \ -v pathToYourDataDir:/data \ acdhch/arche-ingest
  • Run the scripts, e.g. bash arche-create-metadata-template /data all and arche-crawl-meta \ /data/metadata \ /data/merged.ttl \ /ARCHE/staging/GlaserDiaries_16674/data \ https://id.acdh.oeaw.ac.at/glaserdiaries
    • if you need the file-checker, you can just run it with arche-filechecker

On ACDH Cluster

Nothing to be done. It is installed there already.

Usage

(For a full walk-trough using arche-ingestion@acdh-cluster and the Wollmilchsau test collection please look here)

On ACDH Cluster

First, get the arche-ingestion workload console as described here

Then:

  • Generate and validate the metadata:

    • Run the arche-crawl-meta script: bash /ARCHE/vendor/bin/arche-crawl-meta \ <pathToMetadataDirectory> \ --filecheckerReportDir <pathToTheFileCheckerReportDirectory> \ <outputTtlPath> \ <basePathOfTheCollection> \ <idPrefix> \ 2>&1 | tee <pathToLogFile> e.g. bash /ARCHE/vendor/bin/arche-crawl-meta \ /ARCHE/staging/GustavMahlerArchiv_22334/metadata \ --filecheckerReportDir /ARCHE/staging/GustavMahlerArchiv_22334/checkReports/2024_04_08_09_19_24 \ /ARCHE/staging/GustavMahlerArchiv_22334/scriptFiles/metadata.ttl \ /ARCHE/staging/GustavMahlerArchiv_22334/data \ https://id.acdh.oeaw.ac.at/GustavMahlerArchiv \ 2>&1 | tee /ARCHE/staging/GustavMahlerArchiv_22334/scriptFiles/metadata.log
    • If you are want to skip the checks (which speeds up the process significantly), add the --noCheck parameter, e.g. ```bash /ARCHE/vendor/bin/arche-crawl-meta \ /ARCHE/staging/GustavMahlerArchiv22334/metadata \ --filecheckerReportDir /ARCHE/staging/GustavMahlerArchiv22334/checkReports/20240408091924 \ /ARCHE/staging/GustavMahlerArchiv22334/scriptFiles/metadata.ttl \ /ARCHE/staging/GustavMahlerArchiv22334/data \ https://id.acdh.oeaw.ac.at/GustavMahlerArchiv \ --noCheck \ 2>&1 | tee /ARCHE/staging/GustavMahlerArchiv22334/scriptFiles/metadata.log

    ```

  • Create metadata templates: bash /ARCHE/vendor/bin/arche-create-metadata-template \ <pathToDirectoryWhereTemplateShouldBeCreated> \ all e.g. to create templates in the current directory bash /ARCHE/vendor/bin/arche-create-metadata-template . all

Locally

  • Generating and validaing the metadata: bash vendor/bin/arche-crawl-meta \ --filecheckerReportDir pathToDirectoryWithFilecheckerOutput \ pathToInputMetadataDir \ mergedMetadataFilePath \ pathToCollectionData \ pathToTargetMetadataFile e.g. bash vendor/bin/arche-crawl-meta \ --filecheckerReportDir reports/2024_03_01_12_45_23 \ metaDir \ metadata.ttl \ `pwd`/data \ https://id.acdh.oeaw.ac.at/myCollection
  • Creating metadata templates: bash vendor/bin/arche-create-metadata-template \ <pathToDirectoryWhereTemplateShouldBeCreated> \ all e.g. to create templates in the current directory bash bin/arche-create-metadata-template . all

Remarks:

  • To get a list of all available parameters run: bash vendor/bin/arche-crawl-meta --help vendor/bin/arche-create-metadata-template --help

As a docker container

  • Generating and validaing the metadata:
    Run a container mounting directory structure inside the container and overridding the command to be run with the arche-crawl-meta: bash docker run \ --rm -u `id -u`:`id -g`\ -v pathInHost:/mnt \ --entrypoint arche-crawl-meta \ acdhch/arche-ingest \ --filecheckerReportDir pathToDirectoryWithFilecheckerOutput \ pathToInputMetadataDir \ mergedMetadataFilePath \ pathToCollectionData \ pathToTargetMetadataFile e.g. to use with pahts relatively to the current working directory bash docker run \ --rm -u `id -u`:`id -g`\ -v `pwd`:/mnt \ --entrypoint arche-crawl-meta \ acdhch/arche-ingest \ --filecheckerReportDir /mnt/reports/2024_03_01_12_45_23 \ /mnt/metaDir \ /mnt/metadata.ttl \ /mnt/data \ https://id.acdh.oeaw.ac.at/myCollection
  • Creating metadata templates:
    Run a container mounting directory where templates should be created under /mnt inside the container and overridding the command to be run with the arche-create-metadata-template: bash docker run \ --rm -u `id -u`:`id -g`\ -v pathToDirectoryWhereTemplateShouldBeCreated:/mnt \ --entrypoint arche-create-metadata-template acdhch/arche-ingest \ /mnt all e.g. to create the templates in the current directory bash docker run \ --rm -u `id -u`:`id -g` \ -v `pwd`:/mnt \ --entrypoint arche-create-metadata-template \ acdhch/arche-ingest \ /mnt all

Owner

  • Name: Austrian Centre for Digital Humanities & Cultural Heritage
  • Login: acdh-oeaw
  • Kind: organization
  • Email: acdh@oeaw.ac.at
  • Location: Vienna, Austria

GitHub Events

Total
  • Create event: 9
  • Release event: 7
  • Issues event: 11
  • Issue comment event: 5
  • Push event: 12
Last Year
  • Create event: 9
  • Release event: 7
  • Issues event: 11
  • Issue comment event: 5
  • Push event: 12

Committers

Last synced: 6 months ago

All Time
  • Total Commits: 100
  • Total Committers: 2
  • Avg Commits per committer: 50.0
  • Development Distribution Score (DDS): 0.01
Past Year
  • Commits: 23
  • Committers: 1
  • Avg Commits per committer: 23.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Mateusz Żółtak z****k@z****g 99
Stuhec S****c@o****t 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 15
  • Total pull requests: 0
  • Average time to close issues: 16 days
  • Average time to close pull requests: N/A
  • Total issue authors: 2
  • Total pull request authors: 0
  • Average comments per issue: 0.33
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 8
  • Pull requests: 0
  • Average time to close issues: about 3 hours
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 0.25
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • zozlak (14)
  • linxOD (1)
Pull Request Authors
Top Labels
Issue Labels
enhancement (5) bug (5)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • packagist 201 total
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 42
  • Total maintainers: 1
packagist.org: acdh-oeaw/arche-metadata-crawler

Script and library for checking and generating ARCHE metadata in ACDH schema

  • Versions: 42
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 201 Total
Rankings
Dependent packages count: 19.1%
Forks count: 27.1%
Dependent repos count: 33.9%
Stargazers count: 37.1%
Average: 41.7%
Downloads: 91.0%
Maintainers (1)
Funding
Last synced: 6 months ago

Dependencies

.github/workflows/build.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/test.yml actions
  • actions/checkout v3 composite
Dockerfile docker
  • php 8.1-cli build
composer.json packagist
  • phpstan/phpstan * development
  • acdh-oeaw/arche-assets ^3.9.4
  • acdh-oeaw/arche-lib dev-rdfInterface
  • acdh-oeaw/arche-lib-ingest dev-rdfInterface
  • acdh-oeaw/arche-lib-schema dev-master
  • acdh-oeaw/uri-normalizer dev-master
  • php >=8.1 <8.2
  • phpoffice/phpspreadsheet ^1.29
  • sweetrdf/quick-rdf dev-master
  • sweetrdf/quick-rdf-io dev-master
  • sweetrdf/rdf-interface ^2.0.0-RC6
  • zozlak/argparse ^1.0
  • zozlak/logging ^1.0