https://github.com/acdh-oeaw/arche-metadata-crawler
Utility tool to create/curate ARCHE-Metadata
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
1 of 2 committers (50.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.1%) to scientific vocabulary
Keywords
Repository
Utility tool to create/curate ARCHE-Metadata
Basic Info
Statistics
- Stars: 1
- Watchers: 5
- Forks: 0
- Open Issues: 2
- Releases: 41
Topics
Metadata Files
README.md
Metadata Crawler
Functionality
A set of scripts:
- Merging metadata of a collection from inputs in various formats
- Validating the merged metadata
- Generating XLSX metadata templates based on the current ontology (see the horizontal metadata files in metadata formats description)
used for the metadata curation during ARCHE ingestions.
Installation
Locally
- Install PHP and composer
- Run:
bash composer require acdh-oeaw/arche-metadata-crawler
As a docker image
- Install docker.
- Run the
acdhch/arche-ingestimage mounting your data directory into it:bash docker run --rm -ti --entrypoint bash -u `id -u`:`id -g` \ -v pathToYourDataDir:/data \ acdhch/arche-ingest - Run the scripts, e.g.
bash arche-create-metadata-template /data allandarche-crawl-meta \ /data/metadata \ /data/merged.ttl \ /ARCHE/staging/GlaserDiaries_16674/data \ https://id.acdh.oeaw.ac.at/glaserdiaries- if you need the file-checker,
you can just run it with
arche-filechecker
- if you need the file-checker,
you can just run it with
On ACDH Cluster
Nothing to be done. It is installed there already.
Usage
(For a full walk-trough using arche-ingestion@acdh-cluster and the Wollmilchsau test collection please look here)
On ACDH Cluster
First, get the arche-ingestion workload console as described here
Then:
Generate and validate the metadata:
- Run the
arche-crawl-metascript:bash /ARCHE/vendor/bin/arche-crawl-meta \ <pathToMetadataDirectory> \ --filecheckerReportDir <pathToTheFileCheckerReportDirectory> \ <outputTtlPath> \ <basePathOfTheCollection> \ <idPrefix> \ 2>&1 | tee <pathToLogFile>e.g.bash /ARCHE/vendor/bin/arche-crawl-meta \ /ARCHE/staging/GustavMahlerArchiv_22334/metadata \ --filecheckerReportDir /ARCHE/staging/GustavMahlerArchiv_22334/checkReports/2024_04_08_09_19_24 \ /ARCHE/staging/GustavMahlerArchiv_22334/scriptFiles/metadata.ttl \ /ARCHE/staging/GustavMahlerArchiv_22334/data \ https://id.acdh.oeaw.ac.at/GustavMahlerArchiv \ 2>&1 | tee /ARCHE/staging/GustavMahlerArchiv_22334/scriptFiles/metadata.log - If you are want to skip the checks (which speeds up the process significantly), add the
--noCheckparameter, e.g. ```bash /ARCHE/vendor/bin/arche-crawl-meta \ /ARCHE/staging/GustavMahlerArchiv22334/metadata \ --filecheckerReportDir /ARCHE/staging/GustavMahlerArchiv22334/checkReports/20240408091924 \ /ARCHE/staging/GustavMahlerArchiv22334/scriptFiles/metadata.ttl \ /ARCHE/staging/GustavMahlerArchiv22334/data \ https://id.acdh.oeaw.ac.at/GustavMahlerArchiv \ --noCheck \ 2>&1 | tee /ARCHE/staging/GustavMahlerArchiv22334/scriptFiles/metadata.log
```
- Run the
Create metadata templates:
bash /ARCHE/vendor/bin/arche-create-metadata-template \ <pathToDirectoryWhereTemplateShouldBeCreated> \ alle.g. to create templates in the current directorybash /ARCHE/vendor/bin/arche-create-metadata-template . all
Locally
- Generating and validaing the metadata:
bash vendor/bin/arche-crawl-meta \ --filecheckerReportDir pathToDirectoryWithFilecheckerOutput \ pathToInputMetadataDir \ mergedMetadataFilePath \ pathToCollectionData \ pathToTargetMetadataFilee.g.bash vendor/bin/arche-crawl-meta \ --filecheckerReportDir reports/2024_03_01_12_45_23 \ metaDir \ metadata.ttl \ `pwd`/data \ https://id.acdh.oeaw.ac.at/myCollection - Creating metadata templates:
bash vendor/bin/arche-create-metadata-template \ <pathToDirectoryWhereTemplateShouldBeCreated> \ alle.g. to create templates in the current directorybash bin/arche-create-metadata-template . all
Remarks:
- To get a list of all available parameters run:
bash vendor/bin/arche-crawl-meta --help vendor/bin/arche-create-metadata-template --help
As a docker container
- Generating and validaing the metadata:
Run a container mounting directory structure inside the container and overridding the command to be run with the arche-crawl-meta:bash docker run \ --rm -u `id -u`:`id -g`\ -v pathInHost:/mnt \ --entrypoint arche-crawl-meta \ acdhch/arche-ingest \ --filecheckerReportDir pathToDirectoryWithFilecheckerOutput \ pathToInputMetadataDir \ mergedMetadataFilePath \ pathToCollectionData \ pathToTargetMetadataFilee.g. to use with pahts relatively to the current working directorybash docker run \ --rm -u `id -u`:`id -g`\ -v `pwd`:/mnt \ --entrypoint arche-crawl-meta \ acdhch/arche-ingest \ --filecheckerReportDir /mnt/reports/2024_03_01_12_45_23 \ /mnt/metaDir \ /mnt/metadata.ttl \ /mnt/data \ https://id.acdh.oeaw.ac.at/myCollection - Creating metadata templates:
Run a container mounting directory where templates should be created under/mntinside the container and overridding the command to be run with the arche-create-metadata-template:bash docker run \ --rm -u `id -u`:`id -g`\ -v pathToDirectoryWhereTemplateShouldBeCreated:/mnt \ --entrypoint arche-create-metadata-template acdhch/arche-ingest \ /mnt alle.g. to create the templates in the current directorybash docker run \ --rm -u `id -u`:`id -g` \ -v `pwd`:/mnt \ --entrypoint arche-create-metadata-template \ acdhch/arche-ingest \ /mnt all
Owner
- Name: Austrian Centre for Digital Humanities & Cultural Heritage
- Login: acdh-oeaw
- Kind: organization
- Email: acdh@oeaw.ac.at
- Location: Vienna, Austria
- Website: https://www.oeaw.ac.at/acdh
- Repositories: 476
- Profile: https://github.com/acdh-oeaw
GitHub Events
Total
- Create event: 9
- Release event: 7
- Issues event: 11
- Issue comment event: 5
- Push event: 12
Last Year
- Create event: 9
- Release event: 7
- Issues event: 11
- Issue comment event: 5
- Push event: 12
Committers
Last synced: 6 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Mateusz Żółtak | z****k@z****g | 99 |
| Stuhec | S****c@o****t | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 15
- Total pull requests: 0
- Average time to close issues: 16 days
- Average time to close pull requests: N/A
- Total issue authors: 2
- Total pull request authors: 0
- Average comments per issue: 0.33
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 8
- Pull requests: 0
- Average time to close issues: about 3 hours
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 0.25
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- zozlak (14)
- linxOD (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- packagist 201 total
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 42
- Total maintainers: 1
packagist.org: acdh-oeaw/arche-metadata-crawler
Script and library for checking and generating ARCHE metadata in ACDH schema
- Homepage: https://github.com/acdh-oeaw/arche-metadata-crawler
- License: MIT
-
Latest release: 0.14.7
published 7 months ago
Rankings
Maintainers (1)
Funding
Dependencies
- actions/checkout v3 composite
- docker/login-action v2 composite
- actions/checkout v3 composite
- php 8.1-cli build
- phpstan/phpstan * development
- acdh-oeaw/arche-assets ^3.9.4
- acdh-oeaw/arche-lib dev-rdfInterface
- acdh-oeaw/arche-lib-ingest dev-rdfInterface
- acdh-oeaw/arche-lib-schema dev-master
- acdh-oeaw/uri-normalizer dev-master
- php >=8.1 <8.2
- phpoffice/phpspreadsheet ^1.29
- sweetrdf/quick-rdf dev-master
- sweetrdf/quick-rdf-io dev-master
- sweetrdf/rdf-interface ^2.0.0-RC6
- zozlak/argparse ^1.0
- zozlak/logging ^1.0