irdm_harvester
Automatically harvest publications for an InvenioRDM repository
Science Score: 65.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
✓Institutional organization owner
Organization caltechlibrary has institutional domain (www.library.caltech.edu) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.2%) to scientific vocabulary
Repository
Automatically harvest publications for an InvenioRDM repository
Basic Info
Statistics
- Stars: 3
- Watchers: 5
- Forks: 0
- Open Issues: 2
- Releases: 2
Metadata Files
README.md
InvenioRDM Harvester
This is a harvester that can automatically collect and submit works to an InvenioRDM repository. It currently works with the CaltechAUTHORS repository and looks at CrossRef and ORCID.
Table of contents
- Introduction
- Installation
- Usage
- Known issues and limitations
- Getting help
- Contributing
- License
- Authors and history
- Acknowledgments
Introduction
Currently harvesting:
- CrossRef by ROR
- ORCID
- CrossRef DOIs
Usage
The harvests are typically run through GitHub actions but could also be run on the command line.
You need to have a CaltechAUTHORS token available in the environment variable
RDMTOK. For a CrossRef ROR harvest type
bash
python harvest.py crossref
You can harvest a specific DOI with
bash
python harvest.py -doi 10.7717/peerj-cs.1023
For an ORCID harvest type:
bash
python harvest.py orcid -orcid 0000-0001-9266-5146
For all harvests there is an -actor flag, which gets included in the message when the record is added to the queue.
Installation
For command line use you need the latest version of irdmtools installed:
curl https://caltechlibrary.github.io/irdmtools/installer.sh | sh
Then install the python requirements with
pip install -r requirements.txt
Known issues and limitations
While this approach should work for any InvenioRDM repository, it has only been tested on CaltechAUTHORS. If you're interested in using this with a different repository reach out as we would be happy to make it a bit more flexible.
Publishers use a wide variety of urls for licenses. We are currently adding variants to the license.csv file, which is a custom file that connects urls to the InvenioRDM license names. It is almost certainly incomplete.
Getting help
Open an issue in the issue tab.
Contributing
Pull requests are appreciated.
License
Software produced by the Caltech Library is Copyright © 2022 California Institute of Technology. This software is freely distributed under a BSD-style license. Please see the LICENSE file for more information.
Authors and history
GitHub action created by Tom Morrell. Robert Doiel and Tom Morrell wrote the source irdmtools package.
Acknowledgments
This work was funded by the California Institute of Technology Library.
Owner
- Name: Caltech Library
- Login: caltechlibrary
- Kind: organization
- Email: helpdesk@library.caltech.edu
- Location: Pasadena, CA 91125
- Website: https://www.library.caltech.edu/
- Repositories: 84
- Profile: https://github.com/caltechlibrary
We manage the physical and digital holdings of the California Institute of Technology, provide services and training, and develop open-source software.
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: irdm_harvester
authors:
- family-names: Morrell
given-names: Thomas E
orcid: https://orcid.org/0000-0001-9266-5146
abstract: Automatic harvester for adding content to an InvenioRDM repository.
repository-code: "https://github.com/caltechlibrary/irdm_harvester"
type: software
version: 0.2.0
license-url: "https://data.caltech.edu/license"
keywords:
- metadata
- CrossRef
- InvenioRDM
date-released: 2024-02-29
GitHub Events
Total
- Issues event: 7
- Watch event: 2
- Issue comment event: 1
- Push event: 561
Last Year
- Issues event: 7
- Watch event: 2
- Issue comment event: 1
- Push event: 561
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 24
- Total pull requests: 7
- Average time to close issues: 5 months
- Average time to close pull requests: 1 minute
- Total issue authors: 1
- Total pull request authors: 2
- Average comments per issue: 0.96
- Average comments per pull request: 0.0
- Merged pull requests: 7
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 10
- Pull requests: 2
- Average time to close issues: 2 months
- Average time to close pull requests: 1 minute
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 1.0
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- tmorrell (17)
Pull Request Authors
- tmorrell (5)
- t4k (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- EndBug/add-and-commit v7 composite
- actions/checkout v2 composite
- caltechlibrary/codemeta2cff main composite
- EndBug/add-and-commit v9 composite
- actions/checkout v3 composite
- EndBug/add-and-commit v9 composite
- actions/checkout v3 composite
- EndBug/add-and-commit v9 composite
- actions/checkout v3 composite
- caltechdata_api >=1.4.1