irdm_harvester

Automatically harvest publications for an InvenioRDM repository

https://github.com/caltechlibrary/irdm_harvester

Science Score: 65.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
    Organization caltechlibrary has institutional domain (www.library.caltech.edu)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.2%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Automatically harvest publications for an InvenioRDM repository

Basic Info
  • Host: GitHub
  • Owner: caltechlibrary
  • License: other
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 1.84 MB
Statistics
  • Stars: 3
  • Watchers: 5
  • Forks: 0
  • Open Issues: 2
  • Releases: 2
Created about 3 years ago · Last pushed 10 months ago
Metadata Files
Readme Changelog Contributing License Code of conduct Citation Support Codemeta

README.md

InvenioRDM Harvester

This is a harvester that can automatically collect and submit works to an InvenioRDM repository. It currently works with the CaltechAUTHORS repository and looks at CrossRef and ORCID.

License Latest
release DOI

Table of contents

Introduction

Currently harvesting:

- CrossRef by ROR
- ORCID
- CrossRef DOIs

Usage

The harvests are typically run through GitHub actions but could also be run on the command line.

You need to have a CaltechAUTHORS token available in the environment variable RDMTOK. For a CrossRef ROR harvest type

bash python harvest.py crossref

You can harvest a specific DOI with

bash python harvest.py -doi 10.7717/peerj-cs.1023

For an ORCID harvest type:

bash python harvest.py orcid -orcid 0000-0001-9266-5146

For all harvests there is an -actor flag, which gets included in the message when the record is added to the queue.

Installation

For command line use you need the latest version of irdmtools installed:

curl https://caltechlibrary.github.io/irdmtools/installer.sh | sh

Then install the python requirements with

pip install -r requirements.txt

Known issues and limitations

While this approach should work for any InvenioRDM repository, it has only been tested on CaltechAUTHORS. If you're interested in using this with a different repository reach out as we would be happy to make it a bit more flexible.

Publishers use a wide variety of urls for licenses. We are currently adding variants to the license.csv file, which is a custom file that connects urls to the InvenioRDM license names. It is almost certainly incomplete.

Getting help

Open an issue in the issue tab.

Contributing

Pull requests are appreciated.

License

Software produced by the Caltech Library is Copyright © 2022 California Institute of Technology. This software is freely distributed under a BSD-style license. Please see the LICENSE file for more information.

Authors and history

GitHub action created by Tom Morrell. Robert Doiel and Tom Morrell wrote the source irdmtools package.

Acknowledgments

This work was funded by the California Institute of Technology Library.


Owner

  • Name: Caltech Library
  • Login: caltechlibrary
  • Kind: organization
  • Email: helpdesk@library.caltech.edu
  • Location: Pasadena, CA 91125

We manage the physical and digital holdings of the California Institute of Technology, provide services and training, and develop open-source software.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: irdm_harvester
authors:
  - family-names: Morrell
    given-names: Thomas E
    orcid: https://orcid.org/0000-0001-9266-5146
abstract: Automatic harvester for adding content to an InvenioRDM repository.
repository-code: "https://github.com/caltechlibrary/irdm_harvester"
type: software
version: 0.2.0
license-url: "https://data.caltech.edu/license"
keywords:
  - metadata
  - CrossRef
  - InvenioRDM
date-released: 2024-02-29

GitHub Events

Total
  • Issues event: 7
  • Watch event: 2
  • Issue comment event: 1
  • Push event: 561
Last Year
  • Issues event: 7
  • Watch event: 2
  • Issue comment event: 1
  • Push event: 561

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 24
  • Total pull requests: 7
  • Average time to close issues: 5 months
  • Average time to close pull requests: 1 minute
  • Total issue authors: 1
  • Total pull request authors: 2
  • Average comments per issue: 0.96
  • Average comments per pull request: 0.0
  • Merged pull requests: 7
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 10
  • Pull requests: 2
  • Average time to close issues: 2 months
  • Average time to close pull requests: 1 minute
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • tmorrell (17)
Pull Request Authors
  • tmorrell (5)
  • t4k (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

.github/workflows/codemeta2cff.yml actions
  • EndBug/add-and-commit v7 composite
  • actions/checkout v2 composite
  • caltechlibrary/codemeta2cff main composite
.github/workflows/crossref_ror.yaml actions
  • EndBug/add-and-commit v9 composite
  • actions/checkout v3 composite
.github/workflows/doi.yaml actions
  • EndBug/add-and-commit v9 composite
  • actions/checkout v3 composite
.github/workflows/orcid.yaml actions
  • EndBug/add-and-commit v9 composite
  • actions/checkout v3 composite
requirements.txt pypi
  • caltechdata_api >=1.4.1