inveniordm-migrate

Scripts to migrate content into Invenio RDM

https://github.com/caltechlibrary/inveniordm-migrate

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.1%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Scripts to migrate content into Invenio RDM

Basic Info
  • Host: GitHub
  • Owner: caltechlibrary
  • License: other
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 912 KB
Statistics
  • Stars: 2
  • Watchers: 6
  • Forks: 0
  • Open Issues: 1
  • Releases: 0
Created over 6 years ago · Last pushed about 3 years ago
Metadata Files
Readme Changelog Contributing License Code of conduct Support Codemeta

README.md

Assorted scripts to migrate content to InvenioRDM and S3 data sources

This repo holds scripts user to migrate content into InvenioRDM. These have generally been used for one-time migration activities, but may be useful in the future.

License Latest release

Table of contents

Usage

CaltechDATA

migrate_caltechdata.py was usilized to move records from the TIND-managed Invenio instance to InvenioRDM

CaltechTHESIS

migrate_caltechthesis.py was utilized to creats some minimal test records in InvenioRDM. It is not complete.

OSN Migration

For large collections of data we sometimes need to move the data first, and then create InvenioRDM records. An S3 object store like the Open Storage Network is a great option. You can bulk move records efficiently with s5cmd and the management scripts.

Run python make_command.py to generate a list of files to sync. You'll need to set environment variables with

AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY S3_ENDPOINT_URL https://renc.osn.xsede.org AWS_REGION us-east-1

Then run the command with nohup ./s5cmd -numworkers 100 run commands.txt >> & log2017.txt ; echo Done >> & log2017.txt &. You may be able to adjust the numworkers component depending on the OS.

Getting help

Raise an issue on the issue tacker.

License

Software produced by the Caltech Library is Copyright (C) 2023, Caltech. This software is freely distributed under a BSD/MIT type license. Please see the LICENSE file for more information.

Authors and history

These scripts were written by Tom Morrell.

Acknowledgments

This work was funded by the California Institute of Technology Library.


Owner

  • Name: Caltech Library
  • Login: caltechlibrary
  • Kind: organization
  • Email: helpdesk@library.caltech.edu
  • Location: Pasadena, CA 91125

We manage the physical and digital holdings of the California Institute of Technology, provide services and training, and develop open-source software.

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 9
  • Total pull requests: 0
  • Average time to close issues: about 1 month
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • tmorrell (9)
Pull Request Authors
Top Labels
Issue Labels
CaltechDATA (8) CaltechTHESIS (1)
Pull Request Labels

Dependencies

setup.py pypi