inveniordm-migrate
Scripts to migrate content into Invenio RDM
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.1%) to scientific vocabulary
Repository
Scripts to migrate content into Invenio RDM
Basic Info
Statistics
- Stars: 2
- Watchers: 6
- Forks: 0
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
Assorted scripts to migrate content to InvenioRDM and S3 data sources
This repo holds scripts user to migrate content into InvenioRDM. These have generally been used for one-time migration activities, but may be useful in the future.
Table of contents
Usage
CaltechDATA
migrate_caltechdata.py was usilized to move records from the TIND-managed
Invenio instance to InvenioRDM
CaltechTHESIS
migrate_caltechthesis.py was utilized to creats some minimal test records in
InvenioRDM. It is not complete.
OSN Migration
For large collections of data we sometimes need to move the data first, and then create InvenioRDM records. An S3 object store like the Open Storage Network is a great option. You can bulk move records efficiently with s5cmd and the management scripts.
Run python make_command.py to generate a list of files to sync. You'll need
to set environment variables with
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
S3_ENDPOINT_URL https://renc.osn.xsede.org
AWS_REGION us-east-1
Then run the command with
nohup ./s5cmd -numworkers 100 run commands.txt >> & log2017.txt ; echo Done >> & log2017.txt &.
You may be able to adjust the numworkers component depending on the OS.
Getting help
Raise an issue on the issue tacker.
License
Software produced by the Caltech Library is Copyright (C) 2023, Caltech. This software is freely distributed under a BSD/MIT type license. Please see the LICENSE file for more information.
Authors and history
These scripts were written by Tom Morrell.
Acknowledgments
This work was funded by the California Institute of Technology Library.
Owner
- Name: Caltech Library
- Login: caltechlibrary
- Kind: organization
- Email: helpdesk@library.caltech.edu
- Location: Pasadena, CA 91125
- Website: https://www.library.caltech.edu/
- Repositories: 84
- Profile: https://github.com/caltechlibrary
We manage the physical and digital holdings of the California Institute of Technology, provide services and training, and develop open-source software.
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 9
- Total pull requests: 0
- Average time to close issues: about 1 month
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- tmorrell (9)