epxml_to_datacite
Transform Eprints XML to DataCite XML and mint DOIs in Eprints repositories
Science Score: 52.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
✓Institutional organization owner
Organization caltechlibrary has institutional domain (www.library.caltech.edu) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.1%) to scientific vocabulary
Keywords
Repository
Transform Eprints XML to DataCite XML and mint DOIs in Eprints repositories
Basic Info
Statistics
- Stars: 5
- Watchers: 5
- Forks: 1
- Open Issues: 0
- Releases: 24
Topics
Metadata Files
README.md
epxmltodatacite
Convert Eprints XML to DataCite XML and mint DOIs. Only tested on Caltech repositories.
Contents
- caltech_thesis - Generate DataCite metadata and DOIs from CaltechTHESIS
- caltechauthorstech_report - Generate DataCite metadata and DOIs from CaltechAUTHORS tech reports
- caltechauthorsto_data - Make DataCite metadata for data files in CaltechAUTHORS
Setup
Prerequisites
You need to have Python 3.7 on your machine
(Miniconda is a great
installation option). Test whether you have python installed by opening a terminal or
anaconda prompt window and typing python -V, which should print version 3.7
or greater. It's best to download this software using git. To install git, type
conda install git in your terminal or anaconda prompt window.
Clone epxmltodatacite
Find where you want the epxmltodatacite folder to live on your computer in File Explorer or Finder
(This could be the Desktop or Documents folder, for example). Type cd
in anaconda prompt or terminal and drag the location from the file browser into
the terminal window. The path to the location
will show up, so your terminal will show a command like
cd /Users/tmorrell/Desktop. Hit enter. Then type
git clone https://github.com/caltechlibrary/epxml_to_datacite.git. Once you
hit enter you'll see an epxmltodatacite folder. Type cd epxml_to_datacite
Install
Now that you're in the epxmltodatacite folder, type python setup.py install
to install dependencies.
If you're on a Mac, you'll need to authorize the underlying eputil application.
Open the epxml_to_datacite directory in finder, open the epxml_support
directory, and right click on eputil and select 'Open'. Agree that you
authorize the executible. This is a one-time installation step.
If you will be minting DOIs, you need to create a file called pw using a text
editor that contains your DataCite password. The username is hardcoded in the
script, since non-Caltech users will have to modify the script to work with
their Eprints installation. If you don't have a text editor on your machine, type
conda install -c swc nano
Updating
When there is a new version of the software, go to the epxmltodatacite
folder in anaconda prompt or terminal and type git pull. You shouldn't need to re-do
the installation steps unless there are major updates.
Options
There are three different scripts
caltech_thesis.pycaltech_authors_to_data.py(Prepares metadata from CaltechAUTHORS for submission to CaltechDATA)caltech_authors_tech_report.py(Prepares metadata from CaltechAUTHORS tech reports withmonographitem type (Report or Paper))
In this documentation we use caltech_thesis.py as the example script, but in most cases you can substitute one of the other sources.
Basic operation
If you have Eprints XML files (from thesis.library.caltech.edu/rest/eprint/1234.xml, for example), put them in the epxmltodatacite folder. Type
python caltech_thesis.py
And you'll get '_datacite.xml' for each xml file in the folder
Downloading Eprints XML
You can use Eprints ids (e.g. 9690) to download Eprints xml files by adding a
-ids option to any command.
python caltech_thesis.py -ids 9690
Alternativly, you can provide a tsv file, where the first column is the Eprints
id using the -id_file option
python caltech_thesis.py -id_file ids.tsv
Mint DOIs
You can also have the script submit the metadata to DataCite and add the DOI to the source repository. Add the -mint
option and if you want to make test DOIs add the -test option to the command line.
python caltech_thesis.py -mint -ids 9690
Custom Prefixes
caltech_authors_tech_report.py has support for alternative DOI prefixes. By
adding the -prefix option you can mint a DOI for any of the DataCite prefixes
controlled by the library.
python caltech_authors_tech_report.py -prefix 10.26206 -ids 99015
Custom prefixes can also trigger metadata changes. For example, the publisher for prefix 10.26206 is the Keck Institute for Space Studies"
Advanced
You can also import the metadata transformation function into another python
script by including from caltech_thesis import epxml_to_datacite at the top of your new script.
Then you will be able to call epxml_to_datacite(eprint), where eprint is an
xml file parsed by something like:
infile = open('10271.xml',encoding="utf8")
eprint = xmltodict.parse(infile.read())['eprints']['eprint']
datacite = epxml_to_datacite(eprint)
Owner
- Name: Caltech Library
- Login: caltechlibrary
- Kind: organization
- Email: helpdesk@library.caltech.edu
- Location: Pasadena, CA 91125
- Website: https://www.library.caltech.edu/
- Repositories: 84
- Profile: https://github.com/caltechlibrary
We manage the physical and digital holdings of the California Institute of Technology, provide services and training, and develop open-source software.
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: epxml_to_datacite
authors:
- family-names: Morrell
given-names: Thomas E
orcid: https://orcid.org/0000-0001-9266-5146
abstract: Transform Eprints XML to DataCite XML.
repository-code: "https://github.com/caltechlibrary/epxml_to_datacite"
type: software
version: 1.2.0
license-url: "https://data.caltech.edu/license"
keywords:
- GitHub
- Eprints
- metadata
- DataCite
- DOI
- XML
- software
date-released: 2022-08-29
CodeMeta (codemeta.json)
{
"@context": "https://doi.org/10.5063/schema/codemeta-2.0",
"@type": "SoftwareSourceCode",
"description": "Transform Eprints XML to DataCite XML.",
"name": "epxml_to_datacite",
"codeRepository": "https://github.com/caltechlibrary/epxml_to_datacite",
"issueTracker": "https://github.com/caltechlibrary/epxml_to_datacite/issues",
"license": "https://data.caltech.edu/license",
"version": "1.2.0",
"author": [
{
"@type": "Person",
"givenName": "Thomas E",
"familyName": "Morrell",
"affiliation": {
"@type": "Organization",
"name": "Caltech Library"
},
"email": "tmorrell@caltech.edu",
"@id": "https://orcid.org/0000-0001-9266-5146"
}
],
"developmentStatus": "active",
"downloadUrl": "https://github.com/caltechlibrary/epxml_to_datacite/releases",
"keywords": [
"GitHub",
"Eprints",
"metadata",
"DataCite",
"DOI",
"XML",
"software"
],
"maintainer": "https://orcid.org/0000-0001-9266-5146",
"programmingLanguage": "Python"
}
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0