epxml_to_datacite

Transform Eprints XML to DataCite XML and mint DOIs in Eprints repositories

https://github.com/caltechlibrary/epxml_to_datacite

Science Score: 52.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
    Organization caltechlibrary has institutional domain (www.library.caltech.edu)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.1%) to scientific vocabulary

Keywords

datacite datacite-xml eprints
Last synced: 6 months ago · JSON representation ·

Repository

Transform Eprints XML to DataCite XML and mint DOIs in Eprints repositories

Basic Info
  • Host: GitHub
  • Owner: caltechlibrary
  • License: other
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 26.6 MB
Statistics
  • Stars: 5
  • Watchers: 5
  • Forks: 1
  • Open Issues: 0
  • Releases: 24
Topics
datacite datacite-xml eprints
Created almost 8 years ago · Last pushed about 2 years ago
Metadata Files
Readme License Code of conduct Citation Codemeta

README.md

epxmltodatacite

DOI

Convert Eprints XML to DataCite XML and mint DOIs. Only tested on Caltech repositories.

Contents

  • caltech_thesis - Generate DataCite metadata and DOIs from CaltechTHESIS
  • caltechauthorstech_report - Generate DataCite metadata and DOIs from CaltechAUTHORS tech reports
  • caltechauthorsto_data - Make DataCite metadata for data files in CaltechAUTHORS

Setup

Prerequisites

You need to have Python 3.7 on your machine (Miniconda is a great installation option). Test whether you have python installed by opening a terminal or anaconda prompt window and typing python -V, which should print version 3.7 or greater. It's best to download this software using git. To install git, type conda install git in your terminal or anaconda prompt window.

Clone epxmltodatacite

Find where you want the epxmltodatacite folder to live on your computer in File Explorer or Finder (This could be the Desktop or Documents folder, for example). Type cd in anaconda prompt or terminal and drag the location from the file browser into the terminal window. The path to the location will show up, so your terminal will show a command like cd /Users/tmorrell/Desktop. Hit enter. Then type git clone https://github.com/caltechlibrary/epxml_to_datacite.git. Once you hit enter you'll see an epxmltodatacite folder. Type cd epxml_to_datacite

Install

Now that you're in the epxmltodatacite folder, type python setup.py install to install dependencies.

If you're on a Mac, you'll need to authorize the underlying eputil application. Open the epxml_to_datacite directory in finder, open the epxml_support directory, and right click on eputil and select 'Open'. Agree that you authorize the executible. This is a one-time installation step.

If you will be minting DOIs, you need to create a file called pw using a text editor that contains your DataCite password. The username is hardcoded in the script, since non-Caltech users will have to modify the script to work with their Eprints installation. If you don't have a text editor on your machine, type conda install -c swc nano

Updating

When there is a new version of the software, go to the epxmltodatacite folder in anaconda prompt or terminal and type git pull. You shouldn't need to re-do the installation steps unless there are major updates.

Options

There are three different scripts

  • caltech_thesis.py
  • caltech_authors_to_data.py (Prepares metadata from CaltechAUTHORS for submission to CaltechDATA)
  • caltech_authors_tech_report.py (Prepares metadata from CaltechAUTHORS tech reports with monograph item type (Report or Paper))

In this documentation we use caltech_thesis.py as the example script, but in most cases you can substitute one of the other sources.

Basic operation

If you have Eprints XML files (from thesis.library.caltech.edu/rest/eprint/1234.xml, for example), put them in the epxmltodatacite folder. Type

python caltech_thesis.py

And you'll get '_datacite.xml' for each xml file in the folder

Downloading Eprints XML

You can use Eprints ids (e.g. 9690) to download Eprints xml files by adding a -ids option to any command.

python caltech_thesis.py -ids 9690

Alternativly, you can provide a tsv file, where the first column is the Eprints id using the -id_file option

python caltech_thesis.py -id_file ids.tsv

Mint DOIs

You can also have the script submit the metadata to DataCite and add the DOI to the source repository. Add the -mint option and if you want to make test DOIs add the -test option to the command line.

python caltech_thesis.py -mint -ids 9690

Custom Prefixes

caltech_authors_tech_report.py has support for alternative DOI prefixes. By adding the -prefix option you can mint a DOI for any of the DataCite prefixes controlled by the library.

python caltech_authors_tech_report.py -prefix 10.26206 -ids 99015

Custom prefixes can also trigger metadata changes. For example, the publisher for prefix 10.26206 is the Keck Institute for Space Studies"

Advanced

You can also import the metadata transformation function into another python script by including from caltech_thesis import epxml_to_datacite at the top of your new script. Then you will be able to call epxml_to_datacite(eprint), where eprint is an xml file parsed by something like:

infile = open('10271.xml',encoding="utf8") eprint = xmltodict.parse(infile.read())['eprints']['eprint'] datacite = epxml_to_datacite(eprint)

Owner

  • Name: Caltech Library
  • Login: caltechlibrary
  • Kind: organization
  • Email: helpdesk@library.caltech.edu
  • Location: Pasadena, CA 91125

We manage the physical and digital holdings of the California Institute of Technology, provide services and training, and develop open-source software.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: epxml_to_datacite
authors:
  - family-names: Morrell
    given-names: Thomas E
    orcid: https://orcid.org/0000-0001-9266-5146
abstract: Transform Eprints XML to DataCite XML.
repository-code: "https://github.com/caltechlibrary/epxml_to_datacite"
type: software
version: 1.2.0
license-url: "https://data.caltech.edu/license"
keywords:
  - GitHub
  - Eprints
  - metadata
  - DataCite
  - DOI
  - XML
  - software
date-released: 2022-08-29

CodeMeta (codemeta.json)

{
  "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
  "@type": "SoftwareSourceCode",
  "description": "Transform Eprints XML to DataCite XML.",
  "name": "epxml_to_datacite",
  "codeRepository": "https://github.com/caltechlibrary/epxml_to_datacite",
  "issueTracker": "https://github.com/caltechlibrary/epxml_to_datacite/issues",
  "license": "https://data.caltech.edu/license",
  "version": "1.2.0",
  "author": [
    {
      "@type": "Person",
      "givenName": "Thomas E",
      "familyName": "Morrell",
      "affiliation": {
        "@type": "Organization",
        "name": "Caltech Library"
      },
      "email": "tmorrell@caltech.edu",
      "@id": "https://orcid.org/0000-0001-9266-5146"
    }
  ],
  "developmentStatus": "active",
  "downloadUrl": "https://github.com/caltechlibrary/epxml_to_datacite/releases",
  "keywords": [
    "GitHub",
    "Eprints",
    "metadata",
    "DataCite",
    "DOI",
    "XML",
    "software"
  ],
  "maintainer": "https://orcid.org/0000-0001-9266-5146",
  "programmingLanguage": "Python"
}

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels