ames

Automated Metadata Service

https://github.com/caltechlibrary/ames

Science Score: 62.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    1 of 4 committers (25.0%) from academic institutions
  • Institutional organization owner
    Organization caltechlibrary has institutional domain (www.library.caltech.edu)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.3%) to scientific vocabulary

Keywords from Contributors

inveniordm archives
Last synced: 6 months ago · JSON representation ·

Repository

Automated Metadata Service

Basic Info
  • Host: GitHub
  • Owner: caltechlibrary
  • License: other
  • Language: Python
  • Default Branch: main
  • Size: 66.3 MB
Statistics
  • Stars: 5
  • Watchers: 7
  • Forks: 4
  • Open Issues: 9
  • Releases: 35
Created over 8 years ago · Last pushed 7 months ago
Metadata Files
Readme License Code of conduct Citation Codemeta

README.md

ames

DOI

Automated Metadata Service

Manage metadata from different sources. The examples in the package are specific to Caltech repositories, but could be generalized. This package is currently in development and will have additional sources and matchers added over time.

Install

You need to have Python 3.7 or later on your machine.

If you just need the python functions to write your own code (like codemetatodatacite) open a terminal and type pip install ames

Full Install

The full install will include all the example scripts. You need to have Python 3.7 or later on your machine and git.

Clone ames

A full install starts by downloading this software using git. Find where you want the ames folder to live on your computer in File Explorer or Finder (This could be the Desktop or Documents folder, for example). Type cd in anaconda prompt or terminal and drag the location from the file browser into the terminal window. The path to the location will show up, so your terminal will show a command like cd /Users/tmorrell/Desktop. Hit enter. Then type git clone https://github.com/caltechlibrary/ames.git. Once you hit enter you'll see an ames folder. Type cd ames

Install

Now that you're in the ames folder, type python setup.py install. You can now run all the different operations described below.

Updating

When there is a new version of the software, go to the ames folder in anaconda prompt or terminal and type git pull. You shouldn't need to re-do the installation steps unless there are major updates.

Organization

Harvesters

  • crossref_refs - Harvest references in datacite metadata from crossref event data
  • caltechdata - Harvest metadata from CaltechDATA
  • cd_github - Harvest GitHub repos and codemeta files from CaltechDATA
  • matomo - Harvest web statistics from matomo
  • caltechfeeds - Harvest Caltech Library metadata from feeds.library.caltech.edu

Matchers

  • caltechdata - Match content in CaltechDATA
  • update_datacite - Match content in DataCite

Example Operations

The run scripts show examples of using ames to perform a specific update operation.

CodeMeta management

In the test directory these is an example of using the codemetatodatacite function to convert a codemeta file to DataCite standard metdata

CodeMeta Updating

Collect GitHub records in CaltechDATA, search for a codemeta.json file, and update CaltechDATA with new metadata.

CodeMeta Setup

You need to set an environmental variable with your token to access CaltechDATA export TINDTOK=

CodeMeta Usage

Type python run_codemeta.py.

CaltechDATA Citation Alerts

Harvest citation data from the Crossref Event Data API, records in CaltechDATA, match records, update metadata in CaltechDATA, and send email to user.

Citation Alerts Setup

You need to set environmental variables with your token to access CaltechDATA export TINDTOK= and Mailgun export MAILTOK=.

Citation Alerts Usage

Type python run_event_data.py. You'll be prompted for confirmation if any new citations are found.

Media Updates

Update media records in DataCite that indicate the files associated with a DOI.

Media Setup

You need to set an environmental variable with your password for your DataCite account using export DATACITE=

Media Usage

Type python run_media_update.py.

CaltechDATA metadata checks

This will run checks on the quality of metadata in CaltechDATA. Currently this verifies whether redundent links are present in the related identifier section.
It also can update metadata with DataCite.

Metadata Checks Setup

You need to set environmental variables with your token to access CaltechDATA export TINDTOK=

Metadata Checks Usage

Type python run_caltechdata_checks.py.

CaltechDATA Metadata Updates

This will improve the quality of metadata in CaltechDATA. This option is broken up into updates that should run frequently (currently every 10 minutes) and daily. Frequent updates include adding a recommended citation to the descriptions, and daily updates include adding CaltechTHESIS DOIs to CaltechDATA.

Metadata Updates Setup

You need to set environmental variables with your token to access CaltechDATA export TINDTOK=

Metadata Updates Usage

Type python run_caltechdata_updates.py or python run_caltechdata_daily.py.

CaltechDATA COUNTER Usage Reports

This will harvest download and view information from matomo and format it into a COUNTER report. This feature is still being tested.

Usage Report Setup

You need to set environmental variables with your token to access Matomo export MATTOK=

Usage Report Usage

Type python run_usage.py.

Archives Reports

Runs reports on ArchivesSpace. Current reports:

  • accession_report: Returns accession records that match a certain subject
  • format_report: Returns large report on accessions with certain media formats

Example usage:

python runarchivesreport.py accession_report accession.csv -subject "Manuscript Collection"

Update Eprints

Perform update options using the Eprints API. Supports url updates to https for resolver field, special character updates, and adjusting the item modified date (which also regenerates the public view of the page).

Example usage:

python runeprintsupdates.py update_date authors -recid 83420 -user tmorrell -password

CODA Reports

Runs reports on Caltech Library repositories. Current reports:

  • doi_report: Records (optionally filtered by year) and their DOIs.

  • thesis_report: Matches Eprints tsv export for CaltechTHESIS

  • thesis_metadata: Matches Eprints metadata tsv export for CaltechTHESIS

  • creator_report: Finds records where an Eprints Creator ID has an ORCID but it is not included on all records. Also lists cases where an author has two ORCIDS.

  • creator_search: Export a google sheet with the author lists of all publications associated with an author id. Requires -creator argument

  • people_search: Search across the CaltechPEOPLE collection by division

  • file_report: Records that have potential problems with the attached files

  • status_report: Reports on any records with an incorrect status in feeds

  • recordnumberreport: Reports on records where the record number and resolver URL don't match

  • alturlreport: Reports on records with discontinure alt_url field

  • license_report: Report out the license types in CaltechDATA

Report Usage

Type something like python run_coda_report.py doi_report thesis report.tsv -year 1977-1978

  • The first option is the report type
  • Next is the repository (thesis or authors)
  • Next is the output file name (include .csv or .tsv extension, will show up in current directory)

Report Options

  • Some reports include a -year option to return just the records from a specific year (1977) or a range (1977-1978)

  • Some reports include a -group option to return just the records with a specific group name. Surround long names with quotes (e.g. "Keck Institute for Space Studies")

  • Some reports include a -item option to return just records with a specific item type. Supported types include:

    • CaltechDATA item types (Dataset, Software, ...)
    • CaltechAUTHORS item types (article, monograph, ...)
    • CaltechAUTHORS monograph sub-types
      • discussion_paper
      • documentation
      • manual
      • other
      • project_report
      • report
      • technical_report
      • white_paper
      • working_paper

There are some additional technical arguments if you want to change the default behavior.

  • Adding -source eprints will pull report data from Eprints instead of feeds. This is very slow. You may need to add -username and -password to provide login credentials

  • Adding -sample XXX allows you to select a number of randomly selected records. This makes it more reasonable to pull data directly from Eprints.

You can combine multiple options to build more complex queries, such as this request for reports from a group:

console python run_coda_report.py doi_report authors keck_tech_reports.csv -group "Keck Institute for Space Studies" -item technical_report project_report discussion_paper

console python run_coda_report.py people_search people chem.csv -search "Chemistry and Chemical Engineering Division"

Owner

  • Name: Caltech Library
  • Login: caltechlibrary
  • Kind: organization
  • Email: helpdesk@library.caltech.edu
  • Location: Pasadena, CA 91125

We manage the physical and digital holdings of the California Institute of Technology, provide services and training, and develop open-source software.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: ames
authors:
  - family-names: Morrell
    given-names: Thomas E
    orcid: https://orcid.org/0000-0001-9266-5146
  - family-names: Doiel
    given-names: Robert
    orcid: https://orcid.org/0000-0003-0900-6903
  - family-names: Bhattarai
    given-names: Rohan
    orcid: https://orcid.org/0009-0007-0323-4733
  - family-names: Won
    given-names: Elizabeth
    orcid: https://orcid.org/0009-0002-2450-6471
  - family-names: Abakah
    given-names: Alexander
    orcid: https://orcid.org/0009-0003-5640-6691
abstract: Automated Metadata Service: Manage metadata from different sources.
repository-code: "https://github.com/caltechlibrary/ames"
type: software
doi: 10.22002/4d93n-c8q12
version: 1.2.2
license-url: "https://data.caltech.edu/license"
keywords:
  - GitHub
  - metadata
  - software
date-released: 2025-07-30

CodeMeta (codemeta.json)

{
  "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
  "@type": "SoftwareSourceCode",
  "description": "Automated Metadata Service: Manage metadata from different sources.",
  "name": "ames",
  "codeRepository": "https://github.com/caltechlibrary/ames",
  "issueTracker": "https://github.com/caltechlibrary/ames/issues",
  "license": "https://data.caltech.edu/license",
  "version": "1.2.2",
  "author": [
    {
      "@type": "Person",
      "givenName": "Thomas E",
      "familyName": "Morrell",
      "affiliation": {
        "@type": "Organization",
        "name": "Caltech Library"
      },
      "email": "tmorrell@caltech.edu",
      "@id": "https://orcid.org/0000-0001-9266-5146"
    },
    {
      "@type": "Person",
      "givenName": "Robert",
      "familyName": "Doiel",
      "affiliation": {
        "@type": "Organization",
        "name": "Caltech Library"
      },
      "email": "rsdoiel@caltech.edu",
      "@id": "https://orcid.org/0000-0003-0900-6903"
    },
    {
      "@type": "Person",
      "givenName": "Rohan",
      "familyName": "Bhattarai",
      "affiliation": {
        "@type": "Organization",
        "name": "Caltech"
      },
      "email": "rbhattar@caltech.edu",
      "@id": "https://orcid.org/0009-0007-0323-4733"
    },
    {
      "@type": "Person",
      "givenName": "Elizabeth",
      "familyName": "Won",
      "affiliation": {
        "@type": "Organization",
        "name": "Caltech"
      },
      "@id": "https://orcid.org/0009-0002-2450-6471"
    },
    {
      "@type": "Person",
      "givenName": "Alexander",
      "familyName": "Abakah",
      "affiliation": {
        "@type": "Organization",
        "name": "Caltech"
      },
      "@id": "https://orcid.org/0009-0003-5640-6691"
    }
  ],
  "developmentStatus": "active",
  "downloadUrl": "https://github.com/caltechlibrary/ames/archive/main.zip",
  "keywords": [
    "GitHub",
    "metadata",
    "software"
  ],
  "programmingLanguage": "Python",
  "maintainer": [
    {
      "@id": "https://orcid.org/0000-0001-9266-5146",
      "@type": "Person",
      "affiliation": {
        "@type": "Organization",
        "name": "Caltech Library"
      },
      "familyName": "Morrell",
      "givenName": "Thomas E."
    }
  ],
  "identifier": "10.22002/4d93n-c8q12",
  "funding": {
    "@type": "Grant",
    "identifier": "2322420",
    "name": "CC* Data Storage: Closing Caltech's data storage gap: from ad-hoc to well-managed stewardship of large-scale datasets",
    "funder": {
      "@id": "https://doi.org/10.13039/100000001",
      "@type": "Organization",
      "name": "National Science Foundation"
    }
  }
}

GitHub Events

Total
  • Create event: 2
  • Release event: 2
  • Issues event: 5
  • Watch event: 1
  • Issue comment event: 4
  • Push event: 32
  • Pull request review comment event: 10
  • Pull request review event: 11
  • Pull request event: 6
  • Fork event: 2
Last Year
  • Create event: 2
  • Release event: 2
  • Issues event: 5
  • Watch event: 1
  • Issue comment event: 4
  • Push event: 32
  • Pull request review comment event: 10
  • Pull request review event: 11
  • Pull request event: 6
  • Fork event: 2

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 346
  • Total Committers: 4
  • Avg Commits per committer: 86.5
  • Development Distribution Score (DDS): 0.145
Top Committers
Name Email Commits
Tom Morrell t****l@c****u 296
Thomas Morrell t****l@u****m 36
R. S. Doiel r****l@g****m 12
Katrin Leinweber 9****r@u****m 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 18
  • Total pull requests: 12
  • Average time to close issues: 3 months
  • Average time to close pull requests: 10 days
  • Total issue authors: 3
  • Total pull request authors: 6
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.58
  • Merged pull requests: 9
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 4
  • Pull requests: 7
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 13 days
  • Issue authors: 1
  • Pull request authors: 3
  • Average comments per issue: 0.25
  • Average comments per pull request: 0.43
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • tmorrell (16)
  • rsdoiel (1)
  • WardLT (1)
Pull Request Authors
  • RohanBhattaraiNP (3)
  • rsdoiel (3)
  • elizabethjhwon (2)
  • AbakahAlexander (2)
  • katrinleinweber (1)
  • tmorrell (1)
Top Labels
Issue Labels
bug (1)
Pull Request Labels
bug (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 667 last-month
  • Total docker downloads: 81
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 25
  • Total maintainers: 1
pypi.org: ames

Automated Metadata Service: Manage metadata from different sources.

  • Versions: 25
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 667 Last month
  • Docker Downloads: 81
Rankings
Docker downloads count: 2.6%
Dependent packages count: 7.5%
Average: 17.8%
Forks count: 19.4%
Stargazers count: 21.8%
Dependent repos count: 22.5%
Downloads: 33.3%
Maintainers (1)
Last synced: 7 months ago