codemetapy

A Python package for generating and working with codemeta

https://github.com/proycon/codemetapy

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (19.5%) to scientific vocabulary

Keywords

codemeta linked-data metadata metadata-extractor schema-org scientific
Last synced: 4 months ago · JSON representation

Repository

A Python package for generating and working with codemeta

Basic Info
  • Host: GitHub
  • Owner: proycon
  • License: gpl-3.0
  • Language: Python
  • Default Branch: master
  • Homepage: https://codemeta.github.io/
  • Size: 991 KB
Statistics
  • Stars: 28
  • Watchers: 3
  • Forks: 6
  • Open Issues: 9
  • Releases: 33
Topics
codemeta linked-data metadata metadata-extractor schema-org scientific
Created over 7 years ago · Last pushed 8 months ago
Metadata Files
Readme License Codemeta

README.md

Project Status: Active -- The project has reached a stable, usable state and is being actively developed. GitHub build GitHub release Latest release in the Python Package Index

Codemetapy

Codemetapy is a command-line tool to work with the codemeta software metadata standard. Codemeta builds upon schema.org and defines a vocabulary for describing software source code. It maps various existing metadata standards to a unified vocabulary.

For more general information about the CodeMeta Project for defining software metadata, see https://codemeta.github.io. In particular, new users might want to start with the User Guide, while those looking to learn more about JSON-LD and consuming existing codemeta files should see the Developer Guide.

Using codemetapy you can generate a codemeta.json file, which serialises using JSON-LD , for your software. At the moment it supports conversions from the following existing metadata specifications:

  • Python distutils/pip packages (setup.py/pyproject.toml)
  • Java/Maven packages (pom.xml)
  • NodeJS packages (package.json)
  • Debian package (apt show output)
  • Github API (when passed a github URL)
  • GitLab API (when passed a GitLab URL)
  • Web sites/services (see the section on software types and service below):
    • Simple metadata from HTML <meta> elements.
    • Script blocks using application/json+ld

It can also read and manipulate existing codemeta.json files as well as parse simple AUTHORS/CONTRIBUTORS files. One of the most notable features of codemetapy is that it allows chaining to successively update a metadata description based on multiple sources. Codemetapy is used in that way by the codemeta-harvester.

Note: If you are looking for an all-in-one solution to automatically generate a codemeta.json for your project, then codemeta-harvester is the best place to start. It is a higher-level tool that automatically invokes codemetapy on various sources it can automatically detect, and combined those into a single codemeta representation.

Installation

pip install codemetapy

Usage

Query and convert any installed python package:

$ codemetapy somepackage

Output will be to standard output by default, to write it to an output file instead, do either:

$ codemetapy somepackage > codemeta.json

or use the -O parameter:

$ codemetapy -O codemeta.json somepackage

If you are in the current working directory of any python project and there is a setup.pyor pyproject.toml, then you can simply call codemetapy without arguments to output codemeta for the project. Codemetapy will automatically run python setup.py egg_info if needed and parse it's output to facilitate this:

$ codemetapy

The tool also supports adding properties through parameters:

$ codemetapy --developmentStatus active somepackage > codemeta.json

To read an existing codemeta.json and extend it:

$ codemetapy -O codemeta.json codemeta.json somepackage

or even:

$ codemetapy -O codemeta.json codemeta.json codemeta2.json codemeta3.json

This makes use of an important characteristic of codemetapy which is composition. When you specify multiple input sources, they will be interpreted as referring to the same resource. Properties (on schema:SoftwareSourceCode) in the later resources will overwrite earlier properties. So if codemeta3.json specifies authors, all authors that were specified in codemeta2.json are lost rather than merged and the end result will have the authors from codemeta3.json. However, if codemeta2.json has a property that was not in codemeta3.json, say deveopmentStatus, then that will make it to the end rsult. In other words, the latest source always takes precedence. Any non-overlapping properties will be be merged. This functionality is heavily relied on by the higher-level tool codemeta-harvester.

If you want to start from scratch and build using command line parameters, use /dev/null as input, and make sure to pass some identifier and code repository:

$ codemetapy --identifier some-id --codeRepository https://github.com/my/code /dev/null > codemeta.json

This tool can also deal with debian packages by parsing the output of apt show (albeit limited):

$ apt show somepackage | codemetapy -i debian -

Here - represents standard input, which enables you to use piping solutions on a unix shell, -i denotes the input types, you can chain as many as you want. The number of input types specifies must correspond exactly to the number of input sources (the positional arguments).

Some notes on Vocabulary

For codemeta:developmentStatus, codemetapy attempts to assign full repostatus URIs whenever possible For schema:license, full SPDX URIs are used where possible.

Identifiers

We distinguish two types of identifiers, first there is the URI or IRI that identifies RDF resources. It is a globally unique identifier and often looks like a URL.

Codemetapy will assign new URIs for resources if and only if you pass a base URI using --baseuri. Moreover, if you set this, codemetapy will forcibly set URIs over any existing ones, effectively assigning new identifiers. The previous identifier will then be covered via the owl:sameAs property instead. This allows you to ownership of all URIs. Internally, codemetapy will create URIs for everything even if you don't specified a base URI (even for blank nodes), but these URIs are stripped again upon serialisation to JSON-LD.

The second identifier is the schema:identifier, of which there may even be multiple. Codemetapy typically expects such an identifier to be a simple unspaced string holding a name for software. For example, a Python package name would make a good identifier. If this property is present, codemetapy will use it when generating URIs. The schema:identifier property can be contrasted with schema:name, which is the human readable form of the name and may be more elaborate. The identifier is typically also used for other identifiers (such as DOIs, ISBNs, etc), which should come in the following form:

json "identifier:" { "@type": "PropertyValue", "propertyID": "doi", "value": "10.5281/zenodo.6882966" }

But short-hand forms such as doi:10.5281/zenodo.6882966 or as a URL like https://doi.org/10.5281/zenodo.6882966 are also recognised by this library.

Software Types and services

Codemetapy (since 2.0) implements an extension to codemeta that allows linking the software source code to the actual instantiation of the software, with explicit regard for the interface type. This is done via the schema:targetProduct property, which takes as range a schema:SoftwareApplication, schema:WebAPI, schema:WebSite or any of the extra types defined in https://github.com/SoftwareUnderstanding/software_types/ . This was proposed in this issue

This extension is enabled by default and can be disabled by setting the --strict flag.

When you pass codemetapy a URL it will assume this is where the software is run as a service, and attempt to extract metadata from the site and encode is via targetProduct. For example, here we read an existing codemeta.json and extend it with some place where it is instantiated as a service:

$ codemetapy codemeta.json https://example.org/

If served HTML, codemetapy will use your <script> block using application/json+ld if it provides a valid software types (as mentioned above). For other HTML, codemetapy will simply extract some metadata from HTML <meta> elements. Content negotation will be used and the we favour json+ld, json and even yaml and XML over HTML.

(Note: the older Entypoint Extension from before codemetapy 2.0 is now deprecated)

Graph

You can use codemetapy to generate one big knowledge graph expressing multiple codemeta resources using the --graph parameter:

$ codemetapy --graph resource1.json resource2.json

This will produce JSON-LD output with multiple resources in the graph.

Github API

Codemetapy can make use of the Github API to query metdata from GitHub, but this allows only limited anonymous requests before you hit a limit. To allow more requests, please set the environment variable $GITHUB_TOKEN to a personal access token.

GitLab API

Codemetapy can make use of the GitLab API to query metdata from GitLab, but this allows only limited anonymous requests before you hit a limit. To allow more requests, please set the environment variable $GITLAB_TOKEN to a personal access token.

Integration in setup.py

You can integrate codemeta.json generation in your project's setup.py, this will add an extra python setup.py codemeta command that will generate a new metadata file or update an already existing metadata file. Note that this must be run after python setup.py install (or python setup.py develop).

To integrate this, add the following to your project's setup.py:

python try: from codemeta.codemeta import CodeMetaCommand cmdclass={ 'codemeta': CodeMetaCommand, } except ImportError: cmdclass={}

And in your setup() call add the parameter:

python cmdclass=cmdclass

This will ensure your setup.py works in all cases, even if codemetapy is not installed, and that the command will be available if codemetapy is available.

If you want to ship your package with the generated codemeta.json, then simply add a line saying codemeta.json to the file MANIFEST.in in the root of your project.

Acknowledgements

This work is conducted at the KNAW Humanities Cluster's Digital Infrastructure department in the scope of the CLARIAH project (CLARIAH-PLUS, NWO grant 184.034.023) as part of the FAIR Tool Discovery track of the Shared Development Roadmap.

Owner

  • Name: Maarten van Gompel
  • Login: proycon
  • Kind: user
  • Location: Eindhoven, the Netherlands
  • Company: KNAW Humanities Cluster & CLST, Radboud University

Research software engineer - NLP - AI - 🐧 Linux & open-source enthusiast - 🐍 Python/ 🌊C/C++ / 🦀 Rust / 🐚 Shell - 🔐 InfoSec - https://git.sr.ht/~proycon

CodeMeta (codemeta.json)

{
  "@context": [
    "https://w3id.org/codemeta/3.0",
    "http://schema.org",
    "https://w3id.org/software-types",
    "https://w3id.org/software-iodata"
  ],
  "@id": "https://github.com/proycon/codemetapy.git",
  "@type": "SoftwareSourceCode",
  "applicationCategory": [
    "Software Development",
    "https://w3id.org/nwo-research-fields#ComputerScience",
    "https://vocabs.dariah.eu/tadirah/converting"
  ],
  "audience": {
    "@id": "/audience/developers",
    "@type": "Audience",
    "audienceType": "Developers"
  },
  "author": {
    "@id": "https://orcid.org/0000-0002-1046-0006",
    "@type": "Person",
    "affiliation": {
      "@id": "https://huc.knaw.nl"
    },
    "email": "proycon@anaproy.nl",
    "familyName": "van Gompel",
    "givenName": "Maarten",
    "url": "https://proycon.anaproy.nl"
  },
  "codeRepository": "https://github.com/proycon/codemetapy.git",
  "contIntegration": "https://github.com/proycon/codemetapy/actions/",
  "contributor": {
    "@id": "https://orcid.org/0000-0002-1046-0006",
    "@type": "Person",
    "affiliation": {
      "@id": "https://huc.knaw.nl"
    },
    "email": "proycon@anaproy.nl",
    "familyName": "van Gompel",
    "givenName": "Maarten",
    "url": "https://proycon.anaproy.nl"
  },
  "dateCreated": "2018-04-16T10:54:22Z+0200",
  "dateModified": "2025-04-24T13:37:47Z+0200",
  "description": "Codemetapy is a command-line tool and python library to work with the codemeta software metadata standard. Codemeta builds upon schema.org and defines a vocabulary for describing software source code. It maps various existing metadata standards to a unified vocabulary. Codemetapy allows you to generate codemeta from various sources.",
  "developmentStatus": [
    "https://www.repostatus.org/#active",
    "https://w3id.org/research-technology-readiness-levels#Level9Proven"
  ],
  "funding": [
    {
      "@type": "Grant",
      "name": "CLARIAH-PLUS (NWO grant 184.034.023)",
      "funder": {
        "@type": "Organization",
        "name": "NWO",
        "url": "https://www.nwo.nl"
      }
    }
  ],
  "identifier": "codemetapy",
  "issueTracker": "https://github.com/proycon/codemetapy/issues",
  "keywords": [
    "metadata",
    "scientific",
    "metadata-extractor",
    "codemeta",
    "rdf",
    "software metadata",
    "schema.org",
    "linked data"
  ],
  "license": "https://spdx.org/licenses/GPL-3.0-only",
  "maintainer": {
    "@id": "https://orcid.org/0000-0002-1046-0006",
    "@type": "Person",
    "affiliation": {
      "@id": "https://huc.knaw.nl"
    },
    "email": "proycon@anaproy.nl",
    "familyName": "van Gompel",
    "givenName": "Maarten",
    "url": "https://proycon.anaproy.nl"
  },
  "name": "CodeMetaPy",
  "operatingSystem": [
    "Linux",
    "BSD",
    "macOS"
  ],
  "producer": {
    "@id": "https://huc.knaw.nl",
    "@type": "Organization",
    "name": "KNAW Humanities Cluster",
    "url": "https://huc.knaw.nl",
    "parentOrganization": {
      "@id": "https://knaw.nl",
      "@type": "Organization",
      "name": "KNAW",
      "url": "https://knaw.nl",
      "location": {
        "@type": "Place",
        "name": "Amsterdam"
      }
    }
  },
  "readme": "https://github.com/proycon/codemetapy/blob/README.rst",
  "runtimePlatform": [
    "Python 3.6",
    "Python 3.8",
    "Python 3.9",
    "Python 3",
    "Python 3.7",
    "Python 3.10",
    "Python 3.11",
    "Python 3.12"
  ],
  "softwareHelp": [
    {
      "@type": "WebSite",
      "name": "README: Installation and usage instructions",
      "url": "https://github.com/proycon/codemetapy/blob/master/README.md",
      "description": "Installation and usage instructions"
    },
    {
      "@type": "WebSite",
      "name": "The CodeMeta Project",
      "url": "https://codemeta.github.io/",
      "description": "Describes the underlying software metadata model (not specific to codemetapy)"
    }
  ],
  "softwareRequirements": [
    {
      "@id": "/dependency/nameparser",
      "@type": "SoftwareApplication",
      "identifier": "nameparser",
      "name": "nameparser",
      "runtimePlatform": "Python 3"
    },
    {
      "@id": "/dependency/pyyaml",
      "@type": "SoftwareApplication",
      "identifier": "pyyaml",
      "name": "pyyaml",
      "runtimePlatform": "Python 3"
    },
    {
      "@id": "/dependency/beautifulsoup4",
      "@type": "SoftwareApplication",
      "identifier": "BeautifulSoup4",
      "name": "BeautifulSoup4",
      "runtimePlatform": "Python 3"
    },
    {
      "@id": "/dependency/importlib-metadata",
      "@type": "SoftwareApplication",
      "identifier": "importlib-metadata",
      "name": "importlib-metadata",
      "runtimePlatform": "Python 3"
    },
    {
      "@id": "/dependency/jinja2",
      "@type": "SoftwareApplication",
      "identifier": "Jinja2",
      "name": "Jinja2",
      "runtimePlatform": "Python 3"
    },
    {
      "@id": "/dependency/requests",
      "@type": "SoftwareApplication",
      "identifier": "requests",
      "name": "requests",
      "runtimePlatform": "Python 3"
    },
    {
      "@id": "/dependency/lxml",
      "@type": "SoftwareApplication",
      "identifier": "lxml",
      "name": "lxml",
      "runtimePlatform": "Python 3"
    },
    {
      "@id": "/dependency/rdflib-ge-6.1.1",
      "@type": "SoftwareApplication",
      "identifier": "rdflib",
      "name": "rdflib",
      "runtimePlatform": "Python 3",
      "version": ">= 6.1.1"
    }
  ],
  "isSourceCodeOf": [
    {
      "@id": "/commandlineapplication/codemetapy",
      "@type": "CommandLineApplication",
      "executableName": "codemetapy",
      "name": "codemetapy",
      "runtimePlatform": "Python 3"
    },
    {
      "@id": "/softwarelibrary/codemetapy",
      "@type": "SoftwareLibrary",
      "name": "codemeta",
      "runtimePlatform": "Python 3"
    }
  ],
  "url": "https://github.com/proycon/codemetapy.git",
  "version": "3.0.3"
}

GitHub Events

Total
  • Create event: 1
  • Release event: 2
  • Issues event: 6
  • Watch event: 3
  • Issue comment event: 11
  • Push event: 14
  • Pull request event: 4
  • Fork event: 1
Last Year
  • Create event: 1
  • Release event: 2
  • Issues event: 6
  • Watch event: 3
  • Issue comment event: 11
  • Push event: 14
  • Pull request event: 4
  • Fork event: 1

Committers

Last synced: almost 2 years ago

All Time
  • Total Commits: 633
  • Total Committers: 2
  • Avg Commits per committer: 316.5
  • Development Distribution Score (DDS): 0.047
Past Year
  • Commits: 77
  • Committers: 1
  • Avg Commits per committer: 77.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Maarten van Gompel p****n@a****l 603
mlongobardo-gituname 30
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 53
  • Total pull requests: 5
  • Average time to close issues: 3 months
  • Average time to close pull requests: 7 days
  • Total issue authors: 13
  • Total pull request authors: 2
  • Average comments per issue: 1.81
  • Average comments per pull request: 2.0
  • Merged pull requests: 5
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 4
  • Pull requests: 4
  • Average time to close issues: 14 days
  • Average time to close pull requests: about 6 hours
  • Issue authors: 3
  • Pull request authors: 1
  • Average comments per issue: 1.75
  • Average comments per pull request: 0.5
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • proycon (21)
  • apirogov (10)
  • broeder-j (7)
  • matthewfeickert (5)
  • willynilly (2)
  • rettinghaus (1)
  • tkphd (1)
  • jrvosse (1)
  • nealmcb (1)
  • mahoromax (1)
  • danielskatz (1)
  • rlzijdeman (1)
  • mustafasoylu (1)
Pull Request Authors
  • willynilly (4)
  • xmichele (1)
Top Labels
Issue Labels
enhancement (22) ready (20) bug (14) wontfix (1) low priority (1) invalid (1) question (1)
Pull Request Labels
enhancement (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 240 last-month
  • Total dependent packages: 5
  • Total dependent repositories: 1
  • Total versions: 35
  • Total maintainers: 1
pypi.org: codemetapy

Generate and manage CodeMeta software metadata

  • Versions: 35
  • Dependent Packages: 5
  • Dependent Repositories: 1
  • Downloads: 240 Last month
Rankings
Dependent packages count: 7.3%
Downloads: 9.4%
Stargazers count: 12.9%
Average: 13.7%
Forks count: 16.9%
Dependent repos count: 22.1%
Maintainers (1)
Last synced: 4 months ago

Dependencies

setup.py pypi
  • BeautifulSoup4 *
  • Jinja2 *
  • importlib_metadata *
  • lxml *
  • nameparser *
  • pep517 *
  • pyyaml *
  • rdflib *
  • requests *
tests/fusus/setup.py pypi
  • PyMuPDF *
  • ipython *
  • kraken ==3.0.6
  • numpy *
  • opencv-contrib-python *
  • pdoc3 *
  • pillow *
  • python-Levenshtein *
  • pyyaml >=5.3
  • text-fabric >=8.4.7
.github/workflows/codemetapy.yml actions
  • Gottox/irc-message-action v1 composite
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • mad9000/actions-find-and-replace-string 2 composite