phi4pipeline

Pipeline for creating release formats of PHI-base 4 datasets

https://github.com/phi-base/phi4pipeline

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.9%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Pipeline for creating release formats of PHI-base 4 datasets

Basic Info
  • Host: GitHub
  • Owner: PHI-base
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 120 KB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created almost 2 years ago · Last pushed 10 months ago
Metadata Files
Readme License Citation

README.md

PHI-base 4 pipeline

DOI

Python package and command-line application for releasing the PHI-base 4 dataset in Excel and CSV formats, and preparing metadata and files for the Zenodo repository.

Installation

Install the latest release from GitHub:

python -m pip install 'phi4pipeline@git+https://github.com/PHI-base/phi4pipeline.git@1.0.0'

Or install the latest commit on the main branch:

python -m pip install 'phi4pipeline@git+https://github.com/PHI-base/phi4pipeline.git@main'

Usage

Excel release format

To generate a cleaned and validated version of the spreadsheet that contains the PHI-base 4 dataset, use the following command:

python -m phi4pipeline excel -o FILE SPREADSHEET

Explanation of arguments:

  • -o, --output: the output path for the processed spreadsheet file.

  • SPREADSHEET: the path to the spreadsheet containing the PHI-base 4 dataset.

Zenodo release format

To generate release files to be uploaded to Zenodo, use the following command:

python -m phi4pipeline zenodo --contributors PATH --doi YEAR --fasta PATH -o DIR --year YEAR SPREADSHEET

Explanation of arguments:

  • --contributors: the path to the CSV file that contains information about the authors and contributors of the dataset. See the 'Contributors file' section below for more information.

  • --doi: the DOI name for the dataset in prefix/suffix form (for example: 10.5281/zenodo.5356870). The DOI name must not be prefixed with 'doi:' or 'https://doi.org/'. The DOI is usually generated when preparing a release on Zenodo.

  • --fasta: the path to the FASTA file that accompanies the PHI-base dataset.

  • -o, --out_dir: the output directory for the release files that will be uploaded to Zenodo.

  • --year: the year of publication of the dataset. This is the first date of publication anywhere online (for example, year of publication on the PHI-base website), not necessarily the year of publication on Zenodo.

  • SPREADSHEET: the path to the spreadsheet containing the PHI-base 4 dataset.

Contributors file

The Contributors file is a CSV file that contains information about the people (authors and contributors) related to the dataset. It contains the following columns, in the following order:

  • name: the full name of the person.

  • orcid: the Open Researcher and Contributor ID (ORCID) for the person, without any URL prefix. For example: 0000-0002-1825-0097

  • email: the email address for the person.

  • role_readme: the role of the person, as shown in the README file included with the dataset.

  • role_frictionless: the role of the author or contributor, as shown in the Frictionless Data Package metadata file (datapackage.json) included with the dataset. The recommended role values can be found in the Data Package specification.

  • affiliation: the organization that the author or contributor is affiliated with.

  • is_author: TRUE if the person is an author; FALSE if the person is a contributor. Controls which table the person's details appear in the README file.

  • is_private: TRUE if the person's personal details should be hidden from the README and datapackage.json files; otherwise FALSE. Defaults to FALSE. This is included to comply with data protection requirements.

License

phi4pipeline is distributed under the terms of the MIT license.

Citation

Please use the following citation for this software:

Seager, J. (2024). PHI-base 4 pipeline (1.0.0). Zenodo. https://doi.org/10.5281/zenodo.13773740

See the CITATION.cff file in this repository for more information.

Owner

  • Name: PHI-base
  • Login: PHI-base
  • Kind: organization
  • Email: contact@phi-base.org
  • Location: Rothamsted Research

Pathogen-Host Interaction Database

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: PHI-base 4 pipeline
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: James
    family-names: Seager
    email: james.seager@rothamsted.ac.uk
    affiliation: Rothamsted Research
    orcid: 'https://orcid.org/0000-0001-7487-610X'
identifiers:
  - type: doi
    value: 10.5281/zenodo.13773740
repository-code: 'https://github.com/PHI-base/phi4pipeline'
abstract: >-
  Python package and command-line application for releasing
  the PHI-base 4 dataset in Excel and CSV formats, and
  preparing metadata and files for the Zenodo repository.
keywords:
  - database
  - host-pathogen interactions
license: MIT
commit: 152293611cd0269f4cb71caba47e7f6ecd71e5f3
version: 1.0.0
date-released: '2024-09-17'

GitHub Events

Total
  • Push event: 1
Last Year
  • Push event: 1

Dependencies

pyproject.toml pypi
  • markdown ==3.7
  • pandas ==2.2.2
  • tabulate *