substrateminer: A Python package to investigate protein substrate repertoires

substrateminer: A Python package to investigate protein substrate repertoires - Published in JOSS (2025)

https://github.com/dpp4researchgroup/substrateminer

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

bioinformatics proteomics substrate
Last synced: 5 months ago · JSON representation

Repository

A python package to discover enzyme substrates based on sequence consensus

Basic Info
Statistics
  • Stars: 1
  • Watchers: 0
  • Forks: 2
  • Open Issues: 16
  • Releases: 0
Topics
bioinformatics proteomics substrate
Created over 1 year ago · Last pushed 11 months ago
Metadata Files
Readme License

Readme.md

substrateminer

Header Image

Overview

status

substrateminer is a python package that offer a suite of discovery tools to investigate enzyme substrate repertorie based on sequence cleavage consensus.

CI/CD Status

UnitTest Status

| Branch | main | develop | features | |:-------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Linux | substrateminer-main | substrateminer-dev | substrateminer-features | | macOS | substrateminer-main | substrateminer-dev | substrateminer-features |

Documentation Status

| Page | Status | |:---------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | substrateminer | pages-build-deployment |

TL;DR

Installation Guide

Due to complex dependency requirements of substrateminer, conda is recommended here. Please ensure that you have conda installed on your system. If you do not have conda installed, please refer to the Miniconda installation guide.

Firstly, download a copy of the latest release of substrateminer from the GitHub Releases page to a chosen local path before setup the required conda environment as instructed below. substrateminer supports both Linux and MacOS platforms, please choose the appropriate environment file based on your platform as follows:

```

For Linux users

$ cd substrateminer $ conda env create -f environment-Linux.yml # Linux support $ conda activate substrateminer ```

OR

```

For MacOS users

$ cd substrateminer $ conda env create -f environment-macOS.yml # MacOS support $ conda activate substrateminer ```

Then install substrateminer package as below:

$ pip install . # utility mode $ pip install -e . # debug mode

Testing the installation with the help command

$ substrateminer --help

Requirements

substrateminer requires the following dependencies: - Python >= 3.10.11 - BioPython >= 1.84 - Numpy - SciPy - Pandas - matplotlib <= 3.6.0 - weblogo - requests - click - PyYAML - pillow Optional binary dependencies for multiple sequence alignment: - Clustal Omega >= 1.2.4 - MUSCLE >= 5.1 - MAFFT >= 7.475

Quick Start Guide

substrateminer provides three main categories of functionalities namely motif, miner, and pathfinder. substrateminer also integrates multi-sequence alignment tools to facilitate the analysis.

``` Usage: substrateminer [OPTIONS] COMMAND [ARGS]...

A suite to tools to discover enzyme substrates based on sequence consensus.

Main entry point for substrateminer CLI.

Options: --help Show this message and exit.

Commands: miner Filter amino acid sequences from a reference file. motif Interface for consensus sequence determination. msa Interface for multi-sequence alignments pathfinder Find the pathological/molecular path for a substrate. ```

Usage Examples

Motif

Consensus can be derived from a collection of sequences using the consensus subcommand.

$ substrateminer motif consensus -i unittests/data/msa_align.fas -O .

The conservation of the conseqnsus can be visualised using the weblogo subcommand.

substrateminer motif weblogo -i unittests/data/weblogo_align.fas -o weblogo_output.png

Miner

To identify potential substrates (degradome) from a collection of sequences (this is commonly proteom of a species), the miner subcommand can be used.

$ substrateminer miner --referencefile unittests/data/test-uniprot.txt --config unittests/test-config.yml --filtermode size --outmode inline

Pathfinder

To identify the molecular path for a substrate, the pathfinder subcommand can be used.

$ substrateminer pathfinder -i unittests/data/uniprot_id_short.txt -o path.txt -a

Construct a customised workflow

substrateminer is designed to provide a suite of methods to investigate enzyme substrate repertorie based on sequence cleavage consensus. The package is modular and extensible and can be used to design custom workflows. The following demonstrates a typical workflow:

Design Workflow

Methods and Functions Overview

Multiple Sequence Alignment (MSA)

``` usage: msa.py [-h] -i INPUT -o OUTPUT -m METHOD

Perform multiple sequence alignment

options: -h, --help show this help message and exit -i INPUT, --input INPUT Input file path -o OUTPUT, --output OUTPUT Output file path -m METHOD, --method METHOD Alignment method (clustalomega, mafft, muscle) ```

Motif

``` usage: consensus.py [-h] {consensus,weblogo} ...

Determine consensus sequence from a multiple sequence alignment (MSA) and draw summative plots.

positional arguments: {consensus,weblogo} consensus Determine consensus sequence from a multiple sequence alignment (MSA) and draw sequence entropy and gap frequency plots. weblogo Generate a weblogo image from an input file.

options: -h, --help show this help message and exit ```

Consensus

``` usage: consensus.py consensus [-h] -i Input alignment file in FASTA format. [-o Output gap stripped FASTA file name] [-O Output directory] [-c Method for removing insertions] [-t Gap frequency threshold] [-f]

options: -h, --help show this help message and exit -i Input alignment file in FASTA format. Filename for FASTA alignment -o Output gap stripped FASTA file name Output FASTA filename. If not given will use name of input FASTA file as template to name output files. -O Output directory Output directory for all output files. If not given will use directory of input FASTA file. -c Method for removing insertions Desired method for removing insertions. 1 = Positions with gap frequencies < threshold (0.5 default, change with -t flag). 2 = Positions with residue as most frequent character. 3 = Positions with residues in a specific sequence. If not given will ask for user input upon running script. See README for further explantion of methods. -t Gap frequency threshold Gap frequecy threshold to define a consensus positions. Only valid for Option 1 for removing insertions. Must be a value between 0 and 1 (default: 0.5) -f Include flag to prevent saving images of MSA data analysis. ```

Weblogo

``` usage: consensus.py weblogo [-h] -i INPUTFILE -o FILENAME [-s RESOLUTION] [-F FILETYPE]

options: -h, --help show this help message and exit -i INPUTFILE Input alignment file/self-aligned file in FASTA/text format. -o FILENAME Output filename for weblogo image. -s RESOLUTION Resolution of the weblogo image. -F FILETYPE File type of the output image. ```

Miner

``` Usage: substrateminer miner [OPTIONS]

Filter amino acid sequences from a reference file.

Options: --referencefile TEXT The reference file containing sequences. [required] --referencetype [swiss|genbank|embl] The type of reference file. Default is swiss. --filtermode [size|motif|loc] The mode of filtering. Default is size. [required] --config TEXT The path to the configuration file. --stats Generate statistics for the filtered sequences. --outmode [all|file|inline] The output mode for the filtered sequences. --outputfilename TEXT The output file name for the filtered sequences. --outputfiletype [fasta|text|txt|genbank|swiss] The output file type for the filtered sequences. --help Show this message and exit. ```

Pathfinder

``` Usage: substrateminer pathfinder [OPTIONS]

Find the pathological/molecular path for a substrate.

Options: -i, --input PATH Input file path -o, --output TEXT Output file path -a, --api Use KEGG API to retrieve pathways and diseases -u, --uniprots TEXT UniProt ID for a protein, comma-separated for multiple IDs (e.g., P12345,Q67890) or space-separated for multiple IDs (e.g., "P12345 Q67890") -g, --orgs TEXT Organism code for the KEGG API (default: hsa) --help Show this message and exit. ```

GitHub Actions CI manual

UnitTests Sequence

CI/CD is carried out with GitHub Actions workflow and consists following steps:

  • Checks out the repository.
  • Sets up Python.
  • Caches the conda environment.
  • Installs Miniconda and creates the conda environment.
  • Runs CLI tests with pip.
  • Runs unit tests with unittest.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Issue/Bug Reporting

Any issues you encounter with substrateminer, please report by open a bug issue and provide as much details as possible, including examples, error messages and environment setup will be highly appreciated.

Contributing

We welcome contributions to substrateminer. To contribute, please follow the steps below: 1. Fork the repository to your designated location. 2. Create a new branch with a descriptive name for your proposed feature and/or bugfix. 3. Make your changes and commit them with clear and concise commit messages. 4. Push your changes to your forked repository. 5. Submit a pull request.

*Major changes:* please open an issue first to discuss what you would like to change.

Owner

  • Name: DPP4ResearchGroup
  • Login: DPP4ResearchGroup
  • Kind: organization
  • Location: Adelaide, Australia

DPP4 Research Group @ Flinders University

JOSS Publication

substrateminer: A Python package to investigate protein substrate repertoires
Published
September 12, 2025
Volume 10, Issue 113, Page 8266
Authors
Robert Qiao ORCID
School of Biological Sciences, Flinders University, Bedford Park, SA 5042, Australia, Digital Research Services, Flinders University, Bedford Park, SA 5042, Australia
Editor
Charlotte Soneson ORCID
Tags
visualisation enzyme proteolysis substrates bioinformatics

GitHub Events

Total
  • Pull request event: 1
  • Pull request review comment event: 2
  • Pull request review event: 2
  • Fork event: 2
  • Create event: 1
Last Year
  • Pull request event: 1
  • Pull request review comment event: 2
  • Pull request review event: 2
  • Fork event: 2
  • Create event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • csoneson (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

.github/workflows/draft-pdf.yml actions
  • actions/checkout v4 composite
  • actions/upload-artifact v4 composite
  • openjournals/openjournals-draft-action master composite
.github/workflows/jekyll-ci.yml actions
  • actions/checkout v4 composite
  • actions/configure-pages v5 composite
  • actions/deploy-pages v4 composite
  • actions/jekyll-build-pages v1 composite
  • actions/upload-pages-artifact v3 composite
.github/workflows/python-ci.yml actions
  • actions/cache v3 composite
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
  • conda-incubator/setup-miniconda v3 composite
.github/workflows/substrateminer-mac.yml actions
  • actions/cache v3 composite
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
  • conda-incubator/setup-miniconda v3 composite
pyproject.toml pypi
  • PyYAML *
  • biopython >=1.84
  • certifi *
  • click *
  • fonttools *
  • matplotlib *
  • numpy *
  • pandas *
  • pillow *
  • requests *
  • scipy *
  • weblogo *
requirements.txt pypi
  • PyYAML ==6.0.2
  • biopython >=1.84
  • certifi ==2024.8.30
  • charset-normalizer ==3.3.2
  • click ==8.1.7
  • contourpy ==1.3.0
  • cycler ==0.12.1
  • fonttools ==4.54.1
  • idna ==3.10
  • kiwisolver ==1.4.7
  • matplotlib <3.8.0,>=3.4.3
  • mkl-service ==2.4.0
  • modules ==1.0.0
  • numpy ==1.26.4
  • packaging ==24.1
  • pillow ==10.4.0
  • pyparsing ==3.1.4
  • python-dateutil ==2.9.0
  • requests ==2.32.3
  • six ==1.16.0
  • urllib3 ==2.2.3
setup.py pypi