substrateminer: A Python package to investigate protein substrate repertoires

substrateminer: A Python package to investigate protein substrate repertoires - Published in JOSS (2025)

https://github.com/dpp4researchgroup/substrateminer

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in JOSS metadata
✓
Academic publication links
Links to: joss.theoj.org
○
Academic email domains
○
Institutional organization owner
✓
JOSS paper metadata
Published in Journal of Open Source Software

Keywords

bioinformatics proteomics substrate

Last synced: 5 months ago · JSON representation

Repository

A python package to discover enzyme substrates based on sequence consensus

Basic Info

Host: GitHub
Owner: DPP4ResearchGroup
License: mit
Language: Python
Default Branch: develop
Homepage: https://bit.ly/substrateminer-manual
Size: 5.15 MB

Statistics

Stars: 1
Watchers: 0
Forks: 2
Open Issues: 16
Releases: 0

Topics

bioinformatics proteomics substrate

Created over 1 year ago · Last pushed 11 months ago

Metadata Files

Readme License

substrateminer

Overview

substrateminer is a python package that offer a suite of discovery tools to investigate enzyme substrate repertorie based on sequence cleavage consensus.

CI/CD Status

UnitTest Status

| Branch | main | develop | features | |:-------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Linux | | | | | macOS | | | |

Documentation Status

| Page | Status | |:---------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | substrateminer | |

TL;DR

Installation Guide

Due to complex dependency requirements of substrateminer, conda is recommended here. Please ensure that you have conda installed on your system. If you do not have conda installed, please refer to the Miniconda installation guide.

Firstly, download a copy of the latest release of substrateminer from the GitHub Releases page to a chosen local path before setup the required conda environment as instructed below. substrateminer supports both Linux and MacOS platforms, please choose the appropriate environment file based on your platform as follows:

```

For Linux users

$ cd substrateminer $ conda env create -f environment-Linux.yml # Linux support $ conda activate substrateminer ```

```

For MacOS users

$ cd substrateminer $ conda env create -f environment-macOS.yml # MacOS support $ conda activate substrateminer ```

Then install substrateminer package as below:

$ pip install . # utility mode $ pip install -e . # debug mode

Testing the installation with the help command

$ substrateminer --help

Requirements

substrateminer requires the following dependencies: - Python >= 3.10.11 - BioPython >= 1.84 - Numpy - SciPy - Pandas - matplotlib <= 3.6.0 - weblogo - requests - click - PyYAML - pillow Optional binary dependencies for multiple sequence alignment: - Clustal Omega >= 1.2.4 - MUSCLE >= 5.1 - MAFFT >= 7.475

Quick Start Guide

substrateminer provides three main categories of functionalities namely motif, miner, and pathfinder. substrateminer also integrates multi-sequence alignment tools to facilitate the analysis.

``` Usage: substrateminer [OPTIONS] COMMAND [ARGS]...

A suite to tools to discover enzyme substrates based on sequence consensus.

Main entry point for substrateminer CLI.

Options: --help Show this message and exit.

Commands: miner Filter amino acid sequences from a reference file. motif Interface for consensus sequence determination. msa Interface for multi-sequence alignments pathfinder Find the pathological/molecular path for a substrate. ```

Usage Examples

Motif

Consensus can be derived from a collection of sequences using the consensus subcommand.

$ substrateminer motif consensus -i unittests/data/msa_align.fas -O .

The conservation of the conseqnsus can be visualised using the weblogo subcommand.

substrateminer motif weblogo -i unittests/data/weblogo_align.fas -o weblogo_output.png

Miner

To identify potential substrates (degradome) from a collection of sequences (this is commonly proteom of a species), the miner subcommand can be used.

$ substrateminer miner --referencefile unittests/data/test-uniprot.txt --config unittests/test-config.yml --filtermode size --outmode inline

Pathfinder

To identify the molecular path for a substrate, the pathfinder subcommand can be used.

$ substrateminer pathfinder -i unittests/data/uniprot_id_short.txt -o path.txt -a

Construct a customised workflow

substrateminer is designed to provide a suite of methods to investigate enzyme substrate repertorie based on sequence cleavage consensus. The package is modular and extensible and can be used to design custom workflows. The following demonstrates a typical workflow:

Design Workflow

Methods and Functions Overview

Multiple Sequence Alignment (MSA)

``` usage: msa.py [-h] -i INPUT -o OUTPUT -m METHOD

Perform multiple sequence alignment

options: -h, --help show this help message and exit -i INPUT, --input INPUT Input file path -o OUTPUT, --output OUTPUT Output file path -m METHOD, --method METHOD Alignment method (clustalomega, mafft, muscle) ```

Motif

``` usage: consensus.py [-h] {consensus,weblogo} ...

Determine consensus sequence from a multiple sequence alignment (MSA) and draw summative plots.

positional arguments: {consensus,weblogo} consensus Determine consensus sequence from a multiple sequence alignment (MSA) and draw sequence entropy and gap frequency plots. weblogo Generate a weblogo image from an input file.

options: -h, --help show this help message and exit ```

Consensus

``` usage: consensus.py consensus [-h] -i Input alignment file in FASTA format. [-o Output gap stripped FASTA file name] [-O Output directory] [-c Method for removing insertions] [-t Gap frequency threshold] [-f]

options: -h, --help show this help message and exit -i Input alignment file in FASTA format. Filename for FASTA alignment -o Output gap stripped FASTA file name Output FASTA filename. If not given will use name of input FASTA file as template to name output files. -O Output directory Output directory for all output files. If not given will use directory of input FASTA file. -c Method for removing insertions Desired method for removing insertions. 1 = Positions with gap frequencies < threshold (0.5 default, change with -t flag). 2 = Positions with residue as most frequent character. 3 = Positions with residues in a specific sequence. If not given will ask for user input upon running script. See README for further explantion of methods. -t Gap frequency threshold Gap frequecy threshold to define a consensus positions. Only valid for Option 1 for removing insertions. Must be a value between 0 and 1 (default: 0.5) -f Include flag to prevent saving images of MSA data analysis. ```

Weblogo

``` usage: consensus.py weblogo [-h] -i INPUTFILE -o FILENAME [-s RESOLUTION] [-F FILETYPE]

options: -h, --help show this help message and exit -i INPUTFILE Input alignment file/self-aligned file in FASTA/text format. -o FILENAME Output filename for weblogo image. -s RESOLUTION Resolution of the weblogo image. -F FILETYPE File type of the output image. ```

Miner

``` Usage: substrateminer miner [OPTIONS]

Filter amino acid sequences from a reference file.

Options: --referencefile TEXT The reference file containing sequences. [required] --referencetype [swiss|genbank|embl] The type of reference file. Default is swiss. --filtermode [size|motif|loc] The mode of filtering. Default is size. [required] --config TEXT The path to the configuration file. --stats Generate statistics for the filtered sequences. --outmode [all|file|inline] The output mode for the filtered sequences. --outputfilename TEXT The output file name for the filtered sequences. --outputfiletype [fasta|text|txt|genbank|swiss] The output file type for the filtered sequences. --help Show this message and exit. ```

Pathfinder

``` Usage: substrateminer pathfinder [OPTIONS]

Find the pathological/molecular path for a substrate.

Options: -i, --input PATH Input file path -o, --output TEXT Output file path -a, --api Use KEGG API to retrieve pathways and diseases -u, --uniprots TEXT UniProt ID for a protein, comma-separated for multiple IDs (e.g., P12345,Q67890) or space-separated for multiple IDs (e.g., "P12345 Q67890") -g, --orgs TEXT Organism code for the KEGG API (default: hsa) --help Show this message and exit. ```

GitHub Actions CI manual

UnitTests Sequence

CI/CD is carried out with GitHub Actions workflow and consists following steps:

Checks out the repository.
Sets up Python.
Caches the conda environment.
Installs Miniconda and creates the conda environment.
Runs CLI tests with pip.
Runs unit tests with unittest.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Issue/Bug Reporting

Any issues you encounter with substrateminer, please report by open a bug issue and provide as much details as possible, including examples, error messages and environment setup will be highly appreciated.

Contributing

We welcome contributions to substrateminer. To contribute, please follow the steps below: 1. Fork the repository to your designated location. 2. Create a new branch with a descriptive name for your proposed feature and/or bugfix. 3. Make your changes and commit them with clear and concise commit messages. 4. Push your changes to your forked repository. 5. Submit a pull request.

*Major changes:* please open an issue first to discuss what you would like to change.

Owner

Name: DPP4ResearchGroup
Login: DPP4ResearchGroup
Kind: organization
Location: Adelaide, Australia

Website: https://dpp4research.page.link/home
Repositories: 2
Profile: https://github.com/DPP4ResearchGroup

DPP4 Research Group @ Flinders University

JOSS Publication

substrateminer: A Python package to investigate protein substrate repertoires

Published

September 12, 2025

DOI

10.21105/joss.08266

Volume 10, Issue 113, Page 8266

Authors

Robert Qiao

School of Biological Sciences, Flinders University, Bedford Park, SA 5042, Australia, Digital Research Services, Flinders University, Bedford Park, SA 5042, Australia

Editor

Charlotte Soneson

GitHub Events

Total

Pull request event: 1
Pull request review comment event: 2
Pull request review event: 2
Fork event: 2
Create event: 1

Last Year

Pull request event: 1
Pull request review comment event: 2
Pull request review event: 2
Fork event: 2
Create event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 0
Total pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

csoneson (1)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

.github/workflows/draft-pdf.yml actions

actions/checkout v4 composite
actions/upload-artifact v4 composite
openjournals/openjournals-draft-action master composite

.github/workflows/jekyll-ci.yml actions

actions/checkout v4 composite
actions/configure-pages v5 composite
actions/deploy-pages v4 composite
actions/jekyll-build-pages v1 composite
actions/upload-pages-artifact v3 composite

.github/workflows/python-ci.yml actions

actions/cache v3 composite
actions/checkout v4 composite
actions/setup-python v5 composite
conda-incubator/setup-miniconda v3 composite

.github/workflows/substrateminer-mac.yml actions

actions/cache v3 composite
actions/checkout v4 composite
actions/setup-python v5 composite
conda-incubator/setup-miniconda v3 composite

pyproject.toml pypi

PyYAML *
biopython >=1.84
certifi *
click *
fonttools *
matplotlib *
numpy *
pandas *
pillow *
requests *
scipy *
weblogo *

requirements.txt pypi

PyYAML ==6.0.2
biopython >=1.84
certifi ==2024.8.30
charset-normalizer ==3.3.2
click ==8.1.7
contourpy ==1.3.0
cycler ==0.12.1
fonttools ==4.54.1
idna ==3.10
kiwisolver ==1.4.7
matplotlib <3.8.0,>=3.4.3
mkl-service ==2.4.0
modules ==1.0.0
numpy ==1.26.4
packaging ==24.1
pillow ==10.4.0
pyparsing ==3.1.4
python-dateutil ==2.9.0
requests ==2.32.3
six ==1.16.0
urllib3 ==2.2.3

setup.py pypi

substrateminer: A Python package to investigate protein substrate repertoires

Science Score: 93.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

Readme.md

substrateminer

Overview

CI/CD Status

UnitTest Status

Documentation Status

TL;DR

Installation Guide

For Linux users

For MacOS users

Requirements

Quick Start Guide

Usage Examples

Construct a customised workflow

Methods and Functions Overview

Multiple Sequence Alignment (MSA)

Motif

Consensus

Weblogo

Miner

Pathfinder

GitHub Actions CI manual

UnitTests Sequence

License

Issue/Bug Reporting

Contributing

Owner

JOSS Publication

substrateminer: A Python package to investigate protein substrate repertoires

Authors

Editor

Tags

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies