substrateminer: A Python package to investigate protein substrate repertoires
substrateminer: A Python package to investigate protein substrate repertoires - Published in JOSS (2025)
Science Score: 93.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org -
○Academic email domains
-
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Keywords
Repository
A python package to discover enzyme substrates based on sequence consensus
Basic Info
- Host: GitHub
- Owner: DPP4ResearchGroup
- License: mit
- Language: Python
- Default Branch: develop
- Homepage: https://bit.ly/substrateminer-manual
- Size: 5.15 MB
Statistics
- Stars: 1
- Watchers: 0
- Forks: 2
- Open Issues: 16
- Releases: 0
Topics
Metadata Files
Readme.md
substrateminer

Overview
substrateminer is a python package that offer a suite of discovery tools to investigate enzyme substrate repertorie based on sequence cleavage consensus.
CI/CD Status
UnitTest Status
| Branch | main | develop | features |
|:-------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Linux | |
|
|
| macOS |
|
|
|
Documentation Status
| Page | Status |
|:---------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| substrateminer | |
TL;DR
Installation Guide
Due to complex dependency requirements of substrateminer, conda is recommended here. Please ensure that you have conda installed on your system. If you do not have conda installed, please refer to the Miniconda installation guide.
Firstly, download a copy of the latest release of substrateminer from the GitHub Releases page to a chosen local path before setup the required conda environment as instructed below. substrateminer supports both Linux and MacOS platforms, please choose the appropriate environment file based on your platform as follows:
```
For Linux users
$ cd substrateminer $ conda env create -f environment-Linux.yml # Linux support $ conda activate substrateminer ```
OR
```
For MacOS users
$ cd substrateminer $ conda env create -f environment-macOS.yml # MacOS support $ conda activate substrateminer ```
Then install substrateminer package as below:
$ pip install . # utility mode
$ pip install -e . # debug mode
Testing the installation with the help command
$ substrateminer --help
Requirements
substrateminer requires the following dependencies:
- Python >= 3.10.11
- BioPython >= 1.84
- Numpy
- SciPy
- Pandas
- matplotlib <= 3.6.0
- weblogo
- requests
- click
- PyYAML
- pillow
Optional binary dependencies for multiple sequence alignment:
- Clustal Omega >= 1.2.4
- MUSCLE >= 5.1
- MAFFT >= 7.475
Quick Start Guide
substrateminer provides three main categories of functionalities namely motif, miner, and pathfinder. substrateminer also integrates multi-sequence alignment tools to facilitate the analysis.
``` Usage: substrateminer [OPTIONS] COMMAND [ARGS]...
A suite to tools to discover enzyme substrates based on sequence consensus.
Main entry point for substrateminer CLI.
Options: --help Show this message and exit.
Commands: miner Filter amino acid sequences from a reference file. motif Interface for consensus sequence determination. msa Interface for multi-sequence alignments pathfinder Find the pathological/molecular path for a substrate. ```
Usage Examples
Consensus can be derived from a collection of sequences using the consensus subcommand.
$ substrateminer motif consensus -i unittests/data/msa_align.fas -O .
The conservation of the conseqnsus can be visualised using the weblogo subcommand.
substrateminer motif weblogo -i unittests/data/weblogo_align.fas -o weblogo_output.png
To identify potential substrates (degradome) from a collection of sequences (this is commonly proteom of a species), the miner subcommand can be used.
$ substrateminer miner --referencefile unittests/data/test-uniprot.txt --config unittests/test-config.yml --filtermode size --outmode inline
To identify the molecular path for a substrate, the pathfinder subcommand can be used.
$ substrateminer pathfinder -i unittests/data/uniprot_id_short.txt -o path.txt -a
Construct a customised workflow
substrateminer is designed to provide a suite of methods to investigate enzyme substrate repertorie based on sequence cleavage consensus. The package is modular and extensible and can be used to design custom workflows. The following demonstrates a typical workflow:

Methods and Functions Overview
Multiple Sequence Alignment (MSA)
``` usage: msa.py [-h] -i INPUT -o OUTPUT -m METHOD
Perform multiple sequence alignment
options: -h, --help show this help message and exit -i INPUT, --input INPUT Input file path -o OUTPUT, --output OUTPUT Output file path -m METHOD, --method METHOD Alignment method (clustalomega, mafft, muscle) ```
Motif
``` usage: consensus.py [-h] {consensus,weblogo} ...
Determine consensus sequence from a multiple sequence alignment (MSA) and draw summative plots.
positional arguments: {consensus,weblogo} consensus Determine consensus sequence from a multiple sequence alignment (MSA) and draw sequence entropy and gap frequency plots. weblogo Generate a weblogo image from an input file.
options: -h, --help show this help message and exit ```
Consensus
``` usage: consensus.py consensus [-h] -i Input alignment file in FASTA format. [-o Output gap stripped FASTA file name] [-O Output directory] [-c Method for removing insertions] [-t Gap frequency threshold] [-f]
options: -h, --help show this help message and exit -i Input alignment file in FASTA format. Filename for FASTA alignment -o Output gap stripped FASTA file name Output FASTA filename. If not given will use name of input FASTA file as template to name output files. -O Output directory Output directory for all output files. If not given will use directory of input FASTA file. -c Method for removing insertions Desired method for removing insertions. 1 = Positions with gap frequencies < threshold (0.5 default, change with -t flag). 2 = Positions with residue as most frequent character. 3 = Positions with residues in a specific sequence. If not given will ask for user input upon running script. See README for further explantion of methods. -t Gap frequency threshold Gap frequecy threshold to define a consensus positions. Only valid for Option 1 for removing insertions. Must be a value between 0 and 1 (default: 0.5) -f Include flag to prevent saving images of MSA data analysis. ```
Weblogo
``` usage: consensus.py weblogo [-h] -i INPUTFILE -o FILENAME [-s RESOLUTION] [-F FILETYPE]
options: -h, --help show this help message and exit -i INPUTFILE Input alignment file/self-aligned file in FASTA/text format. -o FILENAME Output filename for weblogo image. -s RESOLUTION Resolution of the weblogo image. -F FILETYPE File type of the output image. ```
Miner
``` Usage: substrateminer miner [OPTIONS]
Filter amino acid sequences from a reference file.
Options: --referencefile TEXT The reference file containing sequences. [required] --referencetype [swiss|genbank|embl] The type of reference file. Default is swiss. --filtermode [size|motif|loc] The mode of filtering. Default is size. [required] --config TEXT The path to the configuration file. --stats Generate statistics for the filtered sequences. --outmode [all|file|inline] The output mode for the filtered sequences. --outputfilename TEXT The output file name for the filtered sequences. --outputfiletype [fasta|text|txt|genbank|swiss] The output file type for the filtered sequences. --help Show this message and exit. ```
Pathfinder
``` Usage: substrateminer pathfinder [OPTIONS]
Find the pathological/molecular path for a substrate.
Options: -i, --input PATH Input file path -o, --output TEXT Output file path -a, --api Use KEGG API to retrieve pathways and diseases -u, --uniprots TEXT UniProt ID for a protein, comma-separated for multiple IDs (e.g., P12345,Q67890) or space-separated for multiple IDs (e.g., "P12345 Q67890") -g, --orgs TEXT Organism code for the KEGG API (default: hsa) --help Show this message and exit. ```
GitHub Actions CI manual
UnitTests Sequence
CI/CD is carried out with GitHub Actions workflow and consists following steps:
- Checks out the repository.
- Sets up Python.
- Caches the conda environment.
- Installs Miniconda and creates the conda environment.
- Runs CLI tests with pip.
- Runs unit tests with unittest.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Issue/Bug Reporting
Any issues you encounter with substrateminer, please report by open a bug issue and provide as much details as possible, including examples, error messages and environment setup will be highly appreciated.
Contributing
We welcome contributions to substrateminer. To contribute, please follow the steps below:
1. Fork the repository to your designated location.
2. Create a new branch with a descriptive name for your proposed feature and/or bugfix.
3. Make your changes and commit them with clear and concise commit messages.
4. Push your changes to your forked repository.
5. Submit a pull request.
*Major changes:* please open an issue first to discuss what you would like to change.
Owner
- Name: DPP4ResearchGroup
- Login: DPP4ResearchGroup
- Kind: organization
- Location: Adelaide, Australia
- Website: https://dpp4research.page.link/home
- Repositories: 2
- Profile: https://github.com/DPP4ResearchGroup
DPP4 Research Group @ Flinders University
JOSS Publication
substrateminer: A Python package to investigate protein substrate repertoires
Authors
Tags
visualisation enzyme proteolysis substrates bioinformaticsGitHub Events
Total
- Pull request event: 1
- Pull request review comment event: 2
- Pull request review event: 2
- Fork event: 2
- Create event: 1
Last Year
- Pull request event: 1
- Pull request review comment event: 2
- Pull request review event: 2
- Fork event: 2
- Create event: 1
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 0
- Total pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- csoneson (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v4 composite
- actions/upload-artifact v4 composite
- openjournals/openjournals-draft-action master composite
- actions/checkout v4 composite
- actions/configure-pages v5 composite
- actions/deploy-pages v4 composite
- actions/jekyll-build-pages v1 composite
- actions/upload-pages-artifact v3 composite
- actions/cache v3 composite
- actions/checkout v4 composite
- actions/setup-python v5 composite
- conda-incubator/setup-miniconda v3 composite
- actions/cache v3 composite
- actions/checkout v4 composite
- actions/setup-python v5 composite
- conda-incubator/setup-miniconda v3 composite
- PyYAML *
- biopython >=1.84
- certifi *
- click *
- fonttools *
- matplotlib *
- numpy *
- pandas *
- pillow *
- requests *
- scipy *
- weblogo *
- PyYAML ==6.0.2
- biopython >=1.84
- certifi ==2024.8.30
- charset-normalizer ==3.3.2
- click ==8.1.7
- contourpy ==1.3.0
- cycler ==0.12.1
- fonttools ==4.54.1
- idna ==3.10
- kiwisolver ==1.4.7
- matplotlib <3.8.0,>=3.4.3
- mkl-service ==2.4.0
- modules ==1.0.0
- numpy ==1.26.4
- packaging ==24.1
- pillow ==10.4.0
- pyparsing ==3.1.4
- python-dateutil ==2.9.0
- requests ==2.32.3
- six ==1.16.0
- urllib3 ==2.2.3
