p2smi: A toolkit enabling SMILES generation and property analysis for noncanonical and cyclized peptides

p2smi: A toolkit enabling SMILES generation and property analysis for noncanonical and cyclized peptides - Published in JOSS (2025)

https://github.com/aaronfeller/p2smi

Last synced: 3 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: AaronFeller
License: mit-0
Language: Python
Default Branch: master
Size: 11.4 MB

Statistics

Stars: 12
Watchers: 1
Forks: 2
Open Issues: 2
Releases: 2

Created almost 2 years ago · Last pushed 5 months ago

Metadata Files

Readme License

p2smi: Generation and Analysis of Drug-like Peptide SMILES Strings

p2smi is a Python toolkit for peptide design and analysis.

It enables generation of peptide sequences, conversion to SMILES representations—including support for cyclic and noncanonical amino acids—and evaluation of molecular properties. The package also provides utilities for structural modification (e.g., N-methylation, PEGylation), synthesis feasibility assessment, and output in a dedicated .p2smi format that links peptide sequences to their corresponding SMILES.

Developed in support of PeptideCLM, a SMILES-based language model for modified peptides, p2smi provides an extensible foundation for computational peptide chemistry and machine-learning-driven molecular design.

Features

Generate random peptide sequences (with NCAAs, D-stereochemistry, and cyclization)
Convert peptide FASTA files into valid SMILES strings
Support five cyclization types: disulfide, head-to-tail, sidechain-to-sidechain, sidechain-to-N-term, sidechain-to-C-term
Modify SMILES with user-defined N-methylation and PEGylation rates
Evaluate synthetic feasibility based on common failure motifs
Compute molecular properties (MW, logP, TPSA, Lipinski, etc.)

Updates

Version 1.1.1 - Added functionality to allow for user-defined cyclizing residue constraints
Version 1.1.0 - Updated codebase, documentation, fixed bugs -- for JOSS review
Version 1.0.0 - First release for JOSS submission

Citation

If you use this tool, please cite:

p2smi: A Python Toolkit for Peptide FASTA-to-SMILES Conversion and Molecular Property Analysis.
Feller, A. L. and Wilke, C. O. (2025).
arXiv

A JOSS publication for this package is in review.

Manuscript

PDF | Markdown

Installation

Install from PyPI:

bash pip install p2smi

For local development:

bash git clone https://github.com/AaronFeller/p2smi.git cd p2smi pip install -e .[dev]

Command-Line Tools

| Command | Description | |-------------------------|-------------------------------------------------------------------------------------------------------------------------| | generate-peptides. | Summary: Generates random peptide sequences with user-defined constraints including number of sequences, length range, NCAA percentage, D-stereochemistry rate, and cyclization types. Supports over 100 noncanonical amino acids (SwissSidechain).
Input: CLI arguments for generation settings and output filename.
Output: FASTA file with single-letter codes, including noncanonical residues. | | fasta2smi | Summary: Converts peptide sequences from FASTA format into SMILES, parsing cyclization tags from the FASTA header.
Note: Supports five cyclization types: disulfide (SS), head-to-tail (HT), sidechain-to-sidechain (SCSC), sidechain-to-head (SCNT), and sidechain-to-tail (SCCT). To define specific cyclizations, include notation in fasta file as described in the next section below.
Input: Peptide FASTA file, optional cyclization tags.
Output: .p2smi file containing amino acid sequence, cyclization type, and SMILES string. | | modify-smiles | Summary: Applies random N-methylation and PEGylation to SMILES strings. Modifications are probabilistic and tracked when input is in .p2smi format.
Input: Plaintext SMILES file or .p2smi file.
Output: Modified SMILES in same format as input, with changes recorded. | | smiles-props | Summary: Computes a wide range of molecular properties from SMILES, including MW, TPSA, logP, H-bond donors/acceptors, rotatable bonds, ring count, fraction Csp3, heavy atoms, formal charge, molecular formula, and Lipinski rule evaluation.
Input: SMILES text file or .p2smi file.
Output: JSON-formatted text file with calculated properties for each SMILES. | | synthesis-check | Summary: Evaluates peptide sequences for synthetic feasibility using hard-coded filters (e.g., N/Q at N-terminus, Gly/Pro motifs, Cys count, hydrophobicity, charge distribution). Currently supports natural amino acids only.
Input: FASTA file.
Output: FASTA file with headers annotated as PASS/FAIL. |

Use --help on any command for options: bash fasta2smi --help

Manually encoding cyclizations

Cyclizations can be specified directly in the FASTA header to control how fasta2smi interprets bond formation between residues.

Each cyclization tag begins with a two-letter code identifying the bond type (SS or SC), followed by a constraint mask of equal length to the peptide sequence, where:

X marks positions left unconstrained
C marks residues participating in a disulphide bond
N marks residues with side-chain cyclization to N-term
Z marks residues with side-chain cyclization to C-term
if N and Z included, form side-chain to side-chain cyclization

Supported Formats:

| Tag | Type | Description | Example header| |------|------|-------------|----------| | SS | Disulfide | Connects two cysteine residues | >peptide\|SSXXXCXXXCX | | HT | Head-to-tail | Amide bond between N- and C-termini | >peptide\|HT | | SCSC | Sidechain–Sidechain | Covalent link between two sidechains (e.g., Lys–Asp lactam) | >peptide\|SCXXNXXXXXZ | | SCNT | Sidechain–N-Terminus | Link between N-terminus and a sidechain residue | >peptide\|SCXXNXXXXXX | | SCCT | Sidechain–C-Terminus | Link between a sidechain residue and C-terminus | >peptide\|SCXXXXXZXXX |

Example Usage

Generate random peptides with constraints:

bash generate-peptides \ --num 10 \ --min_length 10 \ --max_length 20 \ --noncanonical 0.1 \ --dextro 0.1 \ --cyclization_constraints all \ --outfile peptides.fasta

Convert FASTA to SMILES:

bash fasta2smi -i peptides.fasta -o peptides.p2smi

Modify SMILES strings:

bash modify-smiles -i peptides.p2smi -o modified.p2smi --peg_rate 0.2 --nmeth_rate 0.2 --nmeth_residues 0.2

Compute molecular properties:

bash smiles-props -i modified.p2smi

Check synthesis feasibility (natural AAs only):

bash generate-peptides -o nat_peptides.fasta synthesis-check -i nat_peptides.fasta

Future Work

Extend synthesis rules to NCAAs and modified peptides
Support alternative encodings (HELM, SELFIES)
Batch processing and multiprocessing support
Integration with predictive models
Post-translational modification import pipelines

For Contributors

You’re welcome to contribute! Suggestions, bugs, and pull requests are appreciated.

📂 Open an Issue
🛠 Submit a pull request
📝 Improve the docs

License

MIT License

Owner

Name: Aaron
Login: AaronFeller
Kind: user

Repositories: 2
Profile: https://github.com/AaronFeller

JOSS Publication

p2smi: A toolkit enabling SMILES generation and property analysis for noncanonical and cyclized peptides

Published

December 09, 2025

DOI

10.21105/joss.08319

Volume 10, Issue 116, Page 8319

Authors

Aaron L. Feller

Department of Interdisciplinary Life Sciences, The University of Texas at Austin, Austin, TX, United States

Claus O. Wilke

Department of Interdisciplinary Life Sciences, The University of Texas at Austin, Austin, TX, United States, Department of Integrative Biology, The University of Texas at Austin, Austin, TX, United States

Editor

Lucy Whalley

GitHub Events

Total

Create event: 3
Release event: 2
Issues event: 4
Watch event: 12
Delete event: 2
Issue comment event: 4
Push event: 35
Pull request event: 2
Fork event: 1

Last Year

Create event: 3
Release event: 2
Issues event: 4
Watch event: 12
Delete event: 2
Issue comment event: 4
Push event: 35
Pull request event: 2
Fork event: 1

Committers

Last synced: 5 months ago

All Time

Total Commits: 71
Total Committers: 2
Avg Commits per committer: 35.5
Development Distribution Score (DDS): 0.197

Past Year

Commits: 48
Committers: 1
Avg Commits per committer: 48.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
AaronFeller	a**r@g**m	57
Fergal Duffy	f**d@g**m	14

Issues and Pull Requests

Last synced: 5 months ago

All Time

Total issues: 4
Total pull requests: 3
Average time to close issues: N/A
Average time to close pull requests: 19 minutes
Total issue authors: 3
Total pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 0.0
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 4
Pull requests: 3
Average time to close issues: N/A
Average time to close pull requests: 19 minutes
Issue authors: 3
Pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 0.0
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

andregiuseppecavalli (2)
dr-aspirinas (1)
jfFEM (1)

Pull Request Authors

AaronFeller (3)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 289 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 4
Total maintainers: 1

pypi.org: p2smi

A toolkit enabling SMILES generation and property analysis for noncanonical and cyclized peptides.

Homepage: https://github.com/AaronFeller/p2smi
Documentation: https://p2smi.readthedocs.io/
License: MIT
Latest release: 1.1.1
published 4 months ago

Versions: 4
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 289 Last month

Rankings

Dependent packages count: 9.5%

Average: 31.3%

Dependent repos count: 53.2%

Maintainers (1)

AaronFeller

Last synced: 4 months ago

p2smi: A toolkit enabling SMILES generation and property analysis for noncanonical and cyclized peptides

Science Score: 87.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

p2smi: Generation and Analysis of Drug-like Peptide SMILES Strings

Features

Updates

Citation

Manuscript

Directory

Installation

Command-Line Tools

Manually encoding cyclizations

Supported Formats:

Example Usage

Generate random peptides with constraints:

Convert FASTA to SMILES:

Modify SMILES strings:

Compute molecular properties:

Check synthesis feasibility (natural AAs only):

Future Work

For Contributors

License

Owner

JOSS Publication

p2smi: A toolkit enabling SMILES generation and property analysis for noncanonical and cyclized peptides

Authors

Editor

Tags

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: p2smi

Rankings

Maintainers (1)