https://github.com/althonos/proteinogenic

Chemical structure generation for protein sequences as SMILES string.

https://github.com/althonos/proteinogenic

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    1 of 2 committers (50.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.5%) to scientific vocabulary

Keywords

bioinformatics chemical-structure protein-sequences rust-library smiles
Last synced: 6 months ago · JSON representation

Repository

Chemical structure generation for protein sequences as SMILES string.

Basic Info
  • Host: GitHub
  • Owner: althonos
  • License: mit
  • Language: Rust
  • Default Branch: main
  • Homepage:
  • Size: 418 KB
Statistics
  • Stars: 4
  • Watchers: 2
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Topics
bioinformatics chemical-structure protein-sequences rust-library smiles
Created about 4 years ago · Last pushed almost 4 years ago
Metadata Files
Readme Changelog License

README.md

proteinogenic Star me

Chemical structure generation for protein sequences as SMILES string.

Actions Codecov License Source Crate Documentation Changelog GitHub issues

🔌 Usage

This crate builds on top of purr, a crate providing primitives for reading and writing SMILES.

Use the AminoAcid enum to encode the sequence residues, and build a SMILES string with proteinogenic::smiles. For example with divergicin 750:

```rust extern crate proteinogenic;

let residues = "KGILGKLGVVQAGVDFVSGVWAGIKQSAKDHPNA" .chars() .map(proteinogenic::AminoAcid::from_char) .map(Result::unwrap); let s = proteinogenic::smiles(residues) .expect("failed to generate SMILES string"); ```

Additional modifications can be carried out by using a Peptide struct to configure the rendering of the peptide. So far, disulfide bonds as well as lanthionine bridges are supported, as well as head-to-tail cyclization. For instance. we can generate the SMILES string of a cyclotide such as kalata B1:

```rust extern crate proteinogenic;

let residues = "GLPVCGETCVGGTCNTPGCTCSWPVCTRN" .chars() .map(proteinogenic::AminoAcid::from_char) .map(Result::unwrap);

let mut p = proteinogenic::Protein::new(residues); p.cyclization(proteinogenic::Cyclization::HeadToTail); p.crosslink(proteinogenic::CrossLink::Cystine(5, 19)).unwrap(); p.crosslink(proteinogenic::CrossLink::Cystine(9, 21)).unwrap(); p.cross_link(proteinogenic::CrossLink::Cystine(14, 26)).unwrap();

let s = p.smiles() .expect("failed to generate SMILES string"); ```

This SMILES string can be used in conjunction with other cheminformatics toolkits, for instance OpenBabel which can generate a PNG figure:

Skeletal formula of divergicin 750

Note that proteinogenic is not limited to building a SMILES string; it can actually use any purr::walk::Follower implementor to generate an in-memory representation of a protein formula. If your code is already compatible with purr, then you'll be able to use protein sequences quite easily.

```rust extern crate proteinogenic; extern crate purr;

let sequence = "KGILGKLGVVQAGVDFVSGVWAGIKQSAKDHPNA"; let residues = sequence.chars() .map(proteinogenic::AminoAcid::from_char) .map(Result::unwrap);

let mut builder = purr::graph::Builder::new(); proteinogenic::visit(residues, &mut builder);

builder.build() .expect("failed to create a graph representation"); ```

The API is not yet stable, and may change to follow changes introduced by purr or to improve the interface ergonomics.

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

📋 Changelog

This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.

🔍 See Also

If you're a bioinformatician and a Rustacean, you may be interested in these other libraries:

  • uniprot.rs: Rust data structures for the UniProtKB databases.
  • obofoundry.rs: Rust data structures for the OBO Foundry.
  • fastobo: Rust parser and abstract syntax tree for Open Biomedical Ontologies.
  • pubchem.rs: Rust data structures and API client for the PubChem API.

📜 License

This library is provided under the open-source MIT license.

This project was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.

Owner

  • Name: Martin Larralde
  • Login: althonos
  • Kind: user
  • Location: Heidelberg, Germany
  • Company: EMBL / LUMC, @zellerlab

PhD candidate in Bioinformatics, passionate about programming, SIMD-enthusiast, Pythonista, Rustacean. I write poems, and sometimes they are executable.

GitHub Events

Total
Last Year

Committers

Last synced: 10 months ago

All Time
  • Total Commits: 35
  • Total Committers: 2
  • Avg Commits per committer: 17.5
  • Development Distribution Score (DDS): 0.029
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Martin Larralde m****e@e****e 34
imgbot[bot] 3****] 1
Committer Domains (Top 20 + Academic)
embl.de: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: 22 minutes
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 1.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 1
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • imgbot[bot] (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cargo 2,590 total
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 2
  • Total maintainers: 1
crates.io: proteinogenic

Chemical structure generation for protein sequences as SMILES string

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 2,590 Total
Rankings
Forks count: 29.1%
Dependent repos count: 29.3%
Stargazers count: 33.8%
Dependent packages count: 33.8%
Average: 39.7%
Downloads: 72.6%
Maintainers (1)
Last synced: 6 months ago

Dependencies

Cargo.toml cargo
  • lazy_static 1.4.0 development
  • pubchem 0.1.1 development
  • purr 0.9.0