yowl

A SMILES parser based on Purr

https://github.com/chem-william/yowl

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.4%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

A SMILES parser based on Purr

Basic Info
  • Host: GitHub
  • Owner: chem-william
  • License: mit
  • Language: Rust
  • Default Branch: main
  • Size: 524 KB
Statistics
  • Stars: 6
  • Watchers: 2
  • Forks: 0
  • Open Issues: 2
  • Releases: 4
Created about 1 year ago · Last pushed 11 months ago
Metadata Files
Readme License Citation

README.md

Codecov dependency status DOI

Yowl

Primitives for reading and writing SMILES strings in Rust.

This project is a hard fork of Purr and extends its functionality to support additional SMILES inputs accepted by RDKit and beyond.

About

Yowl provides a safe, ergonomic API to parse and serialize molecular structures in the OpenSMILES format. SMILES (Simplified Molecular Input Line Entry System) is a widely adopted notation for representing molecular graphs as text strings.

Usage

Add yowl to your Cargo.toml:

bash cargo add yowl

Examples

Parse acetamide into an adjacency representation:

```rust use yowl::graph::{Builder, Atom, Bond}; use yowl::feature::{AtomKind, BondKind, Symbol}; use yowl::read::{read, ReadError}; use yowl::Element;

fn main() -> Result<(), ReadError> { let mut builder = Builder::default();

read("CC(=O)N", &mut builder, None)?;

assert_eq!(builder.build(), Ok(vec![
    Atom {
        kind: AtomKind::Symbol(Symbol::Aliphatic(Element::C)),
        bonds: vec![
            Bond::new(BondKind::Elided, 1)
        ]
    },
    Atom {
        kind: AtomKind::Symbol(Symbol::Aliphatic(Element::C)),
        bonds: vec![
            Bond::new(BondKind::Elided, 0),
            Bond::new(BondKind::Double, 2),
            Bond::new(BondKind::Elided, 3)
        ]
    },
    Atom {
        kind: AtomKind::Symbol(Symbol::Aliphatic(Element::O)),
        bonds: vec![
            Bond::new(BondKind::Double, 1)
        ]
    },
    Atom {
        kind: AtomKind::Symbol(Symbol::Aliphatic(Element::N)),
        bonds: vec![
            Bond::new(BondKind::Elided, 1)
        ]
    }
]));

Ok(())

} ```

The order of atoms and their substituents reflects their implied order within the corresponding SMILES string. This is important when atomic configuration (e.g., @, @@) is present at an atom.

An optional Trace type maps adjacency features to a cursor position in the original string. This is useful for conveying semantic errors such as hypervalence.

```rust use yowl::graph::Builder; use yowl::read::{read, Trace}; use yowl::read::ReadError;

fn main() -> Result<(), ReadError> { let mut builder = Builder::default(); let mut trace = Trace::default();

//    012345678901234
read("C(C)C(C)(C)(C)C", &mut builder, Some(&mut trace))?;

// Texas carbon @ atom(2) with cursor range 4..5
assert_eq!(trace.atom(2), Some(4..5));

Ok(())

} ```

Syntax errors are mapped to the cursor at which they occur.

```rust use yowl::graph::Builder; use yowl::read::{read, ReadError};

fn main() { let mut builder = Builder::default();

assert_eq!(read("OCCXC", &mut builder, None), Err(ReadError::Character(3)));

} ```

An adjacency can be written using write.

```rust use yowl::graph::Builder; use yowl::read::{read, ReadError}; use yowl::walk::walk; use yowl::write::Writer;

fn main() -> Result<(), ReadError> { let mut builder = Builder::default();

read("c1c([37Cl])cccc1", &mut builder, None)?;

let atoms = builder.build().expect("atoms");
let mut writer = Writer::default();

walk(atoms, &mut writer).expect("walk");

assert_eq!(writer.write(), "c(ccccc1[37Cl])1");

Ok(())

} ```

The output string doesn't match the input string, although both represent the same molecule (Cl-37 chlorobenzene). write traces atoms in depth-first order, but the adjacency representation (atoms) lacks information about how the original SMILES tree was cut.

Notes

Reading a SMILES string is not guaranteed to produce the same SMILES string when written using writer. It will always correspond to the same molecule (and if not, please open a bug report!) - The temporary IUPAC names for the synthetic elements (such as Uun, Uuu, etc.) are supported for reading, but not writing. As such, a SMILES string with "[Uun]" would get written as "[Ds]". - Single-quotation marks (') are ignored everywhere in SMILES input. For example, ['Lv'] and [Lv] are equivalent. Error reporting will always point to the correct position in the original string, even if there are quotes. Writing the SMILES to disk will be done without single-quotes irrespective of whether the original SMILES string had single-quotes.

Why a hard fork

The original author of Purr has seemingly passed away (he chronicled some of his time with cancer on his personal blog), and the library needed extensions to accept a broader set of SMILES inputs (e.g., RDKit-compatible strings). Yowl continues maintenance and adds new features.

Contributing

Contributions are welcome! Please open an issue or pull request. Ensure you add tests for new functionality and follow Rust formatting conventions (cargo fmt).

License

Yowl is distributed under the terms of the MIT License. See LICENSE-MIT and COPYRIGHT for details.

Owner

  • Name: William Bro-Jørgensen
  • Login: chem-william
  • Kind: user
  • Company: University of Copenhagen

Citation (CITATION)

@software{richard_apodaca_2025_15360154,
  author       = {Richard Apodaca and
                  William},
  title        = {chem-william/yowl: v0.1.0},
  month        = may,
  year         = 2025,
  publisher    = {Zenodo},
  version      = {v0.1.0},
  doi          = {10.5281/zenodo.15360154},
  url          = {https://doi.org/10.5281/zenodo.15360154},
  swhid        = {swh:1:dir:12ec0125b4447d6ff0dfa8303e43f476c14a0064
                   ;origin=https://doi.org/10.5281/zenodo.15360153;vi
                   sit=swh:1:snp:e938d394dd6d3f715e79f5e9af0fae6707ac
                   b52f;anchor=swh:1:rel:9d5681136fbbef03a6c106db4ab7
                   e405bce4ae6c;path=chem-william-yowl-d4ebf46
                  },
}

GitHub Events

Total
  • Create event: 18
  • Issues event: 8
  • Release event: 4
  • Watch event: 4
  • Delete event: 14
  • Issue comment event: 12
  • Push event: 56
  • Public event: 1
  • Pull request event: 33
Last Year
  • Create event: 18
  • Issues event: 8
  • Release event: 4
  • Watch event: 4
  • Delete event: 14
  • Issue comment event: 12
  • Push event: 56
  • Public event: 1
  • Pull request event: 33

Packages

  • Total packages: 1
  • Total downloads:
    • cargo 1,945 total
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 4
  • Total maintainers: 1
crates.io: yowl

Primitives for reading and writing the SMILES language

  • Versions: 4
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 1,945 Total
Rankings
Dependent repos count: 21.8%
Dependent packages count: 28.8%
Average: 48.5%
Downloads: 94.7%
Maintainers (1)
Last synced: 11 months ago