https://github.com/battmodels/smirk

An Atomically Complete Tokenizer for Molecular Foundation Models

https://github.com/battmodels/smirk

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.1%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

An Atomically Complete Tokenizer for Molecular Foundation Models

Basic Info
Statistics
  • Stars: 2
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 2
Created almost 2 years ago · Last pushed 10 months ago
Metadata Files
Readme Changelog License

README.md

Smirk: A Tokenizer for OpenSMILES

![GitHub License](https://img.shields.io/github/license/BattModels/smirk) ![arXiv:2409.15370](https://img.shields.io/badge/cs.LG-2409.15370-b31b1b?style=flat&logo=arxiv&logoColor=red)

Smirk is a chemistry-specific tokenizer that provides complete coverage of the OpenSMILES specification, that is built using Rust 🦀 and HuggingFace's tokenizers 🤗. Installation is easy, and Smirk works out-of-the-box with the HuggingFace ecosystem.

Check our documentation to see smirk in action, or read the paper to learn about tokenization for molecular foundation models.

Installation

pip install smirk

Owner

  • Name: BatteryModels
  • Login: BattModels
  • Kind: organization
  • Email: venkvis@cmu.edu

This will consist of first-principles, multi-physics battery and electric mobility models developed in group of V. Viswanathan at Carnegie Mellon.

GitHub Events

Total
  • Watch event: 1
  • Delete event: 2
  • Push event: 6
  • Pull request event: 3
  • Create event: 3
Last Year
  • Watch event: 1
  • Delete event: 2
  • Push event: 6
  • Pull request event: 3
  • Create event: 3

Dependencies

.github/workflows/CI.yaml actions
  • Swatinem/rust-cache v2 composite
  • actions/checkout v4 composite
  • astral-sh/setup-uv v5 composite
  • dtolnay/rust-toolchain stable composite
.github/workflows/docs.yml actions
  • actions/checkout v4 composite
  • actions/deploy-pages v4 composite
  • actions/upload-pages-artifact v4 composite
  • astral-sh/setup-uv v5 composite
.github/workflows/pre-commit.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • astral-sh/setup-uv v5 composite
Cargo.toml cargo
  • tempfile 3.10.1 development
  • clap 4.5.1
  • const_format 0.2.32
  • derive_builder 0.20.0
  • dict_derive 0.6.0
  • either 1.13.0
  • macro_rules_attribute 0.2.0
  • once_cell 1.19.0
  • paste 1.0.14
  • pyo3 ^0.23
  • regex 1.10.3
  • serde 1.0.197
  • serde_json 1.0.114
  • serde_with 3.8.0
  • tokenizers ^0.21
pyproject.toml pypi
  • transformers >=4.40,<5