https://github.com/battmodels/smirk
An Atomically Complete Tokenizer for Molecular Foundation Models
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.1%) to scientific vocabulary
Repository
An Atomically Complete Tokenizer for Molecular Foundation Models
Basic Info
- Host: GitHub
- Owner: BattModels
- License: apache-2.0
- Language: Rust
- Default Branch: main
- Homepage: https://eeg.engin.umich.edu/smirk/
- Size: 400 KB
Statistics
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 2
Metadata Files
README.md
Smirk: A Tokenizer for OpenSMILES
Smirk is a chemistry-specific tokenizer that provides complete coverage of the OpenSMILES specification, that is built using Rust 🦀 and HuggingFace's tokenizers 🤗. Installation is easy, and Smirk works out-of-the-box with the HuggingFace ecosystem.
Check our documentation to see smirk in action, or read the paper to learn
about tokenization for molecular foundation models.
Installation
pip install smirk
Owner
- Name: BatteryModels
- Login: BattModels
- Kind: organization
- Email: venkvis@cmu.edu
- Website: http://andrew.cmu.edu/~venkatv
- Twitter: venkvis
- Repositories: 11
- Profile: https://github.com/BattModels
This will consist of first-principles, multi-physics battery and electric mobility models developed in group of V. Viswanathan at Carnegie Mellon.
GitHub Events
Total
- Watch event: 1
- Delete event: 2
- Push event: 6
- Pull request event: 3
- Create event: 3
Last Year
- Watch event: 1
- Delete event: 2
- Push event: 6
- Pull request event: 3
- Create event: 3
Dependencies
- Swatinem/rust-cache v2 composite
- actions/checkout v4 composite
- astral-sh/setup-uv v5 composite
- dtolnay/rust-toolchain stable composite
- actions/checkout v4 composite
- actions/deploy-pages v4 composite
- actions/upload-pages-artifact v4 composite
- astral-sh/setup-uv v5 composite
- actions/checkout v3 composite
- actions/setup-python v3 composite
- astral-sh/setup-uv v5 composite
- tempfile 3.10.1 development
- clap 4.5.1
- const_format 0.2.32
- derive_builder 0.20.0
- dict_derive 0.6.0
- either 1.13.0
- macro_rules_attribute 0.2.0
- once_cell 1.19.0
- paste 1.0.14
- pyo3 ^0.23
- regex 1.10.3
- serde 1.0.197
- serde_json 1.0.114
- serde_with 3.8.0
- tokenizers ^0.21
- transformers >=4.40,<5