chemlift

Language-interfaced fine-tuning for chemistry

https://github.com/lamalab-org/chemlift

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
    1 of 1 committers (100.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.5%) to scientific vocabulary

Keywords

chemistry few-shot-learning fine-tuning hacktoberfest llm materials
Last synced: 6 months ago · JSON representation ·

Repository

Language-interfaced fine-tuning for chemistry

Basic Info
  • Host: GitHub
  • Owner: lamalab-org
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 229 KB
Statistics
  • Stars: 43
  • Watchers: 1
  • Forks: 7
  • Open Issues: 11
  • Releases: 1
Topics
chemistry few-shot-learning fine-tuning hacktoberfest llm materials
Created over 2 years ago · Last pushed about 2 years ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.md

chemlift

Tests PyPI PyPI - Python Version PyPI - License Documentation Status Codecov status Cookiecutter template from @cthoyt Code style: black Contributor Covenant

Chemical language interfaced predictions using large language models.

💪 Getting Started

With ChemLIFT you can use large language models to make predictions on chemical data. You can use two different approaches:

  • Few-shot learning: Provide a few examples in the prompt along with the points you want to predict and the model will learn to predict the property of interest.
  • Fine-tuning: Fine-tune a large language model on a dataset of your choice and use it to make predictions.

Fine-tuning updates the weights of the model, while few-shot learning does not.

Few-shot learning

```python from chemlift.icl.fewshotclassifier import FewShotClassifier from langchain.llms import OpenAI

llm = OpenAI() fsc = FewShotClassifier(llm, property_name='bandgap')

Train on a few examples

fsc.fit(['ethane', 'propane', 'butane'], [0,1,0])

Predict on a few more

fsc.predict(['pentane', 'hexane', 'heptane']) ```

Fine-tuning

```python

from chemlift.finetuning.classifier import ChemLIFTClassifierFactory

model = ChemLIFTClassifierFactory('property name', modelname='EleutherAI/pythia-1b-deduped').createmodel() model.fit(X, y) model.predict(X) ```

🚀 Installation

The most recent code and data can be installed directly from GitHub with:

bash $ pip install git+https://github.com/lamalab-org/chemlift.git

👐 Contributing

Contributions, whether filing an issue, making a pull request, or forking, are appreciated. See CONTRIBUTING.md for more information on getting involved.

👋 Attribution

⚖️ License

The code in this package is licensed under the MIT License.

📖 Citation

Citation goes here!

@article{Jablonka_2023, doi = {10.26434/chemrxiv-2023-fw8n4}, url = {https://doi.org/10.26434%2Fchemrxiv-2023-fw8n4}, year = 2023, month = {feb}, publisher = {American Chemical Society ({ACS})}, author = {Kevin Maik Jablonka and Philippe Schwaller and Andres Ortega-Guerrero and Berend Smit}, title = {Is {GPT}-3 all you need for low-data discovery in chemistry?} }

🎁 Support

The work of the LAMALab is supported by the Carl-Zeiss foundation.

In addition, the work was supported by the MARVEL National Centre for Competence in Research funded by the Swiss National Science Foundation (grant agreement ID 51NF40-182892). In addition, we acknoweledge support by the USorb-DAC Project, which is funded by a grant from The Grantham Foundation for the Protection of the Environment to RMI’s climate tech accelerator program, Third Derivative.

🛠️ For Developers

See developer instructions The final section of the README is for if you want to get involved by making a code contribution. ### Development Installation To install in development mode, use the following: ```bash $ git clone git+https://github.com/lamalab-org/chemlift.git $ cd chemlift $ pip install -e . ``` ### 🥼 Testing After cloning the repository and installing `tox` with `pip install tox`, the unit tests in the `tests/` folder can be run reproducibly with: ```shell $ tox ``` Additionally, these tests are automatically re-run with each commit in a [GitHub Action](https://github.com/lamalab-org/chemlift/actions?query=workflow%3ATests). ### 📖 Building the Documentation The documentation can be built locally using the following: ```shell $ git clone git+https://github.com/lamalab-org/chemlift.git $ cd chemlift $ tox -e docs $ open docs/build/html/index.html ``` The documentation automatically installs the package as well as the `docs` extra specified in the [`setup.cfg`](setup.cfg). `sphinx` plugins like `texext` can be added there. Additionally, they need to be added to the `extensions` list in [`docs/source/conf.py`](docs/source/conf.py). ### 📦 Making a Release After installing the package in development mode and installing `tox` with `pip install tox`, the commands for making a new release are contained within the `finish` environment in `tox.ini`. Run the following from the shell: ```shell $ tox -e finish ``` This script does the following: 1. Uses [Bump2Version](https://github.com/c4urself/bump2version) to switch the version number in the `setup.cfg`, `src/chemlift/version.py`, and [`docs/source/conf.py`](docs/source/conf.py) to not have the `-dev` suffix 2. Packages the code in both a tar archive and a wheel using [`build`](https://github.com/pypa/build) 3. Uploads to PyPI using [`twine`](https://github.com/pypa/twine). Be sure to have a `.pypirc` file configured to avoid the need for manual input at this step 4. Push to GitHub. You'll need to make a release going with the commit where the version was bumped. 5. Bump the version to the next patch. If you made big changes and want to bump the version by minor, you can use `tox -e bumpversion -- minor` after.

Owner

  • Name: Laboratory for AI for Materials
  • Login: lamalab-org
  • Kind: organization

Research group led by Kevin Maik Jablonka

Citation (CITATION.cff)

cff-version: 1.0.2
message: "If you use this software, please cite it as below."
title: "chemlift"
authors:
  - name: "Kevin Maik Jablonka"
version: 0.0.1-dev
doi:
url: "https://github.com/lamalab-org/chemlift"

GitHub Events

Total
  • Watch event: 11
  • Fork event: 4
Last Year
  • Watch event: 11
  • Fork event: 4

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 35
  • Total Committers: 1
  • Avg Commits per committer: 35.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Kevin Maik Jablonka k****a@e****h 35
Committer Domains (Top 20 + Academic)
epfl.ch: 1

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 18
  • Total pull requests: 1
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 1 minute
  • Total issue authors: 2
  • Total pull request authors: 1
  • Average comments per issue: 0.22
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • kjappelbaum (14)
  • demaxiya567 (1)
Pull Request Authors
  • kjappelbaum (1)
Top Labels
Issue Labels
Pull Request Labels