Fireworks

Fireworks: Reproducible Machine Learning and Preprocessing with PyTorch - Published in JOSS (2019)

https://github.com/kellylab/fireworks

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
    2 of 6 committers (33.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Scientific Fields

Biology Life Sciences - 84% confidence
Last synced: 4 months ago · JSON representation

Repository

PyTorch with DataFrames

Basic Info
Statistics
  • Stars: 14
  • Watchers: 5
  • Forks: 3
  • Open Issues: 0
  • Releases: 1
Created over 7 years ago · Last pushed over 2 years ago
Metadata Files
Readme License

README.md

Build Status Binder status DOI

Fireworks - PyTorch with DataFrames

Introduction

This library provides an implementation of DataFrames which are compatible with PyTorch Tensors. This means that you can construct models that refer to their inputs by column name, which makes it easier to keep track of your variables when working with complex datasets. This also makes it easier to integrate those models into your existing Pandas-based data science workflow. Additionally, we provide a complete machine learning framework built around DataFrames to facilitate model training, saving/loading, preprocessing, and hyperparameter optimization.

Overview

Data is represented in Fireworks using an object called a Message, which generalizes the concept of a DataFrame to include PyTorch tensors (analogous to a TensorFrame in other frameworks). This data structure is popular in statistical research because of how well it organizes information while remaining flexible enough to facilitate any statistical analysis. By providing an implementation of a DataFrame that supports torch.Tensor objects, we can now use this data structure with PyTorch. You can easily to move columns in a Message to and from torch.Tensor and to and from the GPU all within one object.

We provide a set of abstract primitives (Pipes, Junctions, and Models) that can be be used to implement specific operations and can be stacked together to construct a data pipeline. Because of the standardization of input/output that these primitives expect, these components are reusable. Rather than constructing a new data pipeline in an ad-hoc manner for every project, you can modularly compose your pipeline using existing components provided by fireworks or that you have made previously.

This library also provides a set of tools built around these components. There are modules here for training machine learning models, reading and writing to databases using Messages, hyperparameter optimization, and saving/loading snapshots/logs of your data pipeline for re-usability and reproducibility.

Getting Started

Installation

You can install fireworks from PyPI:

pip3 install fireworks-ml

Documentation

See documentation at https://fireworks.readthedocs.io

Contributing

Comments, questions, issues, and pull requests are always welcome! Feel free to open an issue with any feedback you have or reach out to me (smk508) directly with any questions. See our roadmap for an overview of features that we are looking to add (https://fireworks.readthedocs.io/en/development/Project.html#roadmap).

Acknowledgements

Development

fireworks was developed by Saad Khan, an MSTP student in the lab of Libusha Kelly at the Albert Einstein College of Medicine (https://www.kellylab.org/). We use this library to develop deep learning models to study the microbiome.

Funding

Saad Khan is funded in part by an NIH MSTP training grant 6T32GM007288-46. This work was funded in part by a Peer Reviewed Cancer Research Program Career Development Award from the United States Department of Defense to Libusha Kelly (CA171019).

Owner

  • Name: Kelly lab
  • Login: kellylab
  • Kind: organization
  • Email: libusha@gmail.com

Account for Libusha Kelly's lab, Albert Einstein College of Medicine

JOSS Publication

Fireworks: Reproducible Machine Learning and Preprocessing with PyTorch
Published
July 21, 2019
Volume 4, Issue 39, Page 1478
Authors
Saad M. Khan ORCID
Systems & Computational Biology, Albert Einstein College of Medicine
Libusha Kelly
Systems & Computational Biology, Albert Einstein College of Medicine
Editor
Ariel Rokem ORCID
Tags
PyTorch batch processing machine learning

GitHub Events

Total
Last Year

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 251
  • Total Committers: 6
  • Avg Commits per committer: 41.833
  • Development Distribution Score (DDS): 0.044
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Saad Khan s****8@m****u 240
Saad-Khan s****n@d****c 6
Arman a****n@d****c 2
sturmianseq 8****q 1
Daniel S. Katz d****z@i****g 1
Ariel Rokem a****m@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 52
  • Total pull requests: 29
  • Average time to close issues: 5 months
  • Average time to close pull requests: 2 months
  • Total issue authors: 2
  • Total pull request authors: 4
  • Average comments per issue: 0.15
  • Average comments per pull request: 0.28
  • Merged pull requests: 24
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • smk508 (51)
  • dirmeier (1)
Pull Request Authors
  • smk508 (26)
  • sturmianseq (1)
  • danielskatz (1)
  • arokem (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

doc-requirements.txt pypi
  • Babel >=2.6.0
  • Deprecated >=1.2.3
  • Flask >=1.0.2
  • Jinja2 >=2.10
  • MarkupSafe >=1.0
  • Pygments >=2.2.0
  • SQLAlchemy >=1.3.1
  • SQLAlchemy-Utils ==0.33.5
  • Sphinx >=1.8.4
  • alabaster >=0.7.12
  • atomicwrites >=1.1.5
  • attrs >=18.1.0
  • bleach >=3.1.0
  • certifi >=2018.4.16
  • chardet >=3.0.4
  • click >=6.7
  • docutils >=0.14
  • enum34 >=1.1.6
  • fancycompleter >=0.8
  • ffmpeg >=1.4
  • idna >=2.6
  • imagesize >=1.1.0
  • itsdangerous >=0.24
  • more-itertools ==4.1.0
  • numpy >=1.14.3
  • packaging >=19.0
  • pandas >=0.23.0
  • pbr >=5.1.3
  • pkginfo >=1.5.0.1
  • pluggy >=0.6.0
  • py >=1.5.3
  • pyparsing >=2.3.1
  • pytest >=3.6.0
  • pytorch-ignite >=0.1.0
  • pytz >=2018.4
  • pyzmq >=17.0.0
  • readme >renderer==24.0
  • requests >=2.21.0
  • requests-toolbelt ==0.9.1
  • six >=1.11.0
  • snowballstemmer >=1.2.1
  • torch >=1.0.1.post2
  • torchfile >=0.1.0
  • urllib3 >=1.24.1
  • visdom ==0.1.8.4
full-requirements.txt pypi
  • Babel ==2.6.0
  • Deprecated ==1.2.3
  • Flask ==1.0.2
  • Jinja2 ==2.10.1
  • MarkupSafe ==1.0
  • Pillow ==5.1.0
  • Pygments ==2.2.0
  • SQLAlchemy ==1.3.1
  • SQLAlchemy-Utils ==0.33.5
  • Sphinx ==1.8.4
  • Werkzeug ==0.15.3
  • alabaster ==0.7.12
  • atomicwrites ==1.1.5
  • attrs ==18.1.0
  • bidict ==0.17.2
  • biopython ==1.71
  • bleach ==3.1.0
  • certifi ==2018.4.16
  • chardet ==3.0.4
  • click ==6.7
  • docutils ==0.14
  • enum34 ==1.1.6
  • fancycompleter ==0.8
  • ffmpeg ==1.4
  • idna ==2.6
  • imagesize ==1.1.0
  • itsdangerous ==0.24
  • more-itertools ==4.1.0
  • packaging ==19.0
  • pandas >=0.24.0
  • pbr ==5.1.3
  • pdbpp ==0.9.2
  • pkginfo ==1.5.0.1
  • pluggy ==0.6.0
  • py ==1.5.3
  • pyparsing ==2.3.1
  • pytest ==3.6.0
  • python-dateutil ==2.7.3
  • pytorch-ignite ==0.1.0
  • pytz ==2018.4
  • pyzmq ==17.0.0
  • readme-renderer ==24.0
  • requests ==2.21.0
  • requests-toolbelt ==0.9.1
  • scipy >=1.1.0
  • six ==1.11.0
  • snowballstemmer ==1.2.1
  • sphinxcontrib-websupport ==1.1.0
  • torch >=1.0.1.post2
  • torchfile ==0.1.0
  • torchvision ==0.3.0
  • tornado ==5.0.2
  • tqdm ==4.24.0
  • twine ==1.13.0
  • urllib3 ==1.24.2
  • visdom ==0.1.8.4
  • webencodings ==0.5.1
  • websocket-client ==0.47.0
  • wmctrl ==0.3
  • wrapt ==1.10.11
requirements.txt pypi
  • Deprecated >=1.2.5
  • SQLAlchemy >=1.3.4
  • SQLAlchemy-Utils >0.33.5
  • bidict >=0.18.0
  • biopython >==1.73
  • matplotlib >=3.0.3
  • numpy >=1.16.4
  • pandas >=0.24.2
  • pyarrow >=0.14.1
  • python-dateutil >=2.8.0
  • pytorch-ignite >=0.2.0
  • six >=1.12.0
  • torch >=1.1.0
  • torchfile >=0.1.0
  • torchvision >=0.3.0
  • visdom >=0.1.8.4
  • wrapt >=1.11.1