HARDy

HARDy: Handling Arbitrary Recognition of Data in Python - Published in JOSS (2022)

https://github.com/eisy-as-py/hardy

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
    2 of 6 committers (33.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Scientific Fields

Engineering Computer Science - 60% confidence
Last synced: 6 months ago · JSON representation

Repository

Handling Arbitrary Recognition of Data! y not?

Basic Info
  • Host: GitHub
  • Owner: EISy-as-Py
  • License: mit
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 80.8 MB
Statistics
  • Stars: 10
  • Watchers: 2
  • Forks: 4
  • Open Issues: 5
  • Releases: 1
Created almost 6 years ago · Last pushed almost 3 years ago
Metadata Files
Readme License

README.md

Build Status Coverage Status Documentation Status Anaconda-Server Badge Anaconda-Server Badge Anaconda-Server Badge

DOI

Project HARDy

"HARDy: Handling Arbitrary Recognition of Data in python" A package to assist in discovery, research, and classification of YOUR data, no matter who you are!

Project Objective

Numerical and visual transformation of experimental data to improve its classification and cataloging

This project was part of DIRECT Capstone Project at University of Washington and was presented at the showcase, follow this link for the presentation

Requirements:

Package HARDy has following main dependencies: 1. Python = 3.7 2. Tensorflow = 2.0

The detailed list of dependencies is reflected in the environment.yml file

Installation:

The package HARDy can be installed using following command:

conda install -c pozzorg hardy

Alternatively, you can also install it using the GitHub repository in following steps:

*Please note that currently v1.0 is the most stable release

  1. In your terminal, run git clone https://github.com/EISy-as-Py/hardy.git
  2. Change the directory to hardy root directory, by running cd hardy
  3. Run git checkout v1.0
  4. Run python setup.py install
  5. To check installation run, python -c "import hardy" in your terminal

For other methods of installation like using environment file and installation using pip, please visit Installation page.

Usage:

HARDy uses Keras for training Convolutional Neural Network & Keras-tuner for the hyperparameter optimization. The flow of information is shown in image below:

information flow of how the package works

An example jupyter notebook to run HARDy using single script is available at this link Example Notebook

To perform various transformations, training Neural Network and Hyperparameter Optimization, Hardy utilizes following .yaml configuration files:

The instructions for modifying or writing your own configuration file can be accessed by clicking on the configuration files listed above.

The notebooks and documentations can also be accessed at this link Documentations

Visualization

In order to increase the density of data presented to the convolutional neural network and add a visual transformation of the data, we adopted a new plotting technique that takes advantage of how images are read by computers. Using color images, we were able to encode the experimental data in the pixel value, using different series per each image channel. The results are data- dense images, which are also pretty to look at.

 details on the proposed visual transformation to increased the images data density

Mission:

We have been commissioned by Professor Lilo Pozzo to create a new tool for research and discovery, For her lab and for high throughput researchers everywhere. Our vision of the final product: * A package which can approach any large, labeled dataset (such as those familiar to High Throughput Screening (HTS) researchers). * Perform a (procedurally generated and data-guided) wide array of transformations on the data to produce completely novel ways of examining the data, maybe not Human-Readable but in a certainly machine-readable format. * Train "A Machine Learning Algorithm" (We currently focus on Visual-Processing CNNs but are open to anything!) to classify the existing labled data based on each of the aforementioned transformations. * Report back to the user: * Which versions of the Model/Algorithm worked best? * Which transformations appeared the most useful? (AKA were used across many of the most successful models) * What Data "Fingerprints" should we pay the most attention to? * Present a User Interface, to allow non-programmers to interact with and use the chosen classifier(s?) in their work.

## Use Cases: The package is designed to deal with a diverse set of labeled data. These are some of the use cases we see benefitting from using the HARDy package.

possible use cases for the HARDy package

## Modules Overview: * handling.py : Functions related to configuration, importing/exporting, and other sorts of back-end useful tasks. * arbitrage.py : Data Pre-Analysis, Transformations, and other preparation to be fed into the learning algorithm. * recognition.py : Setup, training and testing of single convolutional neural network (CNN) or hyperparameters optimization for CNNs. * data_reporting.py : Output and reporting of any/all results. Tabular summary of runs, visual performance comparison, as well as parallel coordinate plots and feature maps

## Community Guidlines: We welcome the members of open-source community to extend the functionalities of HARDy, submit feature requests and report bugs.

### Feature Request: If you would like to suggest a feature or start a discussion on possible extension of HARDy, please feel free to raise an issue

### Bug Report: If you would like to report a bug, please follow this link

### Contributions: If you would to contribute to HARDy, you can fork the repository, add your contribution and generate a pull request. The complete guide to make contributions can be found at this link

## Acknowledgment

Maria Politi acknowledges support from the National Science Foundation through NSF-CBET grant 1917340

Owner

  • Name: EISy-as-Py
  • Login: EISy-as-Py
  • Kind: organization
  • Email: politim@uw.edu

JOSS Publication

HARDy: Handling Arbitrary Recognition of Data in Python
Published
March 14, 2022
Volume 7, Issue 71, Page 3829
Authors
Maria Politi ORCID
University of Washington, Department of Chemical Engineering, Seattle, WA, USA
Abdul Moeez ORCID
University of Washington, Department of Materials Science and Engineering, Seattle, WA, USA
David Beck ORCID
University of Washington, Department of Chemical Engineering, Seattle, WA, USA, eScience Institute, University of Washington, Seattle, WA, USA
Stuart Adler
University of Washington, Department of Chemical Engineering, Seattle, WA, USA
Lilo Pozzo ORCID
University of Washington, Department of Chemical Engineering, Seattle, WA, USA
Editor
George K. Thiruvathukal ORCID
Tags
Feature Engineering Kernel methods Machine Learning

GitHub Events

Total
Last Year

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 641
  • Total Committers: 6
  • Avg Commits per committer: 106.833
  • Development Distribution Score (DDS): 0.406
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
amoeezuw a****z@u****u 381
Maria Politi p****m@u****u 152
David Hurt f****a@g****m 89
Moeez 5****w 16
Kyle Niemeyer k****r@f****m 2
Abdul Moeez a****z@A****l 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 12
  • Total pull requests: 10
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 2 months
  • Total issue authors: 4
  • Total pull request authors: 2
  • Average comments per issue: 2.5
  • Average comments per pull request: 0.7
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 9
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • andrewtarzia (5)
  • FencerDave (3)
  • alberto-battistel (3)
  • dacb (1)
Pull Request Authors
  • dependabot[bot] (9)
  • kyleniemeyer (1)
Top Labels
Issue Labels
bug (5) question (4) enhancement (3) documentation (1)
Pull Request Labels
dependencies (9) python (9)

Dependencies

doc/requirements.txt pypi
  • PyYAML *
  • docutils ==0.17.1
  • keras *
  • keras-tuner *
  • matplotlib *
  • nbsphinx *
  • numpy *
  • numpydoc *
  • pandas *
  • plotly *
  • scikit-image *
  • scikit-learn *
  • tensorflow *
  • tqdm *
environment.yml pypi
  • PyYAML *
  • contextlib2 *
  • google-images-download *
  • h5py ==2.10.0
  • keras-tuner ==1.0.1
  • opt-einsum *
  • plotly ==4.8.1
  • selenium *
  • tensorflow ==2.0
  • tqdm *
  • urllib3 *
setup.py pypi
  • google-pasta *
  • grpcio ==1.24.3
  • h5py ==2.10.0
  • keras ==2.3.1
  • keras-tuner ==1.0.1
  • numpy ==1.18.1
  • opencv-python *
  • pandas ==1.0.3
  • plotly ==4.8.1
  • python-dateutil ==2.8.1
  • pytz ==2020.1
  • pyyaml ==5.4
  • scikit-image ==0.17.2
  • scikit-learn ==0.23.1
  • scipy ==1.4.1
  • tensorflow ==2.2.0