Recent Releases of ECNet

ECNet - alvaDesc bug fix

Sets alvaDesc na\r return values to 0.0

Scientific Software - Peer-reviewed - Python
Published by tjkessler almost 2 years ago

ECNet - Update dependencies

  • update all package dependencies

Scientific Software - Peer-reviewed - Python
Published by tjkessler almost 2 years ago

ECNet - 4.1.2 - Update build/install method, add GitHub workflows, unittest -> pytest

  • Build/installation now uses pyproject.toml instead of the deprecated setup.py
  • Added GitHub workflows for PyPI publishing and unit testing
  • Unit tests now use pytest instead of unittest

Scientific Software - Peer-reviewed - Python
Published by tjkessler almost 3 years ago

ECNet - Dependency update

Update to package dependencies, notably PyTorch 1.8.0 -> 2.0.0. ECNet now requires Python 3.11+.

Scientific Software - Peer-reviewed - Python
Published by tjkessler about 3 years ago

ECNet - Updates to training runtime, tuning arguments

  • Added option to shuffle training/validation subsets every epoch
  • Update to docstrings/documentation
  • Added a "getting started" notebook in the examples directory
  • New argument format for ABC-based parameter tuning

Scientific Software - Peer-reviewed - Python
Published by tjkessler almost 5 years ago

ECNet - PyTorch rework, new API, bundled property sets

  • ECNet now leverages the PyTorch package for ML operations
    • This change presented an opportunity to overhaul ECNet from the ground up, allowing us to think about how the user will interact with this package. Ultimately, we wanted to make interactions easier.
  • Custom data structures were weird, and didn't belong in a ML toolkit. Instead, we offer PyTorch-based data structures, adjusted to house chemical data. Users can obtain SMILES strings and property values, or a ML-ready structure ready to be passed to ECNet for training.
  • All these changes require documentation, so full API documentation is available. We also have an example script, and would like to include more examples in the future.

Scientific Software - Peer-reviewed - Python
Published by tjkessler about 5 years ago

ECNet - Update to ECabc-based hyper-parameter tuning functions

Per ECabc's API changes in its 3.0.0 update, this ECNet update incorporates these changes into all relevant functions.

Scientific Software - Peer-reviewed - Python
Published by tjkessler over 5 years ago

ECNet - Better implementation of TensorFlow 2.0

  • ecnet.models.mlp.MultilayerPerceptron's implementation now makes sense, and leads to faster training times
  • some database cleanup
  • in case ECNet is not utilized to use a pre-trained project, input QSPR descriptor names are also saved inside the project (DataFrame object not required)

Scientific Software - Peer-reviewed - Python
Published by tjkessler over 6 years ago

ECNet - ML back-end, workflows, database additions/encoding, and more

  • Addition of validated, PaDEL/alvaDesc-generated YSI databases
  • Update to repository links, author information
  • ecnet.utils.data_utils now forces UTF-8 encoding for all database creation/saving
  • ML back-end updated to TensorFlow 2.0.0

    • No API changes to ecnet.models.mlp.MultilayerPerceptron
    • Existing .h5 model files will not work with the updated class

    Note: initially, PyTorch was looked at as an alternative; however, after tests to evaluate performance were conducted and the viability of installing PyTorch on high-performance machines available to the ECRL were both deemed inadequate, updating to TensorFlow 2.0.0 was deemed the most appropriate action.

  • Only the following hyper-parameters are tuned with the built-in functions:

    • Learning rate of Adam optimization function
    • Learning rate decay of Adam optimization function
    • Batch size during training
    • Patience (if validating, # epochs to wait for better validation loss, else terminate training)
    • Size of each hidden layer

    Note: with the relatively small number of samples our models are trained with, it does not make sense to adjust hyper-parameters such as **beta1, **beta_2, and epsilon. The hyper-parameters listed above are theorized to play a much more important role with how the models train/perform._

  • Added the UML ECRL's general publication workflow as ecnet.workflows.ecrlworkflow.createmodel

  • If using ecnet.Server and not creating a project, a single model's filename can now be specified as an additional argument (default: model.h5)

  • TensorFlow's verbose argument is now propagated from ecnet.Server.train to the model during training; added as an additional argument

  • ecnet.models.mlp.MultilayerPerceptron.fit now returns a tuple: (learn losses, validation losses); learn losses and validation losses are both lists containing loss values (mean squared error) at every epoch; if training a single model using ecnet.Server.train, this tuple is returned; if not performing validation, the validation losses list is populated with None elements equal in size to the learn losses list

  • If installing using setup.py, installing TensorFlow is optional; to skip the installation of the pre-compiled PyPI distribution of TensorFlow, run setup.py with python setup.py --omit_tf install

    Note: other methods of installing TensorFlow offer clear benefits (GPU support, different CPU instruction sets, etc.), therefore we want to provide an option for the user to use an existing installation of TensorFlow instead of forcing the PyPI-sourced version.

Scientific Software - Peer-reviewed - Python
Published by tjkessler over 6 years ago

ECNet - Bug fixes

  • If validation/test sets are empty, input parameter limiting processes will still run
  • Server.limit_inputs now correctly returns input parameter names, importances

Scientific Software - Peer-reviewed - Python
Published by tjkessler almost 7 years ago

ECNet - Bug fixes, enhancements

  • ecnet.Server.removeoutliers_ and ecnet.tasks.removeoutliers_ have been removed
    • while detecting outliers may be beneficial in determining abnormalities in data, removing them entirely is likely not the right approach (in terms of fuel property prediction). Once a viable usage has been determined, outlier detection will be included.
  • Added the batchsize_ hyper-parameter, included in the default model configuration and hyper-parameter tuning process
    • Relevant unit tests updated
  • Any missing model configuration variables from config files generated with previous versions of ECNet will now be set to their default values
    • Additional unit tests added
  • Added option to convert SMILES to MDL during PaDEL-based database creation
    • Additional unit test added
  • Added PaDEL-generated databases for all properties
  • ecnet.tasks.limit_inputs.limit_rforest now relies on sklearn.ensemble.RandomForestRegressor as its only dependency
    • limit_rforest now returns list of parameter names/importances instead of a modified DataFrame
    • Server.limit_inputs also returns a list of parameter names/importances
    • Removed the ditto-lib dependency
  • Bug fixes:
    • Server._sets now loads when a PRJ file is opened via ecnet.Server
    • ecnet.utils.data_utils.DataFrame.set_inputs now immediately applies selected inputs to L/V/T sets
    • ParityPlot parity lines now scale to reflect data minimum/maximum
  • More robust unit tests for MultilayerPerceptron, database creation, input parameter limiting
  • All unit tests may now be run individually

Scientific Software - Peer-reviewed - Python
Published by tjkessler almost 7 years ago

ECNet - Better MLP validation, moved multiprocessing checks

  • Training an MLP using a validation set now uses Keras' early stopping callback to determine learning cutoff, preserves weights at best validation loss
  • Moved multiprocessing.setstartmethod to multiprocessed tasks

Scientific Software - Peer-reviewed - Python
Published by tjkessler almost 7 years ago

ECNet - Removal of conversion functions, slight Server rework

1.) The following conversions have been removed from ECNet:

  • get_smiles
  • smilestodescriptors
  • smilestomdl
  • mdltodescriptors

*Note: these were adding clutter, and were not within the main scope of ECNet.

2.) PaDEL-Descriptor is no longer bundled into ECNet

*Note: with the removal of conversion functions, this is no longer needed.

3.) Database creation functions now rely on two separate packages:

  • PaDELPy (https://github.com/ECRL/PaDELPy) - QSPR descriptor generation using PaDEL-Descriptor
  • alvaDescPy (https://github.com/ECRL/alvaDescPy) - QSPR descriptor generation using alvaDesc

*Note: it made sense to create separate packages for interfacing with these software, a Python interface for generating QSPR descriptors is generally quite handy.

4.) ecnet.tools.database.createdb_'s arguments have been changed:

```python

ecnet.tools.database.createdb(['CC', 'CCC'], 'mydatabase.csv', targets=[13, 47]) ```

Construct using alvaDesc:

```python

ecnet.tools.database.createdb(['CC', 'CCC'], 'mydatabase.csv', targets=[13, 47], backend='alvadesc') ```

*Note: supplying SMILES strings and targets using lists makes more sense than requiring the user to create a separate file - this change allows the user to choose where the data comes from.

5.) ecnet.tools.project.predict's arguments have been changed:

```python

results = ecnet.tools.project.predict(['CC', 'CCC'], 'my_project.prj') print(results) [[13], [47]] ```

*Note: similar to why we switched to lists as inputs in database creation, makes more sense

6.) ecnet.Server has been rearranged a bit:

  • project training has been moved to a separate function at ecnet.tasks.training.trainproject_
  • various functions have been moved to ecnet.utils.serverutils_:
    • creating a project folder structure
    • saving a project as a .prj file
    • opening a .prj file to use
  • task-specific logging messages have been moved to their respective functions in ecnet.tasks

*Note: ecnet.Server needed to be shrunk down, and functions that were obviously utilities were moved into utility files. This should also provide more direct access to the "back-end" of ECNet (subverting Server usage), allowing greater variation in experimental procedure.

7.) Added a suite of unit tests implemented with the unittest library:

  • in addition to Server unit tests, individual utilities of ECNet are tested
  • added a Python script, /tests/testall.py_, to automatically run all unit tests and report a summary of successes/failures

*Note: it's time for "proper" unit testing, and that means implementing a unit testing package. I'm looking forward to expanding ECNet's tests and introduce more automation into the testing process.

8.) Installation now forces TensorFlow 1.13.1 to be installed

*Note: I've encountered pip install tensorflow installing the 2.0.0 beta, which ECNet does not currently support - we'll make the change when we're ready (and so is Keras)

9.) Changed/added a variety of databases to the /databases/ directory

  • All databases constructed using alvaDesc
  • All SMILES strings have been validated with respect to compound name
    • PubChemPy (https://github.com/mcs07/PubChemPy) is a lifesaver
    • Compounds not found on PubChem were validated in-house by an ECRL research assistant

*Note: in order to ensure accurate QSPR-descriptor to experimental value correlation, accurate SMILES strings are necessary (assuming descriptors are being generated using them).

Scientific Software - Peer-reviewed - Python
Published by tjkessler about 7 years ago

ECNet - Type checking, improved unit testing

  • All methods/functions now enforce specific types for arguments, return values
  • calcr2 function now uses scikit-learn's r2score function
  • Changed unit testing scheme, now uses unittest library
    • added a suite of unit tests

Scientific Software - Peer-reviewed - Python
Published by tjkessler about 7 years ago

ECNet - Addition to conversion tools, update to database creation function

  • Addition of the "smilestodescriptors" function
  • Database creation functions now use the "smilestodescriptors" function, bypassing the use of OpenBabel (used for SMILES -> MDL -> descriptors)
  • Updated relevant documentation

Scientific Software - Peer-reviewed - Python
Published by tjkessler about 7 years ago

ECNet - Updates to DataFrame, DataPoint classes and their functionality

  • STRING and GROUP attributes for DataPoints (rows in an ECNet-formatted database) can now be accessed as object attributes. For example:

```python

from ecnet.utils.datautils import DataFrame df = DataFrame('mydatabase.csv') firstentry = df.datapoints[0] print(firstentry.SMILES) # SMILES is a STRING column in the supplied database C print(getattr(firstentry, 'Compound Name') # STRINGs with spaces are obtained like this Methane ```

  • Additional STRING columns can be supplied when creating an ECNet-formatted database
  • Fixed issue where YAML package was throwing a loader warning
  • Suppressed TensorFlow warnings about deprecation
  • Updates to documentation
  • Other minor changes

Scientific Software - Peer-reviewed - Python
Published by tjkessler about 7 years ago

ECNet - Bug fixes, database creation improvements

  • Added "setspawnmethod", fixes multiprocessing on Unix systems
  • Databases can now be constructed with fingerprints instead of descriptors
  • "get_smiles" function now returns an empty string if the molecule is not found on PubChem
  • Slight updates to logging
  • Hyperparameter tuning bug fix

Scientific Software - Peer-reviewed - Python
Published by tjkessler about 7 years ago

ECNet - 3.0.0 Release

  • Server object refactor
    • Includes API changes
  • Update to ML back end (raw TensorFlow -> Keras)
  • Logging moved to separate module
  • Input descriptor limiting now uses random forest regression, via ditto-lib 1.0.0
  • Implemented ReadTheDocs page
  • Added classes for parity plot generation
  • Updated hyperparameter tuning for ECabc 2.2.2 release
  • Implemented methods for removing outliers, via ditto-lib 1.0.0

Scientific Software - Peer-reviewed - Python
Published by tjkessler about 7 years ago

ECNet - Bug fixes, GA improvements, data sorting options, optimizations

  • Updated parameter limiting with GA, per 0.6.0 PyGenetics update
  • Fixed bug with MultilayerPerceptron returning "NaN" values
  • Changed default parameter bounds for ABC tuning
  • Added "sortstring" argument for Server.importdata
  • Added "Getting Started" tutorial for new users

Scientific Software - Peer-reviewed - Python
Published by tjkessler over 7 years ago

ECNet - Tool integrations, database additions

  • Integrated various tools:
    • Database creation tool (wrappers for Open Babel, PaDEL-Descriptor)
    • Using project tool (supply text w/ molecules, ECNet .prj file)
    • Get SMILES from molecule name (PubChemPy)
    • Convert SMILES to MDL/SDF
    • Convert MDL/SDF to QSPR descriptors
  • Added unit tests for database creation tool, using project tool
  • Removed command line tools (integrated, above)
  • Added various databases:
    • Cloud point
    • Pour point
    • Yield sooting index

Scientific Software - Peer-reviewed - Python
Published by tjkessler over 7 years ago

ECNet - Bug fixes, tool additions

  • Fixed bug where Server.train() did not obtain shuffled sets from its DataFrame if the user is shuffling sets for each candidate neural network
  • Server._sets is now public instead of private
  • Removed unnecessary import in ecnet\model.py
  • Added a suite of tools for creating an ECNet-formatted database:
    • tools\createecnetdb.py (aggregates SMILES, QSPR descriptors for database creation)
    • tools\nametosmiles.py (uses PubChemPy to obtain SMILES strings for supplied molecule names)
    • tools\smilestoqspr.py (generates QSPR descriptors from SMILES strings using Open Babel and PaDEL-Descriptor)
    • added version of PaDEL-Descriptor
    • added how-to-use documentation (tools\README.md)

Scientific Software - Peer-reviewed - Python
Published by tjkessler over 7 years ago

ECNet - Multiprocessing and More

  • Rework of Server method arguments, functionality
    • This includes an API change (check your scripts!)
  • Rework of Server training/selection algorithms
  • Improvements to logging
  • Rework of model saving/loading
  • When input dimensionality is reduced, selected inputs are applied to current data immediately
  • Updates to hyperparameter tuning
    • Can now shuffle data sets for each bee
    • Supports ECabc 2.1.0

Scientific Software - Peer-reviewed - Python
Published by tjkessler over 7 years ago

ECNet - Logging, DocStrings and Documentation

  • Added support for logging with the ColorLogging package
  • Updated docstrings, uses Google's documentation format
  • Modules now only import functions/classes they use instead of whole packages
  • Added unit testing for Server methods
  • Updated genetic limiting per PyGenetics 0.5.0 update
  • Made project variables accessible
  • Updated documentation
    • Moved Server method documentation to ecnet/README.md
    • Updated README.md to include more code block examples
    • Updated terminology, project scope, clarity

Scientific Software - Peer-reviewed - Python
Published by tjkessler over 7 years ago

ECNet - Multiprocessing Support for Input Parameter Dimensionality Reduction

  • Added support for PyGenetics multiprocessing for limiting input parameter dimensionality

Scientific Software - Peer-reviewed - Python
Published by tjkessler almost 8 years ago

ECNet - Bug fixes for saving/opening projects from other directories

  • Full paths for project_file supplied to Server will now be utilized
  • Model configuration file will not be extracted to the current working directory

Scientific Software - Peer-reviewed - Python
Published by tjkessler almost 8 years ago

ECNet - Bug fixes for input parameter limiting

  • Fixed issue with data shuffling for genetic algorithm population members
  • Fixed case where feedback printing variable was not defined when calling limiting function

Scientific Software - Peer-reviewed - Python
Published by tjkessler almost 8 years ago

ECNet - 1.5 update - pep8 styling, rework of server, model classes/methods

  • Changes to Server methods:
    • openproject -> _open_project, now called with optional argument on Server initialization
    • Updated default arguments for createproject, importdata, trainmodel, selectbest
    • Configuration .yml file now only contains model architecture/learning variables
      • Variable names renamed for clarity
    • tune_hyperparameters now scales to predefined number of hidden layers
  • Changes to ecnet/model.py methods:
    • Activation functions are now stored in ACTIVATION_FUNCTIONS, a dictionary of callable functions
    • corresponding to supplied activation functions
    • Layer objects house activation functions corresponding to ACTIVATION_FUNCTIONS
    • _feedforward uses the activation function stored in Layer for computation
  • Updates to documentation/examples reflecting above changes
  • Code style == pep8

Scientific Software - Peer-reviewed - Python
Published by tjkessler almost 8 years ago

ECNet - Fixed project opening (sets project_name before unzipping)

Scientific Software - Peer-reviewed - Python
Published by tjkessler almost 8 years ago

ECNet - Update to project saving/using

Upon saving a project, the current Server configuration is now included in the .project file; when opening the .project file for use, the configuration found in the .project file is imported as the Server's configuration.

Scientific Software - Peer-reviewed - Python
Published by tjkessler almost 8 years ago

ECNet - Update to docstrings and code styling

  • General cleanup for easy-reading and documentation

Scientific Software - Peer-reviewed - Python
Published by tjkessler almost 8 years ago

ECNet - Shuffling for Input Parameter Dimensionality Reduction

  • Added option to shuffle data sets for each population member when limiting input parameter dimensionality using genetic algorithm
  • Removed shuffling functionality for iterative inclusion dimensionality reduction; caused indexing errors

Scientific Software - Peer-reviewed - Python
Published by tjkessler almost 8 years ago

ECNet - Bug fixes for 1.4.1

Moved DataPoint class out of DataFrame class in ecnet/datautils.py; having this inside caused errors in saving DataPoint objects using saveproject() Server method

Scientific Software - Peer-reviewed - Python
Published by tjkessler almost 8 years ago

ECNet - Bug fixes and exception handling for 1.4.0

  • Removed extraneous lines in error_utils.py
  • Added exceptions for empty sets in model.py

Scientific Software - Peer-reviewed - Python
Published by tjkessler about 8 years ago

ECNet - Overhaul of ECNet source files

  • Update to server.py syntax (some method names have been changed, compare your scripts to the new examples!)
  • Overhauled data_utils.py
  • Reformatted model.py
  • Inclusion of genetic algorithm for input feature dimensionality reduction
  • Artificial bee colony functionality for hyperparameter tuning now relies on ECabc Python package, it is no longer integrated into ECNet
  • Update to database format (your old databases will need to be edited to remain compatible!)

Scientific Software - Peer-reviewed - Python
Published by tjkessler about 8 years ago

ECNet - Integration of Hyperparameter Tuning

Neural network training hyperparameters can now be tuned using an artificial bee colony optimization algorithm. For more on artificial bee colonies: https://en.wikipedia.org/wiki/Artificialbeecolony_algorithm

Scientific Software - Peer-reviewed - Python
Published by tjkessler about 8 years ago

ECNet - Limit parameter bug fix

Fixed issues related to limiting the input parameter dimensionality. Files edited: - ecnet/limit_parameters.py - ecnet/server.py - setup.py

Scientific Software - Peer-reviewed - Python
Published by tjkessler over 8 years ago