Recent Releases of chemprop

chemprop - v2.2.1

Cuik-molmaker

Chemprop can now use cuik-molmaker (https://github.com/NVIDIA-Digital-Bio/cuik-molmaker), a C++/python package that accelerates atom and bond featurization. Usage of cuik-molmaker accelerates Chemprop training by 1.6X and inference by 2.4X. In addition, memory usage is reduced by ~80% enables larger-scale training and inference workloads. It can be used as a drop in replacement for the featurization classes implemented in Chemprop. Use chemprop/scripts/check_and_install_cuik_molmaker.py to install the correct version for your environment. Then specify --use-cuikmolmaker-featurization if using the command line. Or if you are using chemprop in a python script, import data.LazyMoleculeDatapoint, data.CuikmolmakerDataset, and featurizers.CuikmolmakerMolGraphFeaturizer.

Other notable changes

We continue to make the command line interface easier to use. The train command, chemprop train, no longer requires a test set. Additionally, datapoint descriptors (e.g., temperature, pressure) can now be included in the main input file using a command similar to chemprop train --data-path input.csv --descriptors-columns temperature pressure.

There are also several bug fixes in this release. See the detailed PR list below.

What's Changed

Zero-indexing refactoring of splits.json used in both tutorial & test by @jxl26 in https://github.com/chemprop/chemprop/pull/1244
Remove unneeded folders from python distrubution by @JacksonBurns in https://github.com/chemprop/chemprop/pull/1240
Standardizing the name Chemprop by @akshatzalte in https://github.com/chemprop/chemprop/pull/1259
Update suggested CheMeleon citation by @JacksonBurns in https://github.com/chemprop/chemprop/pull/1267
Enable cuik-molmaker for accelerated molecule featurization by @sveccham in https://github.com/chemprop/chemprop/pull/1253
Fix randomly failing overfit test by @jxl26 in https://github.com/chemprop/chemprop/pull/1271
Bug Fix: Add handling for --from-foundation argument in hpopt by @JacksonBurns in https://github.com/chemprop/chemprop/pull/1273
Add support for empty test set by @jxl26 in https://github.com/chemprop/chemprop/pull/1243
Ensure models loaded for transfer learning are on CPU by @JacksonBurns in https://github.com/chemprop/chemprop/pull/1252
Fixes multiple quantile regression bugs by @craabreu in https://github.com/chemprop/chemprop/pull/1229
Support extra feature columns in input datasets by @jxl26 in https://github.com/chemprop/chemprop/pull/1250
Fix bug for molecule + reaction dataset with --molecule-featurizers (introduced in #1253) by @KnathanM in https://github.com/chemprop/chemprop/pull/1274
Make hpopt use --tracking-metric by @JacksonBurns in https://github.com/chemprop/chemprop/pull/1281
Clean up code and fix bugs for cuik-molmaker (in #1253) by @KnathanM in https://github.com/chemprop/chemprop/pull/1275

New Contributors

@jxl26 made their first contribution in https://github.com/chemprop/chemprop/pull/1244
@sveccham made their first contribution in https://github.com/chemprop/chemprop/pull/1253

Full Changelog: https://github.com/chemprop/chemprop/compare/v2.2.0...v2.2.1

- Python
Published by KnathanM 11 months ago

chemprop - v2.2.0 Atom and bond property prediction + Foundation Models

With this release, we finish our reimplementation of chemprop v1 to be modern and maintainable. The last major feature from chemprop v1 that we plan to port to v2 is support for atom and bond property prediction. This was accomplished in #1136. Documentation for using this feature from the CLI is available here while examples for using it from a python script are available here and here. As a reminder for CLI users coming from v1, we have a helpful transition guide here. Also note that we do not support converting v1 models for atom and bond targets to v2 models as the model architecture has been simplified in v2. Now if there are multiple atom targets (or similarly multiple bond targets) a single feed forward network (FFN) with multiple outputs is used to make predictions for all those targets. A separate FFN is used for each of molecule, atom, and bond targets, as well as for each of atom prediction constraints and bond prediction constraints if those are used (see the example notebooks).

A notable new feature is the ability to use pretrained message passing layers with new predictor heads, added in #1226, with the CLI flag --from-foundation. This makes it possible to train large foundation style chemprop models on many basic chemistry tasks and then use the message passing layer weights to initialize a new model for training on other smaller datasets. An example of such a model CheMeleon is shown here.

CLI changes

The hyperparameter search space has been updated to include 6 message passing steps as an option in #1230. This option was included in chemprop v1 and was accidentally excluded during our reimplementation in v2.
The "scaled exponential linear unit" (SELU) activation function is removed from the hyperparameter search space in #1146 because it is normally used in self normalizing models, which chemprop does not support. In the same PR, all other torch activation modules are made available as an option via the CLI. In python scripts, customized activation functions may also now be used.
Stereochemical information (R/S and cis/trans) is included in the default featurization. If a model is trained on molecules that do not include this stereochemical info, some of the model weights will not be updated. This could cause erroneous predictions at inference time if molecules are used with stereochemical info. To remedy this, we have added a --ignore-stereo flag and corresponding function argument chemprop.utils.utils.make_mol(smi, ignore_stereo = True) that tells chemprop to ignore any stereochemical info in the input molecule. See #1196 and #1216.

Bug fixes

Previously a dataset could not be missing values if the values were bounded. This is fixed in #1203. Thank you to @lewismervin1 for the bug report and fix.
Also fixed the output shape of dropout uncertainty predictions in #1205 thanks to a bug report by @lewismervin1.
If matthew correlation coefficient (MCC) is used as a metric, a higher value is better, but the checkpointing callback was told that lower is better. This is fixed by #1218.
The paths of extra atom/bond features/descriptors would not save properly in a config file. This is fixed in #1189 and #1190.
Scalers from a pretrained GPU-trained model would not load correctly on a CPU-only machine. This is fixed in #1231. Thank you to @jonwzheng for the bug report.

Finally, not exactly related to this version of chemprop, but we have also added lists of external dependency versions that are known to work with previous versions of chemprop in #1225. This is useful if you want to use a specific version of chemprop but are unsure if it is compatible with the most up to date versions of external dependencies. For example, torch v2.6 is incompatible with earlier versions of chemprop because torch.load now uses weights_only=True by default.

What's Changed

Allow Numpy 2+ by @JacksonBurns in https://github.com/chemprop/chemprop/pull/1193
Adds CLI option to ignore chirality in SMILES by @craabreu in https://github.com/chemprop/chemprop/pull/1196
Make sure bounded targets dataframe is all strings by @KnathanM in https://github.com/chemprop/chemprop/pull/1203
Includes Chemprop logo for dark mode by @craabreu in https://github.com/chemprop/chemprop/pull/1198
Fix dropout uncertainty output shape by @KnathanM in https://github.com/chemprop/chemprop/pull/1205
(Data point) weights should be shape b or b x 1 by @KnathanM in https://github.com/chemprop/chemprop/pull/1210
Add warning to use v1 featurizer to converting script by @KnathanM in https://github.com/chemprop/chemprop/pull/1180
MCC metrics higher is better by @KnathanM in https://github.com/chemprop/chemprop/pull/1218
Expand chirality ignore to ignore bond stereochemistry by @KnathanM in https://github.com/chemprop/chemprop/pull/1216
Make converting 2.0 to 2.1 models easier and more obvious by @JacksonBurns in https://github.com/chemprop/chemprop/pull/1191
Add Support for Fine-Tuning Foundation Models with --from-foundation and add CheMeleon by @JacksonBurns in https://github.com/chemprop/chemprop/pull/1226
Add Known Working External Dependency Lists by @JacksonBurns in https://github.com/chemprop/chemprop/pull/1225
Make features descriptors path config file-able by @KnathanM in https://github.com/chemprop/chemprop/pull/1189
Ensure device="cpu" when loading Scalers by @JacksonBurns in https://github.com/chemprop/chemprop/pull/1231
Increases customizability of activation functions by @craabreu in https://github.com/chemprop/chemprop/pull/1146
Allow importing uncertainty as a subpackage by @craabreu in https://github.com/chemprop/chemprop/pull/1237
Fix hpopt write config by @KnathanM in https://github.com/chemprop/chemprop/pull/1190
v2.2: Atom and bond property predictions by @KnathanM in https://github.com/chemprop/chemprop/pull/1136
undo unintended v1 -> v2 hpopt search space changes by @KnathanM in https://github.com/chemprop/chemprop/pull/1230

Full Changelog: https://github.com/chemprop/chemprop/compare/v2.1.2...v2.2.0

- Python
Published by KnathanM about 1 year ago

chemprop - v2.1.2

What's Changed

Important changes * CLI implementation of RIGR as an option in --multi-hot-atom-featurizer-mode by @akshatzalte in https://github.com/chemprop/chemprop/pull/1172

A new featurization scheme, RIGR (Resonance Invariant Graph Representation), is now available. To access it via the CLI, use --multi-hot-atom-featurizer-mode rigr. This featurizer uses only resonance invariant features so it treats all resonance structures of a molecule identically. It uses a subset of the atom and bond features from the default v2 featurizer. With 60% fewer features, RIGR has shown comparable or superior performance across a variety of property prediction tasks in a forthcoming manuscript. An example Jupyter notebook is also provided.

Other changes * Apply task_weights to default loss function in CLI by @craabreu in https://github.com/chemprop/chemprop/pull/1170 * Check if dropout prop needs to be restored by @KnathanM in https://github.com/chemprop/chemprop/pull/1178 * Message Passing Error Message Fix by @twinbrian in https://github.com/chemprop/chemprop/pull/1161 * Fix metrics problems - Cuda-> CPU, no _defaults by @KnathanM in https://github.com/chemprop/chemprop/pull/1179 * Update convert script for v1.4 by @KnathanM in https://github.com/chemprop/chemprop/pull/1176

New Contributors

@craabreu made his first contribution in https://github.com/chemprop/chemprop/pull/1170

Full Changelog: https://github.com/chemprop/chemprop/compare/v2.1.1...v2.1.2

- Python
Published by akshatzalte over 1 year ago

chemprop - v2.1.1

Notable changes

In #1090, we started the process of integrating logging into the core code. This will make it easier for users to control what information Chemprop prints to output. It will also make it easier for developers to include more information outputs for potential debugging.

Scipy 1.15 subtly change how logit works which caused some of our tests to fail (as the values reported were slightly different than before). The expected test values have been updated. #1142

A new example notebook has been added which demonstraits how to adapt Chemprop to work with Shapley value analysis. This is another method to lend some intepretability to Chemprop models by highlighting which atom/bond features are most impactful to the final prediction value. #938

We continue to try to make chemprop easy to use. In #1091 and #1124 we added better warnings and error messages. And in #1151 we made is easy to open the example notebooks in Google Colab. This allows people reading the docs to immediately jump in and try chemprop without needing to set up a python environment.

Bug Fixes

In #1097, we fixed a bug where the transforms for scaling extra features/descriptors were turned off during validation. This caused models trained with these extra inputs to not report accurate metrics during training, which is a problem if the "best" model is selected instead of the last model as is done in hyperparameter optimization. Training a model and using the last model was unaffected as was doing inference.

1084 fixed a bug where R2Score did not have the attribute `task_weights`. This attribute is not used but is needed for compatability with other metrics

In v2.1 we transitioned to using torchmetrics for our metrics and loss functions, in part because it takes care of training across multiple nodes (DDP) automatically. Our custom metric for Matthew's correlation coefficient however was not set up the way torchmetrics expected. This was fixed in #1131.

What's Changed

splits file is json by @KnathanM in https://github.com/chemprop/chemprop/pull/1083
add more helpful warnings about the splitting api change by @JacksonBurns in https://github.com/chemprop/chemprop/pull/1091
Fix: Splits file can have multiple splitting schemes by @KnathanM in https://github.com/chemprop/chemprop/pull/1086
Set all transforms to train during validation by @KnathanM in https://github.com/chemprop/chemprop/pull/1097
updated warning to logger by @twinbrian in https://github.com/chemprop/chemprop/pull/1090
Add task weights to r2score by @KnathanM in https://github.com/chemprop/chemprop/pull/1084
Fix tracking_metric overwrite issue by @shihchengli in https://github.com/chemprop/chemprop/pull/1105
Fix save_individual_predictions with ensembling by @shihchengli in https://github.com/chemprop/chemprop/pull/1110
Add a helpful warning when invalid SMILES are passed by @JacksonBurns in https://github.com/chemprop/chemprop/pull/1124
Fix batch size calculation for multicomponent by @KnathanM in https://github.com/chemprop/chemprop/pull/1098
Not use transform_variance for unscaled targets by @shihchengli in https://github.com/chemprop/chemprop/pull/1108
Add output size to attentive hparams by @KnathanM in https://github.com/chemprop/chemprop/pull/1133
Fix test failure due to scipy logit by @KnathanM in https://github.com/chemprop/chemprop/pull/1142
fix docs about extra atom descriptors by @KnathanM in https://github.com/chemprop/chemprop/pull/1139
Fix MCC for DDP and multitask by @KnathanM in https://github.com/chemprop/chemprop/pull/1131
V2: Add Shapley Value notebook for interpretability by @oscarwumit in https://github.com/chemprop/chemprop/pull/938
add notebooks to colab and docs by @KnathanM in https://github.com/chemprop/chemprop/pull/1151

Full Changelog: https://github.com/chemprop/chemprop/compare/v2.1.0...v2.1.1

- Python
Published by KnathanM over 1 year ago

chemprop - v2.1.0

The v2.1 release adds the uncertainty quantification modules, including estimation, calibration, and evaluation (#937). For more details on uncertainty quantification in Chemprop, please refer to the documentation and the example notebook. Additionally, we switched the loss functions and metrics to torchmetrics (#1022). With this change we also changed the "val_loss" reported to be calculated the same as the training loss to make them comparable (#1020). We also changed Chemprop to use replicates instead of cross validation (#994) and batch normalization is now disabled by default (#1058).

Core code changes

The validation_loss_function is removed in #1023.
The batch norm is disabled by default in #1058
An new predictor, QuantileFFN, is added in #963
BinaryDirichletLoss and MulticlassDirichletLoss are integrated into DirichletLoss in #1066
The split type of CV and CV_NO_VAL are removed in #994
A models list of metric is now registered as children modules in #1020

CLI changes

Disable batch norm by default, and it can be turned on by --batch-norm #1058
Many CLI flags related to uncertainty quantification are added #1010
Quantile regression is now supported via -t regression-quantile #963
The cross validation (CV) is replaced with replicates. The number of replicates can be specified via --num-replicates and the flag --num-folds is deprecated #994
--tracking-metric is added which is the metric to track for early stopping and checkpointing #1020

New notebooks

An notebook showing interoperability of Chemprop featurizer w/ other libraries (DGL and PyG) #1063
Active learning #910
Uncertainty quantification #1071

CI/CD

Ray can be tested on Python 3.12 #1064
USE_LIBUV: 0 is added into the CI workflow #1065

Backwards Compatibility Note

Models trained with v2.0 will not load properly in v2.1 due to the loss functions file being moved. A conversion script is provided to convert a v2.0 model to one compatible with v2.1. Its usage is python chemprop/utils/v2_0_to_v2_1.py <v2_0.pt> <v2_1.pt>

data.make_split_indices now always returns a nested list. Previously it would only return a nested list for cross validation. We encourage you to use data.make_split_indices(num_replicates=X) where X is some number greater than 1, to train on multiple splits of your data to get a better idea of the performance of your architecture. If you do use only one replicate, you will need to unnest the list like so: train_indices, val_indices, test_indices = data.make_split_indices(mols) train_data, val_data, test_data = data.split_data_by_indices( all_data, train_indices, val_indices, test_indices ) train_data, val_data, test_data = train_data[0], val_data[0], test_data[0]

What's Changed

change installed torch version on windows actions again by @shihchengli in https://github.com/chemprop/chemprop/pull/1062
.pt instead of .ckpt by @twinbrian in https://github.com/chemprop/chemprop/pull/1060
add ModelCheckpointing to training.ipynb so best model is used automatically by @donerancl in https://github.com/chemprop/chemprop/pull/1059
Add ray to tests on python 3.12 by @KnathanM in https://github.com/chemprop/chemprop/pull/1064
v2.1 Feature: Replicates Instead of Cross Validation Folds by @JacksonBurns in https://github.com/chemprop/chemprop/pull/994
disable libuv with env var rather than avoiding latest torch by @JacksonBurns in https://github.com/chemprop/chemprop/pull/1065
Add new example notebook for active learning by @joelnkn in https://github.com/chemprop/chemprop/pull/910
Fix: splits column is a string not a list by @KnathanM in https://github.com/chemprop/chemprop/pull/1074
Update chemprop to v2.1 in https://github.com/chemprop/chemprop/pull/1038
- This PR included the following PRs:
- Rerun notebooks for v2.1 by @KnathanM in #1067
- Refactor with torchmetrics by @KnathanM in #1022
- update train docs for v2.1 by @KnathanM in #1069
- Disable batch norm by default by @jonwzheng in #1058
- Add notebook showing interoperability of Chemprop featurizer w/other libraries by @jonwzheng in #1063
- Add tracking metric options; make metrics ModuleList; other improvements by @KnathanM in #1020
- Remove old validate-loss-function function by @KnathanM in #1023
- V2: Uncertainty implementation in #1058
  - This PR included the following PRs:
  - Improve the docstring for uncertainty modules by @shihchengli in #986
  - Add Platt calibrator by @KnathanM in #961
  - Add dropout and ensemble predictors by @joelnkn in #970
  - Add NLL and Spearman Uncertainty Evaluators by @am2145 in #984
  - Add quantile regression by @shihchengli in #963
  - Add miscalibration area and ence evaluators by @shihchengli in #1012
  - Add isotonic calibrators by @KnathanM in #1053
  - V2 conformal calibrators by @shihchengli in #989
  - V2 conformal evaluators by @shihchengli in #1005
  - Uncertainty regression calibrators (non-conformal) by @shihchengli in #1055
  - Adding Evidential, MVE, and Binary Dirichlet Uncertainty Predictors by @akshatzalte in #1061
  - Cleanup the uncertainty modules by @shihchengli in #1072
  - Multiclass dirichlet give uncertainty by @KnathanM in #1066
  - Rename uncertainty estimator by @KnathanM in #1070
  - Update uncertainty notebook by @shihchengli in #1071
  - Add uncertainty quantification to the predict CLI by @shihchengli in #1010

Full Changelog: https://github.com/chemprop/chemprop/compare/v2.0.5...v2.1.0

- Python
Published by shihchengli over 1 year ago

chemprop - v2.0.5

We continue to enhance and improve the functionality and usability of Chemprop. If there are things you'd like to see addressed in a future update, please open an issue or PR.

Core code changes

We discovered that our Noam learning rate scheduler does not match what was originally proposed. The current scheduler does work well though, so it was decided to not change the definition. Instead the scheduler was renamed and refactored to be more clear. By @shihchengli in https://github.com/chemprop/chemprop/pull/975 Work on uncertainty quantification methods revealed that our previous prediction tensor return dimensions would cause difficulty down the line. Now we have placed uncertainty into a separate dimension. By @hwpang in https://github.com/chemprop/chemprop/pull/959 The BinaryDirichletFFN and MulticlassDirichletFFN predictors were added early in the v2 development, but not tested. Now they have been tested and corrected. By @shihchengli in https://github.com/chemprop/chemprop/pull/1017 The RDKit 2D molecular featurizer was added back by popular demand. The versions used in v1 are available as well as a version that uses all available molecular features in rdkit.Chem.Descriptors. By @KnathanM in https://github.com/chemprop/chemprop/pull/877

CLI changes

Log statistical summary of training, validation, and test datasets by @donerancl in https://github.com/chemprop/chemprop/pull/882
Change the default verbose level to INFO by @shihchengli in https://github.com/chemprop/chemprop/pull/953
Save both probabilities and class label for multiclass classification by @shihchengli in https://github.com/chemprop/chemprop/pull/987
Add --remove-checkpoints flag to opt out of saving checkpoints by @shihchengli in https://github.com/chemprop/chemprop/pull/1014
Add --class-balance flag to train CLI by @shihchengli in https://github.com/chemprop/chemprop/pull/1011
Save target column names in model for use at inference by @hwpang in https://github.com/chemprop/chemprop/pull/935
Fix save-smiles-splits not working with rxn. columns as column header by @jonwzheng in https://github.com/chemprop/chemprop/pull/998

Transfer learning

Add new example notebook for transfer learning by @joelnkn in https://github.com/chemprop/chemprop/pull/904
Use pre-train output scaler to scale training data in CLI by @KnathanM in https://github.com/chemprop/chemprop/pull/1051
Add --checkpoint and --freeze-encoder flags in train CLI for transfer learning by @shihchengli in https://github.com/chemprop/chemprop/pull/1007

Documentation

Fixed typos in CLI reference and standardized formatting by @donerancl in https://github.com/chemprop/chemprop/pull/880
Example Notebook for Classification by @twinbrian in https://github.com/chemprop/chemprop/pull/1047
Improve frzn-ffn-layers description and update doc for transfer learning by @oscarwumit in https://github.com/chemprop/chemprop/pull/993
add transform tests by @KnathanM in https://github.com/chemprop/chemprop/pull/955
Add documentation for how to use a separate splits file (CLI) by @KnathanM in https://github.com/chemprop/chemprop/pull/1041

Other small bug fixes

Convert v1 models trained on GPU by @KnathanM in https://github.com/chemprop/chemprop/pull/978
Fix hpopting Notebook and CLI for Windows by @JacksonBurns in https://github.com/chemprop/chemprop/pull/1034
Update multiclass data to be compatible with rdkit 2024.09.1 by @jonwzheng in https://github.com/chemprop/chemprop/pull/1037
Define task_weights if it is None in MulticlassClassificationFFN by @shihchengli in https://github.com/chemprop/chemprop/pull/988
change installed torch version on windows actions again by @KnathanM in https://github.com/chemprop/chemprop/pull/1016
Update batch norm freezing to freeze running stats by @joelnkn in https://github.com/chemprop/chemprop/pull/952
Pass map_location through load_submodules() to torch.load() by @shihchengli in https://github.com/chemprop/chemprop/pull/1029
fix no-header-rows in predict command error by @sunhwan in https://github.com/chemprop/chemprop/pull/1001

New Contributors

@sunhwan made their first contribution in https://github.com/chemprop/chemprop/pull/1001
@twinbrian made his first contribution in https://github.com/chemprop/chemprop/pull/1047

Full Changelog: https://github.com/chemprop/chemprop/compare/v2.0.4...v2.0.5

- Python
Published by shihchengli over 1 year ago

chemprop - v2.0.4

Enhancements and New Features

This release introduces several enhancements and new features to Chemprop. A notable addition is a new notebook demonstrating Monte Carlo Tree Search for model interpretability (see here). Enhancements have been made to the output transformation and prediction saving mechanisms for MveFFN and EvidentialFFN. Additionally, users can now perform predictions on CPU even if the models were trained on GPU. Users are now also warned when not using the TensorBoard logger, helping them to be aware of available logging tools for better monitoring.

Bug Fixes

Several bugs have been fixed in this release, including issues related to Matthews Correlation Coefficient (MCC) metrics and loss calculations, and the behavior of the CGR featurizer when the bond features matrix is empty. The task_weights parameter has been standardized across all loss functions and moved to the correct device for MCC metrics, preventing device mismatch errors.

What's Changed

Standardize task_weights in LossFunction across all loss functions by @shihchengli in https://github.com/chemprop/chemprop/pull/941
Improve output transformation and prediction saving for MveFFN and EvidentialFFN by @shihchengli in https://github.com/chemprop/chemprop/pull/943
Enable CPU prediction for GPU-trained models by @snaeppi in https://github.com/chemprop/chemprop/pull/950
Fix Issues in MCC Metrics and Loss Calculations by @shihchengli in https://github.com/chemprop/chemprop/pull/942
Fix docs building by pinning sphinx-argparse by @jonwzheng in https://github.com/chemprop/chemprop/pull/964
Add Monte Carlo Tree search notebook for interpretability by @hwpang in https://github.com/chemprop/chemprop/pull/924
Fix CGR featurizer behavior when bond features matrix is empty by @jonwzheng in https://github.com/chemprop/chemprop/pull/958
Fix Failing CI for torch==2.4.0 on Windows ray[tune] Tests by @JacksonBurns in https://github.com/chemprop/chemprop/pull/971
warn users when not using tensorboard logger by @JacksonBurns in https://github.com/chemprop/chemprop/pull/967
Bug: Move task_weights to 'device' for MCC metrics by @YoochanMyung in https://github.com/chemprop/chemprop/pull/973

New Contributors

@snaeppi made their first contribution in https://github.com/chemprop/chemprop/pull/950
@YoochanMyung made their first contribution in https://github.com/chemprop/chemprop/pull/973

Full Changelog: https://github.com/chemprop/chemprop/compare/v2.0.3...v2.0.4

- Python
Published by shihchengli almost 2 years ago

chemprop - v2.0.3

Notable changes

The mfs argument of MoleculeDatapoint was removed in #876. This argument accepted functions which generated molecular features to use as extra datapoint descriptors. When using chemprop in a notebook, users should first manually generate their molecule features and pass them into the datapoints using x_d which stands for (extra) datapoint descriptors. This is demonstrated in the extra_features_descriptors.ipynb notebook under examples. CLI users will see no change as the CLI will still automatically calculate molecule features using user specified featurizers. The --features-generators flag has been deprecated though in favor of the more descriptive --molecule-featurizers. Available molecule features can be found in the help text generated by chemprop train -h.

The default aggregation was changed to norm in #946. This was meant to be change in version 2.0.0, but got missed. Norm aggregation was used in all the benchmarking of version 1 as it performs better than mean aggregation when predicting properties that are extensive in the number of atoms.

More documentation for the CLI hpopt and fingerprint commands have been added and can be viewed here and here.

The individual predictions of an ensemble of models are now automatically averaged and the individual predictions are saved in a separate file. #919

What's Changed

Change the installed numpy version in pyproject by @shihchengli in https://github.com/chemprop/chemprop/pull/922
Explicitly double save scalers/criterion by @KnathanM in https://github.com/chemprop/chemprop/pull/898
Add --show-individual-scores CLI flag by @shihchengli in https://github.com/chemprop/chemprop/pull/920
Set Ray Train's trainer resources to 0 by @hwpang in https://github.com/chemprop/chemprop/pull/928
Save individual and average predictions into different files by @shihchengli in https://github.com/chemprop/chemprop/pull/919
Add CLI pages for hpopt and fingerprint by @jonwzheng in https://github.com/chemprop/chemprop/pull/914
Make fingerprint CLI consistent with predict CLI by @hwpang in https://github.com/chemprop/chemprop/pull/927
Fix issue related to target column for fingerprint by @hwpang in https://github.com/chemprop/chemprop/pull/939
build molecule featurizer in parsing by @KnathanM in https://github.com/chemprop/chemprop/pull/875
Remove featurizing from datapoint by @KnathanM in https://github.com/chemprop/chemprop/pull/876
change aggregation default to norm by @KnathanM in https://github.com/chemprop/chemprop/pull/946
Use mol.GetBonds() instead of for loop by @KnathanM in https://github.com/chemprop/chemprop/pull/931

Full Changelog: https://github.com/chemprop/chemprop/compare/v2.0.2...v2.0.3

- Python
Published by KnathanM almost 2 years ago

chemprop - v2.0.2 Adding Document Modules and hpopt Enhancement

In this release, we have included numerous notebooks to document modules. Chemprop may be used in python scripts, allowing for greater flexibility and control than the CLI. We recommend first looking through some of the worked examples to get an overview of the workflow. Then further details about the creation, customization, and use of Chemprop modules can be found in the module tutorials.

New CLI Features

Improved `--model-path` CLI

Previously --model-path could take either a single model file or a directory containing model files. Now it can take any combination of checkpoint files (.ckpt), model files (.pt), and directory containing model files. Directories are recursively searched for model file (.pt). Chemprop will use all models given and found to make predictions (#731).

Improvements for hpopt CLI

Some flags related to Ray Tune (i.e., --raytune-temp-dir, --raytune-num-cpus, --raytune-num-gpus, and --raytune-max-concurrent-trials) have been added. You can use the CLI to initiate your Ray instance using these flags. (#918)

Bug fix

An incorrect max learning rate was used when writing the config file after hyperparameter optimization. This is now fixed (#913).

What's Changed

Fix typos in docstrings and .rst files that led to rendering errors by @jonwzheng in https://github.com/chemprop/chemprop/pull/901
Add CLI transition guide link to RTD by @kevingreenman in https://github.com/chemprop/chemprop/pull/907
Add meaningful warning for warm up epoch search space by @hwpang in https://github.com/chemprop/chemprop/pull/909
Fixing small bug in hpopt for learning rate by @akshatzalte in https://github.com/chemprop/chemprop/pull/913
Add notebooks to document modules by @KnathanM in https://github.com/chemprop/chemprop/pull/834
V2: consolidate --checkpoint CLI by @hwpang in https://github.com/chemprop/chemprop/pull/731
Improvements for hpopt cli by @hwpang in https://github.com/chemprop/chemprop/pull/918

Full Changelog: https://github.com/chemprop/chemprop/compare/v2.0.1...v2.0.2

- Python
Published by shihchengli about 2 years ago

chemprop - v2.0.1 First Patch

New CLI Features

Caching in CLI

MolGraphs are now created (by featurizing molecules) and cached at the beginning of training by default in the CLI. If you wish to disable caching, you can use the --no-cache flag, which will featurize molecules on the fly instead. (#903)

Change the default trial scheduler in HPO

We changed the default trial scheduler for HPO from AsyncHyperBand to FIFO, as it is the default in Ray and was used in version 1. You can switch the trial scheduler back to AsyncHyperBand by using --raytune-trial-scheduler AsyncHyperBand if needed. (#896)

Support Optuna in HPO

You can use optuna as a HPO search algorithm via --raytune-search-algorithm optuna. (#888)

CLI Bug Fixes

HPO-related bugs

In #873, we changed the search space for the initial and final learning rate ratios and max_lr to avoid very small (~10^-10) learning rates and also ensured that some hyperparameters are saved as integers instead of floating-point numbers (e.g., batch_size). In #881, we addressed the bug concerning the incompatibility of the saved config file with the training config. In #836, we shut down Ray processes after HPO completion to avoid zombie processes. For those encountering issues with Ray processes, we suggest you start Ray outside of the Python process.

DDP-related bugs

In #884, we resolved the issue where metrics were not synchronized across processes and disabled the distributed sampler during testing in DDP.

Backwards incompatibility note

In #883, we fixed the bug related to unused parameters in DDP. Models created via the CLI in v2.0.0 without additional atomic descriptors cannot be used via the CLI in v2.0.1. You will need to first remove message_passing.W_d.weight and message_passing.W_d.bias from the model file's state_dict to make it compatible with the current version.

What's Changed

update v2 installation instructions page in docs by @kevingreenman in https://github.com/chemprop/chemprop/pull/831
Remove Ray zombie processes by @shihchengli in https://github.com/chemprop/chemprop/pull/836
Docker images for v2 by @JacksonBurns in https://github.com/chemprop/chemprop/pull/841
Change Docker sytnax for MyBinder compatibility by @JacksonBurns in https://github.com/chemprop/chemprop/pull/872
[V2] Fix featurizer cli by @hwpang in https://github.com/chemprop/chemprop/pull/865
Fix hyperparameter predictorbase by @c-w-feldmann in https://github.com/chemprop/chemprop/pull/832
V2: Add all notebooks to test by @hwpang in https://github.com/chemprop/chemprop/pull/840
Fix small bugs in hpopt by @akshatzalte in https://github.com/chemprop/chemprop/pull/873
Add pip setup step to environment.yml install instructions by @cjmcgill in https://github.com/chemprop/chemprop/pull/889
Avoid scrambling target column name order by @JacksonBurns in https://github.com/chemprop/chemprop/pull/893
Fix unused parameters issue in DDP by @shihchengli in https://github.com/chemprop/chemprop/pull/883
Fix the inference issue related to the target columns by @shihchengli in https://github.com/chemprop/chemprop/pull/895
Change the default trial scheduler to FIFOScheduler by @shihchengli in https://github.com/chemprop/chemprop/pull/896
Add Optuna support for HPO by @shihchengli in https://github.com/chemprop/chemprop/pull/888
Fix Circular Import with isort by @JacksonBurns in https://github.com/chemprop/chemprop/pull/887
make LookupAction work with ConfigArgParse by @KnathanM in https://github.com/chemprop/chemprop/pull/900
V2: Fix typo in hpopt installation instruction by @hwpang in https://github.com/chemprop/chemprop/pull/897
V2: Make hpopt config compatible with training config by @hwpang in https://github.com/chemprop/chemprop/pull/881
Fix DDP prediction and checkpoint Issues by @shihchengli in https://github.com/chemprop/chemprop/pull/884
Add simple cache to CLI by @KnathanM in https://github.com/chemprop/chemprop/pull/903
V2: Fix small hpopt bugs and add example notebook by @hwpang in https://github.com/chemprop/chemprop/pull/842

New Contributors

@akshatzalte made his first contribution in https://github.com/chemprop/chemprop/pull/873

Full Changelog: https://github.com/chemprop/chemprop/compare/v2.0.0...v2.0.1

- Python
Published by shihchengli about 2 years ago

chemprop - v2.0.0 Stable Release

This is the first stable release of Chemprop v2.0.0, with updates since the v2.0.0-rc.1 release candidate in early March.

v2 documentation can be found here.

There are v2 tutorial notebooks in the examples/ directory.

A helpful transition guide from Chemprop v1 to v2 can be found here. This includes a side-by-side comparison of CLI argument options, a list of which arguments will be implemented in later versions of v2, and a list of changes to default hyperparameters.

Note that if you install from source, the primary branch of our repository has been renamed from master to main.

Due to development team bandwidth, Chemprop v1 will no longer be actively developed, so that we can focus our efforts on v2. Bug reports and questions about v1 are still welcome to benefit users who haven't yet made the switch to v2, but bug reports will not be fixed by the development team.

Please let us know of any bugs you find, questions you have, or enhancements you want in Chemprop v2 by opening an issue.

- Python
Published by kevingreenman about 2 years ago

chemprop - Final Patch for Version 1

This is the final release of chemprop v1. All future development will be done on chemprop v2. The development team is still happy to answer questions about v1, but no new feature requests or PRs for v1 will be accepted. Users who identify bugs in v1 are still encouraged to open issues to report them - they will be tagged as v1-wontfix to signify that we won't be publishing fixes for them in official chemprop releases, but the bugs can still be open to community discussion.

We encourage all users to try migrating their workflows over to chemprop v2 (available now as a release candidate, stable version planned to be released within the next week) and let us know of any issues you encounter. All v1 releases will remain available on PyPI, and the v1 source code will remain available in this GitHub organization.

What's Changed

fix the uncal_vars for atom/bond property prediction by @shihchengli in https://github.com/chemprop/chemprop/pull/712
[v1]: Add Docker Image Building Action and Official Images to DockerHub by @JacksonBurns in https://github.com/chemprop/chemprop/pull/718
remove macos and windows from v1 ci by @JacksonBurns in https://github.com/chemprop/chemprop/pull/720
update docker build if to use correct upstream branch name by @JacksonBurns in https://github.com/chemprop/chemprop/pull/723
fix the task names by @shihchengli in https://github.com/chemprop/chemprop/pull/725
Fixed typo in README.md by @willspag in https://github.com/chemprop/chemprop/pull/745

New Contributors

@willspag made their first contribution in https://github.com/chemprop/chemprop/pull/745

Full Changelog: https://github.com/chemprop/chemprop/compare/v1.7.0...v1.7.1

- Python
Published by kevingreenman about 2 years ago

chemprop - v2.0.0 Release Candidate

This is a release candidate for Chemprop v2.0.0, to be released in April 2024.

The primary objectives of v2.0.0 are making Chemprop more usable from within Python scripts, more modular, easier to maintain and develop, more compute/memory efficient, and usable with PyTorch Lightning. Some features will not be migrated from v1 to v2 (e.g. web, sklearn). Some v1 features will be added in later versions of v2 (v2.1+) (e.g. uncertainty, interpret, atom- and bond-targets); see milestones here. The new version also has substantially faster featurization speeds and much higher unit test coverage, enables training on multiple GPUs, and works on Windows (in addition to Linux and Mac). Finally, the incorporation of a batch normalization layer is expected to result in smoother training and improved predictions. The label as a “release candidate” reflects its availability to be downloaded via PyPI and that only minor changes are expected for the Python API before the final release. We expect most remaining changes before the release of v2.0.0 in April to be focused on additional improvements to the command line interface (CLI), which does not yet have feature parity with v1. We encourage all Chemprop users to try using v2.0.0-rc.1 to see how it can improve their workflows.

The v2 documentation can be found here.

There are tutorial notebooks for v2 in the examples/ directory.

A helpful transition guide from v1 to v2 can be found here. This includes a side-by-side comparison of CLI argument options, a list of which arguments will be implemented in later versions of v2, and a list of changes to default hyperparameters.

You can subscribe to our development status and notes for this version: https://github.com/chemprop/chemprop/issues/517.

Ongoing work for this version is available on the v2/dev branch.

Please let us know of any bugs you find by opening an issue.

- Python
Published by kevingreenman over 2 years ago

chemprop - Conformal Calibration

What's Changed

new split per molecular weight by @soulios in https://github.com/chemprop/chemprop/pull/456
Specify license for Chemprop logos by @mliu49 in https://github.com/chemprop/chemprop/pull/461
Add todo.md by @davidegraff in https://github.com/chemprop/chemprop/pull/492
Update authors list in license file and alphabetically sort by @cjmcgill in https://github.com/chemprop/chemprop/pull/532
update authors in LICENSE and setup files for v1 by @kevingreenman in https://github.com/chemprop/chemprop/pull/533
Fix Transpose bug in Inequality Regression by @cjmcgill in https://github.com/chemprop/chemprop/pull/308
Add Dirichlet Evidential Uncertainty Quantification by @cjmcgill in https://github.com/chemprop/chemprop/pull/423
New metrics by @soulios in https://github.com/chemprop/chemprop/pull/542
Updating README with ADMET-AI details by @swansonk14 in https://github.com/chemprop/chemprop/pull/554
Improve error message when gilbrat is needed. by @KnathanM in https://github.com/chemprop/chemprop/pull/569
limit chempropv1 python version to 3.7, 3.8 only by @JacksonBurns in https://github.com/chemprop/chemprop/pull/618
Add a CITATIONS.bib by @JacksonBurns in https://github.com/chemprop/chemprop/pull/627
Limit Maximum Allowed flask Version in v1 by @JacksonBurns in https://github.com/chemprop/chemprop/pull/628
move numunctasks definition to ensure always defined by @kevingreenman in https://github.com/chemprop/chemprop/pull/632
Switching np.mean to np.nanmean to handle NaN metrics by @swansonk14 in https://github.com/chemprop/chemprop/pull/453
Fix the dtype for targets of different sizes by @shihchengli in https://github.com/chemprop/chemprop/pull/638
Add setters for atom and bond constraints by @shihchengli in https://github.com/chemprop/chemprop/pull/637
switch v1 readthedocs build from conda to mamba by @kevingreenman in https://github.com/chemprop/chemprop/pull/660
Fix v1 docs theme by @kevingreenman in https://github.com/chemprop/chemprop/pull/669
Conformal Calibration by @danielxu9393 in https://github.com/chemprop/chemprop/pull/304
add note on feature releases and instructions for ssl+ddp by @JacksonBurns in https://github.com/chemprop/chemprop/pull/685
remove unnecessary argument for reshape function by @shihchengli in https://github.com/chemprop/chemprop/pull/671
Fix atom/bond property prediction with atom-mapped SMILES and target classification by @shihchengli in https://github.com/chemprop/chemprop/pull/673
Pass num_workers to MoleculeDataLoader during interpretation by @kevingreenman in https://github.com/chemprop/chemprop/pull/691
conformal quantile prediction bug fix by @shihchengli in https://github.com/chemprop/chemprop/pull/693

New Contributors

@soulios made their first contribution in https://github.com/chemprop/chemprop/pull/456
@danielxu9393 made their first contribution in https://github.com/chemprop/chemprop/pull/304

Full Changelog: https://github.com/chemprop/chemprop/compare/v1.6.1...v1.7.0

- Python
Published by kevingreenman over 2 years ago

chemprop - Bug fix for reaction atom mapping

Bug fix

PR #383 unexpectedly broke the atom mapping for reaction mode. The issue is described in Issue #426 and fixed by PR #427.

What's Changed

Fix versioning issues - metadata and dependencies by @kevingreenman in https://github.com/chemprop/chemprop/pull/420
add job to tests action for PyPI package by @JacksonBurns in https://github.com/chemprop/chemprop/pull/422
added chemprop manuscript to readme by @hesther in https://github.com/chemprop/chemprop/pull/425
Keep Support for Python 3.7 and 3.8 when fixing gilbrat Issue by @JacksonBurns in https://github.com/chemprop/chemprop/pull/431
Fix reaction atom mapping by @shihchengli in https://github.com/chemprop/chemprop/pull/427

Full Changelog: https://github.com/chemprop/chemprop/compare/v1.6.0...v1.6.1

- Python
Published by kevingreenman almost 3 years ago

chemprop - Atomic/bond targets prediction

Major New Features

Atomic/bond targets prediction by @shihchengli in https://github.com/chemprop/chemprop/pull/280

What's Changed

Replace multiclass mcc with 1-mcc for loss by @cjmcgill in https://github.com/chemprop/chemprop/pull/332
Add chemprop logo by @shihchengli in https://github.com/chemprop/chemprop/pull/339
Add CodeQL workflow for GitHub code scanning by @lgtm-com in https://github.com/chemprop/chemprop/pull/344
Add to the description of evidential regularization by @cjmcgill in https://github.com/chemprop/chemprop/pull/353
Remove deprecated numpy float types by @cjmcgill in https://github.com/chemprop/chemprop/pull/357
Correct a bug in ENCE uncertainty evaluation by @cjmcgill in https://github.com/chemprop/chemprop/pull/360
Hyperopt Parallel Race Conditions and Manual Trial Load by @cjmcgill in https://github.com/chemprop/chemprop/pull/307
Simplified install with PyPI rdkit and git install in setup.py by @JacksonBurns in https://github.com/chemprop/chemprop/pull/364
Allow providing both loaded features and a features generator by @shihchengli in https://github.com/chemprop/chemprop/pull/318
For any multiclass task, make_predictions fails if option --individualensemblepredictions is on. by @piotr-semenov in https://github.com/chemprop/chemprop/pull/354
Save loaded molecular features into .npy files by @shihchengli in https://github.com/chemprop/chemprop/pull/337
Ignore invalid atom-mapped SMILES by @shihchengli in https://github.com/chemprop/chemprop/pull/367
Molecule fingerprinting with invalid SMILES in list by @shihchengli in https://github.com/chemprop/chemprop/pull/351
change calibrationfeaturespath from str to List[str] by @ceroth in https://github.com/chemprop/chemprop/pull/358
Change logo style by @shihchengli in https://github.com/chemprop/chemprop/pull/369
Clamp evidential 'v' parameter by @kevingreenman in https://github.com/chemprop/chemprop/pull/371
fix colab demo by @kevingreenman in https://github.com/chemprop/chemprop/pull/368
Avoid OverflowError when setting field size to sys.maxsize by @shihchengli in https://github.com/chemprop/chemprop/pull/373
Set atom and bond constraints when loading model by @shihchengli in https://github.com/chemprop/chemprop/pull/374
Readme updates by @kevingreenman in https://github.com/chemprop/chemprop/pull/385
Remove atom map numbers for scaffold splits by @shihchengli in https://github.com/chemprop/chemprop/pull/383
update bug report template - ask for full stack trace by @kevingreenman in https://github.com/chemprop/chemprop/pull/401
Fix t-SNE script by @kevingreenman in https://github.com/chemprop/chemprop/pull/403
Fixing skipped lines in csv writing when using a windows computer by @cjmcgill in https://github.com/chemprop/chemprop/pull/406

Full Changelog: https://github.com/chemprop/chemprop/compare/v1.5.2...v1.6.0

- Python
Published by kevingreenman almost 3 years ago

chemprop - Flexible hyperparameter search, missing uncertainty target values, evaluation of different magnitude multitask targets, empty test set assignment, and DockerFile updates

Features

Flexible hyperparameter search space

The parameters to be included in hyperparameter optimization can now be selected using the argument --search_parameter_kewords {list-of-keywords}. The parameters supported are: activation, aggregation, aggregationnorm, batchsize, depth, dropout, ffnhiddensize, ffnnumlayers, finallr, hiddensize, initlr, maxlr, warmupepochs. Some special kewords are also included for groups of keywords or different search behavior: basic, learningrate, all, linkedhiddensize. PR #299

Missing targets in uncertainty calibration datasets

Added capabilities to the uncertainty calibration and evaluation methods to allow them to handle missing target values in multitask jobs. This capability was already included in the normal training of models, now implemented in uncertainty calibration and evaluation. PR #295 Issue #292

Multitask evaluation for tasks of different magnitudes

When evaluation metrics tend to scale with the magnitude of a task (e.g., rmse), averaging metrics between tasks has been replaced with a geometric mean function. This makes the average metric in multitask regression jobs be less dominated by large magnitude targets. This was previously an issue for hyperparameter optimization and the evaluation of optimal epoch during model training, though the calculation of loss for gradient descent is on scaled targets and was already not scale dependent. PR #290

Empty test set allowed

An empty test split can now be used during training. This was previously possible only using the cv-no-test split method, but now it is available more widely when specifying split sizes, for example with --split_sizes 0.8 0.2 0. PR #284, #260 related Issue #279

Updates to conda environment and docker file

Conda environment building will now prefer to use the pytorch channel over the conda-forge channel. The Dockerfile has been updated to use micromamba, allowing for faster environment solves than conda and removing a potential licensing issue. PR #276

Bug Fixes

Fix MCC loss for multiclass jobs

Corrected a calculation problem in the loss function that was returning infinite loss inappropriately. Also adopted the convention of returning loss of zero when infinite loss is returned, as often happens in very unbalanced datasets. Added appropriate unit testing. PR #309 Issue #306

Correct code error in ence uncertainty evaluation

Corrects an error in the ence uncertainty evaluation method that made that method unusable. Bug was introduced during PR #305. PR #302 Issue #301

Fixed link to MoleculeNet website

Corrected the link to the MoleculeNet benchmark dataset website in the readme, following MoleculeNet migrating to a new site location. PR #296

Multitarget uncertainty calibration mve weighting method

Previously, this method only worked for single task jobs, now has been extended to work for multitask models as well. PR #291

Remove unused verion.py file

Version tracking in Chemprop no longer uses the version.py file and it was removed. PR #283

Multiclass argument typo in readme

Corrected a typo where the number of classes used in multiclass regression should have been indicated as --multiclass_num_classes. PR #281

Repair individual ensemble predictions

Refactoring of prediction file during the addition of uncertainty functions disabled the option to return the individual predictions of each member of an ensemble of models. Option is now available again. PR #274

- Python
Published by cjmcgill almost 4 years ago

chemprop - Quick Fix to Uncertainty Evaluation

Bugfix

Inconsistent Path For Uncertainty Evaluation

Fixed a bug in uncertainty evaluation where the uncertainty evaluator was using the path name originally used to train a checkpoint. This made the uncertainty evaluator only work in the case that the test data and training data used in initial model training had the same path.

- Python
Published by cjmcgill about 4 years ago

chemprop - Uncertainty Functions, Reaction-Solvent Models, Loss Function Options, Keyed Splitting, and Chemprop Colab Demo

Features

Uncertainty Tools

Tools added for uncertainty quantification, calibration, and evaluation as part of the chemprop predict function. Uncertainty predictions are saved as part of the predictions file. Uncertainty functions and outputs are triggered using the arguments --uncertainty_method {method}.

Uncertainty outputs can be calibrated using an outside dataset (evaluation set from training is often suitable) in order to have better uncertainty estimates on new predictions. Can be activated using --calibration_method {method} and --calibration_path {path-to-csv}. For the regression dataset type, a calibrated output can provide either a standard deviation or one-sided interval bound, as set with the options --regression_calibrator_metric {stdev-or-interval} and --calibration_interval_percentile {int}.

If the data file containing smiles for the test path also contains target values, the uncertainty performance can be evaluated using various metrics, activated with the option --evaluation_methods {list-of-methods}.

Internally, this PR creates several classes for carrying out prediction tasks: UncertaintyEstimator, UncertaintyPredictor, UncertaintyCalibrator, UncertaintyEvaluator. Loss functions have been added that have auxiliary uncertainty outputs, mve and evidential for regression. PR #267 PR #269

Reaction-Solvent Option

Gives the option to train a chemprop model using one reaction and one molecule for each datapoint. Active when used with the option --reaction_solvent. Options for making the solvent mpnn use different parameters than that for the reaction are possible using --bias_solvent, --hidden_size_solvent {int}, and --depth_solvent {int}. PR #246

Multimolecule Fingerprinting

Added some new changes for fingerprint functions with multiple molecules. Models trained with a "shared-mpn" between two molecules can return a MPN fingerprint with only one molecule provided. Also, when multiple molecule models are used for MPN fingerprint generation, the output will indicate which molecule each element belongs to. PR #242 Issue #236

Colab Notebook Examples

Created a Jupyter notebook that runs examples of Chemprop jobs, specifically as the functions can be used in python. Good resource for new users, demonstrations, or tutorials. Linked to Google Colab so that it can be run remotely, not requiring any local install of Chemprop. PR #239 PR #273

Loss Function Options

Previously, loss functions were selected automatically based on the dataset type being used in model training. Now the loss function can be selected with --loss_function {function}. Some new specialty loss functions have been added with this capability. * Matthews Correlation Coefficient (mcc) is a loss function for classification and multiclass that considers True Positives, True Negatives, False Positives, False Negatives separately in the loss function, avoiding domination by one class and making it well suited to unbalanced training sets. * Bounded Mean Squared Error (bounded_mse) is a regression loss function that allows for training targets expressed as inequalities, e.g. ">5.0". Intended for use with experimental data with delimited ranges. * Mean Variance Estimation (mve) and evidential loss are regression loss functions that maximize the likelihood of the target on an estimated uncertainty distribution. When used as loss functions, the outputs of these functions can be used in uncertainty estimation. Appropriate metrics have been added along with these loss functions. PR #238 PR #267

Development Environment

GitHub Addons

Added a CONTRIBUTING.md file with guidelines for how users can contribute to Chemprop. New templates are now available for issue submission that distinguish between different issue types: bug report, feature request, and questions. New templates also suggested for PRs. Templates stored in the .github directory. PR #241

Unit Testing

Part of an ongoing effort to include a more complete set of automated tests for Chemprop. Unit tests added for data utils, uncertainty-related loss functions, and the uncertainty evaluation metrics. PR #232 PR #267 PR #269

Flake8 Formatting

Ongoing effort to standardize the formatting of incoming code. New PRs now request/require the new code to be flake8 compliant in formatting. The utils module and files significantly associated with the new uncertainty function are flake8 compliant. PR #241 PR #258 PR #267

Update Versioning

Changed the way that version numbers are stored and updated throughout the code. PR #247

Remove Assertion Errors

Removed many of the assertion errors throughout Chemprop and replaced them with more easily interpretable error types and messages. PR #257

Bug Fixes

Hyperopt Version Fix

Changed the way that random seeds are passed into hyperopt during hyperparameter optimization to avoid an error where hyperopt stopped supporting a previously supported way of passing numpy seeds. PR #245 Issue #243 Issue #254 Issue #264

- Python
Published by cjmcgill about 4 years ago

chemprop - Prediction function output options, multi-molecule splitting, and explicit H atoms in message passing

Features

Allow the inclusion of H atoms in message passing

Default model behavior is to treat H atoms implicity with their neighbors. With the previously existing argument --explicit_h, explicit H atoms included in the SMILES string would be considered during message passing. This PR adds a new argument --adding_h, which would make all H atoms treated explicitly during message-passing. PR #225 and #227

Allow splitting by different key molecules in multi-molecule models

The data-splitting methods scaffold_balanced and random_with_repeated_smiles can only consider one molecule per datapoint in adhering to the constraints of which data must share splits with each other. This PR creates an argument --split_key_molecule {int}, which is used to select which molecule in multi-molecule datasets will be used for the splitting determination. PR #230

Select split fractions when separate test data is provided

Previously, the split fractions for training/validation were hardcoded as 80/20 when test data was provided via --separate_test_path. Split fractions can now be specified in this case using --split_sizes as normal. PR #230

Additional output options for make_predictions function

This change affects usage of make_predictions as a python function, rather than in the whole Chemprop workflow. When used as a python function, make_predictions would return the predictions for a set of SMILES, but would skip the invalid SMILES without indicating which ones were skipped. Now this function has two new option arguments: 1) return_invalid_smiles that includes invalid SMILES in the output but with "Invalid SMILES" as the prediction value and 2) return_index_dict that returns predictions of the model in a dictionary keyed to the original data indices. PR #235

New utility functions for identifying invalid SMILES

New functions have been added to chemprop/data/utils.py to allow users to identify datapoints that have invalid SMILES. These functions are get_invalid_smiles_from_file and get_invalid_smiles_from_list. PR #235

Bug Fixes

Simultaneous use of extra atom features and extra bond features

Bug prevented using extra atom features and extra bond features at the same time and has been resolved. PR #215 Issue #213

Fixed install error with newer versions of pip

Newer versions of pip failed to install some some chemprop dependencies properly. These dependencies (flake8, pytest, parameterized) were moved to an installation as part of the conda environment rather than by pip. Also, environment build for testing was changed from conda to mamba for better install speed. PR #215 and #216

Correction in tutorial file

Tutorial file changed to show the proper list of lists format for SMILES. PR #218

Predicting for a multiclass model with an improper SMILES

When making a prediction for an improper SMILES in a multiclass model, an error would be triggered instead of returning a prediction of "Invalid SMILES". This has been corrected for this case and the parallel case of improper SMILES used with --individual_ensemble_predictions. PR #229

Molecule fingerprints generated with extra atom features

Molecule fingerprints could not be predicted when extra atom features were provided as part of the model. This and the parallel issue with extra bond features have been addressed. PR #234 Issue #233

- Python
Published by cjmcgill over 4 years ago

chemprop - Model preloading, hyperparameter optimization improvements, spectra training, latent representations, and more.

Features

Spectra training

Introduces spectra as a new dataset type available for training, in which each target in a multitarget regression refers to a positive intensity value in one position of a spectrum. Training methods are consistent with https://github.com/gfm-collab/chemprop-IR. Default loss function is spectral information divergence (SID), but Wasserstein loss (earthmover distance) is also supported with --metric wasserstein --alternative_loss_function wasserstein. PR #197

Preloading model in predictions

Refactored the make_predictions into smaller functions for better capability to use chemprop functions as a python library. Refactoring specficially designed to allow for the loading of a model using the function chemprop.train.load_model a single time and then using it for multiple instances of predictions by feeding that model as an argument to chemprop.train.make_predictions. PR #200

Improved hyperparameter optimization

Added several new features to hyperparameter optimization, many related to hyperparameter checkpoints saved in the location specified by --hyperopt_checkpoint_dir <dir_path>. The new functionalities: * Restarting failed hyperparameter optimization jobs by selecting the same checkpoint directory. * Parallelizing multiple instances of hyperparameter optimization by setting a shared checkpoint directory among instances. * Seeding hyperparameter optimizations with previously run jobs by indicating an old checkpoint directory and/or by specifying the save directories of relevant jobs trained with train.py using -manual_trial_dirs <list-of-directories>. * Manually set the number of hyperparameter trials that use randomized parameters before directed TPE search begins using --startup_random_iters <int, default=10>. PR #208

Return results from all ensemble models

When making predictions from an ensemble of models, returns the mean prediction but also the individual predictions from the individual models when --individual_ensemble_predictions is specified. PR #190

Latent representations for ensembles and from FFN layers

Allows for the calculation of latent fingerprints from an ensemble of models by concatenating them together. Also allows for the return of either a latent representation from the MPNN output or from the next-to-last FFN layer using the argument --fingerprint_type <MPN or last_FFN>. PR #193

Target imputation for sklearn multitask models

Sklearn multitask training cannot proceed with missing targets among the data, previously would have needed to be run as multiple singletask models. This PR introduces target imputation for missing data to allow multitask sklearn training even when some data is missing with the argument --impute_mode <model/linear/median/mean/frequent> indicating which method to use for imputation. PR #210 Issue #211

Reaction balancing

Adds options in reaction training for how to handle situations where reactants and products are not balanced. The argument --reaction_mode now also has the options reac_diff_balance, prod_diff_balance, and reac_prod_balance (in addition to the current options reac_diff, prod_diff, and reac_prod). Also fixes an error where atomic numbers are incorrect when an atom is present in the products but not in the reactants. PR #212 Issue #204

Bug Fixes

Interactions with git repos

Resolves a problem with TAP (typed-argument-parser) where running Chemprop from inside a different git repo would trigger an error related to the generation of a reproducibility hash. In this situation the reproducibility hash is not generated, but it logs the issue and does not stop Chemprop from running. PR #195

Global features structure

Changes the way that global variables related to model construction and feature vector size are handled. Resolves a problem in pytest where these variables wouldn't reset between runs. PR #206

- Python
Published by cjmcgill over 4 years ago

chemprop - Resume interrupted training, frozen layer pre-training, target/data weighted training, and more

Features

Resume training on multiple folds if interrupted

As training progresses through folds of a multiple fold model, the results of each individual fold are stored in a JSON file. If training is interrupted, the completed fold results will be read from the JSON file and resume on the first uncompleted fold if using the flag --resume_experiment. PR #164

Frozen layers for pre-training

Added functionality to freeze the MPN or FFN layers in a model being trained at the values of a previously trained model. Freezes MPN values using a model indicated with --checkpoint_frzn <path>. FFN layers will also be frozen if indicated with --frzn_ffn_layers <number-of-layers>. Models with multiple molecules can select to only freeze the first molecule MPN using --freeze_first_only. PR #170

tSNE functionality

Added HDBScan clustering to the tSNE script. PR #172

Weighted training by target and by datapoint

Added training weights for different targets and different datapoints, with normalization of weight values. Target weights indicated with the argument --target_weights <list-of-values>. Data weights supplied through an input file indicated with the argument --data_weights_path <path>. PR #173, #175, #189 Issue #145

Bug Fixes

MPNN input

Providing SMILES or RDKit molecules to the MPN's forward function failed (only BatchMolGraph worked) following other changes. Now, SMILES and RDKit molecules can once again be used as input. PR #164

Backwards compatibility with old checkpoints

Backwards compatibility for features scaling PR #164 Issue #108

Updated readme

Added information to the readme and documentation of pre-training, treatment of missing values in multitask models and caching. PR #165 Issue #156

Multiclass classification

Corrected error when using the metric accuracy with multiclass classification. PR #169

RDKit Compatibility

Bugfix for compatibility issues of RDKit 2021.03.01 with the interpretation script. PR #182 Issue #178

- Python
Published by cjmcgill almost 5 years ago

chemprop - Custom atom/bond features, epistemic uncertainty, reaction option, bug fix for atom/bond features

New Features

Custom atom/bond features

Enabled custom input of atom and bond features either in addition or instead of the default features.

PR: https://github.com/chemprop/chemprop/pull/137

Epistemic uncertainty

Introduced the argument --ensemble_variance which calculates the epistemic uncertainty of predictions via an ensemble of models.

PR: https://github.com/chemprop/chemprop/pull/140

Reaction option

Introduced CGR option - input of atom-mapped reaction smiles instead of molecules. This creates a pseudo-molecule of the graph transition state between reactants and products, and performs message passing on this pseudo-molecule

PR: https://github.com/chemprop/chemprop/pull/152

Latent representation

Added a new functionality that saves the latent representation of a molecule (the MPNN output), which can be used similar to predicting with a given checkpoint file, and saves the MPNN output to file.

PR: https://github.com/chemprop/chemprop/pull/119

Preprocessing updates

Updates to the preprocessing, handling and saving of smiles strings. Removed redundant checks.

PR: https://github.com/chemprop/chemprop/pull/135

Resume experiments

Experiments with multiple folds can now be resumed using the --resume_experiment flag. Additionally, the test results of each fold are saved as a JSON file in the corresponding subfolder in save_dir.

PR: https://github.com/chemprop/chemprop/pull/164

Bug Fixes

Atom messages

Major bugfix for running Chemprop with the argument --atom_messages, where the wrong features were passed to the MPNN. This improves the performance of Chemprop in atom_messages mode, and causes backwards incompatibility with old checkpoint files if created in atom_messages mode. Since Chemprop is mainly used for directed message passing via bond messages, we hope not many users are affected.

Issue: https://github.com/chemprop/chemprop/issues/133 PR: https://github.com/chemprop/chemprop/pull/138

Backwards compatibility with old checkpoints

Backwards compatibility for correctly setting recently introduced training arguments for old models.

Issue: https://github.com/chemprop/chemprop/issues/148 and https://github.com/chemprop/chemprop/issues/108 PR: https://github.com/chemprop/chemprop/pull/149 and PR: https://github.com/chemprop/chemprop/pull/164

Sklearn scores

Bugfix in training sklearn models: Scores were not saved correctly previously.

PR: https://github.com/chemprop/chemprop/pull/162

Data split script

Bugfix in a standalone script to create data splits: Multi-molecule input had previously created incompatibilities with passing data to the scaffold split functionality. Update of docstring.

Issue: https://github.com/chemprop/chemprop/issues/158 PR: https://github.com/chemprop/chemprop/pull/159

MPNN sanity check

Bugfix for sanity checks for dimensions of batches within the MPNN forward pass: The introduction of multi-molecule input had previously caused an inconsistency in one of the checks.

Issue: https://github.com/chemprop/chemprop/issues/153 PR: https://github.com/chemprop/chemprop/pull/154

MPNN type annotations

Bugfix for type annotation in the MPNN forward pass + update of docstring.

PR: https://github.com/chemprop/chemprop/pull/151 and PR: https://github.com/chemprop/chemprop/pull/164

Tanimoto distance

Bugfix for calculating Tanimoto distances. The introduction of multi-molecule input had previously caused incompatibilities in the standalone script to find similar molecules in the training data.

Issue: https://github.com/chemprop/chemprop/issues/143 PR: https://github.com/chemprop/chemprop/pull/144

README typos

Fixed typos for a few arguments in the README

PR: https://github.com/chemprop/chemprop/pull/139

Sanitize script

Bugfix in standalone script sanitize.py - open output file with write access.

RDKit molecule caching

Bugfix for creating RDKit molecules from smiles strings. Previously the molecules were recreated even though they were already cached.

PR: https://github.com/chemprop/chemprop/pull/152

Saving SMILES

Bugfix for error occurring when --save_smiles_splits is used in conjunction with --separate_test_path. Now, the data split csv files are still generated, but split_indices.pkl is not generated if there are multiple data points with the same SMILES or if some of the data comes from a separate data file.

Issue: https://github.com/chemprop/chemprop/issues/157 PR: https://github.com/chemprop/chemprop/pull/163

SMILES/mols as input to MPNN

Bugfix for SMILES or RDKit molecules as input to MPNN model instead of BatchMolGraph.

PR: https://github.com/chemprop/chemprop/pull/164

- Python
Published by swansonk14 about 5 years ago

chemprop - New split type, cleaner predictions file, backward compatibility, bug fixes, and testing improvements

Features

New split type

The split type --split_type cv already existed to perform k-fold cross-validation (where k is set by --num_folds). In each fold, 1/k of the data is put in the test set, 1/k of the data is in put in the validation set, and the remaining (k-2)/k of the data is put in the training set.

Now, a new split type --split_type cv-no-test exists which is essentially identical except that it assigns no data to the test set on each fold (https://github.com/chemprop/chemprop/commit/b56ca9866b303036eab61cab93188cccbaa24af2). Instead, 1/k of the data is put in the validation set and (k-1)/k of the data is put in the training set with no test data. The purpose of this split type is to maximize the training data when training a model in cases where the test performance is already known (or is not important) and doesn't need to be determined. Note that the validation set is still necessary to perform early stopping.

Dropping extra columns during prediction

Previously, when using predict.py, all the columns from the test_path file were copied to the preds_path file and then the predictions were added as additional columns at the end. Now there is an option called --drop_extra_columns which will not copy over these extraneous columns to preds_path (https://github.com/chemprop/chemprop/commit/83ea4c06dda4231902777ea6776da922aeba2ad3 and https://github.com/chemprop/chemprop/commit/061339568045863c30c9bd8c2a143b674a0082d8). When --drop_extra_columns is used, preds_path will only contain columns with the SMILES and with the prediction values.

Bug Fixes

Backward compatibility for `load_checkpoint`

Previously, newer versions of Chemprop incorrectly loaded checkpoints that were trained using older versions of Chemprop due to a change in the names of the parameters. Backward compatibility has now been added to allow this version of Chemprop to load checkpoints with either set of names (https://github.com/chemprop/chemprop/commit/5371b29e7c65e41fa8b83d9c76ba2bfdd400b139 and https://github.com/chemprop/chemprop/commit/206950c6ec92a3646800f95bc69ae6d8dc7ca646).

Saving SMILES splits

Due to new Chemprop features such as the ability to load multiple molecules, the feature --save_smiles_splits, which saves the SMILES corresponding to the train, validation, and test splits, had broken (https://github.com/chemprop/chemprop/issues/110). This was fixed in https://github.com/chemprop/chemprop/pull/117.

Fixing `interpret.py`

Similar to the issue with saving SMILES splits, interpret.py broke due to the Chemprop feature that enables multiple molecules to be used as input (https://github.com/chemprop/chemprop/issues/107 and https://github.com/chemprop/chemprop/issues/113). This was fixed in https://github.com/chemprop/chemprop/pull/128.

Updating Dockerfile

The Dockerfile has been updated to address https://github.com/chemprop/chemprop/issues/100 and https://github.com/chemprop/chemprop/issues/129. This was fixed in https://github.com/chemprop/chemprop/pull/131.

Fixing atom descriptors

The atom_descriptors feature did not work in predict.py (https://github.com/chemprop/chemprop/issues/120). This was fixed in https://github.com/chemprop/chemprop/pull/114.

Logging

Logging to the terminal and to files (quiet.log and verbose.log in the save_dir) broke for some OS systems (https://github.com/chemprop/chemprop/issues/106). This was fixed in https://github.com/chemprop/chemprop/pull/118.

README additions

Some of the relatively new features, like custom atomic features, were missing from the README (https://github.com/chemprop/chemprop/issues/121). This was fixed in https://github.com/chemprop/chemprop/pull/122.

Infrastructure Changes

Migrating from Travis CI to GitHub Actions

Chemprop previously used Travis CI to run automated tests upon pushing to master or creating a pull request, but Travis changed its pricing structure and no longer offers unlimited free testing. For this reason, Chemprop now uses GitHub Actions to run automated tests. The results of the test runs can be seen in the Actions tab of the repo.

- Python
Published by swansonk14 over 5 years ago

chemprop - Multiple Molecules, Custom Atom Features, and More

Features

Multiple Input Molecules

[PR] Use multiple molecules as an input to chemprop. The number of molecules is specified with the keyword number_of_molecules. Those molecules are embedded with a separate D-MPNN by default. The latent representations are concatenated prior to the FFN.

The keyword mpn_shared allows you to use a shared D-MPNN. Note that, since the latent representations are concatenated, the order of the input molecules is important. This method is not invariant and there are better ways to use multiple molecules with shared D-MPNN, which will be implemented for the next release.

Custom Atom Features

[PR] Implemented custom atomic features as a counterpart of the custom molecular features in ChemProp. The new feature allows users to provide additional atomic features to each node in a given molecule. To use the feature, use the keyword atom_descriptors. The custom atom features can be employed in two modes. In the first mode, --atom_descriptors feature, custom features are used as normal node features, which are concatenated to the default node vector before the D-MPNN block. In the second mode, --atom_descriptors descriptor, custom atom features will not participate in the model until the atom feature vector has been updated through D-MPNN block. That is, the --atom_descriptors descriptor model will not disturb the extra custom atom features much and keep the information to the maximum extent.

The extra custom descriptors can be put into ChemProp through a variety of pickle files (.pkl, .pickle, .pckl), Numpy save file (.npz), or a .sdf file.

`.pkl` format

The .pkl file must store a Pandas DataFrame with smiles as index and columns as descriptors. All descriptors must be a 1D numpy array or 2D numpy array. For example:

1 custom atomic feature for each atom provided in a 1D array

smiles descriptors CCOc1ccc2nc(S(N)(=O)=O)sc2c1 [0.637781931055927, 0.7075571757878132, 0.7339... CCN1C(=O)NC(c2ccccc2)C1=O [0.09588231301387817, 0.6521911050735447, 0.45...

Multiple atomic features for each atom provided in multiple 1D array

smiles desc1 desc2 CCOc1ccc2nc(S(N)(=O)=O)sc2c1 [0.637781931055927, 0.7075571757878132... [0.8266363223032338, 0.89641156703512 ... CCN1C(=O)NC(c2ccccc2)C1=O [0.09588231301387817, 0.6521911050735447... [0.2847367042611851, 0.8410454963208516...

Note: mixed 1D array and 2D array for different columns are not allowed

`.npz` file

Atomic descriptors for each molecule must be saved as one independent 2D numpy array ([number of atoms x number of descriptors]) in the .npz file for example by:

python np.savez('descriptors.npz', *descriptors)

where descriptors is a list of atomic descriptors in 2D array in the order of molecules in the training/predicting datafile

`.sdf` file

Each molecule is presented as a mol block in the .sdf file. Descriptors should be saved as entries for each mol block in the format of comma separated values. Each molecule must has an entry named SMILES that stores the smiles string. For example:

``` CHEMBL1308_loner5 RDKit 3D

6 6 0 0 1 0 0 0 0 0999 V2000 -0.7579 -0.5337 -2.8744 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.2229 -1.3763 -1.7558 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.0046 -1.0089 -0.4029 C 0 0 0 0 0 0 0 0 0 0 0 0 0.4824 -2.0104 0.3280 N 0 0 0 0 0 0 0 0 0 0 0 0 0.5806 -3.0317 -0.5484 N 0 0 0 0 0 0 0 0 0 0 0 0 0.1735 -2.6999 -1.8031 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 2 6 2 0 2 3 1 0 3 4 2 0 4 5 1 0 5 6 1 0 M END

(1) -8.568031e-05,0.0001865207,-0.0002012379,-5.054658e-05,0.0002148434,-0.0003503839,1.970448e-05,3.081137e-05,2.997883e-05,9.446278e-05,-7.194711e-05,0.0001527364

(1) 5.462954e-05,-2.415399e-06,0.0001044788,-2.274438e-05,0.0001698836,5.206409e-06,4.5825e-06,-8.882181e-07,-1.08787e-05,2.993307e-05,-4.069051e-06,1.338413e-05

(1) Cc1cnnHc1

$$$$ ```

where the name of descriptor entries desc1, desc2 can be arbitrary.

When using this feature, users are responsible for all atomic feature preprocessing works, including feature normalization and expansion.

Note: This feature is developed for small-to-medium sized training dataset, where extra QM descriptors have been demonstrated to be powerful and slow down the model performance downgrade.

Options for Aggregation Function

[PR] By default, at the end of message passing, the D-MPNN aggregates atom hidden representations into a single hidden representation for the whole molecule by taking the mean of the atom representations. Now, this aggregation function can be changed by using --aggregate <mode>, which currently supports “mean” (the default), “sum”, and “norm” (which is equivalent to “sum” with normalization by the constant specified by --aggregation_norm).

Cross-Validation

[commit] The default split type (i.e., --split_type random) randomly samples data into the train, validation, and test sets on each of the num_folds folds independently. This means that the same molecule can end up in the test split on more than one fold. The advantage of this method is that it can be used easily with an arbitrary number of folds, but the downside is that it does not perform strict cross-validation.

The new split type cv (--split_type cv) performs true cross-validation. The data is broken down into num_folds pieces, each of size len(data) / num_folds, and each piece serves as the test split one, the validation split once, and part of the train split on all other folds. The benefit of this method is that it is true cross-validation, but the downside is that the size of each split is dependent on the number of folds, meaning less flexibility (e.g., --num_folds 3 will result in train, validation, and test splits each with 33.3% of the data, which is perhaps too small for the train split and too large for the test split). --num_folds 10 is recommended.

Saving Test Predictions

[commit] The --save_preds option will save predictions on the test split of each fold in a file called “testpreds.csv” in the `savedir`.

Multiple Metrics

[commit] The --metric argument still works as before and this is still the metric that is used for early stopping (i.e., selecting the model which performs best on the validation split), but now there is an additional --extra_metrics argument where additional metrics can be specified and will be recorded. The metrics should be space separated (e.g., --extra_metrics mae rmse r2).

Saving Test Scores

[commit] Scores on the test splits are now saved to file in the save_dir under the name “test_scores.csv”.

Fixes and Improvements

Undefined Rows

[commit] Rows in the input data file with target values that are all undefined are now correctly skipped. This is especially relevant when the row may contain some defined target values, but none of those targets are included in target_columns.

Data Loading

[commit] Data is now only loaded once to decrease training time.

Tests

[tests] Added more comprehensive tests to ensure correct functionality.

Train Loss

[commit] Fixed incorrect averaging of the train loss, which affects the train loss that is printed to screen and saved in tensorboard.

- Python
Published by swansonk14 over 5 years ago

chemprop - Fixing descriptastorus PyPi issue

Since descriptastorus isn't on PyPi, it can't be installed automatically via pip install chemprop. Instead, it must be installed separately via pip install git+https://github.com/bp-kelley/descriptastorus.

- Python
Published by swansonk14 almost 6 years ago

chemprop - Fixing PyPi Installation and Documentation

Fixing an issue with PyPi installation and updating relevant documentation.

- Python
Published by swansonk14 almost 6 years ago

chemprop - Release on PyPi

Chemprop is now available on PyPi: https://pypi.org/project/chemprop. Installation instructions are below.

conda create -n chemprop python=3.8
conda activate chemprop
conda install -c conda-forge rdkit 4.pip install git+https://github.com/bp-kelley/descriptastorus
pip install chemprop

After installing through PyPi, training and predicting are available via the chemprop_train and chemprop_predict commands, which are equivalent to python train.py and python predict.py. All the command line arguments for training and predicting apply as usual. Please see the README for more details.

- Python
Published by swansonk14 almost 6 years ago

chemprop -

- Python
Published by swansonk14 over 7 years ago

Recent Releases of chemprop

chemprop - v2.2.1

Cuik-molmaker

Other notable changes

What's Changed

New Contributors

chemprop - v2.2.0 Atom and bond property prediction + Foundation Models

CLI changes

Bug fixes

What's Changed

chemprop - v2.1.2

What's Changed

New Contributors

chemprop - v2.1.1

Notable changes

Bug Fixes

1084 fixed a bug where R2Score did not have the attribute task_weights. This attribute is not used but is needed for compatability with other metrics

What's Changed

chemprop - v2.1.0

Core code changes

CLI changes

New notebooks

CI/CD

Backwards Compatibility Note

What's Changed

chemprop - v2.0.5

Core code changes

CLI changes

Transfer learning

Documentation

Other small bug fixes

New Contributors

chemprop - v2.0.4

Enhancements and New Features

Bug Fixes

What's Changed

New Contributors

chemprop - v2.0.3

Notable changes

What's Changed

chemprop - v2.0.2 Adding Document Modules and hpopt Enhancement

New CLI Features

Improved --model-path CLI

Improvements for hpopt CLI

Bug fix

What's Changed

chemprop - v2.0.1 First Patch

New CLI Features

Caching in CLI

Change the default trial scheduler in HPO

Support Optuna in HPO

CLI Bug Fixes

HPO-related bugs

DDP-related bugs

Backwards incompatibility note

What's Changed

New Contributors

chemprop - v2.0.0 Stable Release

chemprop - Final Patch for Version 1

What's Changed

New Contributors

chemprop - v2.0.0 Release Candidate

chemprop - Conformal Calibration

What's Changed

New Contributors

chemprop - Bug fix for reaction atom mapping

Bug fix

What's Changed

chemprop - Atomic/bond targets prediction

Major New Features

What's Changed

chemprop - Flexible hyperparameter search, missing uncertainty target values, evaluation of different magnitude multitask targets, empty test set assignment, and DockerFile updates

Features

Flexible hyperparameter search space

Missing targets in uncertainty calibration datasets

Multitask evaluation for tasks of different magnitudes

Empty test set allowed

Updates to conda environment and docker file

Bug Fixes

Fix MCC loss for multiclass jobs

1084 fixed a bug where R2Score did not have the attribute `task_weights`. This attribute is not used but is needed for compatability with other metrics

Improved `--model-path` CLI