Recent Releases of chemprop
chemprop - v2.2.1
Cuik-molmaker
Chemprop can now use cuik-molmaker (https://github.com/NVIDIA-Digital-Bio/cuik-molmaker), a C++/python package that accelerates atom and bond featurization. Usage of cuik-molmaker accelerates Chemprop training by 1.6X and inference by 2.4X. In addition, memory usage is reduced by ~80% enables larger-scale training and inference workloads. It can be used as a drop in replacement for the featurization classes implemented in Chemprop. Use chemprop/scripts/check_and_install_cuik_molmaker.py to install the correct version for your environment. Then specify --use-cuikmolmaker-featurization if using the command line. Or if you are using chemprop in a python script, import data.LazyMoleculeDatapoint, data.CuikmolmakerDataset, and featurizers.CuikmolmakerMolGraphFeaturizer.
Other notable changes
We continue to make the command line interface easier to use. The train command, chemprop train, no longer requires a test set. Additionally, datapoint descriptors (e.g., temperature, pressure) can now be included in the main input file using a command similar to chemprop train --data-path input.csv --descriptors-columns temperature pressure.
There are also several bug fixes in this release. See the detailed PR list below.
What's Changed
- Zero-indexing refactoring of splits.json used in both tutorial & test by @jxl26 in https://github.com/chemprop/chemprop/pull/1244
- Remove unneeded folders from python distrubution by @JacksonBurns in https://github.com/chemprop/chemprop/pull/1240
- Standardizing the name Chemprop by @akshatzalte in https://github.com/chemprop/chemprop/pull/1259
- Update suggested CheMeleon citation by @JacksonBurns in https://github.com/chemprop/chemprop/pull/1267
- Enable cuik-molmaker for accelerated molecule featurization by @sveccham in https://github.com/chemprop/chemprop/pull/1253
- Fix randomly failing overfit test by @jxl26 in https://github.com/chemprop/chemprop/pull/1271
- Bug Fix: Add handling for
--from-foundationargument inhpoptby @JacksonBurns in https://github.com/chemprop/chemprop/pull/1273 - Add support for empty test set by @jxl26 in https://github.com/chemprop/chemprop/pull/1243
- Ensure models loaded for transfer learning are on CPU by @JacksonBurns in https://github.com/chemprop/chemprop/pull/1252
- Fixes multiple quantile regression bugs by @craabreu in https://github.com/chemprop/chemprop/pull/1229
- Support extra feature columns in input datasets by @jxl26 in https://github.com/chemprop/chemprop/pull/1250
- Fix bug for molecule + reaction dataset with --molecule-featurizers (introduced in #1253) by @KnathanM in https://github.com/chemprop/chemprop/pull/1274
- Make
hpoptuse--tracking-metricby @JacksonBurns in https://github.com/chemprop/chemprop/pull/1281 - Clean up code and fix bugs for cuik-molmaker (in #1253) by @KnathanM in https://github.com/chemprop/chemprop/pull/1275
New Contributors
- @jxl26 made their first contribution in https://github.com/chemprop/chemprop/pull/1244
- @sveccham made their first contribution in https://github.com/chemprop/chemprop/pull/1253
Full Changelog: https://github.com/chemprop/chemprop/compare/v2.2.0...v2.2.1
- Python
Published by KnathanM 7 months ago
chemprop - v2.2.0 Atom and bond property prediction + Foundation Models
With this release, we finish our reimplementation of chemprop v1 to be modern and maintainable. The last major feature from chemprop v1 that we plan to port to v2 is support for atom and bond property prediction. This was accomplished in #1136. Documentation for using this feature from the CLI is available here while examples for using it from a python script are available here and here. As a reminder for CLI users coming from v1, we have a helpful transition guide here. Also note that we do not support converting v1 models for atom and bond targets to v2 models as the model architecture has been simplified in v2. Now if there are multiple atom targets (or similarly multiple bond targets) a single feed forward network (FFN) with multiple outputs is used to make predictions for all those targets. A separate FFN is used for each of molecule, atom, and bond targets, as well as for each of atom prediction constraints and bond prediction constraints if those are used (see the example notebooks).
A notable new feature is the ability to use pretrained message passing layers with new predictor heads, added in #1226, with the CLI flag --from-foundation. This makes it possible to train large foundation style chemprop models on many basic chemistry tasks and then use the message passing layer weights to initialize a new model for training on other smaller datasets. An example of such a model CheMeleon is shown here.
CLI changes
- The hyperparameter search space has been updated to include 6 message passing steps as an option in #1230. This option was included in chemprop v1 and was accidentally excluded during our reimplementation in v2.
- The "scaled exponential linear unit" (SELU) activation function is removed from the hyperparameter search space in #1146 because it is normally used in self normalizing models, which chemprop does not support. In the same PR, all other torch activation modules are made available as an option via the CLI. In python scripts, customized activation functions may also now be used.
- Stereochemical information (R/S and cis/trans) is included in the default featurization. If a model is trained on molecules that do not include this stereochemical info, some of the model weights will not be updated. This could cause erroneous predictions at inference time if molecules are used with stereochemical info. To remedy this, we have added a
--ignore-stereoflag and corresponding function argumentchemprop.utils.utils.make_mol(smi, ignore_stereo = True)that tells chemprop to ignore any stereochemical info in the input molecule. See #1196 and #1216.
Bug fixes
- Previously a dataset could not be missing values if the values were bounded. This is fixed in #1203. Thank you to @lewismervin1 for the bug report and fix.
- Also fixed the output shape of dropout uncertainty predictions in #1205 thanks to a bug report by @lewismervin1.
- If matthew correlation coefficient (MCC) is used as a metric, a higher value is better, but the checkpointing callback was told that lower is better. This is fixed by #1218.
- The paths of extra atom/bond features/descriptors would not save properly in a config file. This is fixed in #1189 and #1190.
- Scalers from a pretrained GPU-trained model would not load correctly on a CPU-only machine. This is fixed in #1231. Thank you to @jonwzheng for the bug report.
Finally, not exactly related to this version of chemprop, but we have also added lists of external dependency versions that are known to work with previous versions of chemprop in #1225. This is useful if you want to use a specific version of chemprop but are unsure if it is compatible with the most up to date versions of external dependencies. For example, torch v2.6 is incompatible with earlier versions of chemprop because torch.load now uses weights_only=True by default.
What's Changed
- Allow Numpy 2+ by @JacksonBurns in https://github.com/chemprop/chemprop/pull/1193
- Adds CLI option to ignore chirality in SMILES by @craabreu in https://github.com/chemprop/chemprop/pull/1196
- Make sure bounded targets dataframe is all strings by @KnathanM in https://github.com/chemprop/chemprop/pull/1203
- Includes Chemprop logo for dark mode by @craabreu in https://github.com/chemprop/chemprop/pull/1198
- Fix dropout uncertainty output shape by @KnathanM in https://github.com/chemprop/chemprop/pull/1205
- (Data point) weights should be shape
borb x 1by @KnathanM in https://github.com/chemprop/chemprop/pull/1210 - Add warning to use v1 featurizer to converting script by @KnathanM in https://github.com/chemprop/chemprop/pull/1180
- MCC metrics higher is better by @KnathanM in https://github.com/chemprop/chemprop/pull/1218
- Expand chirality ignore to ignore bond stereochemistry by @KnathanM in https://github.com/chemprop/chemprop/pull/1216
- Make converting 2.0 to 2.1 models easier and more obvious by @JacksonBurns in https://github.com/chemprop/chemprop/pull/1191
- Add Support for Fine-Tuning Foundation Models with
--from-foundationand addCheMeleonby @JacksonBurns in https://github.com/chemprop/chemprop/pull/1226 - Add Known Working External Dependency Lists by @JacksonBurns in https://github.com/chemprop/chemprop/pull/1225
- Make features descriptors path config file-able by @KnathanM in https://github.com/chemprop/chemprop/pull/1189
- Ensure
device="cpu"when loading Scalers by @JacksonBurns in https://github.com/chemprop/chemprop/pull/1231 - Increases customizability of activation functions by @craabreu in https://github.com/chemprop/chemprop/pull/1146
- Allow importing uncertainty as a subpackage by @craabreu in https://github.com/chemprop/chemprop/pull/1237
- Fix hpopt write config by @KnathanM in https://github.com/chemprop/chemprop/pull/1190
- v2.2: Atom and bond property predictions by @KnathanM in https://github.com/chemprop/chemprop/pull/1136
- undo unintended v1 -> v2 hpopt search space changes by @KnathanM in https://github.com/chemprop/chemprop/pull/1230
Full Changelog: https://github.com/chemprop/chemprop/compare/v2.1.2...v2.2.0
- Python
Published by KnathanM 9 months ago
chemprop - v2.1.2
What's Changed
Important changes
* CLI implementation of RIGR as an option in --multi-hot-atom-featurizer-mode by @akshatzalte in https://github.com/chemprop/chemprop/pull/1172
A new featurization scheme, RIGR (Resonance Invariant Graph Representation), is now available. To access it via the CLI, use --multi-hot-atom-featurizer-mode rigr. This featurizer uses only resonance invariant features so it treats all resonance structures of a molecule identically. It uses a subset of the atom and bond features from the default v2 featurizer. With 60% fewer features, RIGR has shown comparable or superior performance across a variety of property prediction tasks in a forthcoming manuscript. An example Jupyter notebook is also provided.
Other changes * Apply task_weights to default loss function in CLI by @craabreu in https://github.com/chemprop/chemprop/pull/1170 * Check if dropout prop needs to be restored by @KnathanM in https://github.com/chemprop/chemprop/pull/1178 * Message Passing Error Message Fix by @twinbrian in https://github.com/chemprop/chemprop/pull/1161 * Fix metrics problems - Cuda-> CPU, no _defaults by @KnathanM in https://github.com/chemprop/chemprop/pull/1179 * Update convert script for v1.4 by @KnathanM in https://github.com/chemprop/chemprop/pull/1176
New Contributors
- @craabreu made his first contribution in https://github.com/chemprop/chemprop/pull/1170
Full Changelog: https://github.com/chemprop/chemprop/compare/v2.1.1...v2.1.2
- Python
Published by akshatzalte about 1 year ago
chemprop - v2.1.1
Notable changes
In #1090, we started the process of integrating logging into the core code. This will make it easier for users to control what information Chemprop prints to output. It will also make it easier for developers to include more information outputs for potential debugging.
Scipy 1.15 subtly change how logit works which caused some of our tests to fail (as the values reported were slightly different than before). The expected test values have been updated. #1142
A new example notebook has been added which demonstraits how to adapt Chemprop to work with Shapley value analysis. This is another method to lend some intepretability to Chemprop models by highlighting which atom/bond features are most impactful to the final prediction value. #938
We continue to try to make chemprop easy to use. In #1091 and #1124 we added better warnings and error messages. And in #1151 we made is easy to open the example notebooks in Google Colab. This allows people reading the docs to immediately jump in and try chemprop without needing to set up a python environment.
Bug Fixes
In #1097, we fixed a bug where the transforms for scaling extra features/descriptors were turned off during validation. This caused models trained with these extra inputs to not report accurate metrics during training, which is a problem if the "best" model is selected instead of the last model as is done in hyperparameter optimization. Training a model and using the last model was unaffected as was doing inference.
1084 fixed a bug where R2Score did not have the attribute task_weights. This attribute is not used but is needed for compatability with other metrics
In v2.1 we transitioned to using torchmetrics for our metrics and loss functions, in part because it takes care of training across multiple nodes (DDP) automatically. Our custom metric for Matthew's correlation coefficient however was not set up the way torchmetrics expected. This was fixed in #1131.
What's Changed
- splits file is json by @KnathanM in https://github.com/chemprop/chemprop/pull/1083
- add more helpful warnings about the splitting api change by @JacksonBurns in https://github.com/chemprop/chemprop/pull/1091
- Fix: Splits file can have multiple splitting schemes by @KnathanM in https://github.com/chemprop/chemprop/pull/1086
- Set all transforms to train during validation by @KnathanM in https://github.com/chemprop/chemprop/pull/1097
- updated warning to logger by @twinbrian in https://github.com/chemprop/chemprop/pull/1090
- Add task weights to r2score by @KnathanM in https://github.com/chemprop/chemprop/pull/1084
- Fix
tracking_metricoverwrite issue by @shihchengli in https://github.com/chemprop/chemprop/pull/1105 - Fix
save_individual_predictionswith ensembling by @shihchengli in https://github.com/chemprop/chemprop/pull/1110 - Add a helpful warning when invalid SMILES are passed by @JacksonBurns in https://github.com/chemprop/chemprop/pull/1124
- Fix batch size calculation for multicomponent by @KnathanM in https://github.com/chemprop/chemprop/pull/1098
- Not use
transform_variancefor unscaled targets by @shihchengli in https://github.com/chemprop/chemprop/pull/1108 - Add output size to attentive hparams by @KnathanM in https://github.com/chemprop/chemprop/pull/1133
- Fix test failure due to scipy logit by @KnathanM in https://github.com/chemprop/chemprop/pull/1142
- fix docs about extra atom descriptors by @KnathanM in https://github.com/chemprop/chemprop/pull/1139
- Fix MCC for DDP and multitask by @KnathanM in https://github.com/chemprop/chemprop/pull/1131
- V2: Add Shapley Value notebook for interpretability by @oscarwumit in https://github.com/chemprop/chemprop/pull/938
- add notebooks to colab and docs by @KnathanM in https://github.com/chemprop/chemprop/pull/1151
Full Changelog: https://github.com/chemprop/chemprop/compare/v2.1.0...v2.1.1
- Python
Published by KnathanM about 1 year ago
chemprop - v2.1.0
The v2.1 release adds the uncertainty quantification modules, including estimation, calibration, and evaluation (#937). For more details on uncertainty quantification in Chemprop, please refer to the documentation and the example notebook. Additionally, we switched the loss functions and metrics to torchmetrics (#1022). With this change we also changed the "val_loss" reported to be calculated the same as the training loss to make them comparable (#1020). We also changed Chemprop to use replicates instead of cross validation (#994) and batch normalization is now disabled by default (#1058).
Core code changes
- The
validation_loss_functionis removed in #1023. - The batch norm is disabled by default in #1058
- An new predictor,
QuantileFFN, is added in #963 BinaryDirichletLossandMulticlassDirichletLossare integrated intoDirichletLossin #1066- The split type of
CVandCV_NO_VALare removed in #994 - A models list of metric is now registered as children modules in #1020
CLI changes
- Disable batch norm by default, and it can be turned on by
--batch-norm#1058 - Many CLI flags related to uncertainty quantification are added #1010
- Quantile regression is now supported via
-t regression-quantile#963 - The cross validation (CV) is replaced with replicates. The number of replicates can be specified via
--num-replicatesand the flag--num-foldsis deprecated #994 --tracking-metricis added which is the metric to track for early stopping and checkpointing #1020
New notebooks
- An notebook showing interoperability of Chemprop featurizer w/ other libraries (DGL and PyG) #1063
- Active learning #910
- Uncertainty quantification #1071
CI/CD
- Ray can be tested on Python 3.12 #1064
USE_LIBUV: 0is added into the CI workflow #1065
Backwards Compatibility Note
Models trained with v2.0 will not load properly in v2.1 due to the loss functions file being moved. A conversion script is provided to convert a v2.0 model to one compatible with v2.1. Its usage is python chemprop/utils/v2_0_to_v2_1.py <v2_0.pt> <v2_1.pt>
data.make_split_indices now always returns a nested list. Previously it would only return a nested list for cross validation. We encourage you to use data.make_split_indices(num_replicates=X) where X is some number greater than 1, to train on multiple splits of your data to get a better idea of the performance of your architecture. If you do use only one replicate, you will need to unnest the list like so:
train_indices, val_indices, test_indices = data.make_split_indices(mols)
train_data, val_data, test_data = data.split_data_by_indices(
all_data, train_indices, val_indices, test_indices
)
train_data, val_data, test_data = train_data[0], val_data[0], test_data[0]
What's Changed
- change installed torch version on windows actions again by @shihchengli in https://github.com/chemprop/chemprop/pull/1062
- .pt instead of .ckpt by @twinbrian in https://github.com/chemprop/chemprop/pull/1060
- add ModelCheckpointing to training.ipynb so best model is used automatically by @donerancl in https://github.com/chemprop/chemprop/pull/1059
- Add ray to tests on python 3.12 by @KnathanM in https://github.com/chemprop/chemprop/pull/1064
v2.1Feature: Replicates Instead of Cross Validation Folds by @JacksonBurns in https://github.com/chemprop/chemprop/pull/994- disable libuv with env var rather than avoiding latest torch by @JacksonBurns in https://github.com/chemprop/chemprop/pull/1065
- Add new example notebook for active learning by @joelnkn in https://github.com/chemprop/chemprop/pull/910
- Fix: splits column is a string not a list by @KnathanM in https://github.com/chemprop/chemprop/pull/1074
- Update chemprop to v2.1 in https://github.com/chemprop/chemprop/pull/1038
- This PR included the following PRs:
- Rerun notebooks for v2.1 by @KnathanM in #1067
- Refactor with torchmetrics by @KnathanM in #1022
- update train docs for v2.1 by @KnathanM in #1069
- Disable batch norm by default by @jonwzheng in #1058
- Add notebook showing interoperability of Chemprop featurizer w/other libraries by @jonwzheng in #1063
- Add tracking metric options; make metrics ModuleList; other improvements by @KnathanM in #1020
- Remove old validate-loss-function function by @KnathanM in #1023
- V2: Uncertainty implementation in #1058
- This PR included the following PRs:
- Improve the docstring for uncertainty modules by @shihchengli in #986
- Add Platt calibrator by @KnathanM in #961
- Add dropout and ensemble predictors by @joelnkn in #970
- Add NLL and Spearman Uncertainty Evaluators by @am2145 in #984
- Add quantile regression by @shihchengli in #963
- Add miscalibration area and ence evaluators by @shihchengli in #1012
- Add isotonic calibrators by @KnathanM in #1053
- V2 conformal calibrators by @shihchengli in #989
- V2 conformal evaluators by @shihchengli in #1005
- Uncertainty regression calibrators (non-conformal) by @shihchengli in #1055
- Adding Evidential, MVE, and Binary Dirichlet Uncertainty Predictors by @akshatzalte in #1061
- Cleanup the uncertainty modules by @shihchengli in #1072
- Multiclass dirichlet give uncertainty by @KnathanM in #1066
- Rename uncertainty estimator by @KnathanM in #1070
- Update uncertainty notebook by @shihchengli in #1071
- Add uncertainty quantification to the predict CLI by @shihchengli in #1010
Full Changelog: https://github.com/chemprop/chemprop/compare/v2.0.5...v2.1.0
- Python
Published by shihchengli over 1 year ago
chemprop - v2.0.5
We continue to enhance and improve the functionality and usability of Chemprop. If there are things you'd like to see addressed in a future update, please open an issue or PR.
Core code changes
We discovered that our Noam learning rate scheduler does not match what was originally proposed. The current scheduler does work well though, so it was decided to not change the definition. Instead the scheduler was renamed and refactored to be more clear. By @shihchengli in https://github.com/chemprop/chemprop/pull/975
Work on uncertainty quantification methods revealed that our previous prediction tensor return dimensions would cause difficulty down the line. Now we have placed uncertainty into a separate dimension. By @hwpang in https://github.com/chemprop/chemprop/pull/959
The BinaryDirichletFFN and MulticlassDirichletFFN predictors were added early in the v2 development, but not tested. Now they have been tested and corrected. By @shihchengli in https://github.com/chemprop/chemprop/pull/1017
The RDKit 2D molecular featurizer was added back by popular demand. The versions used in v1 are available as well as a version that uses all available molecular features in rdkit.Chem.Descriptors. By @KnathanM in https://github.com/chemprop/chemprop/pull/877
CLI changes
- Log statistical summary of training, validation, and test datasets by @donerancl in https://github.com/chemprop/chemprop/pull/882
- Change the default verbose level to INFO by @shihchengli in https://github.com/chemprop/chemprop/pull/953
- Save both probabilities and class label for multiclass classification by @shihchengli in https://github.com/chemprop/chemprop/pull/987
- Add
--remove-checkpointsflag to opt out of saving checkpoints by @shihchengli in https://github.com/chemprop/chemprop/pull/1014 - Add
--class-balanceflag totrainCLI by @shihchengli in https://github.com/chemprop/chemprop/pull/1011 - Save target column names in model for use at inference by @hwpang in https://github.com/chemprop/chemprop/pull/935
- Fix
save-smiles-splitsnot working with rxn. columns as column header by @jonwzheng in https://github.com/chemprop/chemprop/pull/998
Transfer learning
- Add new example notebook for transfer learning by @joelnkn in https://github.com/chemprop/chemprop/pull/904
- Use pre-train output scaler to scale training data in CLI by @KnathanM in https://github.com/chemprop/chemprop/pull/1051
- Add
--checkpointand--freeze-encoderflags in train CLI for transfer learning by @shihchengli in https://github.com/chemprop/chemprop/pull/1007
Documentation
- Fixed typos in CLI reference and standardized formatting by @donerancl in https://github.com/chemprop/chemprop/pull/880
- Example Notebook for Classification by @twinbrian in https://github.com/chemprop/chemprop/pull/1047
- Improve frzn-ffn-layers description and update doc for transfer learning by @oscarwumit in https://github.com/chemprop/chemprop/pull/993
- add transform tests by @KnathanM in https://github.com/chemprop/chemprop/pull/955
- Add documentation for how to use a separate splits file (CLI) by @KnathanM in https://github.com/chemprop/chemprop/pull/1041
Other small bug fixes
- Convert v1 models trained on GPU by @KnathanM in https://github.com/chemprop/chemprop/pull/978
- Fix
hpoptingNotebook and CLI for Windows by @JacksonBurns in https://github.com/chemprop/chemprop/pull/1034 - Update multiclass data to be compatible with rdkit 2024.09.1 by @jonwzheng in https://github.com/chemprop/chemprop/pull/1037
- Define
task_weightsif it isNoneinMulticlassClassificationFFNby @shihchengli in https://github.com/chemprop/chemprop/pull/988 - change installed torch version on windows actions again by @KnathanM in https://github.com/chemprop/chemprop/pull/1016
- Update batch norm freezing to freeze running stats by @joelnkn in https://github.com/chemprop/chemprop/pull/952
- Pass
map_locationthroughload_submodules()totorch.load()by @shihchengli in https://github.com/chemprop/chemprop/pull/1029 - fix no-header-rows in predict command error by @sunhwan in https://github.com/chemprop/chemprop/pull/1001
New Contributors
- @sunhwan made their first contribution in https://github.com/chemprop/chemprop/pull/1001
- @twinbrian made his first contribution in https://github.com/chemprop/chemprop/pull/1047
Full Changelog: https://github.com/chemprop/chemprop/compare/v2.0.4...v2.0.5
- Python
Published by shihchengli over 1 year ago
chemprop - v2.0.4
Enhancements and New Features
This release introduces several enhancements and new features to Chemprop. A notable addition is a new notebook demonstrating Monte Carlo Tree Search for model interpretability (see here). Enhancements have been made to the output transformation and prediction saving mechanisms for MveFFN and EvidentialFFN. Additionally, users can now perform predictions on CPU even if the models were trained on GPU. Users are now also warned when not using the TensorBoard logger, helping them to be aware of available logging tools for better monitoring.
Bug Fixes
Several bugs have been fixed in this release, including issues related to Matthews Correlation Coefficient (MCC) metrics and loss calculations, and the behavior of the CGR featurizer when the bond features matrix is empty. The task_weights parameter has been standardized across all loss functions and moved to the correct device for MCC metrics, preventing device mismatch errors.
What's Changed
- Standardize
task_weightsinLossFunctionacross all loss functions by @shihchengli in https://github.com/chemprop/chemprop/pull/941 - Improve output transformation and prediction saving for
MveFFNandEvidentialFFNby @shihchengli in https://github.com/chemprop/chemprop/pull/943 - Enable CPU prediction for GPU-trained models by @snaeppi in https://github.com/chemprop/chemprop/pull/950
- Fix Issues in MCC Metrics and Loss Calculations by @shihchengli in https://github.com/chemprop/chemprop/pull/942
- Fix docs building by pinning sphinx-argparse by @jonwzheng in https://github.com/chemprop/chemprop/pull/964
- Add Monte Carlo Tree search notebook for interpretability by @hwpang in https://github.com/chemprop/chemprop/pull/924
- Fix CGR featurizer behavior when bond features matrix is empty by @jonwzheng in https://github.com/chemprop/chemprop/pull/958
- Fix Failing CI for
torch==2.4.0on Windowsray[tune]Tests by @JacksonBurns in https://github.com/chemprop/chemprop/pull/971 - warn users when not using tensorboard logger by @JacksonBurns in https://github.com/chemprop/chemprop/pull/967
- Bug: Move
task_weightsto 'device' for MCC metrics by @YoochanMyung in https://github.com/chemprop/chemprop/pull/973
New Contributors
- @snaeppi made their first contribution in https://github.com/chemprop/chemprop/pull/950
- @YoochanMyung made their first contribution in https://github.com/chemprop/chemprop/pull/973
Full Changelog: https://github.com/chemprop/chemprop/compare/v2.0.3...v2.0.4
- Python
Published by shihchengli over 1 year ago
chemprop - v2.0.3
Notable changes
The mfs argument of MoleculeDatapoint was removed in #876. This argument accepted functions which generated molecular features to use as extra datapoint descriptors. When using chemprop in a notebook, users should first manually generate their molecule features and pass them into the datapoints using x_d which stands for (extra) datapoint descriptors. This is demonstrated in the extra_features_descriptors.ipynb notebook under examples. CLI users will see no change as the CLI will still automatically calculate molecule features using user specified featurizers. The --features-generators flag has been deprecated though in favor of the more descriptive --molecule-featurizers. Available molecule features can be found in the help text generated by chemprop train -h.
The default aggregation was changed to norm in #946. This was meant to be change in version 2.0.0, but got missed. Norm aggregation was used in all the benchmarking of version 1 as it performs better than mean aggregation when predicting properties that are extensive in the number of atoms.
More documentation for the CLI hpopt and fingerprint commands have been added and can be viewed here and here.
The individual predictions of an ensemble of models are now automatically averaged and the individual predictions are saved in a separate file. #919
What's Changed
- Change the installed numpy version in pyproject by @shihchengli in https://github.com/chemprop/chemprop/pull/922
- Explicitly double save scalers/criterion by @KnathanM in https://github.com/chemprop/chemprop/pull/898
- Add
--show-individual-scoresCLI flag by @shihchengli in https://github.com/chemprop/chemprop/pull/920 - Set Ray Train's trainer resources to 0 by @hwpang in https://github.com/chemprop/chemprop/pull/928
- Save individual and average predictions into different files by @shihchengli in https://github.com/chemprop/chemprop/pull/919
- Add CLI pages for hpopt and fingerprint by @jonwzheng in https://github.com/chemprop/chemprop/pull/914
- Make fingerprint CLI consistent with predict CLI by @hwpang in https://github.com/chemprop/chemprop/pull/927
- Fix issue related to target column for fingerprint by @hwpang in https://github.com/chemprop/chemprop/pull/939
- build molecule featurizer in parsing by @KnathanM in https://github.com/chemprop/chemprop/pull/875
- Remove featurizing from datapoint by @KnathanM in https://github.com/chemprop/chemprop/pull/876
- change aggregation default to norm by @KnathanM in https://github.com/chemprop/chemprop/pull/946
- Use mol.GetBonds() instead of for loop by @KnathanM in https://github.com/chemprop/chemprop/pull/931
Full Changelog: https://github.com/chemprop/chemprop/compare/v2.0.2...v2.0.3
- Python
Published by KnathanM over 1 year ago
chemprop - v2.0.2 Adding Document Modules and hpopt Enhancement
In this release, we have included numerous notebooks to document modules. Chemprop may be used in python scripts, allowing for greater flexibility and control than the CLI. We recommend first looking through some of the worked examples to get an overview of the workflow. Then further details about the creation, customization, and use of Chemprop modules can be found in the module tutorials.
New CLI Features
Improved --model-path CLI
Previously --model-path could take either a single model file or a directory containing model files. Now it can take any combination of checkpoint files (.ckpt), model files (.pt), and directory containing model files. Directories are recursively searched for model file (.pt). Chemprop will use all models given and found to make predictions (#731).
Improvements for hpopt CLI
Some flags related to Ray Tune (i.e., --raytune-temp-dir, --raytune-num-cpus, --raytune-num-gpus, and --raytune-max-concurrent-trials) have been added. You can use the CLI to initiate your Ray instance using these flags. (#918)
Bug fix
An incorrect max learning rate was used when writing the config file after hyperparameter optimization. This is now fixed (#913).
What's Changed
- Fix typos in docstrings and .rst files that led to rendering errors by @jonwzheng in https://github.com/chemprop/chemprop/pull/901
- Add CLI transition guide link to RTD by @kevingreenman in https://github.com/chemprop/chemprop/pull/907
- Add meaningful warning for warm up epoch search space by @hwpang in https://github.com/chemprop/chemprop/pull/909
- Fixing small bug in hpopt for learning rate by @akshatzalte in https://github.com/chemprop/chemprop/pull/913
- Add notebooks to document modules by @KnathanM in https://github.com/chemprop/chemprop/pull/834
- V2: consolidate
--checkpointCLI by @hwpang in https://github.com/chemprop/chemprop/pull/731 - Improvements for hpopt cli by @hwpang in https://github.com/chemprop/chemprop/pull/918
Full Changelog: https://github.com/chemprop/chemprop/compare/v2.0.1...v2.0.2
- Python
Published by shihchengli over 1 year ago
chemprop - v2.0.1 First Patch
New CLI Features
Caching in CLI
MolGraphs are now created (by featurizing molecules) and cached at the beginning of training by default in the CLI. If you wish to disable caching, you can use the --no-cache flag, which will featurize molecules on the fly instead. (#903)
Change the default trial scheduler in HPO
We changed the default trial scheduler for HPO from AsyncHyperBand to FIFO, as it is the default in Ray and was used in version 1. You can switch the trial scheduler back to AsyncHyperBand by using --raytune-trial-scheduler AsyncHyperBand if needed. (#896)
Support Optuna in HPO
You can use optuna as a HPO search algorithm via --raytune-search-algorithm optuna. (#888)
CLI Bug Fixes
HPO-related bugs
In #873, we changed the search space for the initial and final learning rate ratios and max_lr to avoid very small (~10^-10) learning rates and also ensured that some hyperparameters are saved as integers instead of floating-point numbers (e.g., batch_size). In #881, we addressed the bug concerning the incompatibility of the saved config file with the training config. In #836, we shut down Ray processes after HPO completion to avoid zombie processes. For those encountering issues with Ray processes, we suggest you start Ray outside of the Python process.
DDP-related bugs
In #884, we resolved the issue where metrics were not synchronized across processes and disabled the distributed sampler during testing in DDP.
Backwards incompatibility note
In #883, we fixed the bug related to unused parameters in DDP. Models created via the CLI in v2.0.0 without additional atomic descriptors cannot be used via the CLI in v2.0.1. You will need to first remove message_passing.W_d.weight and message_passing.W_d.bias from the model file's state_dict to make it compatible with the current version.
What's Changed
- update v2 installation instructions page in docs by @kevingreenman in https://github.com/chemprop/chemprop/pull/831
- Remove Ray zombie processes by @shihchengli in https://github.com/chemprop/chemprop/pull/836
- Docker images for v2 by @JacksonBurns in https://github.com/chemprop/chemprop/pull/841
- Change Docker sytnax for MyBinder compatibility by @JacksonBurns in https://github.com/chemprop/chemprop/pull/872
- [V2] Fix featurizer cli by @hwpang in https://github.com/chemprop/chemprop/pull/865
- Fix hyperparameter predictorbase by @c-w-feldmann in https://github.com/chemprop/chemprop/pull/832
- V2: Add all notebooks to test by @hwpang in https://github.com/chemprop/chemprop/pull/840
- Fix small bugs in hpopt by @akshatzalte in https://github.com/chemprop/chemprop/pull/873
- Add pip setup step to environment.yml install instructions by @cjmcgill in https://github.com/chemprop/chemprop/pull/889
- Avoid scrambling target column name order by @JacksonBurns in https://github.com/chemprop/chemprop/pull/893
- Fix unused parameters issue in DDP by @shihchengli in https://github.com/chemprop/chemprop/pull/883
- Fix the inference issue related to the target columns by @shihchengli in https://github.com/chemprop/chemprop/pull/895
- Change the default trial scheduler to
FIFOSchedulerby @shihchengli in https://github.com/chemprop/chemprop/pull/896 - Add Optuna support for HPO by @shihchengli in https://github.com/chemprop/chemprop/pull/888
- Fix Circular Import with
isortby @JacksonBurns in https://github.com/chemprop/chemprop/pull/887 - make LookupAction work with ConfigArgParse by @KnathanM in https://github.com/chemprop/chemprop/pull/900
- V2: Fix typo in hpopt installation instruction by @hwpang in https://github.com/chemprop/chemprop/pull/897
- V2: Make hpopt config compatible with training config by @hwpang in https://github.com/chemprop/chemprop/pull/881
- Fix DDP prediction and checkpoint Issues by @shihchengli in https://github.com/chemprop/chemprop/pull/884
- Add simple cache to CLI by @KnathanM in https://github.com/chemprop/chemprop/pull/903
- V2: Fix small hpopt bugs and add example notebook by @hwpang in https://github.com/chemprop/chemprop/pull/842
New Contributors
- @akshatzalte made his first contribution in https://github.com/chemprop/chemprop/pull/873
Full Changelog: https://github.com/chemprop/chemprop/compare/v2.0.0...v2.0.1
- Python
Published by shihchengli over 1 year ago
chemprop - v2.0.0 Stable Release
This is the first stable release of Chemprop v2.0.0, with updates since the v2.0.0-rc.1 release candidate in early March.
The primary objectives of v2.0.0 are making Chemprop more usable from within Python scripts, more modular, easier to maintain and develop, more compute/memory efficient, and usable with PyTorch Lightning. Some features will not be migrated from v1 to v2 (e.g. web, sklearn). Some v1 features will be added in later versions of v2 (v2.1+) (e.g. uncertainty, interpret, atom- and bond-targets); see milestones here. The new version also has substantially faster featurization speeds and much higher unit test coverage, enables training on multiple GPUs, and works on Windows (in addition to Linux and Mac). Finally, the incorporation of a batch normalization layer is expected to result in smoother training and improved predictions. We encourage all Chemprop users to try using v2.0.0 to see how it can improve their workflows.
v2 documentation can be found here.
There are v2 tutorial notebooks in the examples/ directory.
A helpful transition guide from Chemprop v1 to v2 can be found here. This includes a side-by-side comparison of CLI argument options, a list of which arguments will be implemented in later versions of v2, and a list of changes to default hyperparameters.
Note that if you install from source, the primary branch of our repository has been renamed from master to main.
Due to development team bandwidth, Chemprop v1 will no longer be actively developed, so that we can focus our efforts on v2. Bug reports and questions about v1 are still welcome to benefit users who haven't yet made the switch to v2, but bug reports will not be fixed by the development team.
Please let us know of any bugs you find, questions you have, or enhancements you want in Chemprop v2 by opening an issue.
- Python
Published by kevingreenman almost 2 years ago
chemprop - Final Patch for Version 1
This is the final release of chemprop v1. All future development will be done on chemprop v2. The development team is still happy to answer questions about v1, but no new feature requests or PRs for v1 will be accepted. Users who identify bugs in v1 are still encouraged to open issues to report them - they will be tagged as v1-wontfix to signify that we won't be publishing fixes for them in official chemprop releases, but the bugs can still be open to community discussion.
We encourage all users to try migrating their workflows over to chemprop v2 (available now as a release candidate, stable version planned to be released within the next week) and let us know of any issues you encounter. All v1 releases will remain available on PyPI, and the v1 source code will remain available in this GitHub organization.
What's Changed
- fix the
uncal_varsfor atom/bond property prediction by @shihchengli in https://github.com/chemprop/chemprop/pull/712 - [v1]: Add Docker Image Building Action and Official Images to DockerHub by @JacksonBurns in https://github.com/chemprop/chemprop/pull/718
- remove macos and windows from v1 ci by @JacksonBurns in https://github.com/chemprop/chemprop/pull/720
- update docker build
ifto use correct upstream branch name by @JacksonBurns in https://github.com/chemprop/chemprop/pull/723 - fix the task names by @shihchengli in https://github.com/chemprop/chemprop/pull/725
- Fixed typo in README.md by @willspag in https://github.com/chemprop/chemprop/pull/745
New Contributors
- @willspag made their first contribution in https://github.com/chemprop/chemprop/pull/745
Full Changelog: https://github.com/chemprop/chemprop/compare/v1.7.0...v1.7.1
- Python
Published by kevingreenman almost 2 years ago
chemprop - v2.0.0 Release Candidate
This is a release candidate for Chemprop v2.0.0, to be released in April 2024.
The primary objectives of v2.0.0 are making Chemprop more usable from within Python scripts, more modular, easier to maintain and develop, more compute/memory efficient, and usable with PyTorch Lightning. Some features will not be migrated from v1 to v2 (e.g. web, sklearn). Some v1 features will be added in later versions of v2 (v2.1+) (e.g. uncertainty, interpret, atom- and bond-targets); see milestones here. The new version also has substantially faster featurization speeds and much higher unit test coverage, enables training on multiple GPUs, and works on Windows (in addition to Linux and Mac). Finally, the incorporation of a batch normalization layer is expected to result in smoother training and improved predictions. The label as a “release candidate” reflects its availability to be downloaded via PyPI and that only minor changes are expected for the Python API before the final release. We expect most remaining changes before the release of v2.0.0 in April to be focused on additional improvements to the command line interface (CLI), which does not yet have feature parity with v1. We encourage all Chemprop users to try using v2.0.0-rc.1 to see how it can improve their workflows.
The v2 documentation can be found here.
There are tutorial notebooks for v2 in the examples/ directory.
A helpful transition guide from v1 to v2 can be found here. This includes a side-by-side comparison of CLI argument options, a list of which arguments will be implemented in later versions of v2, and a list of changes to default hyperparameters.
You can subscribe to our development status and notes for this version: https://github.com/chemprop/chemprop/issues/517.
Ongoing work for this version is available on the v2/dev branch.
Please let us know of any bugs you find by opening an issue.
- Python
Published by kevingreenman about 2 years ago
chemprop - Conformal Calibration
What's Changed
- new split per molecular weight by @soulios in https://github.com/chemprop/chemprop/pull/456
- Specify license for Chemprop logos by @mliu49 in https://github.com/chemprop/chemprop/pull/461
- Add
todo.mdby @davidegraff in https://github.com/chemprop/chemprop/pull/492 - Update authors list in license file and alphabetically sort by @cjmcgill in https://github.com/chemprop/chemprop/pull/532
- update authors in LICENSE and setup files for v1 by @kevingreenman in https://github.com/chemprop/chemprop/pull/533
- Fix Transpose bug in Inequality Regression by @cjmcgill in https://github.com/chemprop/chemprop/pull/308
- Add Dirichlet Evidential Uncertainty Quantification by @cjmcgill in https://github.com/chemprop/chemprop/pull/423
- New metrics by @soulios in https://github.com/chemprop/chemprop/pull/542
- Updating README with ADMET-AI details by @swansonk14 in https://github.com/chemprop/chemprop/pull/554
- Improve error message when gilbrat is needed. by @KnathanM in https://github.com/chemprop/chemprop/pull/569
- limit chempropv1 python version to 3.7, 3.8 only by @JacksonBurns in https://github.com/chemprop/chemprop/pull/618
- Add a
CITATIONS.bibby @JacksonBurns in https://github.com/chemprop/chemprop/pull/627 - Limit Maximum Allowed
flaskVersion in v1 by @JacksonBurns in https://github.com/chemprop/chemprop/pull/628 - move numunctasks definition to ensure always defined by @kevingreenman in https://github.com/chemprop/chemprop/pull/632
- Switching np.mean to np.nanmean to handle NaN metrics by @swansonk14 in https://github.com/chemprop/chemprop/pull/453
- Fix the dtype for targets of different sizes by @shihchengli in https://github.com/chemprop/chemprop/pull/638
- Add setters for atom and bond constraints by @shihchengli in https://github.com/chemprop/chemprop/pull/637
- switch v1 readthedocs build from conda to mamba by @kevingreenman in https://github.com/chemprop/chemprop/pull/660
- Fix v1 docs theme by @kevingreenman in https://github.com/chemprop/chemprop/pull/669
- Conformal Calibration by @danielxu9393 in https://github.com/chemprop/chemprop/pull/304
- add note on feature releases and instructions for ssl+ddp by @JacksonBurns in https://github.com/chemprop/chemprop/pull/685
- remove unnecessary argument for reshape function by @shihchengli in https://github.com/chemprop/chemprop/pull/671
- Fix atom/bond property prediction with atom-mapped SMILES and target classification by @shihchengli in https://github.com/chemprop/chemprop/pull/673
- Pass num_workers to MoleculeDataLoader during interpretation by @kevingreenman in https://github.com/chemprop/chemprop/pull/691
- conformal quantile prediction bug fix by @shihchengli in https://github.com/chemprop/chemprop/pull/693
New Contributors
- @soulios made their first contribution in https://github.com/chemprop/chemprop/pull/456
- @danielxu9393 made their first contribution in https://github.com/chemprop/chemprop/pull/304
Full Changelog: https://github.com/chemprop/chemprop/compare/v1.6.1...v1.7.0
- Python
Published by kevingreenman about 2 years ago
chemprop - Bug fix for reaction atom mapping
Bug fix
PR #383 unexpectedly broke the atom mapping for reaction mode. The issue is described in Issue #426 and fixed by PR #427.
What's Changed
- Fix versioning issues - metadata and dependencies by @kevingreenman in https://github.com/chemprop/chemprop/pull/420
- add job to tests action for PyPI package by @JacksonBurns in https://github.com/chemprop/chemprop/pull/422
- added chemprop manuscript to readme by @hesther in https://github.com/chemprop/chemprop/pull/425
- Keep Support for Python 3.7 and 3.8 when fixing
gilbratIssue by @JacksonBurns in https://github.com/chemprop/chemprop/pull/431 - Fix reaction atom mapping by @shihchengli in https://github.com/chemprop/chemprop/pull/427
Full Changelog: https://github.com/chemprop/chemprop/compare/v1.6.0...v1.6.1
- Python
Published by kevingreenman over 2 years ago
chemprop - Atomic/bond targets prediction
Major New Features
- Atomic/bond targets prediction by @shihchengli in https://github.com/chemprop/chemprop/pull/280
What's Changed
- Replace multiclass mcc with 1-mcc for loss by @cjmcgill in https://github.com/chemprop/chemprop/pull/332
- Add chemprop logo by @shihchengli in https://github.com/chemprop/chemprop/pull/339
- Add CodeQL workflow for GitHub code scanning by @lgtm-com in https://github.com/chemprop/chemprop/pull/344
- Add to the description of evidential regularization by @cjmcgill in https://github.com/chemprop/chemprop/pull/353
- Remove deprecated numpy float types by @cjmcgill in https://github.com/chemprop/chemprop/pull/357
- Correct a bug in ENCE uncertainty evaluation by @cjmcgill in https://github.com/chemprop/chemprop/pull/360
- Hyperopt Parallel Race Conditions and Manual Trial Load by @cjmcgill in https://github.com/chemprop/chemprop/pull/307
- Simplified install with PyPI
rdkitand git install insetup.pyby @JacksonBurns in https://github.com/chemprop/chemprop/pull/364 - Allow providing both loaded features and a features generator by @shihchengli in https://github.com/chemprop/chemprop/pull/318
- For any multiclass task,
make_predictionsfails if option --individualensemblepredictions is on. by @piotr-semenov in https://github.com/chemprop/chemprop/pull/354 - Save loaded molecular features into .npy files by @shihchengli in https://github.com/chemprop/chemprop/pull/337
- Ignore invalid atom-mapped SMILES by @shihchengli in https://github.com/chemprop/chemprop/pull/367
- Molecule fingerprinting with invalid SMILES in list by @shihchengli in https://github.com/chemprop/chemprop/pull/351
- change calibrationfeaturespath from str to List[str] by @ceroth in https://github.com/chemprop/chemprop/pull/358
- Change logo style by @shihchengli in https://github.com/chemprop/chemprop/pull/369
- Clamp evidential 'v' parameter by @kevingreenman in https://github.com/chemprop/chemprop/pull/371
- fix colab demo by @kevingreenman in https://github.com/chemprop/chemprop/pull/368
- Avoid OverflowError when setting field size to sys.maxsize by @shihchengli in https://github.com/chemprop/chemprop/pull/373
- Set atom and bond constraints when loading model by @shihchengli in https://github.com/chemprop/chemprop/pull/374
- Readme updates by @kevingreenman in https://github.com/chemprop/chemprop/pull/385
- Remove atom map numbers for scaffold splits by @shihchengli in https://github.com/chemprop/chemprop/pull/383
- update bug report template - ask for full stack trace by @kevingreenman in https://github.com/chemprop/chemprop/pull/401
- Fix t-SNE script by @kevingreenman in https://github.com/chemprop/chemprop/pull/403
- Fixing skipped lines in csv writing when using a windows computer by @cjmcgill in https://github.com/chemprop/chemprop/pull/406
Full Changelog: https://github.com/chemprop/chemprop/compare/v1.5.2...v1.6.0
- Python
Published by kevingreenman over 2 years ago
chemprop - Flexible hyperparameter search, missing uncertainty target values, evaluation of different magnitude multitask targets, empty test set assignment, and DockerFile updates
Features
Flexible hyperparameter search space
The parameters to be included in hyperparameter optimization can now be selected using the argument --search_parameter_kewords {list-of-keywords}. The parameters supported are: activation, aggregation, aggregationnorm, batchsize, depth, dropout, ffnhiddensize, ffnnumlayers, finallr, hiddensize, initlr, maxlr, warmupepochs. Some special kewords are also included for groups of keywords or different search behavior: basic, learningrate, all, linkedhiddensize.
PR #299
Missing targets in uncertainty calibration datasets
Added capabilities to the uncertainty calibration and evaluation methods to allow them to handle missing target values in multitask jobs. This capability was already included in the normal training of models, now implemented in uncertainty calibration and evaluation. PR #295 Issue #292
Multitask evaluation for tasks of different magnitudes
When evaluation metrics tend to scale with the magnitude of a task (e.g., rmse), averaging metrics between tasks has been replaced with a geometric mean function. This makes the average metric in multitask regression jobs be less dominated by large magnitude targets. This was previously an issue for hyperparameter optimization and the evaluation of optimal epoch during model training, though the calculation of loss for gradient descent is on scaled targets and was already not scale dependent. PR #290
Empty test set allowed
An empty test split can now be used during training. This was previously possible only using the cv-no-test split method, but now it is available more widely when specifying split sizes, for example with --split_sizes 0.8 0.2 0.
PR #284, #260 related
Issue #279
Updates to conda environment and docker file
Conda environment building will now prefer to use the pytorch channel over the conda-forge channel. The Dockerfile has been updated to use micromamba, allowing for faster environment solves than conda and removing a potential licensing issue. PR #276
Bug Fixes
Fix MCC loss for multiclass jobs
Corrected a calculation problem in the loss function that was returning infinite loss inappropriately. Also adopted the convention of returning loss of zero when infinite loss is returned, as often happens in very unbalanced datasets. Added appropriate unit testing. PR #309 Issue #306
Correct code error in ence uncertainty evaluation
Corrects an error in the ence uncertainty evaluation method that made that method unusable. Bug was introduced during PR #305. PR #302 Issue #301
Fixed link to MoleculeNet website
Corrected the link to the MoleculeNet benchmark dataset website in the readme, following MoleculeNet migrating to a new site location. PR #296
Multitarget uncertainty calibration mve weighting method
Previously, this method only worked for single task jobs, now has been extended to work for multitask models as well. PR #291
Remove unused verion.py file
Version tracking in Chemprop no longer uses the version.py file and it was removed. PR #283
Multiclass argument typo in readme
Corrected a typo where the number of classes used in multiclass regression should have been indicated as --multiclass_num_classes.
PR #281
Repair individual ensemble predictions
Refactoring of prediction file during the addition of uncertainty functions disabled the option to return the individual predictions of each member of an ensemble of models. Option is now available again. PR #274
- Python
Published by cjmcgill over 3 years ago
chemprop - Quick Fix to Uncertainty Evaluation
Bugfix
Inconsistent Path For Uncertainty Evaluation
Fixed a bug in uncertainty evaluation where the uncertainty evaluator was using the path name originally used to train a checkpoint. This made the uncertainty evaluator only work in the case that the test data and training data used in initial model training had the same path.
- Python
Published by cjmcgill almost 4 years ago
chemprop - Uncertainty Functions, Reaction-Solvent Models, Loss Function Options, Keyed Splitting, and Chemprop Colab Demo
Features
Uncertainty Tools
Tools added for uncertainty quantification, calibration, and evaluation as part of the chemprop predict function. Uncertainty predictions are saved as part of the predictions file. Uncertainty functions and outputs are triggered using the arguments --uncertainty_method {method}.
Uncertainty outputs can be calibrated using an outside dataset (evaluation set from training is often suitable) in order to have better uncertainty estimates on new predictions. Can be activated using --calibration_method {method} and --calibration_path {path-to-csv}. For the regression dataset type, a calibrated output can provide either a standard deviation or one-sided interval bound, as set with the options --regression_calibrator_metric {stdev-or-interval} and --calibration_interval_percentile {int}.
If the data file containing smiles for the test path also contains target values, the uncertainty performance can be evaluated using various metrics, activated with the option --evaluation_methods {list-of-methods}.
Internally, this PR creates several classes for carrying out prediction tasks: UncertaintyEstimator, UncertaintyPredictor, UncertaintyCalibrator, UncertaintyEvaluator. Loss functions have been added that have auxiliary uncertainty outputs, mve and evidential for regression.
PR #267
PR #269
Reaction-Solvent Option
Gives the option to train a chemprop model using one reaction and one molecule for each datapoint. Active when used with the option --reaction_solvent. Options for making the solvent mpnn use different parameters than that for the reaction are possible using --bias_solvent, --hidden_size_solvent {int}, and --depth_solvent {int}.
PR #246
Multimolecule Fingerprinting
Added some new changes for fingerprint functions with multiple molecules. Models trained with a "shared-mpn" between two molecules can return a MPN fingerprint with only one molecule provided. Also, when multiple molecule models are used for MPN fingerprint generation, the output will indicate which molecule each element belongs to. PR #242 Issue #236
Colab Notebook Examples
Created a Jupyter notebook that runs examples of Chemprop jobs, specifically as the functions can be used in python. Good resource for new users, demonstrations, or tutorials. Linked to Google Colab so that it can be run remotely, not requiring any local install of Chemprop. PR #239 PR #273
Loss Function Options
Previously, loss functions were selected automatically based on the dataset type being used in model training. Now the loss function can be selected with --loss_function {function}. Some new specialty loss functions have been added with this capability.
* Matthews Correlation Coefficient (mcc) is a loss function for classification and multiclass that considers True Positives, True Negatives, False Positives, False Negatives separately in the loss function, avoiding domination by one class and making it well suited to unbalanced training sets.
* Bounded Mean Squared Error (bounded_mse) is a regression loss function that allows for training targets expressed as inequalities, e.g. ">5.0". Intended for use with experimental data with delimited ranges.
* Mean Variance Estimation (mve) and evidential loss are regression loss functions that maximize the likelihood of the target on an estimated uncertainty distribution. When used as loss functions, the outputs of these functions can be used in uncertainty estimation.
Appropriate metrics have been added along with these loss functions.
PR #238
PR #267
Development Environment
GitHub Addons
Added a CONTRIBUTING.md file with guidelines for how users can contribute to Chemprop. New templates are now available for issue submission that distinguish between different issue types: bug report, feature request, and questions. New templates also suggested for PRs. Templates stored in the .github directory.
PR #241
Unit Testing
Part of an ongoing effort to include a more complete set of automated tests for Chemprop. Unit tests added for data utils, uncertainty-related loss functions, and the uncertainty evaluation metrics. PR #232 PR #267 PR #269
Flake8 Formatting
Ongoing effort to standardize the formatting of incoming code. New PRs now request/require the new code to be flake8 compliant in formatting. The utils module and files significantly associated with the new uncertainty function are flake8 compliant. PR #241 PR #258 PR #267
Update Versioning
Changed the way that version numbers are stored and updated throughout the code. PR #247
Remove Assertion Errors
Removed many of the assertion errors throughout Chemprop and replaced them with more easily interpretable error types and messages. PR #257
Bug Fixes
Hyperopt Version Fix
Changed the way that random seeds are passed into hyperopt during hyperparameter optimization to avoid an error where hyperopt stopped supporting a previously supported way of passing numpy seeds. PR #245 Issue #243 Issue #254 Issue #264
- Python
Published by cjmcgill almost 4 years ago
chemprop - Prediction function output options, multi-molecule splitting, and explicit H atoms in message passing
Features
Allow the inclusion of H atoms in message passing
Default model behavior is to treat H atoms implicity with their neighbors. With the previously existing argument --explicit_h, explicit H atoms included in the SMILES string would be considered during message passing. This PR adds a new argument --adding_h, which would make all H atoms treated explicitly during message-passing.
PR #225 and #227
Allow splitting by different key molecules in multi-molecule models
The data-splitting methods scaffold_balanced and random_with_repeated_smiles can only consider one molecule per datapoint in adhering to the constraints of which data must share splits with each other. This PR creates an argument --split_key_molecule {int}, which is used to select which molecule in multi-molecule datasets will be used for the splitting determination.
PR #230
Select split fractions when separate test data is provided
Previously, the split fractions for training/validation were hardcoded as 80/20 when test data was provided via --separate_test_path. Split fractions can now be specified in this case using --split_sizes as normal.
PR #230
Additional output options for make_predictions function
This change affects usage of make_predictions as a python function, rather than in the whole Chemprop workflow. When used as a python function, make_predictions would return the predictions for a set of SMILES, but would skip the invalid SMILES without indicating which ones were skipped. Now this function has two new option arguments: 1) return_invalid_smiles that includes invalid SMILES in the output but with "Invalid SMILES" as the prediction value and 2) return_index_dict that returns predictions of the model in a dictionary keyed to the original data indices.
PR #235
New utility functions for identifying invalid SMILES
New functions have been added to chemprop/data/utils.py to allow users to identify datapoints that have invalid SMILES. These functions are get_invalid_smiles_from_file and get_invalid_smiles_from_list.
PR #235
Bug Fixes
Simultaneous use of extra atom features and extra bond features
Bug prevented using extra atom features and extra bond features at the same time and has been resolved. PR #215 Issue #213
Fixed install error with newer versions of pip
Newer versions of pip failed to install some some chemprop dependencies properly. These dependencies (flake8, pytest, parameterized) were moved to an installation as part of the conda environment rather than by pip. Also, environment build for testing was changed from conda to mamba for better install speed. PR #215 and #216
Correction in tutorial file
Tutorial file changed to show the proper list of lists format for SMILES. PR #218
Predicting for a multiclass model with an improper SMILES
When making a prediction for an improper SMILES in a multiclass model, an error would be triggered instead of returning a prediction of "Invalid SMILES". This has been corrected for this case and the parallel case of improper SMILES used with --individual_ensemble_predictions.
PR #229
Molecule fingerprints generated with extra atom features
Molecule fingerprints could not be predicted when extra atom features were provided as part of the model. This and the parallel issue with extra bond features have been addressed. PR #234 Issue #233
- Python
Published by cjmcgill about 4 years ago
chemprop - Model preloading, hyperparameter optimization improvements, spectra training, latent representations, and more.
Features
Spectra training
Introduces spectra as a new dataset type available for training, in which each target in a multitarget regression refers to a positive intensity value in one position of a spectrum. Training methods are consistent with https://github.com/gfm-collab/chemprop-IR. Default loss function is spectral information divergence (SID), but Wasserstein loss (earthmover distance) is also supported with --metric wasserstein --alternative_loss_function wasserstein.
PR #197
Preloading model in predictions
Refactored the make_predictions into smaller functions for better capability to use chemprop functions as a python library. Refactoring specficially designed to allow for the loading of a model using the function chemprop.train.load_model a single time and then using it for multiple instances of predictions by feeding that model as an argument to chemprop.train.make_predictions.
PR #200
Improved hyperparameter optimization
Added several new features to hyperparameter optimization, many related to hyperparameter checkpoints saved in the location specified by --hyperopt_checkpoint_dir <dir_path>. The new functionalities:
* Restarting failed hyperparameter optimization jobs by selecting the same checkpoint directory.
* Parallelizing multiple instances of hyperparameter optimization by setting a shared checkpoint directory among instances.
* Seeding hyperparameter optimizations with previously run jobs by indicating an old checkpoint directory and/or by specifying the save directories of relevant jobs trained with train.py using -manual_trial_dirs <list-of-directories>.
* Manually set the number of hyperparameter trials that use randomized parameters before directed TPE search begins using --startup_random_iters <int, default=10>.
PR #208
Return results from all ensemble models
When making predictions from an ensemble of models, returns the mean prediction but also the individual predictions from the individual models when --individual_ensemble_predictions is specified.
PR #190
Latent representations for ensembles and from FFN layers
Allows for the calculation of latent fingerprints from an ensemble of models by concatenating them together. Also allows for the return of either a latent representation from the MPNN output or from the next-to-last FFN layer using the argument --fingerprint_type <MPN or last_FFN>.
PR #193
Target imputation for sklearn multitask models
Sklearn multitask training cannot proceed with missing targets among the data, previously would have needed to be run as multiple singletask models. This PR introduces target imputation for missing data to allow multitask sklearn training even when some data is missing with the argument --impute_mode <model/linear/median/mean/frequent> indicating which method to use for imputation.
PR #210
Issue #211
Reaction balancing
Adds options in reaction training for how to handle situations where reactants and products are not balanced. The argument --reaction_mode now also has the options reac_diff_balance, prod_diff_balance, and reac_prod_balance (in addition to the current options reac_diff, prod_diff, and reac_prod). Also fixes an error where atomic numbers are incorrect when an atom is present in the products but not in the reactants.
PR #212
Issue #204
Bug Fixes
Interactions with git repos
Resolves a problem with TAP (typed-argument-parser) where running Chemprop from inside a different git repo would trigger an error related to the generation of a reproducibility hash. In this situation the reproducibility hash is not generated, but it logs the issue and does not stop Chemprop from running. PR #195
Global features structure
Changes the way that global variables related to model construction and feature vector size are handled. Resolves a problem in pytest where these variables wouldn't reset between runs. PR #206
- Python
Published by cjmcgill over 4 years ago
chemprop - Resume interrupted training, frozen layer pre-training, target/data weighted training, and more
Features
Resume training on multiple folds if interrupted
As training progresses through folds of a multiple fold model, the results of each individual fold are stored in a JSON file. If training is interrupted, the completed fold results will be read from the JSON file and resume on the first uncompleted fold if using the flag --resume_experiment.
PR #164
Frozen layers for pre-training
Added functionality to freeze the MPN or FFN layers in a model being trained at the values of a previously trained model. Freezes MPN values using a model indicated with --checkpoint_frzn <path>. FFN layers will also be frozen if indicated with --frzn_ffn_layers <number-of-layers>. Models with multiple molecules can select to only freeze the first molecule MPN using --freeze_first_only.
PR #170
tSNE functionality
Added HDBScan clustering to the tSNE script. PR #172
Weighted training by target and by datapoint
Added training weights for different targets and different datapoints, with normalization of weight values. Target weights indicated with the argument --target_weights <list-of-values>. Data weights supplied through an input file indicated with the argument --data_weights_path <path>.
PR #173, #175, #189
Issue #145
Bug Fixes
MPNN input
Providing SMILES or RDKit molecules to the MPN's forward function failed (only BatchMolGraph worked) following other changes. Now, SMILES and RDKit molecules can once again be used as input.
PR #164
Backwards compatibility with old checkpoints
Backwards compatibility for features scaling PR #164 Issue #108
Updated readme
Added information to the readme and documentation of pre-training, treatment of missing values in multitask models and caching. PR #165 Issue #156
Multiclass classification
Corrected error when using the metric accuracy with multiclass classification.
PR #169
RDKit Compatibility
Bugfix for compatibility issues of RDKit 2021.03.01 with the interpretation script. PR #182 Issue #178
- Python
Published by cjmcgill over 4 years ago
chemprop - Custom atom/bond features, epistemic uncertainty, reaction option, bug fix for atom/bond features
New Features
Custom atom/bond features
Enabled custom input of atom and bond features either in addition or instead of the default features.
PR: https://github.com/chemprop/chemprop/pull/137
Epistemic uncertainty
Introduced the argument --ensemble_variance which calculates the epistemic uncertainty of predictions via an ensemble of models.
PR: https://github.com/chemprop/chemprop/pull/140
Reaction option
Introduced CGR option - input of atom-mapped reaction smiles instead of molecules. This creates a pseudo-molecule of the graph transition state between reactants and products, and performs message passing on this pseudo-molecule
PR: https://github.com/chemprop/chemprop/pull/152
Latent representation
Added a new functionality that saves the latent representation of a molecule (the MPNN output), which can be used similar to predicting with a given checkpoint file, and saves the MPNN output to file.
PR: https://github.com/chemprop/chemprop/pull/119
Preprocessing updates
Updates to the preprocessing, handling and saving of smiles strings. Removed redundant checks.
PR: https://github.com/chemprop/chemprop/pull/135
Resume experiments
Experiments with multiple folds can now be resumed using the --resume_experiment flag. Additionally, the test results of each fold are saved as a JSON file in the corresponding subfolder in save_dir.
PR: https://github.com/chemprop/chemprop/pull/164
Bug Fixes
Atom messages
Major bugfix for running Chemprop with the argument --atom_messages, where the wrong features were passed to the MPNN. This improves the performance of Chemprop in atom_messages mode, and causes backwards incompatibility with old checkpoint files if created in atom_messages mode. Since Chemprop is mainly used for directed message passing via bond messages, we hope not many users are affected.
Issue: https://github.com/chemprop/chemprop/issues/133 PR: https://github.com/chemprop/chemprop/pull/138
Backwards compatibility with old checkpoints
Backwards compatibility for correctly setting recently introduced training arguments for old models.
Issue: https://github.com/chemprop/chemprop/issues/148 and https://github.com/chemprop/chemprop/issues/108 PR: https://github.com/chemprop/chemprop/pull/149 and PR: https://github.com/chemprop/chemprop/pull/164
Sklearn scores
Bugfix in training sklearn models: Scores were not saved correctly previously.
PR: https://github.com/chemprop/chemprop/pull/162
Data split script
Bugfix in a standalone script to create data splits: Multi-molecule input had previously created incompatibilities with passing data to the scaffold split functionality. Update of docstring.
Issue: https://github.com/chemprop/chemprop/issues/158 PR: https://github.com/chemprop/chemprop/pull/159
MPNN sanity check
Bugfix for sanity checks for dimensions of batches within the MPNN forward pass: The introduction of multi-molecule input had previously caused an inconsistency in one of the checks.
Issue: https://github.com/chemprop/chemprop/issues/153 PR: https://github.com/chemprop/chemprop/pull/154
MPNN type annotations
Bugfix for type annotation in the MPNN forward pass + update of docstring.
PR: https://github.com/chemprop/chemprop/pull/151 and PR: https://github.com/chemprop/chemprop/pull/164
Tanimoto distance
Bugfix for calculating Tanimoto distances. The introduction of multi-molecule input had previously caused incompatibilities in the standalone script to find similar molecules in the training data.
Issue: https://github.com/chemprop/chemprop/issues/143 PR: https://github.com/chemprop/chemprop/pull/144
README typos
Fixed typos for a few arguments in the README
PR: https://github.com/chemprop/chemprop/pull/139
Sanitize script
Bugfix in standalone script sanitize.py - open output file with write access.
RDKit molecule caching
Bugfix for creating RDKit molecules from smiles strings. Previously the molecules were recreated even though they were already cached.
PR: https://github.com/chemprop/chemprop/pull/152
Saving SMILES
Bugfix for error occurring when --save_smiles_splits is used in conjunction with --separate_test_path. Now, the data split csv files are still generated, but split_indices.pkl is not generated if there are multiple data points with the same SMILES or if some of the data comes from a separate data file.
Issue: https://github.com/chemprop/chemprop/issues/157 PR: https://github.com/chemprop/chemprop/pull/163
SMILES/mols as input to MPNN
Bugfix for SMILES or RDKit molecules as input to MPNN model instead of BatchMolGraph.
PR: https://github.com/chemprop/chemprop/pull/164
- Python
Published by swansonk14 almost 5 years ago
chemprop - New split type, cleaner predictions file, backward compatibility, bug fixes, and testing improvements
Features
New split type
The split type --split_type cv already existed to perform k-fold cross-validation (where k is set by --num_folds). In each fold, 1/k of the data is put in the test set, 1/k of the data is in put in the validation set, and the remaining (k-2)/k of the data is put in the training set.
Now, a new split type --split_type cv-no-test exists which is essentially identical except that it assigns no data to the test set on each fold (https://github.com/chemprop/chemprop/commit/b56ca9866b303036eab61cab93188cccbaa24af2). Instead, 1/k of the data is put in the validation set and (k-1)/k of the data is put in the training set with no test data. The purpose of this split type is to maximize the training data when training a model in cases where the test performance is already known (or is not important) and doesn't need to be determined. Note that the validation set is still necessary to perform early stopping.
Dropping extra columns during prediction
Previously, when using predict.py, all the columns from the test_path file were copied to the preds_path file and then the predictions were added as additional columns at the end. Now there is an option called --drop_extra_columns which will not copy over these extraneous columns to preds_path (https://github.com/chemprop/chemprop/commit/83ea4c06dda4231902777ea6776da922aeba2ad3 and https://github.com/chemprop/chemprop/commit/061339568045863c30c9bd8c2a143b674a0082d8). When --drop_extra_columns is used, preds_path will only contain columns with the SMILES and with the prediction values.
Bug Fixes
Backward compatibility for load_checkpoint
Previously, newer versions of Chemprop incorrectly loaded checkpoints that were trained using older versions of Chemprop due to a change in the names of the parameters. Backward compatibility has now been added to allow this version of Chemprop to load checkpoints with either set of names (https://github.com/chemprop/chemprop/commit/5371b29e7c65e41fa8b83d9c76ba2bfdd400b139 and https://github.com/chemprop/chemprop/commit/206950c6ec92a3646800f95bc69ae6d8dc7ca646).
Saving SMILES splits
Due to new Chemprop features such as the ability to load multiple molecules, the feature --save_smiles_splits, which saves the SMILES corresponding to the train, validation, and test splits, had broken (https://github.com/chemprop/chemprop/issues/110). This was fixed in https://github.com/chemprop/chemprop/pull/117.
Fixing interpret.py
Similar to the issue with saving SMILES splits, interpret.py broke due to the Chemprop feature that enables multiple molecules to be used as input (https://github.com/chemprop/chemprop/issues/107 and https://github.com/chemprop/chemprop/issues/113). This was fixed in https://github.com/chemprop/chemprop/pull/128.
Updating Dockerfile
The Dockerfile has been updated to address https://github.com/chemprop/chemprop/issues/100 and https://github.com/chemprop/chemprop/issues/129. This was fixed in https://github.com/chemprop/chemprop/pull/131.
Fixing atom descriptors
The atom_descriptors feature did not work in predict.py (https://github.com/chemprop/chemprop/issues/120). This was fixed in https://github.com/chemprop/chemprop/pull/114.
Logging
Logging to the terminal and to files (quiet.log and verbose.log in the save_dir) broke for some OS systems (https://github.com/chemprop/chemprop/issues/106). This was fixed in https://github.com/chemprop/chemprop/pull/118.
README additions
Some of the relatively new features, like custom atomic features, were missing from the README (https://github.com/chemprop/chemprop/issues/121). This was fixed in https://github.com/chemprop/chemprop/pull/122.
Infrastructure Changes
Migrating from Travis CI to GitHub Actions
Chemprop previously used Travis CI to run automated tests upon pushing to master or creating a pull request, but Travis changed its pricing structure and no longer offers unlimited free testing. For this reason, Chemprop now uses GitHub Actions to run automated tests. The results of the test runs can be seen in the Actions tab of the repo.
- Python
Published by swansonk14 about 5 years ago
chemprop - Multiple Molecules, Custom Atom Features, and More
Features
Multiple Input Molecules
[PR] Use multiple molecules as an input to chemprop. The number of molecules is specified with the keyword number_of_molecules. Those molecules are embedded with a separate D-MPNN by default. The latent representations are concatenated prior to the FFN.
The keyword mpn_shared allows you to use a shared D-MPNN. Note that, since the latent representations are concatenated, the order of the input molecules is important. This method is not invariant and there are better ways to use multiple molecules with shared D-MPNN, which will be implemented for the next release.
Custom Atom Features
[PR] Implemented custom atomic features as a counterpart of the custom molecular features in ChemProp. The new feature allows users to provide additional atomic features to each node in a given molecule. To use the feature, use the keyword atom_descriptors. The custom atom features can be employed in two modes. In the first mode, --atom_descriptors feature, custom features are used as normal node features, which are concatenated to the default node vector before the D-MPNN block. In the second mode, --atom_descriptors descriptor, custom atom features will not participate in the model until the atom feature vector has been updated through D-MPNN block. That is, the --atom_descriptors descriptor model will not disturb the extra custom atom features much and keep the information to the maximum extent.
The extra custom descriptors can be put into ChemProp through a variety of pickle files (.pkl, .pickle, .pckl), Numpy save file (.npz), or a .sdf file.
.pkl format
The .pkl file must store a Pandas DataFrame with smiles as index and columns as descriptors. All descriptors must be a 1D numpy array or 2D numpy array. For example:
1 custom atomic feature for each atom provided in a 1D array
smiles descriptors
CCOc1ccc2nc(S(N)(=O)=O)sc2c1 [0.637781931055927, 0.7075571757878132, 0.7339...
CCN1C(=O)NC(c2ccccc2)C1=O [0.09588231301387817, 0.6521911050735447, 0.45...
Multiple atomic features for each atom provided in multiple 1D array
smiles desc1 desc2
CCOc1ccc2nc(S(N)(=O)=O)sc2c1 [0.637781931055927, 0.7075571757878132... [0.8266363223032338, 0.89641156703512 ...
CCN1C(=O)NC(c2ccccc2)C1=O [0.09588231301387817, 0.6521911050735447... [0.2847367042611851, 0.8410454963208516...
Note: mixed 1D array and 2D array for different columns are not allowed
.npz file
Atomic descriptors for each molecule must be saved as one independent 2D numpy array ([number of atoms x number of descriptors]) in the .npz file for example by:
python
np.savez('descriptors.npz', *descriptors)
where descriptors is a list of atomic descriptors in 2D array in the order of molecules in the training/predicting datafile
.sdf file
Each molecule is presented as a mol block in the .sdf file. Descriptors should be saved as entries for each mol block in the format of comma separated values. Each molecule must has an entry named SMILES that stores the smiles string. For example:
``` CHEMBL1308_loner5 RDKit 3D
6 6 0 0 1 0 0 0 0 0999 V2000 -0.7579 -0.5337 -2.8744 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.2229 -1.3763 -1.7558 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.0046 -1.0089 -0.4029 C 0 0 0 0 0 0 0 0 0 0 0 0 0.4824 -2.0104 0.3280 N 0 0 0 0 0 0 0 0 0 0 0 0 0.5806 -3.0317 -0.5484 N 0 0 0 0 0 0 0 0 0 0 0 0 0.1735 -2.6999 -1.8031 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 2 6 2 0 2 3 1 0 3 4 2 0 4 5 1 0 5 6 1 0 M END
(1) -8.568031e-05,0.0001865207,-0.0002012379,-5.054658e-05,0.0002148434,-0.0003503839,1.970448e-05,3.081137e-05,2.997883e-05,9.446278e-05,-7.194711e-05,0.0001527364
(1) 5.462954e-05,-2.415399e-06,0.0001044788,-2.274438e-05,0.0001698836,5.206409e-06,4.5825e-06,-8.882181e-07,-1.08787e-05,2.993307e-05,-4.069051e-06,1.338413e-05
(1) Cc1cnnHc1
$$$$ ```
where the name of descriptor entries desc1, desc2 can be arbitrary.
When using this feature, users are responsible for all atomic feature preprocessing works, including feature normalization and expansion.
Note: This feature is developed for small-to-medium sized training dataset, where extra QM descriptors have been demonstrated to be powerful and slow down the model performance downgrade.
Options for Aggregation Function
[PR] By default, at the end of message passing, the D-MPNN aggregates atom hidden representations into a single hidden representation for the whole molecule by taking the mean of the atom representations. Now, this aggregation function can be changed by using --aggregate <mode>, which currently supports “mean” (the default), “sum”, and “norm” (which is equivalent to “sum” with normalization by the constant specified by --aggregation_norm).
Cross-Validation
[commit] The default split type (i.e., --split_type random) randomly samples data into the train, validation, and test sets on each of the num_folds folds independently. This means that the same molecule can end up in the test split on more than one fold. The advantage of this method is that it can be used easily with an arbitrary number of folds, but the downside is that it does not perform strict cross-validation.
The new split type cv (--split_type cv) performs true cross-validation. The data is broken down into num_folds pieces, each of size len(data) / num_folds, and each piece serves as the test split one, the validation split once, and part of the train split on all other folds. The benefit of this method is that it is true cross-validation, but the downside is that the size of each split is dependent on the number of folds, meaning less flexibility (e.g., --num_folds 3 will result in train, validation, and test splits each with 33.3% of the data, which is perhaps too small for the train split and too large for the test split). --num_folds 10 is recommended.
Saving Test Predictions
[commit] The --save_preds option will save predictions on the test split of each fold in a file called “testpreds.csv” in the `savedir`.
Multiple Metrics
[commit] The --metric argument still works as before and this is still the metric that is used for early stopping (i.e., selecting the model which performs best on the validation split), but now there is an additional --extra_metrics argument where additional metrics can be specified and will be recorded. The metrics should be space separated (e.g., --extra_metrics mae rmse r2).
Saving Test Scores
[commit] Scores on the test splits are now saved to file in the save_dir under the name “test_scores.csv”.
Fixes and Improvements
Undefined Rows
[commit] Rows in the input data file with target values that are all undefined are now correctly skipped. This is especially relevant when the row may contain some defined target values, but none of those targets are included in target_columns.
Data Loading
[commit] Data is now only loaded once to decrease training time.
Tests
[tests] Added more comprehensive tests to ensure correct functionality.
Train Loss
[commit] Fixed incorrect averaging of the train loss, which affects the train loss that is printed to screen and saved in tensorboard.
- Python
Published by swansonk14 over 5 years ago
chemprop - Fixing descriptastorus PyPi issue
Since descriptastorus isn't on PyPi, it can't be installed automatically via pip install chemprop. Instead, it must be installed separately via pip install git+https://github.com/bp-kelley/descriptastorus.
- Python
Published by swansonk14 over 5 years ago
chemprop - Fixing PyPi Installation and Documentation
Fixing an issue with PyPi installation and updating relevant documentation.
- Python
Published by swansonk14 over 5 years ago
chemprop - Release on PyPi
Chemprop is now available on PyPi: https://pypi.org/project/chemprop. Installation instructions are below.
conda create -n chemprop python=3.8conda activate chempropconda install -c conda-forge rdkit4.pip install git+https://github.com/bp-kelley/descriptastoruspip install chemprop
After installing through PyPi, training and predicting are available via the chemprop_train and chemprop_predict commands, which are equivalent to python train.py and python predict.py. All the command line arguments for training and predicting apply as usual. Please see the README for more details.
- Python
Published by swansonk14 over 5 years ago