Recent Releases of viprs

viprs - v0.1.3

Changed

  • Fixed bugs in VIPRSGridSearch and VIPRSBMA models, specifically how they were handling _log_var_tau, and the hyperparameters objects after selecting best models or performing model averaging.
  • Fixed bug in how viprs_fit handles validation gdls when the user passes genotype data.
  • Updated interfaces in HyperparameterSearch script to make it more flexible and efficient. Primarily, I added shared memory object for the LD matrix to avoid redundant memory usage when fitting multiple models in parallel. (** WORK IN PROGRESS **).
  • Updated implementation of pseudo_r2 to use square of pseudo correlation coefficient instead. The previous implementation can be problematic with highly sparsified LD matrices.
  • Updated implementation of VIPRSGrid to be better integrated with the VIPRS class. The new implementation also allows for fitting the grid in a pathwise fashion (now default behavior), where we use parameter estimates from previous grid points as warm-start initialization for the current grid point.
  • Removed VIPRSGridSearch and VIPRSBMA classes for now. These functions are implemented in grid_utils.py instead and they can be applied generically to any VIPRSGrid model.

Added

  • Added viprs-cli-example.ipynb notebook to demonstrate how to use the viprs commandline interface.
  • Added documentation page for Downloading LD matrices.
  • Added new utility function combine_coefficient_tables to combine the output from multiple VIPRS models.
  • Added more thorough tests for the various models + CLI scripts.
  • Added PeakMemoryProfiler to viprs_fit to more accurately track peak memory usage. Temporary solution, this will be moved to magenpy later on.
  • Added support for splitting GWAS sumstats to training/validation sets and exposed appropriate interfaces in the base class BayesPRSModel.
  • Added IterationConditionCounter class to keep track of the number of consecutive iterations where a certain condition is met. This is used to monitors convergence of the optimization routine.

- Python
Published by shz9 10 months ago

viprs - v0.1.2

Changed

  • Fixed bug in implementation of .fit method of VIPRS models. Specifically, there was an issue with the continued=True flag not working because the OptimizeResult object wasn't refreshed.
  • Replaced print statements with logging where appropriate (still needs some more work).
  • Updated way we measure peak memory in viprs_fit
  • Updated dict_concat to just return the element if there's a single entry.
  • Refactored pars of VIPRS to cache some recurring computations.
  • Updated VIPRSBMA & VIPRSGridSearch to only consider models that successfully converged.
  • Fixed bug in psuedo_metrics when extracting summary statistics data.
  • Streamlined evaluation code.
  • Refactored code to slightly reduce import/load time.
  • Fixed bug in viprs_evaluate

Added

  • Added SNP position to output table from VIPRS objects.
  • Added measure of time taken to prepare data in viprs_fit.
  • Added option to keep long-range LD regions in viprs_fit.
  • Added convergence check based on parameter values.
  • Added min_iter parameter to .fit methods to ensure CAVI is run for at least min_iter iterations.
  • Added separate method for initializing optimization-related objects.
  • Added regularization penalty lambda_min.
  • Added Spearman R and residualized R-Squared metrics to continuous metrics.

Additional files

Attached are LD matrices for 6 continental ancestry groups, as defined by the Pan-UKB project. The LD matrices are estimated from unrelated samples in the UK Biobank using block-diagonal masks.

- Python
Published by shz9 about 1 year ago

viprs - v0.1.1

Changed

  • Fixed bugs in the E-Step benchmarking script.
  • Re-wrote the logic for finding BLAS libraries in the setup.py script. :crossed_fingers:
  • Fixed bugs in CI / GitHub Actions scripts.

Added

  • Dockerfiles for both cli and jupyter modes.

- Python
Published by shz9 almost 2 years ago

viprs - v0.1.0

A large scale restructuring of the code base to improve efficiency and usability.

Changed

  • Moved plotting script to its own separate module.
  • Updated some method names / commandline flags to be consistent throughout.
  • Updated the VIPRS class to allow for more flexibility in the optimization process.
  • Removed the VIPRSAlpha model for now. This will be re-implemented in the future, using better interfaces / data structures.
  • Moved all hyperparameter search classes/models to their own directory.
  • Restructured the viprs_fit commandline script to make the code cleaner, do better sanity checking, and introduce process parallelism over chromosomes.

Added

  • Basic integration testing with pytest and GitHub workflows.
  • Documentation for the entire package using mkdocs.
  • Integration testing / automating building with GitHub workflows.
  • New self-contained implementation of E-Step in Cython and C++.
    • Uses OpenMP for parallelism across chunks of variants.
    • Allows for de-quantization on the fly of the LD matrix.
    • Uses BLAS linear algebra operations where possible.
    • Allows model fitting with only
  • Benchmarking scripts (benchmark_e_step.py) to compare computational performance of different implementations.
  • Added functionality to allow the user to track time / memory utilization in viprs_fit.
  • Added OptimizeResult class to keep track of the info/parameters of EM optimization.
  • New evaluation metrics
    • pseudo_metrics has been moved to its own module to allow for more flexibility in evaluation.
    • New evaluation metrics for binary traits: nagelkerke_r2, mcfadden_r2, cox_snell_r2 liability_r2, liability_probit_r2, liability_logit_r2.
    • New function to compute standard errors / test statistics for all R-Squared metrics.

- Python
Published by shz9 almost 2 years ago