Recent Releases of magenpy

magenpy - v0.1.5 pypi release

- Python
Published by shz9 11 months ago

magenpy - v0.1.5

Changed

  • Updated behavior of .load of LDMatrix. Now, by default it loads an LDLinearOperator object.
    • Now the cached loaded data is in the form of a LDLinearOperator object.
  • Moved a lot of the functionality of converting LD data to CSR format to the LDLinearOperator.
  • Removed printing where possible in the package and changed it to use the logging module.
  • Resolved some issues in how the pandas-plink genotype matrix is handled in case of splitting by chromosomes/variants.
  • Removed fill_na from the standardize method in stats.transforms.genotype.
  • Fixed how the package interfaces with tempfile to properly cleanup temporary files/directories.
  • Made the tests for LDMatrix a bit more comprehensive.

Added

  • Added preliminary tests to the CLI scripts (magenpy_ld and magenpy_simulate)
  • Support for block iterator for the LDMatrix.
  • rank_one_update for the LDLinearOperator class.
  • Unified method to map variants to genomic blocks map_variants_to_genomic_blocks.
  • Added summary method to LDMatrix to provide a summary of the LD matrix.
  • Added __repr__ and __repr_html__ methods to LDMatrix.
  • Added functionality to allow slicing of LDLinearOperator and outputting subsets of the data as a numpy array directly.
  • Added implementation of the PUMAS procedure for sampling summary data conditional on the LD matrix. Relevant functions:
    • sumstats_train_test_split
    • multivariate_normal_conditional_sampling
  • Added a faster intersection implementation intersect_multiple_arrays.
  • Added preliminary bedReaderGenotypeMatrix to support using the bed-reader package as a backend (still needs more development and testing).
  • Added convenience method setup_logger to set up logging in the package.

- Python
Published by shz9 11 months ago

magenpy - v0.1.4

Changed

  • Updated the data type for the index pointer in the LDMatrix object to be int64. int32 does not work well for very large datasets with millions of variants and it causes overflow errors.
  • Updated the way we determine the pandas chunksize when converting from plink tables to zarr.
  • Simplified the way we compute the quantization scale in model_utils.
  • Fixed major bug in how LD window thresholds that are passed to plink1.9 are computed.
  • Fixed in-place fillna in from_plink_table in LDMatrix to conform to latest pandas API.
  • Update run_shell_script to check for and capture errors.
  • Refactored code to slightly reduce import/load times.
  • Cleaned up load_data method of LDMatrix and subsumed functionality in load_rows.
  • Fixed bugs in match_snp_tables.
  • Fixed bugs and re-wrote how the block LD estimator is computed using both the plink and xarray backends.
  • Updated from_plink_table method in LDMatrix to handle cases where boundaries are different from what plink computes.
  • Fixed bug in symmetrize_ut_csr_matrix utility functions.
  • Changed default storage data type for LD matrices to int16.

Added

  • Added extra validation checks in LDMatrix to ensure that the index pointer is formatted correctly.
  • LDLinearOperator class to allow for efficient linear algebra operations on the LD matrix without representing the full symmetric matrix in memory.
  • Added utility methods to LDMatrix class to allow for computing eigenvalues, performing SVD, etc.
  • Added Spectral properties to the attributes of LD matrices.
  • Added support to slice/retrieve entries of LD matrix by using SNP rsIDs.
  • Added support to reading LD matrices from AWS s3 storage.
  • Added utility method to detect if a file contains header information.
  • Added utility method to generate overlapping windows over a sequence.
  • Added compute_extremal_eigenvalues to allow the user to compute extremal (minimum and maximum) eigenvalues of LD matrices.
  • Added the utility function combine_ld_matrices to allow for combining LD matrices from different sources.

- Python
Published by shz9 about 1 year ago

magenpy - v0.1.3

Changed

  • Updated the logic for detect_outliers in phenotype transforms to actually reflect the function name (before it was returning true for inliers...).
  • Updated quantize and dequantize to minimize data copying as much as possible.
  • Updated LDMatrix.load_rows() method to minimize data copying.
  • Fixed bug in LDMatrix.n_neighbors implementation.
  • Updated dask version in requirements.txt to avoid installing dask-expr.

Added

  • Added get_peak_memory_usage to system_utils to inspect peak memory usage of a process.
  • Placeholder method to perform QC on SumstatsTable objects (needs to be implemented still).
  • New attached dataset for long-range LD regions.
  • New method in SumstatsTable to impute rsID (if missing).
  • Preliminary support for matching with CHR+POS in SumstatsTable (still needs more work).
  • LDMatrix updates:
    • New method to filter long-range LD regions.
    • New method to prune LD matrix.
  • New algorithm for symmetrizing upper triangular and block diagonal LD matrices.
    • Much faster and more memory efficient than using scipy.
    • New LDMatrix class has efficient data loading in .load_data method.
    • We still retain load_rows because it is useful for loading a subset of rows.

- Python
Published by shz9 over 1 year ago

magenpy - v0.1.2

Changed

  • Fixed manhattan plot implementation to support various new features.
  • Added a warning when accessing csr_matrix property of LDMatrix when it hasn't been loaded previously.

Added

  • reset_mask method for magenpy LDMatrix.
  • Dockerfiles for both cli and jupyter modes.
  • A helper script to convert LD matrices from old format to new format.

- Python
Published by shz9 almost 2 years ago

magenpy - v0.1.1

  • Fixed bugs in how covariates are processed in SampleTable.
  • Fixed bugs / issues in implementation of GWAS with xarray backend.
  • Streamlined implementation of manhattan plotting function.

- Python
Published by shz9 almost 2 years ago

magenpy - v0.1.0

Small updates / bug fixes to workflow scripts.

- Python
Published by shz9 almost 2 years ago

magenpy - v0.1.0

A large scale restructuring of the code base to improve efficiency and usability.

Changed

  • Bug fixes across the entire code base.
  • Simulator classes have been renamed from GWASimulator to PhenotypeSimulator.
  • Moved plotting script to its own separate module.
  • Updated some method names / commandline flags to be consistent throughout.

Added

  • Basic integration testing with pytest and GitHub workflows.
  • Documentation for the entire package using mkdocs.
  • Integration testing / automating building with GitHub workflows.
  • New implementation of the LD matrix that uses CSR matrix data structures.
    • Quantization / float precision specification when storing LD matrices.
    • Allow user to specify Compressor / Compressor options for Zarr storage.
  • New implementation of magenpy_simulate script.
    • Allow users to set random seed.
    • Now accept --prop-causal instead of specifying full mixing proportions.
  • Tried to incorporate genome_build into various data structures. This will be useful in the future to ensure consistent genome builds across different data types.
  • Allow user to pass various metadata to magenpy_ld to save information about dataset characteristics.
  • New sumstats parsers:
    • Saige sumstats format.
    • plink1.9 sumstats format.
    • GWAS Catalog sumstats format.
  • Chained transform function for transforming phenotypes.

- Python
Published by shz9 almost 2 years ago