Recent Releases of magenpy
magenpy - v0.1.5
Changed
- Updated behavior of
.loadofLDMatrix. Now, by default it loads anLDLinearOperatorobject.- Now the cached loaded data is in the form of a
LDLinearOperatorobject.
- Now the cached loaded data is in the form of a
- Moved a lot of the functionality of converting LD data to
CSRformat to theLDLinearOperator. - Removed printing where possible in the package and changed it to use
the
loggingmodule. - Resolved some issues in how the
pandas-plinkgenotype matrix is handled in case of splitting by chromosomes/variants. - Removed
fill_nafrom thestandardizemethod instats.transforms.genotype. - Fixed how the package interfaces with
tempfileto properly cleanup temporary files/directories. - Made the tests for
LDMatrixa bit more comprehensive.
Added
- Added preliminary tests to the CLI scripts (
magenpy_ldandmagenpy_simulate) - Support for block iterator for the
LDMatrix. rank_one_updatefor theLDLinearOperatorclass.- Unified method to map variants to genomic blocks
map_variants_to_genomic_blocks. - Added
summarymethod toLDMatrixto provide a summary of the LD matrix. - Added
__repr__and__repr_html__methods toLDMatrix. - Added functionality to allow slicing of
LDLinearOperatorand outputting subsets of the data as a numpy array directly. - Added implementation of the
PUMASprocedure for sampling summary data conditional on the LD matrix. Relevant functions:sumstats_train_test_splitmultivariate_normal_conditional_sampling
- Added a faster intersection implementation
intersect_multiple_arrays. - Added preliminary
bedReaderGenotypeMatrixto support using thebed-readerpackage as a backend (still needs more development and testing). - Added convenience method
setup_loggerto set up logging in the package.
- Python
Published by shz9 11 months ago
magenpy - v0.1.4
Changed
- Updated the data type for the index pointer in the
LDMatrixobject to beint64.int32does not work well for very large datasets with millions of variants and it causes overflow errors. - Updated the way we determine the
pandaschunksize when converting fromplinktables tozarr. - Simplified the way we compute the quantization scale in
model_utils. - Fixed major bug in how LD window thresholds that are passed to
plink1.9are computed. - Fixed in-place
fillnainfrom_plink_tableinLDMatrixto conform to latestpandasAPI. - Update
run_shell_scriptto check for and capture errors. - Refactored code to slightly reduce import/load times.
- Cleaned up
load_datamethod ofLDMatrixand subsumed functionality inload_rows. - Fixed bugs in
match_snp_tables. - Fixed bugs and re-wrote how the
blockLD estimator is computed using both theplinkandxarraybackends. - Updated
from_plink_tablemethod inLDMatrixto handle cases where boundaries are different from whatplinkcomputes. - Fixed bug in
symmetrize_ut_csr_matrixutility functions. - Changed default storage data type for LD matrices to
int16.
Added
- Added extra validation checks in
LDMatrixto ensure that the index pointer is formatted correctly. LDLinearOperatorclass to allow for efficient linear algebra operations on the LD matrix without representing the full symmetric matrix in memory.- Added utility methods to
LDMatrixclass to allow for computing eigenvalues, performing SVD, etc. - Added
Spectral propertiesto the attributes of LD matrices. - Added support to slice/retrieve entries of LD matrix by using SNP rsIDs.
- Added support to reading LD matrices from AWS s3 storage.
- Added utility method to detect if a file contains header information.
- Added utility method to generate overlapping windows over a sequence.
- Added
compute_extremal_eigenvaluesto allow the user to compute extremal (minimum and maximum) eigenvalues of LD matrices. - Added the utility function
combine_ld_matricesto allow for combining LD matrices from different sources.
- Python
Published by shz9 about 1 year ago
magenpy - v0.1.3
Changed
- Updated the logic for
detect_outliersin phenotype transforms to actually reflect the function name (before it was returning true for inliers...). - Updated
quantizeanddequantizeto minimize data copying as much as possible. - Updated
LDMatrix.load_rows()method to minimize data copying. - Fixed bug in
LDMatrix.n_neighborsimplementation. - Updated
daskversion inrequirements.txtto avoid installingdask-expr.
Added
- Added
get_peak_memory_usagetosystem_utilsto inspect peak memory usage of a process. - Placeholder method to perform QC on
SumstatsTableobjects (needs to be implemented still). - New attached dataset for long-range LD regions.
- New method in SumstatsTable to impute rsID (if missing).
- Preliminary support for matching with CHR+POS in SumstatsTable (still needs more work).
- LDMatrix updates:
- New method to filter long-range LD regions.
- New method to prune LD matrix.
- New algorithm for symmetrizing upper triangular and block diagonal LD matrices.
- Much faster and more memory efficient than using
scipy. - New
LDMatrixclass has efficient data loading in.load_datamethod. - We still retain
load_rowsbecause it is useful for loading a subset of rows.
- Much faster and more memory efficient than using
- Python
Published by shz9 over 1 year ago
magenpy - v0.1.2
Changed
- Fixed
manhattanplot implementation to support various new features. - Added a warning when accessing
csr_matrixproperty ofLDMatrixwhen it hasn't been loaded previously.
Added
reset_maskmethod for magenpyLDMatrix.Dockerfiles for bothcliandjupytermodes.- A helper script to convert LD matrices from old format to new format.
- Python
Published by shz9 almost 2 years ago
magenpy - v0.1.0
A large scale restructuring of the code base to improve efficiency and usability.
Changed
- Bug fixes across the entire code base.
- Simulator classes have been renamed from
GWASimulatortoPhenotypeSimulator. - Moved plotting script to its own separate module.
- Updated some method names / commandline flags to be consistent throughout.
Added
- Basic integration testing with
pytestand GitHub workflows. - Documentation for the entire package using
mkdocs. - Integration testing / automating building with GitHub workflows.
- New implementation of the LD matrix that uses CSR matrix data structures.
- Quantization / float precision specification when storing LD matrices.
- Allow user to specify Compressor / Compressor options for Zarr storage.
- New implementation of
magenpy_simulatescript.- Allow users to set random seed.
- Now accept
--prop-causalinstead of specifying full mixing proportions.
- Tried to incorporate
genome_buildinto various data structures. This will be useful in the future to ensure consistent genome builds across different data types. - Allow user to pass various metadata to
magenpy_ldto save information about dataset characteristics. - New sumstats parsers:
- Saige sumstats format.
- plink1.9 sumstats format.
- GWAS Catalog sumstats format.
- Chained transform function for transforming phenotypes.
- Python
Published by shz9 almost 2 years ago