Recent Releases of https://github.com/bbglab/oncodrive3d
https://github.com/bbglab/oncodrive3d - Publication (v1.0.5)
Oncodrive3D is a fast and accurate 3D-clustering algorithm for driver gene discovery.
This release corresponds to the Oncodrive3D version used for the analyses performed in publication on Nucleic Acids Research: v1.0.5
- Python
Published by St3451 10 months ago
https://github.com/bbglab/oncodrive3d - Release v1.0.6
Oncodrive3D is a fast and accurate 3D-clustering algorithm for driver gene discovery.
Key Updates
This release enhances the build step of Oncodrive3D by letting the user plug in its own predicted structures alongside the AlphaFold downloads. You can also now opt to use only structures associated to MANE Select transcripts, without automatically merging in structures asasociated to Ensembl’s canonical transcripts as before. Finally, the order of residues used to select elements in the PAE object is corrected.
- Fix PAE and pCMAP order of residues pairs: https://github.com/bbglab/oncodrive3d/pull/58
- Enable custom structures and MANE Select structure only: https://github.com/bbglab/oncodrive3d/pull/59
Full Changelog: https://github.com/bbglab/oncodrive3d/compare/v1.0.5...v1.0.6
- Python
Published by St3451 11 months ago
https://github.com/bbglab/oncodrive3d - Release v1.0.5
Oncodrive3D is a fast and accurate 3D-clustering algorithm for driver gene discovery.
Key Updates and Features
This release addresses bug fixes in the build annotations and plotting modules, introduces enhancements to association plots, updates documentation, and includes general code cleanups.
Bug Fixes
Features in associations plots
- Add FDR to logistic regression analysis for association between clusters and annotations 1
- Added associations plots to nextflow 1
- Removed comparative plots 1
Others
- Documentation update
- Linting
- Python
Published by St3451 over 1 year ago
https://github.com/bbglab/oncodrive3d - Release v1.0.4
Second release of Oncodrive3D, a fast and accurate 3D-clustering algorithm for driver gene discovery. It identifies mutation-enriched volumes by analyzing missense somatic mutations, leveraging AlphaFold's structural predictions to define residue contacts and mutation profiles to simulate neutral mutagenesis. The tool uses rank-based statistics and can process mutations from duplex sequencing studies, enabling the analysis of both cancer and normal tissue datasets across potentially any organism.
Key Updates and Features
This release mainly update the README with important information and fix a bug in the oncodrive3d build-datasets step.
Documentation Updates
- General improved documentation for clarity and usability.
- Added steps to fulfill software requirements, addressing installation failures on older machines lacking updated C libraries.
- Provided detailed information on input and output data formats, including:
- How to obtain the required input files.
- In-depth descriptions of the main outputs, including gene-level and residue-level clustering results.
Bug Fixes and Refactoring
- Fixed bug in
scripts/datasets/build_datasets.pyandscripts/datasets/seq_for_mut_prob.py:- Disabled downloading and integrating MANE structures if
--maneflag is not enabled. - Removed usage of files related to the MANE downloads when computing the
seq_for_mut_prob.pyfor a non-MANE Human proteome.
- Disabled downloading and integrating MANE structures if
- Updated
scripts/datasets/utils.pyto increase the timeout forsock_readin PyPdl, preventing errors during the download of AlphaFold structures. - Refactored
scripts/main.pyby moving the import of specific modules into their corresponding functions for better modularity and efficiency.
- Python
Published by St3451 over 1 year ago
https://github.com/bbglab/oncodrive3d - Release v1.0.3
First release of Oncodrive3D, a fast and accurate 3D-clustering algorithm for driver gene discovery. It identifies mutation-enriched volumes by analyzing missense somatic mutations, leveraging AlphaFold's structural predictions to define residue contacts and mutation profiles to simulate neutral mutagenesis. The tool uses rank-based statistics and can process mutations from duplex sequencing studies, enabling the analysis of both cancer and normal tissue datasets across potentially any organism.
Key Updates and Features
Packaging and Linting
- Added Python package build using
uv. - Published the package to
PyPI, enabling installation viapip install oncodrive3d. - Updated the
Dockerfile. - Applied code linting to improve code quality and maintainability.
- Added
LICENCE
NextFlow Pipeline Updates
- Restructured the pipeline according to best practices for enhanced performance and maintainability and moved to
oncodrive3d_pipeline/.
Documentation Updates
- Updated the
READMEfile:- Added instructions for installation.
- Added instructions for running the provided NextFlow pipeline.
Bug Fixes and Refactoring and Others
- Removed preprocessing scripts in
build/preprocessing. - Updated URLs in
scripts/datasets/seq_for_mut_prob.pyandscripts/plotting/pfam.pyto use the January 2024 Ensembl archive. - Changed output column from
ClustertoClumpin the residue-level output (<cohort>.3d_clustering_pos.csv). - Changed
oncodrive3d runinput argument frominput_maf_pathto input_path inscripts/main.py. - Refactored
scripts/datasets/utils.pyto improve download functionality and logging.
- Python
Published by St3451 over 1 year ago
https://github.com/bbglab/oncodrive3d - Release v1.0.2
This is the second release of Oncodrive3D, a fast and accurate 3D-clustering algorithm for driver gene discovery. It identifies mutation-enriched volumes by analyzing missense somatic mutations, leveraging AlphaFold's structural predictions to define residue contacts and mutation profiles to simulate neutral mutagenesis. The tool uses rank-based statistics and can process mutations from duplex sequencing studies, enabling the analysis of both cancer and normal tissue datasets across potentially any organism.
Key Updates and Features
New Modules for Annotation and Plotting:
- Introduced a comprehensive plotting module, including summary plots, gene plots, comparative plots, association plots, and ChimeraX plots.
Nextflow Pipeline:
- Added a minimal Nextflow pipeline to perform 3D clustering analysis across multiple cohorts and generate all relevant plots.
MANE Transcripts Support:
- Built datasets prioritizing MANE AF-predicted structures.
- Tracked transcript IDs from input data, including mismatch, match, or missing status compared to Oncodrive3D datasets.
Mutation Filtering:
- Filtered mutations with wild-type (WT) structure-AA mismatches and genes exceeding a threshold ratio of mapping issues.
- Added an option to disable WT AA mismatch filtering, particularly useful for mouse data where VEP and Uniprot isoform inconsistencies occur.
Direct VEP Output Support:
- Enabled direct VEP output processing, allowing filtering of transcripts based on Oncodrive3D-built datasets.
Enhanced Outputs:
- Included processed input mutations (
<cohort>.mutations.processed.tsv), missense mutation probabilities (<cohort>.miss_prob.processed.tsv), and Oncodrive3D sequence dataframes (<cohort>.seq_df.processed.tsv).
Mouse Data Support:
- Fully enabled and tested processing of mouse data (mm39) across all steps, including dataset building, annotations, and plotting.
Bug Fixes and Improvements:
- Resolved bug affecting the identification of the most significant volume per gene.
- Changed sorting of position-level results from rank-based (Gene, Rank) to significance-based (Gene, p-value, Score).
- Refactored
main.py, offloading unnecessary code to module-specific scripts for better organization.
Example usage
To run the examples provided, the <input_path> directory should be organized as follows:
<input_path>/
├── vep/
│ ├── <cohort_1>.vep.tsv.gz
│ └── <cohort_2>.vep.tsv.gz
├── mut_profile/
│ ├── <cohort_1>.sig.json
│ ├── <cohort_2>.sig.json
vep/: Contains the VEP output files for each cohort, compressed as .tsv.gz.
mut_profile/: Contains the Bgsignature output files (mutation profile in trinucleotide context) for each cohort, saved as .sig.json.
Human MANE
build_datasets -o <datasets_path> --manebuild_annotations -o <annotations_path> -d <datasets_path>nextflow run main.nf --indir <input_path> --outdir <output_path> --data_dir <datasets_path> --annotations_dir <annotations_path> --vep_input true --verbose true --plot true --chimerax_plot true --mane true --seed 64 -profile container
Mouse
build_datasets -o <datasets_path> --organism mousebuild_annotations -o <annotations_path> -d <datasets_path> --organism mousenextflow run main.nf --indir <input_path> --outdir <output_path> --data_dir <datasets_path> --annotations_dir <annotations_path> --ignore_mapping_issues true --plot true --chimerax_plot true --vep_input true -profile container
- Python
Published by St3451 over 1 year ago
https://github.com/bbglab/oncodrive3d - Release v0.1
This is the first release of Oncodrive3D, a fast and accurate novel 3D-clustering algorithm for driver genes discovery. This approach involves analysing patterns of observed missense somatic mutations (in cancer or normal tissue) to identify volumes that exhibit a higher-than-expected frequency of mutations than what is typically observed under neutral mutagenesis. Oncodrive3D leverages AlphaFold's structure predictions and Predicted Aligned Error (PAE) to construct contact probability maps. Moreover, if provided, it uses the mutation profile of the cohort to simulate neutral mutagenesis while employing rank-based statistics to determine empirical p-values for the volumes of each mutated residue. Also, It can process the mutation profile and sequencing depth information. If provided as a mutability file, this allows the tool to process mutations obtained from duplex sequencing studies, which are commonly used in normal tissue sequencing at the time of this release.
Input
input.maf (
required): Mutation Annotation Format (MAF) file annotated with consequences (e.g., by using Ensembl Variant Effect Predictor (VEP)).mut_profile.json (
optional): Dictionary including the normalized frequencies of mutations (values) in every possible trinucleotide context (keys), such as 'ACA>A', 'ACC>A', and so on.mut_config.json (
optional): Dictionary including the path and parsing information for the mutability file, which includes information about mutation profile integrated with sequencing depth.
Output
cohortfilename.3dclustering_genes.csv: This is a Comma-Separated Values (CSV) file containing the results of the analysis at the gene level.
cohortfilename.3dclustering_pos.csv: This is a Comma-Separated Values (CSV) file containing the results of the analysis at the level of mutated positions.
- Python
Published by St3451 over 2 years ago