bcell-lymphomas-mutational-signatures

B-Cell Lymphomas Mutational Signatures

https://github.com/catg-umag/bcell-lymphomas-mutational-signatures

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: mdpi.com
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.9%) to scientific vocabulary

Keywords

b-cell lymphoma mutational-signatures reproducible-research
Last synced: 6 months ago · JSON representation ·

Repository

B-Cell Lymphomas Mutational Signatures

Basic Info
  • Host: GitHub
  • Owner: catg-umag
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 1.47 MB
Statistics
  • Stars: 1
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
b-cell lymphoma mutational-signatures reproducible-research
Created almost 6 years ago · Last pushed over 3 years ago
Metadata Files
Readme License Citation

README.md

Mutational Signatures in B-cell lymphomas

Software repository for our article Integration of mutational signature analysis with 3D chromatin data unveils differential AID-related mutagenesis in indolent lymphomas, for reproducibility purposes.

But if you want, you can use you own data too, everything is automated so it will be easy to run if you want a general landscape of mutational signatures in your samples.

What is included?

  • Creation of mutation list from VCF files (optional)
  • Collection of variants extra info (context, AID motifs, SNV in Ig loci)
  • SBS signature extraction using SigProfiler
  • Sample fitting against COSMIC Signatures using deconstructSigs
  • Signature reconstruction using NNLS approach
  • A report including plots to graphically visualize the obtained results

How to use it?

Requirements

First, you need to have installed Nextflow (>=20.07) and Singularity.

Preparation of inputs

You have two options: starting from the VCFs or starting from a list of variants.

  • If you want to start from VCFs:

Prepare a CSV file with 3 columns: - name: will be used as a sample name for the corresponding file - group: will be used to separe your samples in the general representations of your samples (for example it could be pathology, sample origin, etc) - file: VCF path, it is recommended to use absolute paths to avoid issues related with that

It should should look like this: name,group,file CLL_01,CLL/MBL,/home/catg/vcf/CLL_01.snp.filter.som.recode.vcf.hg38_multianno.vcf CLL_02,CLL/MBL,/home/catg/vcf/CLL_02.snp.filter.som.recode.vcf.hg38_multianno.vcf FL_01_1,FL,/home/catg/vcf/FL_01_1.filter.som.recode.vcf.hg38_multianno.vcf FL_01_2,FL,/home/catg/vcf/FL_01_2.filter.som.recode.vcf.hg38_multianno.vcf

  • If you already have a list with your variants:

Basically you need to create a CSV file with this format: sample,group,chrom,pos,ref,alt CLL_01,CLL/MBL,4,89250352,T,C CLL_01,CLL/MBL,5,49600750,T,C CLL_01,CLL/MBL,5,49600906,A,C

Run the pipeline

To run run the pipeline, execute: nextflow run CATG-UMAG/bcell-lymphomas-mutational-signatures -r main <params>

In <params>, you need to provide inputs and other options. These are:

| Parameter | Required | Default | Description | | ------------------------------- | -------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | --vcf_list | yes* | | Input CSV if you want to start with the VCFs (according to previous section). Ignored if --snv_list is not empty. | | --snv_list | yes* | | Input CSV if you want to start with the list of variants (according to previous section). | | --reference | yes | | Reference in 2bit format. Must be the same used in the variant calling. For example: hg19 or hg38 | | --ig_list | yes | | Bed file containing the ranges for the Ig loci. Check data/iglist_hg38.bed for a example. | | --nsignatures_min | no | 2 | Minimum number of signatures to test with sigprofiler. | | --nsignatures_max | no | 5 | Maximum number of signatures to test with sigprofiler. | | --nsignatures_force | no | | Ignore the recomendation from SigProfiler regarding the optimal number of signatures, and use a fixed number of signatures as final output. Must be a number between nsignatures_min and nsignatures_max values (both inclusive). | | --cosmic_version | no | 3.2 | Version of COSMIC signatures to use. Check data/cosmic_signatures_urls.csv for possible options. | | --cosmic_genome | no | GRCh38 | COSMIC signatures genome. Check data/cosmic_signatures_urls.csv for possible options. | | --fitting_selected_signatures | no | | Select only a set of reference signatures for the fitting. The value should be a string containing valid signature names from the COSMIC version selected, separated by commas. Example: "SBS1,SBS3,SBS5,SBS6,SBS9,SBS84" | | --fitting_extra_signatures | no | | Provide additional (local) signatures for the fitting. Must be a CSV file, check data/extra_signatures.csv for the format. | | --results_dir | no | results | Output directory to store the results. | | --sigprofiler_cpus | no | 8 | Number of CPUs to use with SigProfiler. | | --sigprofiler_gpu | no | False | Use a GPU in SigProfiler. It must be a supported CUDA device. |

So, for example, a full execution command should look like this: nextflow run CATG-UMAG/bcell-lymphomas-mutational-signatures -r main \ --snv_list data/snv_list.csv --reference data/hg38.2bit --ig_list data/iglist_hg38.bed \ --nsignatures_min 2 --nsignatures_max 10 --fitting_selected_signatures 'SBS1,SBS3,SBS5,SBS6,SBS9,SBS84'

Alternatively, you can provide a yaml file containing all the parameters you want to setup (that way you don't have to write everything on the command line). Just download params.example.yml and edit it to your needs (you can delete parameters from the file if you don't want to use them). Then execute the pipeline like this: nextflow run CATG-UMAG/bcell-lymphomas-mutational-signatures -r main --params-file params.yml

You can also use any option available in Nextflow.

It's also very easy to run on a computing cluster (as long as Singularity is available). I included a profile for SLURM (-profile slurm), if your cluster uses a different scheduler, you should look here to find the corresponding configuration.

Results

Once the pipeline finished running you will find a set of files. These are: - snv_list.csv: a CSV file with all the variants (if you used variant list as input it will be the same file with extra columns) - extraction/ - signatures.csv: the signatures extracted from your samples - contributions.csv: a list containing the number of mutations contributed by each signature to every one of your samples - statistics.csv: metrics collected from the extraction of the different number of signatures - sigprofiler_out: the raw output from SigProfiler - fitting.csv: the results of the sample fitting process using reference signatures - reconstruction/: reconstruction of each one of the extracted denovo signatures using reference signatures - report/: a summary of all the obtained information with plots, in .html for easy visualization and .ipynb (Jupyter Notebook) for editing

How to cite

If this repository was useful for you, please cite it as below:

Sepulveda-Yanez JH, Alvarez-Saravia D, Fernandez-Goycoolea J, Aldridge J, van Bergen CAM, Posthuma W, Uribe-Paredes R, Veelken H, Navarrete MA. Integration of Mutational Signature Analysis with 3D Chromatin Data Unveils Differential AID-Related Mutagenesis in Indolent Lymphomas. International Journal of Molecular Sciences. 2021; 22(23):13015. https://doi.org/10.3390/ijms222313015

Acknowledgements

  • Python libraries: cyvcf2, twobitreader, SigProfilerExtractor and all of its dependencies
  • R libraries: cluster, cowplot, deconstructSigs, factoextra, IRkernel, NNLS, R.utils, tidyverse
  • Others: Jupyter, Nextflow, Singularity

In containers/ you can find the recipes used to build the containers for the pipeline (hosted in GitHub Container Registry). These are the ones configured in nextflow.config, alongside others from BioContainers.

Owner

  • Name: Centro Austral de Tecnología Genómica
  • Login: catg-umag
  • Kind: organization
  • Location: Punta Arenas, Chile

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If this repository was useful for you, please cite it as below."
authors:
- family-names: "Sepúlveda"
  given-names: "Julieta"
  orcid: "https://orcid.org/0000-0001-6354-3609"
- family-names: "Alvarez"
  given-names: "Diego"
  orcid: "https://orcid.org/0000-0003-0753-274X"
title: "Mutational Signatures in B-cell lymphomas"
url: "https://github.com/catg-umag/bcell-lymphomas-mutational-signatures"
preferred-citation:
  type: article
  authors:
  - family-names: "Sepulveda-Yanez"
    given-names: "Julieta H."
    orcid: "https://orcid.org/0000-0001-6354-3609"
  - family-names: "Alvarez-Saravia"
    given-names: "Diego"
    orcid: "https://orcid.org/0000-0003-0753-274X"
  - family-names: "Fernandez-Goycoolea"
    given-names: "Jose"
    orcid: "https://orcid.org/0000-0001-5349-4348"
  - family-names: "Aldridge"
    given-names: "Jacqueline"
    orcid: "https://orcid.org/0000-0003-3491-6589"
  - family-names: "M. van Bergen"
    given-names: "Cornelis A."
    orcid: "https://orcid.org/0000-0001-6386-4517"
  - family-names: "Posthuma"
    given-names: "Ward"
  - family-names: "Uribe-Paredes"
    given-names: "Roberto"
    orcid: "https://orcid.org/0000-0001-9519-8862"
  - family-names: "Veelken"
    given-names: "Hendrik"
    orcid: "https://orcid.org/0000-0002-9108-3125"
  - family-names: "Navarrete"
    given-names: "Marcelo A."
    orcid: "https://orcid.org/0000-0002-2044-9548"
  title: "Integration of Mutational Signature Analysis with 3D Chromatin Data Unveils Differential AID-Related Mutagenesis in Indolent Lymphomas"
  doi: "10.3390/ijms222313015"
  journal: "International Journal of Molecular Sciences"
  publisher:
    name: "MDPI AG"
  volume: 22
  issue: 23
  start: 13015
  year: 2021
  month: 12

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 1
  • Total pull requests: 4
  • Average time to close issues: about 13 hours
  • Average time to close pull requests: about 1 hour
  • Total issue authors: 1
  • Total pull request authors: 2
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • esavage111 (1)
Pull Request Authors
  • dialvarezs (3)
  • jsepulvedayanez (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

.github/workflows/container_sig_analysis.yml actions
  • actions/checkout v2 composite
  • docker/build-push-action v3 composite
  • docker/login-action v2 composite
.github/workflows/container_sigprofiler.yml actions
  • actions/checkout v2 composite
  • docker/build-push-action v3 composite
  • docker/login-action v2 composite
containers/sig_profiler/Dockerfile docker
  • pytorch/pytorch 1.11.0-cuda11.3-cudnn8-runtime build
containers/signature_analysis/Dockerfile docker
  • r-base 4.2.2 build