https://github.com/dincalcilab/samurai_paper_scripts

https://github.com/dincalcilab/samurai_paper_scripts

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.3%) to scientific vocabulary
Last synced: 5 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: DIncalciLab
  • License: mit
  • Language: Shell
  • Default Branch: main
  • Size: 15.6 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 2 years ago · Last pushed about 1 year ago
Metadata Files
Readme License

README.md

SAMURAIpaperscripts

This repository includes all the scripts used for the paper "SAMURAI: Shallow Analysis of copy nuMber Using a Reproducible And Integrated bioinformatics pipeline".

Case Study 1 - Evaluation of SAMURAI on simulated data: Download and dilution of Test data from Smolander et al.

Step 1: Download Simulated Sample

The original simulated sample files (simulated_L001_R1_001.fastq.gz, simulated_L001_R2_001.fastq.gz) can be downloaded from Zenodo

Step 2: Align FASTQ Files

Align the downloaded FASTQ files to hg38 using BWA-MEM. You can use the following Singularity container for BWA-MEM.

Step 3: Downsample BAM Files

Downsampling is performed using Picard DownsampleSamm. You can install Picard locally or use a Singularity container:

Step 4: Produce Diluted Samples

To produce diluted samples, use the following command, changing the parameter P to simulate different coverages (e.g., 0.1, 0.3, 0.5, 0.7): java -jar picard.jar DownsampleSam \ I=input.bam \ O=downsampled.bam \ P=0.5

Case Study 1 - Evaluation of SAMURAI on simulated data: Dilution of normal samples to build the Panel of normals (PoN) for liquid biopsy test

The script download_normal_gatk.sh can be used to download GATK data to build a simulated panel of normal. Data need to be downsampled at different coverages.

The script contains the automatic download of three singularity images for sambamba, samtools and bedtools that are needed for the in-silico dilution.

The function Subsample takes as input: 1. input_bam: Original downloaded BAM normal file (SM-74NEG.bam) 2. desired_read_count: Desired read count for subsampling 3. output_bam: Final diluted BAM normal file

Within the script, you can adjust the following parameters: - CORES: Number of cores to use - READ_COUNT: Number of reads for subsampling - NUM_SAMPLES: Number of samples to generate

The script then converts diluted samples from BAM to fastq format.

You can use the script by launching bash download_normal_gatk.sh after ajusting the parameters as you like. Alternatively, you can download data and singularity images on your own and use the different part of the script separately.

Owner

  • Name: DIncalciLab
  • Login: DIncalciLab
  • Kind: organization
  • Location: Italy

GitHub Events

Total
  • Push event: 7
Last Year
  • Push event: 7