https://github.com/dincalcilab/samurai_paper_scripts

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.3%) to scientific vocabulary

Last synced: 5 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: DIncalciLab
License: mit
Language: Shell
Default Branch: main
Size: 15.6 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created almost 2 years ago · Last pushed about 1 year ago

Metadata Files

Readme License

SAMURAIpaperscripts

This repository includes all the scripts used for the paper "SAMURAI: Shallow Analysis of copy nuMber Using a Reproducible And Integrated bioinformatics pipeline".

Case Study 1 - Evaluation of SAMURAI on simulated data: Download and dilution of Test data from Smolander et al.

Step 1: Download Simulated Sample

The original simulated sample files (simulated_L001_R1_001.fastq.gz, simulated_L001_R2_001.fastq.gz) can be downloaded from Zenodo

Step 2: Align FASTQ Files

Align the downloaded FASTQ files to hg38 using BWA-MEM. You can use the following Singularity container for BWA-MEM.

Step 3: Downsample BAM Files

Downsampling is performed using Picard DownsampleSamm. You can install Picard locally or use a Singularity container:

Step 4: Produce Diluted Samples

To produce diluted samples, use the following command, changing the parameter P to simulate different coverages (e.g., 0.1, 0.3, 0.5, 0.7): java -jar picard.jar DownsampleSam \ I=input.bam \ O=downsampled.bam \ P=0.5

Case Study 1 - Evaluation of SAMURAI on simulated data: Dilution of normal samples to build the Panel of normals (PoN) for liquid biopsy test

The script download_normal_gatk.sh can be used to download GATK data to build a simulated panel of normal. Data need to be downsampled at different coverages.

The script contains the automatic download of three singularity images for sambamba, samtools and bedtools that are needed for the in-silico dilution.

The function Subsample takes as input: 1. input_bam: Original downloaded BAM normal file (SM-74NEG.bam) 2. desired_read_count: Desired read count for subsampling 3. output_bam: Final diluted BAM normal file

Within the script, you can adjust the following parameters: - CORES: Number of cores to use - READ_COUNT: Number of reads for subsampling - NUM_SAMPLES: Number of samples to generate

The script then converts diluted samples from BAM to fastq format.

You can use the script by launching bash download_normal_gatk.sh after ajusting the parameters as you like. Alternatively, you can download data and singularity images on your own and use the different part of the script separately.

Owner

Name: DIncalciLab
Login: DIncalciLab
Kind: organization
Location: Italy

Repositories: 1
Profile: https://github.com/DIncalciLab

GitHub Events

Total

Push event: 7

Last Year

Push event: 7

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science