https://github.com/dincalcilab/samurai_paper_scripts
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.3%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: DIncalciLab
- License: mit
- Language: Shell
- Default Branch: main
- Size: 15.6 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
SAMURAIpaperscripts
This repository includes all the scripts used for the paper "SAMURAI: Shallow Analysis of copy nuMber Using a Reproducible And Integrated bioinformatics pipeline".
Case Study 1 - Evaluation of SAMURAI on simulated data: Download and dilution of Test data from Smolander et al.
Step 1: Download Simulated Sample
The original simulated sample files (simulated_L001_R1_001.fastq.gz, simulated_L001_R2_001.fastq.gz) can be downloaded from Zenodo
Step 2: Align FASTQ Files
Align the downloaded FASTQ files to hg38 using BWA-MEM. You can use the following Singularity container for BWA-MEM.
Step 3: Downsample BAM Files
Downsampling is performed using Picard DownsampleSamm. You can install Picard locally or use a Singularity container:
Step 4: Produce Diluted Samples
To produce diluted samples, use the following command, changing the parameter P to simulate different coverages (e.g., 0.1, 0.3, 0.5, 0.7):
java -jar picard.jar DownsampleSam \
I=input.bam \
O=downsampled.bam \
P=0.5
Case Study 1 - Evaluation of SAMURAI on simulated data: Dilution of normal samples to build the Panel of normals (PoN) for liquid biopsy test
The script download_normal_gatk.sh can be used to download GATK data to build a simulated panel of normal.
Data need to be downsampled at different coverages.
The script contains the automatic download of three singularity images for sambamba, samtools and bedtools that are needed for the in-silico dilution.
The function Subsample takes as input:
1. input_bam: Original downloaded BAM normal file (SM-74NEG.bam)
2. desired_read_count: Desired read count for subsampling
3. output_bam: Final diluted BAM normal file
Within the script, you can adjust the following parameters:
- CORES: Number of cores to use
- READ_COUNT: Number of reads for subsampling
- NUM_SAMPLES: Number of samples to generate
The script then converts diluted samples from BAM to fastq format.
You can use the script by launching bash download_normal_gatk.sh after ajusting the parameters as you like. Alternatively, you can download data and singularity images on your own and use the different part of the script separately.
Owner
- Name: DIncalciLab
- Login: DIncalciLab
- Kind: organization
- Location: Italy
- Repositories: 1
- Profile: https://github.com/DIncalciLab
GitHub Events
Total
- Push event: 7
Last Year
- Push event: 7