https://github.com/databio/pepatac_paper_data

Information on reproducing plots from the PEPATAC paper

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.4%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

Information on reproducing plots from the PEPATAC paper

Basic Info

Host: GitHub
Owner: databio
Language: R
Default Branch: master
Size: 43.9 KB

Statistics

Stars: 2
Watchers: 3
Forks: 0
Open Issues: 0
Releases: 0

Created over 5 years ago · Last pushed about 5 years ago

Metadata Files

Readme

PEPATAC paper analyses

Get the data files

The data used for the paper is available from public resources. Included in the metadata/ subfolder is the "papersraaccessions.txt" file containing a list of sequence read archive accession numbers.

To obtain the files en masse, you can provide the entire file to NCBI's sra-tools' fasterq-dump function like so: cat paper_sra_accessions.txt | xargs -n1 fasterq-dump -p -O /path/to/output_dir

To simplify use of downstream configuration files, you can also create an environment variable (SRAFQ) that points to this output directory containing your fastq files.

export SRAFQ=/path/to/output_dir

Run the pipeline

After downloading, you can process using the pipeline: looper run paper_config.yaml looper run paper_none_config.yaml

After completing, generate summary statistics: looper report paper_config.yaml looper report paper_none_config.yaml

This will produce output variants with prealignments and without for downstream comparisons.

The included R markdown file may be followed to reproduce the plots in R from the paper.

Prealignment comparisons

To produce the prealignment timing comparison plots requires three primary steps.

1. Obtain source files

The mitochondrial (mtDNA) and human nuclear genome (hg38) aligning reads are originally derived from the following GEO accessions: - GSM2471255 - GSM2471300 - GSM2471249 - GSM2471269 - GSM2471245

2. Run source files through PEPATAC and keep prealignment BAM files

We want to extract mitochondrial reads, so we will keep all prealignment files. The default is to remove them to save disk space. The included "sourcelibraryconfig.yaml" is our PEP for these samples.

looper run source_library_config.yaml -x " --keep"

3. Extract mitochondrial and nuclear genome aligning reads

After these samples finish, we want to generate all of the various total read counts necessary of both mtDNA and hg38 aligning reads that we can combine in various ratios to generate 10-100% mixtures from 10M to 200M total reads per mixture.

./generate_libraries.sh "mtDNA_reads" "hg38_reads" "/path/to/source_library_output/results_pipeline/"

This is best accomplished on a cluster or a machine with upwards of 100GB of available RAM.

4. Analyze using prealignments and without

Set a environment variable that points to the directory containing your generated libraries named DATA.

export DATA=/your/path/to/mtDNA_reads/

Run each version of the PEP project using the same compute resources.

looper run prealignment_config.yaml --compute cores=8 mem=16000 looper run prealignment_none_config.yaml --compute cores=8 mem=16000

5. Produce comparison plots

Rscript PEPATAC_profile_aggregator.R /path/to/prealignment_config.yaml /path/to/prealignment_none_config.yaml $PROCESSED/pepatac/prealignment_comparison/yes/results_pipeline/ $PROCESSED/pepatac/prealignment_comparison/no/results_pipeline/ /path/to/your/output_dir

Owner

Name: Databio
Login: databio
Kind: organization
Location: University of Virginia

Website: https://databio.org
Repositories: 88
Profile: https://github.com/databio

Solving problems in computational biology

GitHub Events

Total

Last Year

Issues and Pull Requests

Last synced: over 1 year ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/databio/pepatac_paper_data

Science Score: 13.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

PEPATAC paper analyses

Get the data files

Run the pipeline

Prealignment comparisons

1. Obtain source files

2. Run source files through PEPATAC and keep prealignment BAM files

3. Extract mitochondrial and nuclear genome aligning reads

4. Analyze using prealignments and without

5. Produce comparison plots

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels