rnmp_hotspots

Finding common and highly incorporated rNMP locations using rNMP Enrichment Factor

https://github.com/dkundnani/rnmp_hotspots

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 15 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.8%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Finding common and highly incorporated rNMP locations using rNMP Enrichment Factor

Basic Info
  • Host: GitHub
  • Owner: DKundnani
  • License: mit
  • Language: R
  • Default Branch: main
  • Size: 92.8 KB
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 3
Created almost 3 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License Citation

README.md

rNMP Hotpsots

To obtain rNMP hotspots(locations of highly abundant rNMPs) using a percentage or poisson pvalue threshold and map on respective reference genomes.

<!-- --> Commits Contributors Forks Stargazers Website Issues MIT License LinkedIn

Table of Contents
  1. Installation
  2. Usage
  3. Contributing
  4. License
  5. Contact
  6. Citations

Installation

Getting the code

The development version from GitHub with: sh git clone https://github.com/DKundnani/rNMP_hotspots.git

Creating the enviroment with required dependencies

sh conda env create --name rNMPhotspots_env --file /rNMP_hotspots/yml/r_env.yml

Additional Dependencies

  • Input files (bed) containing single nucleotide locations, mainly for rNMP data. (another single nucleotide data can also be experimented on!)
  • Reference genome files (.fa and .fai) of the organism being used(Also used to generate bed files)
  • BAM files (optional) from DNA-seq pipelines See https://github.com/DKundnani/Omics-pipelines

Usage

Defining variables

bash lib=path/to/AGS/ribo-DNA-order #First col Library name, 3rd col basename of bam files from DNA-se pipeline, bed=path/to/AGS/bed dna=path/to/AGS/DNAseq/aligned normbed=path/to/AGS/norm_counts script=path/to/AGS/rNMP_hotspots/scripts genome=path/to/reference/sacCer2/sacCer2-nucl.fa.fai #size file of the genome

Normalization of bed files for coverage (optional)

```bash conda activate rNMPhotspots_env #activating enviroment mkdir $normbed #Creating output directory for sample in $(seq 1 6 | xargs -I aa echo Yaa); do samtools depth -a ${sample}.bam > ${sample}.cov & done #Get coverage file from bam files

while read -r line; do FS=$(echo $line | tr " " "\t" | cut -f1) sam=$(echo $line| sed 's/\r$//' | awk '{print $3}') Rscript $script/count_norm.R -r $bed/*.bed -c ${dna}/${sam}.cov -g $genome -o $normbed ; done < $lib > $normbed/norm.log

```

Generating matrix

```bash mkdir $normbed/hotspots #place files into the normbed folder cd $normbed

getting files per genotype

filelist=$(cut -f3 files | uniq | tr "\n" "\t") for f in $filelist; do grep $f files > ${f}_files; done

thresh=2 #Minimum 2 libraries in each subtype used as threshold for f in $filelist; do $script/dfmatrix.R -f ${f}files -a -t $thresh -c 8 -s -o ${f}files${thresh}commonEF.tsv & done #files contain library information per genotype to be grouped for finding hotspots ```

Getting common hotspots for each genotype using different thresholds and visualization

```bash mv tsv ./hotspots/ cd hotspots top=25 #Getting top 25 hotspots for f in $filelist; do Rscript $script/plothotspots.R -m ${f}files*all -c -g $genome -r BSgenome.Scerevisiae.UCSC.sacCer2 -t $top -v -o . & done

Getting top fraction of hotspots

for thresh in 0.05 0.02 0.01 ; do for f in $filelist; do Rscript $script/plothotspots.R -m ${f}files*all -c -g $genome -r BSgenome.Scerevisiae.UCSC.sacCer2 -t $thresh -o . & done done

```

GGseqlogo plots (MEME plots)

bash for thresh in 0.05 0.02 0.01 ; do for f in $filelist; do Rscript $script/meme.R -f ${f}_files*${thresh}*top* -c 9 & done #ggseqlogo plots done

Additional visualizations

See stacked barplots for composition in RPA-wrapper

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Deepali L. Kundnani - deepali.kundnani@gmail.com LinkedIn

(back to top)

Citations

Use this space to list resources you find helpful and would like to give credit to. I've included a few of my favorites to kick things off!+ * Light-strand bias and enriched zones of embedded ribonucleotides are associated with DNA replication and transcription in the human-mitochondrial genome. Penghao Xu, Taehwan Yang, Deepali L Kundnani, Mo Sun, Stefania Marsili, Alli L Gombolay, Youngkyu Jeon, Gary Newnam, Sathya Balachander, Veronica Bazzani, Umberto Baccarani, Vivian S Park, Sijia Tao, Adriana Lori, Raymond F Schinazi, Baek Kim, Zachary F Pursell, Gianluca Tell, Carlo Vascotto, Francesca Storici Acids Research 2023;, gkad1204, https://doi.org/10.1093/nar/gkad1204 * Distinct features of ribonucleotides within genomic DNA in Aicardi-Goutières syndrome (AGS)-ortholog mutants of Saccharomyces cerevisiae Deepali L. Kundnani, Taehwan Yang, Alli L. Gombolay, Kuntal Mukherjee, Gary Newnam, Chance Meers, Zeel H. Mehta, Celine Mouawad, Francesca Storici bioRxiv 2023.10.02.560505; doi:https://doi.org/10.1101/2023.10.02.560505 * Kundnani, D. (2024). rNMP_hotspots:2.0.0 (2.0.0). Zenodo. https://doi.org/10.5281/zenodo.8152090 DOI

(back to top)

Owner

  • Name: Deepali Kundnani
  • Login: DKundnani
  • Kind: user
  • Location: Atlanta
  • Company: Georgia Institute of Technology

Ph.D. Student, Bioinformatics

Citation (CITATION.cff)

cff-version: 2.0.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Kundnani"
  given-names: "Deepali"
  orcid: "https://orcid.org/0000-0002-2289-3554"
title: "rNMP_hotspots:2.0.0"
version: 2.0.0
date-released: 2024-01-03
url: "https://github.com/DKundnani/rNMP_hotspots/"

GitHub Events

Total
Last Year