rnmp_hotspots
Finding common and highly incorporated rNMP locations using rNMP Enrichment Factor
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 15 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.8%) to scientific vocabulary
Repository
Finding common and highly incorporated rNMP locations using rNMP Enrichment Factor
Basic Info
- Host: GitHub
- Owner: DKundnani
- License: mit
- Language: R
- Default Branch: main
- Size: 92.8 KB
Statistics
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 3
Metadata Files
README.md
rNMP Hotpsots
To obtain rNMP hotspots(locations of highly abundant rNMPs) using a percentage or poisson pvalue threshold and map on respective reference genomes.
Table of Contents
Installation
Getting the code
The development version from GitHub with:
sh
git clone https://github.com/DKundnani/rNMP_hotspots.git
Creating the enviroment with required dependencies
sh
conda env create --name rNMPhotspots_env --file /rNMP_hotspots/yml/r_env.yml
Additional Dependencies
- Input files (bed) containing single nucleotide locations, mainly for rNMP data. (another single nucleotide data can also be experimented on!)
- Reference genome files (.fa and .fai) of the organism being used(Also used to generate bed files)
- BAM files (optional) from DNA-seq pipelines See https://github.com/DKundnani/Omics-pipelines
Usage
Defining variables
bash
lib=path/to/AGS/ribo-DNA-order #First col Library name, 3rd col basename of bam files from DNA-se pipeline,
bed=path/to/AGS/bed
dna=path/to/AGS/DNAseq/aligned
normbed=path/to/AGS/norm_counts
script=path/to/AGS/rNMP_hotspots/scripts
genome=path/to/reference/sacCer2/sacCer2-nucl.fa.fai #size file of the genome
Normalization of bed files for coverage (optional)
```bash conda activate rNMPhotspots_env #activating enviroment mkdir $normbed #Creating output directory for sample in $(seq 1 6 | xargs -I aa echo Yaa); do samtools depth -a ${sample}.bam > ${sample}.cov & done #Get coverage file from bam files
while read -r line; do FS=$(echo $line | tr " " "\t" | cut -f1) sam=$(echo $line| sed 's/\r$//' | awk '{print $3}') Rscript $script/count_norm.R -r $bed/*.bed -c ${dna}/${sam}.cov -g $genome -o $normbed ; done < $lib > $normbed/norm.log
```
Generating matrix
```bash mkdir $normbed/hotspots #place files into the normbed folder cd $normbed
getting files per genotype
filelist=$(cut -f3 files | uniq | tr "\n" "\t") for f in $filelist; do grep $f files > ${f}_files; done
thresh=2 #Minimum 2 libraries in each subtype used as threshold for f in $filelist; do $script/dfmatrix.R -f ${f}files -a -t $thresh -c 8 -s -o ${f}files${thresh}commonEF.tsv & done #files contain library information per genotype to be grouped for finding hotspots ```
Getting common hotspots for each genotype using different thresholds and visualization
```bash mv tsv ./hotspots/ cd hotspots top=25 #Getting top 25 hotspots for f in $filelist; do Rscript $script/plothotspots.R -m ${f}files*all -c -g $genome -r BSgenome.Scerevisiae.UCSC.sacCer2 -t $top -v -o . & done
Getting top fraction of hotspots
for thresh in 0.05 0.02 0.01 ; do for f in $filelist; do Rscript $script/plothotspots.R -m ${f}files*all -c -g $genome -r BSgenome.Scerevisiae.UCSC.sacCer2 -t $thresh -o . & done done
```
GGseqlogo plots (MEME plots)
bash
for thresh in 0.05 0.02 0.01 ; do
for f in $filelist; do Rscript $script/meme.R -f ${f}_files*${thresh}*top* -c 9 & done #ggseqlogo plots
done
Additional visualizations
See stacked barplots for composition in RPA-wrapper
Contributing
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
License
Distributed under the MIT License. See LICENSE.txt for more information.
Contact
Deepali L. Kundnani - deepali.kundnani@gmail.com
Citations
Use this space to list resources you find helpful and would like to give credit to. I've included a few of my favorites to kick things off!+
* Light-strand bias and enriched zones of embedded ribonucleotides are associated with DNA replication and transcription in the human-mitochondrial genome.
Penghao Xu, Taehwan Yang, Deepali L Kundnani, Mo Sun, Stefania Marsili, Alli L Gombolay, Youngkyu Jeon, Gary Newnam, Sathya Balachander, Veronica Bazzani, Umberto Baccarani, Vivian S Park, Sijia Tao, Adriana Lori, Raymond F Schinazi, Baek Kim, Zachary F Pursell, Gianluca Tell, Carlo Vascotto, Francesca Storici
Acids Research 2023;, gkad1204, https://doi.org/10.1093/nar/gkad1204
* Distinct features of ribonucleotides within genomic DNA in Aicardi-Goutières syndrome (AGS)-ortholog mutants of Saccharomyces cerevisiae
Deepali L. Kundnani, Taehwan Yang, Alli L. Gombolay, Kuntal Mukherjee, Gary Newnam, Chance Meers, Zeel H. Mehta, Celine Mouawad, Francesca Storici
bioRxiv 2023.10.02.560505; doi:https://doi.org/10.1101/2023.10.02.560505
* Kundnani, D. (2024). rNMP_hotspots:2.0.0 (2.0.0). Zenodo. https://doi.org/10.5281/zenodo.8152090
Owner
- Name: Deepali Kundnani
- Login: DKundnani
- Kind: user
- Location: Atlanta
- Company: Georgia Institute of Technology
- Website: dkundnani.bio
- Repositories: 1
- Profile: https://github.com/DKundnani
Ph.D. Student, Bioinformatics
Citation (CITATION.cff)
cff-version: 2.0.0 message: "If you use this software, please cite it as below." authors: - family-names: "Kundnani" given-names: "Deepali" orcid: "https://orcid.org/0000-0002-2289-3554" title: "rNMP_hotspots:2.0.0" version: 2.0.0 date-released: 2024-01-03 url: "https://github.com/DKundnani/rNMP_hotspots/"