cv_coverage
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.0%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: ulelab
- License: gpl-3.0
- Language: Python
- Default Branch: main
- Size: 84 KB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
cv_coverage
This code compares the coverages of a k-mer group around crosslink events (landmarks) across multiple CLIP datasets. The comparison is conducted within user-specified genomic regions.
Author: aram.amalietti@gmail.com
Dependencies (these are the versions the script was developed with, pandas >= 1 introduced breaking changes, please use these versions):
python=3.7
pandas=0.24.2
numpy=1.19.2
pybedtools=0.8.1
matplotlib=3.3.2
seaborn=0.11.0
Usage:
python3 <path_to_script> <xl_in> <motifs> <regions> <kmer_len> <fasta> <fai> <regions_file> <smothing> <percentile> <window> <use_scores> <n_cores> <chunk_size> <cap>
xl_inlist of BED files containing landmarks around which to display the motifs, given aspath_to_file1,path_to_file2,path_to_file3 .... It can be a single file or a list of files, the plots will be adjusted accordingly;motifsgroup of motifs to be displayed around the input landmarks, for exampleAAAA,CCCC,GGGG ...;regionsgroup of regions to be considered for analysis. Available regions aregenome(intron, CDS, UTR3, UTR5, ncRNA, intergenic),whole_gene(intron, CDS, UTR3, UTR5),intergenic,intron,ncRNA,other_exon(UTR5, CDS),UTR3andUTR5. Single or multiple regions can be analysed and the plots will adjust accordingly. For example,intron,ncRNA;kmer_lenthe length of analysed motifs (in bases);fastais the path to the genome in fasta format;faiis the path to the genome index file;regions_fileis a segmentation file in GTF format, usually obtained with iCount segment but can also come from other sources;smoothingis the size of the smoothing window, usually 12;percentiledefines the percentile for thresholding genomic landmarks by score, if percentile is set to None no thresholding is used and all landmarks will be used for the analisys;windowflanking distance around the landmarks that is analysed and displayed;use_scoresIf True, each landmark is weighted by its score, else all landmarks will have equal weigths of 1;n_coresis the number of threads used in the process (4 is the usual value);chunk_sizeis the number of rows per thread (10000 is the usual value);capis the max value for any landmarks score. It is only used if use_scores is set to True (20 is the recommended value);
Common issues
The script needs writing permission in the staging directory to make results directory and environment variable TMPDIR for temporary files. If you get KeyError: 'TMPDIR' a solution would be to type export TMPDIR=<path_to_folder> in terminal where you want to run the script.
Outputs - A pdf file with graphs showing % k-mer group coverage around landmarks for each analyzed CLIP dataset; - A tsv file with % coverage values (raw values used for plotting); - A text file that saves the number of analyzed landmarks in each sample; - A tsv file with saved run parameters.
Owner
- Name: Ulelab
- Login: ulelab
- Kind: organization
- Location: London
- Repositories: 10
- Profile: https://github.com/ulelab
Citation (CITATION.cff)
cff-version: 1.1.0 message: "If you use this software, please cite it as below." authors: - family-names: "Amalietti" given-names: "Aram Gustav" title: "Comparative Visualisation of Average Motif Coverage" version: 1.1.0 doi: 10.5281/zenodo.8386510 date-released: 2021-12-05 url: "https://github.com/ulelab/cv_coverage"