motifcounter
This is a read-only mirror of the git repos at https://bioconductor.org
Science Score: 39.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.5%) to scientific vocabulary
Repository
This is a read-only mirror of the git repos at https://bioconductor.org
Basic Info
- Host: GitHub
- Owner: bioc
- Language: C
- Default Branch: devel
- Size: 7.21 MB
Statistics
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
motifcounter - R package for analysing TFBSs in DNA sequences.
This software package grew out of the work that I did to obtain my PhD.
If it is of help for your analysis, please cite
@Manual{,
title = {motifcounter: R package for analysing TFBSs in DNA sequences},
author = {Wolfgang Kopp},
year = {2017},
doi = {10.18129/B9.bioc.motifcounter}
}
A detailed description of the compound Poisson model is available in ``` @article{improvedcompound, title={An improved compound Poisson model for the number of motif hits in DNA sequences}, author={Kopp, Wolfgang and Vingron, Martin}, journal={Bioinformatics}, pages={btx539}, year={2017}, publisher={Oxford University Press} }
```
Usage
```
Estimate a background model on a set of sequences
bg <- readBackground(sequences, order)
Normalize a given PFM
new_motif <- normalizeMotif(motif)
Evaluate the scores along a given sequence
scores <- scoreSequence(sequence, motif, bg)
Evaluate the motif hits along a given sequence
hits <- motifHits(sequence, motif, bg)
Evaluate the average score profile
score_profile <- scoreProfile(sequences, motif, bg)
Evaluate the average motif hit profile
hit_profile <- motifHitProfile(sequences, motif, bg)
Compute the motif hit enrichment
enrichment <- motifEnrichment(sequences, motif, bg) ```
Hallmarks of motifcounter
The motifcounter package facilitates the analysis of
transcription factor binding sites (TFBSs) in DNA sequences.
It can be used to scan a set of DNA sequences for known motifs
(e.g. from TRANSFAC or JASPAR) in order to determine the positions
and enrichment of TFBSs in the sequences.
Therefore, an analysis with motifcounter requires as input
1. a position frequency matrix (PFM) which represents the TF affinity towards the DNA
2. a background model, which is estimated from a given DNA sequence and which
serves as a reference for the statistical analysis.
3. a desired false positive level, for identifying putative TFBSs in DNA sequences. For example, a reasonable choice would be to choose a false positive level such that only one in 1000 positions are called TFBSs falsely.
4. a given DNA sequence, which is subject to the TFBS analysis.
The package aims to improve motif hit enrichment analysis. To this end,
the package offers a number of features:
1. motifcounter supports the use of higher-order Markov models
to account for the sequence composition in unbound DNA segments.
This improves the reliability of the enrichment analysis, because higher-order
sequence features occur commonly in natural DNA sequences (e.g. CpG islands).
2. The package automatically accounts for self-overlapping motif
structures1. This aspect is important
for reducing the false positives obtained from the enrichment test, which is
prevalent for repeat-like and palindromic motifs.
motifcounter not only determines self-overlapping motif hit occurrences
on a single DNA strand, but (by default)
also with respect to the reverse strand.
Enrichment model
motifcounter implements two analytic approximations of the
distribution of the number of motif hits
in random DNA sequences that can optionally be used for the
enrichment test:
- A compound Poisson approximation
- A combinatorial approximation
Both approximations yield highly accurate results for stringent false positive levels. Moreover, if you intend to analyse long DNA sequences or a large set of individual sequences (total sequence length >10kb), we recommend to use the compound Poisson approximation. On the other hand, we recommend the combinatorial approximation if a relaxed false positive level is prefered to identify TFBSs.
Installation
An easy way to install motifcounter is by facilitating
the devtools R package.
```R
install.packages("devtools")
library(devtools) installgithub("wkopp/motifcounter", buildvignettes=TRUE) ```
Alternatively, the package can also be cloned or
downloaded from this github-rep,
built via R CMD build
and installed via the R CMD INSTALL command.
Getting started
The motifcounter package contains a tutorial that illustrates:
1. how to determine position- and strand-specific TF motif binding sites,
2. how to analyse the profile of motif hit occurrences across a set of
aligned sequences, and
3. how to test for motif enrichment in a given set of sequences.
The tutorial can be found in the package-vignette:
R
library(motifcounter)
vignette("motifcounter")
Acknowledgements
Thanks to matthuska for reviewing and commenting on the package.
1: Self-overlapping motifs induce clumps of motif hits (that is, mutually overlapping motif hits) when a DNA sequence is scanned for hits. As a consequence of motif clumping, the distribution of the number of motif hits, and thus, the enrichment test are affected.↩
Owner
- Name: bioc
- Login: bioc
- Kind: organization
- Repositories: 1
- Profile: https://github.com/bioc
GitHub Events
Total
- Delete event: 1
- Push event: 3
- Create event: 2
Last Year
- Delete event: 1
- Push event: 3
- Create event: 2