motifcounter

This is a read-only mirror of the git repos at https://bioconductor.org

https://github.com/bioc/motifcounter

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.5%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

This is a read-only mirror of the git repos at https://bioconductor.org

Basic Info

Host: GitHub
Owner: bioc
Language: C
Default Branch: devel
Size: 7.21 MB

Statistics

Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 0

Created almost 3 years ago · Last pushed about 1 year ago

Metadata Files

Readme Changelog Zenodo

motifcounter - R package for analysing TFBSs in DNA sequences.

This software package grew out of the work that I did to obtain my PhD.

If it is of help for your analysis, please cite

@Manual{, title = {motifcounter: R package for analysing TFBSs in DNA sequences}, author = {Wolfgang Kopp}, year = {2017}, doi = {10.18129/B9.bioc.motifcounter} }

A detailed description of the compound Poisson model is available in ``` @article{improvedcompound, title={An improved compound Poisson model for the number of motif hits in DNA sequences}, author={Kopp, Wolfgang and Vingron, Martin}, journal={Bioinformatics}, pages={btx539}, year={2017}, publisher={Oxford University Press} }

```

Usage

```

Estimate a background model on a set of sequences

bg <- readBackground(sequences, order)

Normalize a given PFM

new_motif <- normalizeMotif(motif)

Evaluate the scores along a given sequence

scores <- scoreSequence(sequence, motif, bg)

Evaluate the motif hits along a given sequence

hits <- motifHits(sequence, motif, bg)

Evaluate the average score profile

score_profile <- scoreProfile(sequences, motif, bg)

Evaluate the average motif hit profile

hit_profile <- motifHitProfile(sequences, motif, bg)

Compute the motif hit enrichment

enrichment <- motifEnrichment(sequences, motif, bg) ```

Hallmarks of `motifcounter`

The motifcounter package facilitates the analysis of transcription factor binding sites (TFBSs) in DNA sequences. It can be used to scan a set of DNA sequences for known motifs (e.g. from TRANSFAC or JASPAR) in order to determine the positions and enrichment of TFBSs in the sequences.

Therefore, an analysis with motifcounter requires as input 1. a position frequency matrix (PFM) which represents the TF affinity towards the DNA 2. a background model, which is estimated from a given DNA sequence and which serves as a reference for the statistical analysis. 3. a desired false positive level, for identifying putative TFBSs in DNA sequences. For example, a reasonable choice would be to choose a false positive level such that only one in 1000 positions are called TFBSs falsely. 4. a given DNA sequence, which is subject to the TFBS analysis.

The package aims to improve motif hit enrichment analysis. To this end, the package offers a number of features: 1. motifcounter supports the use of higher-order Markov models to account for the sequence composition in unbound DNA segments. This improves the reliability of the enrichment analysis, because higher-order sequence features occur commonly in natural DNA sequences (e.g. CpG islands). 2. The package automatically accounts for self-overlapping motif structures¹. This aspect is important for reducing the false positives obtained from the enrichment test, which is prevalent for repeat-like and palindromic motifs. motifcounter not only determines self-overlapping motif hit occurrences on a single DNA strand, but (by default) also with respect to the reverse strand.

Enrichment model

motifcounter implements two analytic approximations of the distribution of the number of motif hits in random DNA sequences that can optionally be used for the enrichment test:

A compound Poisson approximation
A combinatorial approximation

Both approximations yield highly accurate results for stringent false positive levels. Moreover, if you intend to analyse long DNA sequences or a large set of individual sequences (total sequence length >10kb), we recommend to use the compound Poisson approximation. On the other hand, we recommend the combinatorial approximation if a relaxed false positive level is prefered to identify TFBSs.

Installation

An easy way to install motifcounter is by facilitating the devtools R package.

```R

install.packages("devtools")

library(devtools) installgithub("wkopp/motifcounter", buildvignettes=TRUE) ```

Alternatively, the package can also be cloned or downloaded from this github-rep, built via R CMD build and installed via the R CMD INSTALL command.

Getting started

The motifcounter package contains a tutorial that illustrates: 1. how to determine position- and strand-specific TF motif binding sites, 2. how to analyse the profile of motif hit occurrences across a set of aligned sequences, and 3. how to test for motif enrichment in a given set of sequences.

The tutorial can be found in the package-vignette:

R library(motifcounter) vignette("motifcounter")

Acknowledgements

Thanks to matthuska for reviewing and commenting on the package.

^{1: Self-overlapping motifs induce
clumps of motif hits (that is, mutually
overlapping motif hits) when a DNA sequence is scanned for hits.
As a consequence of motif clumping, the distribution of the number of
motif hits, and thus, the enrichment test are affected.↩}

Owner

Name: bioc
Login: bioc
Kind: organization

Repositories: 1
Profile: https://github.com/bioc

GitHub Events

Total

Delete event: 1
Push event: 3
Create event: 2

Last Year

Delete event: 1
Push event: 3
Create event: 2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

motifcounter

Science Score: 39.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

motifcounter - R package for analysing TFBSs in DNA sequences.

Usage

Estimate a background model on a set of sequences

Normalize a given PFM

Evaluate the scores along a given sequence

Evaluate the motif hits along a given sequence

Evaluate the average score profile

Evaluate the average motif hit profile

Compute the motif hit enrichment

Hallmarks of `motifcounter`

Enrichment model

Installation

install.packages("devtools")

Getting started

Acknowledgements

Owner

GitHub Events

Total

Last Year

motifcounter

Science Score: 39.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

motifcounter - R package for analysing TFBSs in DNA sequences.

Usage

Estimate a background model on a set of sequences

Normalize a given PFM

Evaluate the scores along a given sequence

Evaluate the motif hits along a given sequence

Evaluate the average score profile

Evaluate the average motif hit profile

Compute the motif hit enrichment

Hallmarks of motifcounter

Enrichment model

Installation

install.packages("devtools")

Getting started

Acknowledgements

Owner

GitHub Events

Total

Last Year

Hallmarks of `motifcounter`