memes

An R interface to the MEME Suite

https://github.com/snystrom/memes

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
    1 of 3 committers (33.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (18.0%) to scientific vocabulary
Last synced: 7 months ago · JSON representation

Repository

An R interface to the MEME Suite

Basic Info
Statistics
  • Stars: 50
  • Watchers: 4
  • Forks: 5
  • Open Issues: 40
  • Releases: 0
Created over 6 years ago · Last pushed 10 months ago
Metadata Files
Readme Changelog License

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  fig.align = "center"
  #out.width = "100%"
)
```

# memes


[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
[![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
[![Codecov test coverage](https://codecov.io/gh/snystrom/memes/branch/master/graph/badge.svg)](https://codecov.io/gh/snystrom/memes?branch=master)
![R-CMD-check-bioc](https://github.com/snystrom/memes/workflows/R-CMD-check-bioc/badge.svg)
![Bioconductor Build Status](https://bioconductor.org/shields/build/devel/bioc/memes.svg)
![Bioconductor Lifetime](https://bioconductor.org/shields/years-in-bioc/memes.svg)


An R interface to the [MEME Suite](http://meme-suite.org/) family of tools,
which provides several utilities for performing motif analysis on DNA, RNA, and
protein sequences. memes works by detecting a local install of the MEME suite,
running the commands, then importing the results directly into R.

## Installation

### Bioconductor

memes is currently available on the Bioconductor `devel` branch:

```{r, eval=F}
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

# The following initializes usage of Bioc devel
BiocManager::install(version='devel')

BiocManager::install("memes")
```

### Development Version (Github)
You can install the development version of memes from [GitHub](https://github.com/snystrom/memes) with:

```{r, eval=F}
if (!requireNamespace("remotes", quietly=TRUE))
  install.packages("remotes")
remotes::install_github("snystrom/memes")

# To temporarily bypass the R version 4.1 requirement, you can pull from the following branch:
remotes::install_github("snystrom/memes", ref = "no-r-4")
```

### Docker Container
```{shell}
# Get development version from dockerhub
docker pull snystrom/memes_docker:devel
# the -v flag is used to mount an analysis directory, 
# it can be excluded for demo purposes
docker run -e PASSWORD= -p 8787:8787 -v //:/mnt/ snystrom/memes_docker:devel
```


## Detecting the MEME Suite

memes relies on a local install of the [MEME Suite](http://meme-suite.org/).
For installation instructions for the MEME suite, see the [MEME Suite
Installation Guide](http://meme-suite.org/doc/install.html?man_type=web).

memes needs to know the location of the `meme/bin/` directory on your local machine.
You can tell memes the location of your MEME suite install in 4 ways. memes
will always prefer the more specific definition if it is a valid path. Here they
are ranked from most- to least-specific:

1. Manually passing the install path to the `meme_path` argument of all memes functions
2. Setting the path using `options(meme_bin = "/path/to/meme/bin/")` inside your R script
3. Setting `MEME_BIN=/path/to/meme/bin/` in your `.Renviron` file
4. memes will try the default MEME install location `~/meme/bin/`

If memes fails to detect your install at the specified location, it will fall
back to the next option.

To verify memes can detect your MEME install, use `check_meme_install()` which
uses the search herirarchy above to find a valid MEME install. It will report
whether any tools are missing, and print the path to MEME that it sees. This can
be useful for troubleshooting issues with your install.
```{r check_install_works}
library(memes)

# Verify that memes detects your meme install
# (returns all green checks if so)
check_meme_install()
```

```{r check_install_fails}
# You can manually input a path to meme_path
# If no meme/bin is detected, will return a red X
check_meme_install(meme_path = 'bad/path')
```

## The Core Tools

| Function Name | Use              | Sequence Input | Motif Input | Output |
|:-------------:|:----------------:|:--------------:|:-----------:|:-------------------------------------------------------|
| `runStreme()` | Motif Discovery (short motifs)  | Yes | No      | `universalmotif_df`                                    |
| `runDreme()`  | Motif Discovery (short motifs)  | Yes | No      | `universalmotif_df`                                    |
| `runAme()`    | Motif Enrichment                | Yes | Yes     | data.frame (optional: `sequences` column)              |
| `runFimo()`   | Motif Scanning                  | Yes | Yes     | GRanges of motif positions                             |
| `runTomTom()` | Motif Comparison                | No  | Yes     | `universalmotif_df` w/ `best_match_motif` and `tomtom` columns* |
| `runMeme()`   | Motif Discovery (long motifs)   | Yes | No      | `universalmotif_df`                                    |

\* **Note:** if `runTomTom()` is run using a `universalmotif_df`
the results will be joined with the `universalmotif_df` results as extra
columns. This allows easy comparison of *de-novo* discovered motifs with their
matches.

**Sequence Inputs** can be any of:

1. Path to a .fasta formatted file
2. `Biostrings::XStringSet` (can be generated from GRanges using `get_sequence()` helper function)
3. A named list of `Biostrings::XStringSet` objects (generated by `get_sequence()`)

**Motif Inputs** can be any of:

1. A path to a .meme formatted file of motifs to scan against
2. A `universalmotif` object, or list of `universalmotif` objects
3. A `runDreme()` results object (this allows the results of `runDreme()` to pass directly to `runTomTom()`)
4. A combination of all of the above passed as a `list()` (e.g. `list("path/to/database.meme", "dreme_results" = dreme_res)`)

**Output Types**:

`runDreme()`, `runStreme()`, `runMeme()` and `runTomTom()` return
`universalmotif_df` objects which are data.frames with special columns. The
`motif` column contains a `universalmotif` object, with 1 entry per row. The
remaining columns describe the properties of each returned motif. The following
column names are special in that their values are used when running
`update_motifs()` and `to_list()` to alter the properties of the motifs stored
in the `motif` column. Be careful about changing these values as these changes
will propagate to the `motif` column when calling `update_motifs()` or
`to_list()`.

 - name
 - altname
 - family
 - organism
 - strand
 - nsites
 - bkgsites
 - pval
 - qval
 - eval

memes is built around the [universalmotif package](https://www.bioconductor.org/packages/release/bioc/html/universalmotif.html)
which provides a framework for manipulating motifs in R. `universalmotif_df`
objects can interconvert between data.frame and `universalmotif` list format
using the `to_df()` and `to_list()` functions, respectively. This allows use of
`memes` results with all other Bioconductor motif packages, as `universalmotif`
objects can convert to any other motif type using `convert_motifs()`.

`runTomTom()` returns a special column: `tomtom` which is a `data.frame` of all
match data for each input motif. This can be expanded out using
`tidyr::unnest(tomtom_results, "tomtom")`, and renested with `nest_tomtom()`.
The `best_match_` prefixed columns returned by `runTomTom()` indicate values for
the motif which was the best match to the input motif.

## Quick Examples
### Motif Discovery with DREME

```{r}
suppressPackageStartupMessages(library(magrittr))
suppressPackageStartupMessages(library(GenomicRanges))

# Example transcription factor peaks as GRanges
data("example_peaks", package = "memes")

# Genome object
dm.genome <- BSgenome.Dmelanogaster.UCSC.dm6::BSgenome.Dmelanogaster.UCSC.dm6
```
The `get_sequence` function takes a `GRanges` or `GRangesList` as input and
returns the sequences as a `BioStrings::XStringSet`, or list of `XStringSet`
objects, respectively. `get_sequence` will name each fasta entry by the genomic
coordinates each sequence is from.
```{r}
# Generate sequences from 200bp about the center of my peaks of interest
sequences <- example_peaks %>% 
  resize(200, "center") %>% 
  get_sequence(dm.genome)
```

`runDreme()` accepts XStringSet or a path to a fasta file as input. You can use
other sequences or shuffled input sequences as the control dataset.
```{r}
# runDreme accepts all arguments that the commandline version of dreme accepts
# here I set e = 50 to detect motifs in the limited example peak list
# In a real analysis, e should typically be < 1
dreme_results <- runDreme(sequences, control = "shuffle", e = 50)
```
memes is built around the
[universalmotif](https://www.bioconductor.org/packages/release/bioc/html/universalmotif.html)
package. The results are returned in `universalmotif_df` format, which is an R data.frame that can seamlessly interconvert between data.frame and `universalmotif` format using `to_list()` to convert to `universalmotif` list format, and `to_df()` to convert back to data.frame format. Using `to_list()` allows using `memes` results with all `universalmotif` functions like so:

```{r}
library(universalmotif)

dreme_results %>% 
  to_list() %>% 
  view_motifs()
```

### Matching motifs using TOMTOM
Discovered motifs can be matched to known TF motifs using `runTomTom()`, which can accept as input a path to a .meme formatted file, a `universalmotif` list, or the results of `runDreme()`.

TomTom uses a database of known motifs which can be passed to the `database` parameter as a path to a .meme format file, or a `universalmotif` object.

Optionally, you can set the environment variable `MEME_DB` in `.Renviron` to a file on disk, or
the `meme_db` value in `options` to a valid .meme format file and memes will
use that file as the database. memes will always prefer user input to the
function call over a global variable setting.

```{r}
options(meme_db = system.file("extdata/flyFactorSurvey_cleaned.meme", package = "memes"))
m <- create_motif("CMATTACN", altname = "testMotif")
tomtom_results <- runTomTom(m)
```
```{r}
tomtom_results
```

### Using runDreme results as TOMTOM input
`runTomTom()` will add its results as columns to a `runDreme()` results data.frame.
```{r}
full_results <- dreme_results %>% 
  runTomTom()
```


### Motif Enrichment using AME

AME is used to test for enrichment of known motifs in target sequences. `runAme()`
will use the `MEME_DB` entry in `.Renviron` or `options(meme_db =
"path/to/database.meme")` as the motif database. Alternately, it will accept all
valid inputs similar to `runTomTom()`.
```{r}
# here I set the evalue_report_threshold = 30 to detect motifs in the limited example sequences
# In a real analysis, evalue_report_threshold should be carefully selected
ame_results <- runAme(sequences, control = "shuffle", evalue_report_threshold = 30)
ame_results
```


## Visualizing Results

`view_tomtom_hits` allows comparing the input motifs to the top hits from
TomTom. Manual inspection of these matches is important, as sometimes the top
match is not always the correct assignment. Altering `top_n` allows you to show
additional matches in descending order of their rank.

```{r}
full_results %>% 
  view_tomtom_hits(top_n = 1)
```

It can be useful to view the results from `runAme()` as a heatmap. 
`plot_ame_heatmap()` can create complex visualizations for analysis of enrichment
between different region types (see vignettes for details). Here is a simple
example heatmap.
```{r, fig.height=3, fig.width=5}
ame_results %>% 
  plot_ame_heatmap()
```

# Scanning for motif occurances using FIMO

The FIMO tool is used to identify matches to known motifs. `runFimo` will return
these hits as a `GRanges` object containing the genomic coordinates of the motif
match.
```{r, fig.height=4, fig.width=3}
# Query MotifDb for a motif
e93_motif <- MotifDb::query(MotifDb::MotifDb, "Eip93F") %>% 
  universalmotif::convert_motifs()

# Scan for the E93 motif within given sequences
fimo_results <- runFimo(sequences, e93_motif, thresh = 1e-3)

# Visualize the sequences matching the E93 motif
plot_sequence_heatmap(fimo_results$matched_sequence)  
```

## Importing Data from previous runs

memes also supports importing results generated using the MEME suite outside of
R (for example, running jobs on [meme-suite.org](meme-suite.org), or running on
the commandline). This enables use of preexisting MEME suite results with
downstream memes functions.

| MEME Tool | Function Name       | File Type        |
|:---------:|:-------------------:|:----------------:|
| Streme    | `importStremeXML()` | streme.xml       |
| Dreme     | `importDremeXML()`  | dreme.xml        |
| TomTom    | `importTomTomXML()` | tomtom.xml       |
| AME       | `importAme()`       | ame.tsv*         |
| FIMO      | `importFimo()`      | fimo.tsv         | 
| Meme      | `importMeme()`      | meme.txt         | 

\* `importAME()` can also use the "sequences.tsv" output when AME used `method = "fisher"`, this is optional.


# FAQs
### How do I use memes/MEME on Windows?
The MEME Suite does not currently support Windows, although it can be
installed under [Cygwin](https://www.cygwin.com/) or the [Windows Linux Subsytem](https://docs.microsoft.com/en-us/windows/wsl/install-win10) (WSL).
Please note that if MEME is installed on Cygwin or WSL, you must also run R
inside Cygwin or WSL to use memes.

An alternative solution is to use [Docker](https://www.docker.com/get-started)
to run a virtual environment with the MEME Suite installed. We provide a [memes docker container](https://github.com/snystrom/memes_docker)  
that ships with the MEME Suite, R studio, and all `memes` dependencies
pre-installed. 

# Citation

memes is a wrapper for a select few tools from the MEME Suite, which were
developed by another group. In addition to citing memes, please cite the MEME
Suite tools corresponding to the tools you use.

If you use `runDreme()` in your analysis, please cite:

Timothy L. Bailey, "DREME: Motif discovery in transcription factor ChIP-seq data", Bioinformatics, 27(12):1653-1659, 2011. [full text](https://academic.oup.com/bioinformatics/article/27/12/1653/257754)

If you use `runTomTom()` in your analysis, please cite:

Shobhit Gupta, JA Stamatoyannopolous, Timothy Bailey and William Stafford Noble, "Quantifying similarity between motifs", Genome Biology, 8(2):R24, 2007. [full text](http://genomebiology.com/2007/8/2/R24)

If you use `runAme()` in your analysis, please cite:

Robert McLeay and Timothy L. Bailey, "Motif Enrichment Analysis: A unified framework and method evaluation", BMC Bioinformatics, 11:165, 2010, doi:10.1186/1471-2105-11-165. [full text](http://www.biomedcentral.com/1471-2105/11/165)

If you use `runFimo()` in your analysis, please cite:

Charles E. Grant, Timothy L. Bailey, and William Stafford Noble, "FIMO: Scanning for occurrences of a given motif", Bioinformatics, 27(7):1017-1018, 2011. [full text](http://bioinformatics.oxfordjournals.org/content/early/2011/02/16/bioinformatics.btr064.full)

## Licensing Restrictions
The MEME Suite is free for non-profit use, but for-profit users should purchase a
license. See the [MEME Suite Copyright Page](http://meme-suite.org/doc/copyright.html) for details.

Owner

  • Name: Spencer Nystrom
  • Login: snystrom
  • Kind: user
  • Company: Recast

Data science & genetics.

GitHub Events

Total
  • Issues event: 3
  • Watch event: 7
  • Issue comment event: 4
  • Push event: 3
  • Pull request event: 1
  • Pull request review event: 1
  • Fork event: 1
Last Year
  • Issues event: 3
  • Watch event: 7
  • Issue comment event: 4
  • Push event: 3
  • Pull request event: 1
  • Pull request review event: 1
  • Fork event: 1

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 544
  • Total Committers: 3
  • Avg Commits per committer: 181.333
  • Development Distribution Score (DDS): 0.425
Past Year
  • Commits: 3
  • Committers: 1
  • Avg Commits per committer: 3.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
snystrom s****m@a****u 313
snystrom s****m 225
Nitesh Turaga n****a@g****m 6
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 100
  • Total pull requests: 2
  • Average time to close issues: 3 months
  • Average time to close pull requests: 2 months
  • Total issue authors: 28
  • Total pull request authors: 2
  • Average comments per issue: 1.92
  • Average comments per pull request: 0.5
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 9
  • Pull requests: 1
  • Average time to close issues: 10 days
  • Average time to close pull requests: N/A
  • Issue authors: 8
  • Pull request authors: 1
  • Average comments per issue: 2.78
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • snystrom (70)
  • mniederhuber (2)
  • seb-mueller (2)
  • eladzis (2)
  • huang-sh (2)
  • SorenHeidelbach (1)
  • sshen82 (1)
  • mdozmorov (1)
  • ivangalvan (1)
  • danli349 (1)
  • whywhowhat (1)
  • sa95078 (1)
  • RaphMet (1)
  • dfernandezperez (1)
  • sugiYag (1)
Pull Request Authors
  • mniederhuber (1)
  • karawoo (1)
  • seb-mueller (1)
Top Labels
Issue Labels
enhancement (17) bug (12) documentation (4) experimental / feedback needed (1) wontfix (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • bioconductor 17,832 total
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 5
  • Total maintainers: 1
bioconductor.org: memes

motif matching, comparison, and de novo discovery using the MEME Suite

  • Versions: 5
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 17,832 Total
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Stargazers count: 4.7%
Forks count: 14.4%
Average: 17.3%
Downloads: 67.6%
Maintainers (1)
Last synced: 7 months ago

Dependencies

DESCRIPTION cran
  • R >= 4.1 depends
  • Biostrings * imports
  • GenomicRanges * imports
  • cmdfun >= 1.0.2 imports
  • dplyr * imports
  • ggplot2 * imports
  • ggseqlogo * imports
  • magrittr * imports
  • matrixStats * imports
  • methods * imports
  • patchwork * imports
  • processx * imports
  • purrr * imports
  • readr * imports
  • rlang * imports
  • stats * imports
  • tibble * imports
  • tidyr * imports
  • tools * imports
  • universalmotif >= 1.9.3 imports
  • usethis * imports
  • utils * imports
  • xml2 * imports
  • BSgenome.Dmelanogaster.UCSC.dm3 * suggests
  • BSgenome.Dmelanogaster.UCSC.dm6 * suggests
  • MotifDb * suggests
  • PMCMRplus * suggests
  • covr * suggests
  • cowplot * suggests
  • forcats * suggests
  • knitr * suggests
  • pheatmap * suggests
  • plyranges >= 1.9.1 suggests
  • rmarkdown * suggests
  • testthat >= 2.1.0 suggests