SeedMatchR

Off-target analysis of siRNA knock down paired with RNAseq

https://github.com/tacazares/seedmatchr

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
✓
Academic publication links
Links to: ncbi.nlm.nih.gov
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (17.1%) to scientific vocabulary

Keywords

deseq2-analysis mirna rna-seq sirna transcriptomics

Last synced: 6 months ago · JSON representation

Repository

Off-target analysis of siRNA knock down paired with RNAseq

Basic Info

Host: GitHub
Owner: tacazares
License: other
Language: R
Default Branch: main
Homepage: https://tacazares.github.io/SeedMatchR/
Size: 10.6 MB

Statistics

Stars: 7
Watchers: 1
Forks: 3
Open Issues: 1
Releases: 2

Topics

deseq2-analysis mirna rna-seq sirna transcriptomics

Created over 2 years ago · Last pushed 7 months ago

Metadata Files

Readme License

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "50%"
)
```

# SeedMatchR version 2.0.0

The goal of SeedMatchR is to help users identify potential seed-mediated effects in their RNA-seq data.

These changes in this forked repository is to add the biological target bulges and wobbles in the search space.

## Installation

This version of SeedMatchR requires R ≥ 4.3.0, but it is recommended to use the latest version of R to avoid issues with annotation retrieval for newer genomes.

You can install the development version of SeedMatchR from [GitHub](https://github.com/) or the stable build from CRAN.

```{r eval = FALSE}
# Install from GitHub
install.packages("devtools")

# Public Repository
devtools::install_github("tacazares/SeedMatchR")
```

## Quick start examples with public siRNA data

```{r include = FALSE}
# Import library
library(SeedMatchR)
```

This example uses the siRNA sequence, D1, targeting the Ttr gene in rat liver from the publication:

> Schlegel MK, Janas MM, Jiang Y, Barry JD, Davis W, Agarwal S, Berman D, Brown CR, Castoreno A, LeBlanc S, Liebow A, Mayo T, Milstein S, Nguyen T, Shulga-Morskaya S, Hyde S, Schofield S, Szeto J, Woods LB, Yilmaz VO, Manoharan M, Egli M, Charissé K, Sepp-Lorenzino L, Haslett P, Fitzgerald K, Jadhav V, Maier MA. From bench to bedside: Improving the clinical safety of GalNAc-siRNA conjugates using seed-pairing destabilization. Nucleic Acids Res. 2022 Jul 8;50(12):6656-6670. doi: 10.1093/nar/gkac539. PMID: 35736224; PMCID: PMC9262600.

The guide sequence of interest is 23 bp long and oriented 5' -\> 3'.

```{r}
# siRNA sequence of interest targeting a 23 bp region of the Ttr gene
guide.seq = "UUAUAGAGCAAGAACACUGUUUU"
```

### Load rat specific annotation data.

We use `AnnotationHub` to derive the `GTF` and `DNA` sequence files for the species of interest. Once you have derived the annotations, you could save them as an Rdata object to increase the speed of loading the data sets. Running this function will take several minutes. Therefore it might be helpful to save the objects and reload them later if you plan to use this code in a repeated workflow. 

#### Load annotation databases

```{r}
annodb = load_annotations(reference.name = "rnor6", canonical = FALSE, min.feature.width = 8, longest.utr = T)
```

## Example 1: Perform a comprehensive transcriptome search

The most straightforward way of using SeedMatchR is to search a reference set of transcripts given an input sequence. 

### Output match ranges as granges

```{r}
res.df = SeedMatchR(seqs = annodb$seqs, 
                 sequence = guide.seq, 
                 seed.name = "mer7m8", 
                 res.format = "granges")
```

### Output match ranges for many different types of views of the siRNA

```{r}
res.df = full_search(guide.seq, annodb$seqs, group.name = "Ttr")
```

## Example 2: Analyze RNA-seq data with SeedMatchR

### Prepare DESEQ2 Results

The test data that is provided with `SeedMatchR` was derived from the 2022 publication by Schlegel et al. The data set represents a DESeq2 analysis performed on rat liver that had been treated with Ttr targeting siRNA. We will use this example to explore seed mediated activity. 

Notes:
>The `SeedMatchR` function will look for specific column in the input if using the `res` argument to map seed matches to differential expression data. The input must contain the columns `gene_id`, `log2FoldChange`, and `padj`.

#### Download data (only need to perform once, can skip to loading if done)

We start by downloading the example data set. This function will download three files from the GEO accession [GSE184929](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE184929). These files represent three samples with different siRNA treatments at two dosages. 

```{r echo = T, results = 'hide', message=FALSE, warning=FALSE, error=FALSE}
get_example_data("sirna")
```

#### Load example data

We can load the example data into the environment.

```{r}
sirna.data = load_example_data("sirna")
```

The DESeq2 results are available through the names `Schlegel_2022_Ttr_D1_30mkg`, `Schlegel_2022_Ttr_D4_30mkg` and `Schlegel_2022_Ttr_D1_10mkg`. The data set name is long, so it will be renamed to `res`.

```{r}
res <- sirna.data$Schlegel_2022_Ttr_D1_30mkg
```

#### Filter example results

The DESeq2 results file is then filtered. The function `filter_res()` can be used to filter a results file by log2FoldChange, padj, baseMean, and remove NA entries.

```{r}
# Dimensions before filtering
dim(res) # [1] 32883    8

# Filter DESeq2 results for SeedMatchR
res = filter_res(res, fdr_cutoff=1, fc_cutoff=0)

# Dimensions after filtering
dim(res) # [1] 13582     8
```

### Counting seed matches in transcripts

You can perform a seed match for a single seed using the `SeedMatchR()` function. 

Notes:

> The names of the sequences in `seqs` will determine if you need to use the `tx.id.col` argument. If you sequence names are gene IDs, then no additional flags need to be set. If they sequence names are transcripts, then the argument `tx.id.col` should be set to `TRUE`. This will summarize the transcript matches to the gene level using information in the gtf file. 

```{r}
res = SeedMatchR(res = res, 
                 seqs = annodb$seqs, 
                 sequence = guide.seq, 
                 seed.name = "mer7m8")

head(res, 2)
```

### Comparing the expression profiles of seed targets to background

Many factors that perturb gene expression, like miRNA, show cumulative changes in their targets gene expression. Cumulative changes in the profile of genes expression can be visualized and tested with the emperical distribution function (ecdf) coupled with a statistical test such as the Kolmogorov-Smirnov test.

`SeedMatchR` provides functions for comparing the log2(Fold Change) of two gene sets. The function `deseq_fc_ecdf` is designed to work directly with a DESeq2 results data frame.

Required Inputs:

-   `res`: DESeq2 results data frame
-   `gene.lists`: A list of lists containing gene names

```{r fig.height=5, fig.width=5, out.retina=1}
# Gene set 1 
mer7m8.list = res$gene_id[res$mer7m8 >= 1]

# Gene set 2
background.list = res$gene_id[res$mer7m8 == 0]

ecdf.results = deseq_fc_ecdf(res, 
                             list("Background" = background.list, 
                                  "mer7m8" = mer7m8.list))

ecdf.results$plot
```

Owner

Name: Tareian Cazares
Login: tacazares
Kind: user

Website: tareian.com
Repositories: 2
Profile: https://github.com/tacazares

Post Doctoral Scientist @EliLillyCo. My research interests include immunology, genomics, computational biology, and systems biology.

GitHub Events

Total

Watch event: 1
Push event: 3
Fork event: 1

Last Year

Watch event: 1
Push event: 3
Fork event: 1

Packages

Total packages: 1
Total downloads:
- cran 118 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 2
Total maintainers: 1

cran.r-project.org: SeedMatchR

Find Matches to Canonical SiRNA Seeds in Genomic Features

Homepage: https://tacazares.github.io/SeedMatchR/
Documentation: http://cran.r-project.org/web/packages/SeedMatchR/SeedMatchR.pdf
License: MIT + file LICENSE
Status: removed
Latest release: 1.1.1
published over 2 years ago

Versions: 2
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 118 Last month

Rankings

Forks count: 28.7%

Dependent packages count: 29.2%

Stargazers count: 31.7%

Dependent repos count: 34.9%

Average: 42.8%

Downloads: 89.6%

Maintainers (1)

tareian.cazares@lilly.com

Last synced: 7 months ago

Dependencies

.github/workflows/R-CMD-check.yaml actions

actions/checkout v3 composite
r-lib/actions/check-r-package v2 composite
r-lib/actions/setup-pandoc v2 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite

.github/workflows/pkgdown.yaml actions

JamesIves/github-pages-deploy-action v4.4.1 composite
actions/checkout v3 composite
r-lib/actions/setup-pandoc v2 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite

DESCRIPTION cran

R >= 4.1.0 depends
AnnotationHub * imports
Biostrings * imports
GenomeInfoDb * imports
GenomicFeatures * imports
cowplot * imports
dplyr * imports
ggmsa * imports
ggplot2 * imports
lifecycle * imports
msa * imports
stats * imports
stringr * imports
testit * imports
utils * imports
knitr * suggests
org.Rn.eg.db * suggests
rmarkdown * suggests
testthat >= 3.0.0 suggests

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science