https://github.com/const-ae/prodd_old
Differential Detection for Label-free (LFQ) Mass Spec Data
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.7%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
Differential Detection for Label-free (LFQ) Mass Spec Data
Basic Info
Statistics
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Created over 8 years ago
· Last pushed over 8 years ago
Metadata Files
Readme
README.Rmd
---
title: "proDD"
output: github_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
cache = TRUE,
fig.path = "tools/README-fig/",
cache.path = "tools/README-cache/",
message = FALSE,
warning = FALSE
)
```
Differential Detection with Label-free Mass Spec Data
## Overview
This package provides a framework to find proteins in mass spec data that are differentially detected between groups.
It is designed to deal with high number of missing values (i.e. zeros) and can nonetheless give reliable significance
estimates.
It is thus applicable to data from affinity purification experiments such as BioID.
## Method
The algorithm is build around the fact that a missing values are more likely to occur if the intensity of protein
is low, which means that a missing observation can tell us something. In the first step the algorithm quantifies this dependency by
estimating a logistic regression of the chance to miss a value depending on the underlying intensity. This model
is fitted using Hamiltonian Monte Carlo method, because precise estimates of the sigmoid are necessary for reliable
downstream calculations. In the second step the group means for each condition and protein are estimated using a
maximum likelihood approach. To find which groups are actually significantly expressed in the last step a moderated
t-test is applied to each protein.
Unlike other approaches that have been suggested in the literature that rely on imputing missing values using _ad hoc_
methods, such as just using half the global minimum, proDD exploits the information provided by the zeros in a
structured way and focuses on the MLE of the group means, which are sufficient to establish significance.
## Workflow
Installation
```{r}
# Install directly from github
devtools::github("const-ae/proDD")
```
Let's assume that `X` is a matrix where each row contains the intensity for one protein and each column is one
sample, which can be grouped into conditions.
```{r, echo=FALSE}
library(proDD)
source("tests/testthat/helper_datageneration.R")
data <- generate_zero_inflated_data_with_effect(N_genes=100, N_rep=3, perc_changed = 0, mu0=8.5, nu0=5, sigma0=0.4, location=8, scale=-0.3)
X <- cbind(data$X, data$Y)
colnames(X) <- c(paste0("A_", 1:3), paste0("B_", 1:3))
X <- X[rowSums(X) != 0, ]
```
```{r}
library(proDD)
head(X, n=10)
```
```{r, echo=FALSE}
ComplexHeatmap::Heatmap((X != 0)*1.0, cluster_rows=FALSE, cluster_columns= FALSE,
col=c("black", "lightgrey"), name="Value Observed")
```
For subsequent steps a description of the samples is necessary, i.e. which sample belongs to which condition.
For this we will create a dataframe containing that information:
```{r}
data_description <- data.frame(Condition=as.factor(c(rep("A", 3), rep("B", 3))),
Replicate=c(1:3, 1:3))
data_description$Sample <- paste0(data_description$Condition, data_description$Replicate)
data_description
design <- model.matrix(Sample ~ Condition - 1, data_description)
design
```
Now we can apply the algorithm that consists of three steps to that data
1. Estimate the parameters for the variance moderation:
```{r}
vm_est <- estimate_variance_moderation(X, design)
```
2. Estimate the sigmoid that describes the chance to miss an observation:
```{r}
sig_est <- estimate_sigmoid(X, data_description, vm_est$nu_est, vm_est$sigma2_est, chains=1)
```
3. Estimate the means of each condition per protein
```{r}
group_locations <- estimate_group_means(X, design, vm_est$nu_est, vm_est$sigma2_est, sig_est$location_est, sig_est$scale_est)
```
4. Lastly, apply the moderated t-test to the group means to find differentially detected proteins:
```{r}
result <- detect_differences(X, design, data_description, d0=vm_est$nu_est, s0=vm_est$sigma2_est,
group_locations=group_locations, comparison=c("A", "B"))
head(result, n=10)
```
## Note
This project is still work in progress and although the algorithm is working well, the API will probably change dramatically.
Owner
- Name: Constantin
- Login: const-ae
- Kind: user
- Location: Heidelberg, Germany
- Company: EMBL
- Website: https://twitter.com/const_ae
- Repositories: 64
- Profile: https://github.com/const-ae
PhD Student, Biostats, R
GitHub Events
Total
Last Year
Committers
Last synced: about 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| const-ae | a****5@g****m | 21 |
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
DESCRIPTION
cran
- R >= 3.0.2 depends
- Rcpp >= 0.12.11 depends
- methods * depends
- limma * imports
- maxLik * imports
- rstan >= 2.16.2 imports
- rstantools >= 1.2.0 imports
- testthat * suggests