https://github.com/const-ae/prodd_old

Differential Detection for Label-free (LFQ) Mass Spec Data

https://github.com/const-ae/prodd_old

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.7%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Differential Detection for Label-free (LFQ) Mass Spec Data

Basic Info
  • Host: GitHub
  • Owner: const-ae
  • Language: C++
  • Default Branch: master
  • Homepage:
  • Size: 7.07 MB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 8 years ago · Last pushed over 8 years ago
Metadata Files
Readme

README.Rmd

---
title: "proDD"
output: github_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  cache = TRUE,
  fig.path = "tools/README-fig/",
  cache.path = "tools/README-cache/",
  message = FALSE,
  warning = FALSE
)
```

Differential Detection with Label-free Mass Spec Data

## Overview

This package provides a framework to find proteins in mass spec data that are differentially detected between groups.
It is designed to deal with high number of missing values (i.e. zeros) and can nonetheless give reliable significance
estimates.

It is thus applicable to data from affinity purification experiments such as BioID.

## Method

The algorithm is build around the fact that a missing values are more likely to occur if the intensity of protein
is low, which means that a missing observation can tell us something. In the first step the algorithm quantifies this dependency by
estimating a logistic regression of the chance to miss a value depending on the underlying intensity. This model
is fitted using Hamiltonian Monte Carlo method, because precise estimates of the sigmoid are necessary for reliable
downstream calculations. In the second step the group means for each condition and protein are estimated using a
maximum likelihood approach. To find which groups are actually significantly expressed in the last step a moderated
t-test is applied to each protein.

Unlike other approaches that have been suggested in the literature that rely on imputing missing values using _ad hoc_
methods, such as just using half the global minimum, proDD exploits the information provided by the zeros in a 
structured way and focuses on the MLE of the group means, which are sufficient to establish significance.

## Workflow

Installation

```{r}
# Install directly from github
devtools::github("const-ae/proDD")
```


Let's assume that `X` is a matrix where each row contains the intensity for one protein and each column is one 
sample, which can be grouped into conditions.

```{r, echo=FALSE}
library(proDD)
source("tests/testthat/helper_datageneration.R")
data <- generate_zero_inflated_data_with_effect(N_genes=100, N_rep=3, perc_changed = 0, mu0=8.5, nu0=5, sigma0=0.4, location=8, scale=-0.3)
X <- cbind(data$X, data$Y)
colnames(X) <- c(paste0("A_", 1:3), paste0("B_", 1:3))
X <- X[rowSums(X) != 0, ]
```

```{r}
library(proDD)
head(X, n=10)
```

```{r, echo=FALSE}
ComplexHeatmap::Heatmap((X != 0)*1.0, cluster_rows=FALSE, cluster_columns= FALSE,
                        col=c("black", "lightgrey"), name="Value Observed")
```


For subsequent steps a description of the samples is necessary, i.e. which sample belongs to which condition.
For this we will create a dataframe containing that information:

```{r}
data_description <-  data.frame(Condition=as.factor(c(rep("A", 3), rep("B", 3))), 
                                Replicate=c(1:3, 1:3))
data_description$Sample <- paste0(data_description$Condition, data_description$Replicate)
data_description

design <-  model.matrix(Sample ~ Condition - 1, data_description)
design
```

Now we can apply the algorithm that consists of three steps to that data

1. Estimate the parameters for the variance moderation:

    ```{r}
    vm_est <- estimate_variance_moderation(X, design)
    ```

2. Estimate the sigmoid that describes the chance to miss an observation:

    ```{r}
    sig_est <- estimate_sigmoid(X, data_description, vm_est$nu_est, vm_est$sigma2_est, chains=1)
    ```

3. Estimate the means of each condition per protein

    ```{r}
    group_locations <- estimate_group_means(X, design, vm_est$nu_est, vm_est$sigma2_est, sig_est$location_est, sig_est$scale_est)
    ```

4. Lastly, apply the moderated t-test to the group means to find differentially detected proteins:

    ```{r}
    result <- detect_differences(X, design, data_description, d0=vm_est$nu_est, s0=vm_est$sigma2_est,
                                 group_locations=group_locations, comparison=c("A", "B"))
    
    head(result, n=10)
    ```



## Note

This project is still work in progress and although the algorithm is working well, the API will probably change dramatically.







Owner

  • Name: Constantin
  • Login: const-ae
  • Kind: user
  • Location: Heidelberg, Germany
  • Company: EMBL

PhD Student, Biostats, R

GitHub Events

Total
Last Year

Committers

Last synced: about 1 year ago

All Time
  • Total Commits: 21
  • Total Committers: 1
  • Avg Commits per committer: 21.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
const-ae a****5@g****m 21

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

DESCRIPTION cran
  • R >= 3.0.2 depends
  • Rcpp >= 0.12.11 depends
  • methods * depends
  • limma * imports
  • maxLik * imports
  • rstan >= 2.16.2 imports
  • rstantools >= 1.2.0 imports
  • testthat * suggests