https://github.com/arunbodd/serum-proteomics

This repository Contains analysis pipeline for processing and analyzing mass spectrometry proteomics data

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.6%) to scientific vocabulary

Keywords

differential-expression mass-spectrometry

Last synced: 6 months ago · JSON representation

Repository

This repository Contains analysis pipeline for processing and analyzing mass spectrometry proteomics data

Basic Info

Host: GitHub
Owner: arunbodd
License: mit
Language: HTML
Default Branch: main
Homepage:
Size: 1.08 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Topics

differential-expression mass-spectrometry

Created about 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme License

Proteomics Analysis Pipeline

Overview

This repository contains an analysis pipeline for processing and analyzing proteomics data. The workflow includes data preprocessing, normalization, PCA analysis, outlier detection, differential expression analysis, and visualization of significant results. The analysis is tailored for comparing different groups, such as controls and patients at various time points.

Features

Data Preprocessing:
- Handles missing values.
- Converts raw proteomics counts to integers for downstream analysis.
- Ensures alignment between metadata and dataset.
Normalization:
- Implements Total Ion Current (TIC) normalization.
- Filters out proteins with high proportions of zero values.
- Log2 normalization for stabilization.
Principal Component Analysis (PCA):
- Performs PCA for both pre- and post-outlier removal datasets.
- Generates interactive 3D PCA plots.
Outlier Detection and Removal:
- Identifies and removes outlier samples to improve data quality.
Differential Expression Analysis:
- Utilizes limma and voom for linear modeling and statistical testing.
- Defines contrasts for multiple group comparisons.
Visualization:
- Generates PCA plots, volcano plots, boxplots, and heatmaps.
- Creates Venn diagrams to compare significant results across groups.
Output Files:
- Exports significant results, unique proteins, and differential expression results.

Data Requirements

Proteomics Data:
- CSV file containing raw proteomics counts. Each row represents a protein, and each column represents a sample.
Metadata:
- CSV file containing sample annotations, such as group labels and time points.

Workflow

1. Input Data Preparation

Input Files:
- ../Raw_Counts_serum_proteomics.csv: Raw proteomics data.
- ../Metadata_TimePoints.csv: Metadata for the samples.
Preprocess the raw counts by handling missing values, reshaping, and aligning metadata with the dataset.

2. Normalization

Apply Total Ion Current (TIC) normalization to scale sample counts.
Filter out proteins with more than 80% zero values.
Apply log2 normalization for downstream analysis.

3. Outlier Detection and Removal

Identify outlier samples using PCA and remove them.
Reapply normalization steps to the filtered dataset.

4. Differential Expression Analysis

Define contrasts for group comparisons:
- Control vs Patient groups at different time points.
Use voom and limma to identify differentially expressed proteins.
Extract results for each contrast, including p-values, adjusted p-values, and log fold changes.

5. Visualization

PCA Plots:
- Generate pre- and post-normalization PCA plots.
- Create interactive 3D PCA plots using plotly.
Volcano Plots:
- Highlight significant up- and down-regulated proteins.
Heatmaps:
- Visualize significant protein expression across groups.
Boxplots:
- Show expression levels of significant proteins.
Venn Diagrams:
- Compare significant results across multiple contrasts.

6. Output Files

Differentially expressed proteins for each comparison:
- ../Results_Files/Differential_Proteins_<Comparison>.csv
Unique proteins for each comparison:
- ../Results_Files/Unique_Proteins_<Comparison>.csv
Venn diagrams and other plots:
- Saved as PDFs in the ../Plots/ directory.

Repository Structure

. ├── Scripts │ └── analysis_pipeline.R # Main analysis script ├── Data │ ├── Raw_Counts_serum_proteomics.csv │ └── Metadata_TimePoints.csv ├── Results_Files │ ├── Differential_Proteins_<Comparison>.csv │ └── Unique_Proteins_<Comparison>.csv ├── Plots │ ├── PCA_Plots.pdf │ ├── Volcano_Plots.pdf │ └── Venn_TopGenes_ggvenn.pdf └── README.md

Dependencies

The pipeline requires the following R libraries: - readxl - dplyr - limma - ggplot2 - tidyr - pheatmap - RColorBrewer - bit64 - ggrepel - plotly - data.table - htmlwidgets - tidyverse - edgeR - ggvenn - ComplexHeatmap - circlize

Install dependencies using: R install.packages(c("readxl", "dplyr", "ggplot2", "tidyr", "pheatmap", "RColorBrewer", "bit64", "ggrepel", "plotly", "data.table", "htmlwidgets", "tidyverse", "edgeR")) BiocManager::install(c("limma", "ComplexHeatmap", "circlize", "ggvenn"))

Usage

Clone the repository: bash git clone <repository_url>
Set the working directory to the location of the scripts and data files.
Run the analysis pipeline in R: R source("RScripts/DEP_Analysis.R")
Check the Results_Files and Plots directories for outputs.

Outputs

PCA Plots: Visualizations of variance among samples.
Volcano Plots: Highlighting significant proteins with fold changes and p-values.
Heatmaps: Clustered visualizations of significant protein expression.
Differential Expression Results: CSV files containing p-values, adjusted p-values, and log fold changes.
Venn Diagrams: Overlap of significant proteins across comparisons.

Citation

If you use this pipeline in your research, please cite this repository and relevant R libraries.

Contact

For questions or issues, please contact: - Name: Arun Boddapati - Email: arunbodd (at) outlook.com

Owner

Name: Arun boddapati
Login: arunbodd
Kind: user
Location: Reston
Company: Leidos

Repositories: 30
Profile: https://github.com/arunbodd

GitHub Events

Total

Push event: 1
Create event: 2

Last Year

Push event: 1
Create event: 2

Committers

Last synced: 9 months ago

All Time

Total Commits: 3
Total Committers: 1
Avg Commits per committer: 3.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 3
Committers: 1
Avg Commits per committer: 3.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
arunbodd	a**d@o**m	3

Issues and Pull Requests

Last synced: 9 months ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

https://github.com/arunbodd/serum-proteomics

Science Score: 26.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Proteomics Analysis Pipeline

Overview

Features

Data Requirements

Workflow

1. Input Data Preparation

2. Normalization

3. Outlier Detection and Removal

4. Differential Expression Analysis

5. Visualization

6. Output Files

Repository Structure

Dependencies

Usage

Outputs

Citation

Contact

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels