deg-pipeline-assistant

This pipeline takes normalized RNA-seq data and outputs differentially expressed genes and pathways.

https://github.com/shaan7071/deg-pipeline-assistant

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.2%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

This pipeline takes normalized RNA-seq data and outputs differentially expressed genes and pathways.

Basic Info
  • Host: GitHub
  • Owner: Shaan7071
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 204 KB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 11 months ago · Last pushed 10 months ago
Metadata Files
Readme License Citation

README.md

DEG Pipeline Assistant

A comprehensive tool for analyzing RNA-seq data, identifying differentially expressed genes (DEGs), and performing pathway enrichment analysis.

Overview

This pipeline takes normalized RNA-seq data and performs a complete analysis workflow, generating various visualizations and identifying biologically significant patterns. The tool features both a command-line interface and a user-friendly Streamlit web application, making it accessible to both bioinformaticians and researchers with limited programming experience.

Features

  • Data Processing: Creates standardized metadata from normalized counts
  • Quality Control: Generates boxplots, correlation heatmaps, and PCA plots
  • Differential Expression Analysis: Processes DESeq2 files to identify DEGs
  • Visualization: Creates MA plots, volcano plots, and DEG heatmaps
  • Ortholog Mapping: Maps genes to human orthologs for cross-species analysis
  • Enrichment Analysis: Performs GO and KEGG pathway enrichment analysis
  • Interactive UI: Streamlit-based web interface for easy parameter configuration
  • AI Assistant: Integrated AI functionality to help with parameter selection

Installation

Prerequisites

  • Python 3.13+
  • Required Python packages (install via pip): pandas numpy matplotlib seaborn scipy statsmodels scikit-learn streamlit click requests gseapy mygene openai

Setup

  1. Clone the repository bash git clone https://github.com/yourusername/rnaseq-pipeline-assistant.git
  2. Navigate to the project directory bash cd rnaseq-pipeline-assistant
  3. Install dependencies bash pip install -r requirements.txt
  4. Set up OpenAI API key (for AI assistant functionality) bash export OPENAI_API_KEY="your-api-key"

Usage

Command Line Interface

The pipeline can be run using the CLI with two main commands:

  1. Setup: Initialize the directory structure bash python pipeline_CLI.py --norm-file "path/to/normalized_data.csv" --pw-data "path/to/pairwise_data" --base-dir "output_directory" --num-replicates 3 --conditions "Control" --conditions "Treatment1" --conditions "Treatment2" setup

  2. Run Analysis: Execute the full analysis pipeline bash python pipeline_CLI.py --norm-file "path/to/normalized_data.csv" --pw-data "path/to/pairwise_data" --base-dir "output_directory" --num-replicates 3 --conditions "Control" --conditions "Treatment1" --conditions "Treatment2" run-all --model-organism "drerio" --pw-interest "Treatment1_vs_Control" --pw-interest "Treatment2_vs_Control" --log2fc-threshold 1.0 --padj-threshold 0.05 --enrich-sig-cutoff 0.05

Web Interface

For a more user-friendly experience, run the Streamlit app:

bash streamlit run app.py

This will open a web interface where you can: - Input all parameters through form fields - Generate and review pipeline commands - Execute commands directly from the interface - View real-time command output

Alternatively, connect this repository to the Streamlit cloud at https://streamlit.io/ to handle data that requires more memory.

Parameters

Setup Parameters

  • norm-file: Path to the normalized data file (CSV format)
  • pw-data: Directory containing pairwise comparison files
  • base-dir: Base output directory for results
  • conditions: Experimental conditions (specify multiple with repeated flags)
  • num-replicates: Number of replicates per condition

Analysis Parameters

  • model-organism: Model organism code (e.g., "drerio" for zebrafish)
  • pw-interest: Pairwise comparisons of interest (e.g., "TreatmentvsControl")
  • log2fc-threshold: Log2 fold change threshold for DEG identification (default: 1.0)
  • padj-threshold: Adjusted p-value threshold (default: 0.05)
  • enrich-sig-cutoff: Significance cutoff for enrichment analysis (default: 0.05)

Optional Flags

Skip specific analysis steps with these flags: - --skip-boxplots: Skip boxplot generation - --skip-correlation-heatmap: Skip correlation heatmap - --skip-pca: Skip PCA analysis - --skip-ma-plots: Skip MA plot generation - --skip-volcano-plots: Skip volcano plot generation - --skip-heatmap: Skip DEG heatmap - --skip-go: Skip GO enrichment analysis - --skip-kegg: Skip KEGG enrichment analysis

Output Structure

The pipeline creates a standardized directory structure:

base_dir/ ├── data/ │ ├── DESeq2/ │ ├── ortholog_mapping/ │ └── GO_and_KEGG_enrichments/ ├── plots/ │ ├── boxplots/ │ ├── correlation_heatmap/ │ ├── PCA/ │ ├── MA_plots/ │ ├── volcano_plots/ │ ├── DEG_heatmap/ │ ├── GO_enrichment/ │ └── KEGG_enrichment/ └── results/ ├── DEGs/ ├── GO_enrichment/ └── KEGG_enrichment/

File Descriptions

  • app.py: Streamlit web application interface that
  • ai_assistant.py: AI functionality that collects parameters from the user in natural language and transforms them into valid commands
  • pipeline_CLI.py: Command-line interface for the pipeline
  • DEGpipeline.py: Core pipeline functionality and analysis modules

Example Workflow

  1. Prepare normalized RNA-seq data in CSV format
  2. Prepare pairwise comparison files
  3. Run the setup command to create directory structure
  4. Run the analysis command with appropriate parameters
  5. Examine the generated plots and results files
  6. Interpret biological significance of DEGs and enriched pathways

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

For questions or support, please contact [i.banwait7@gmail.com].

Acknowledgments

  • This pipeline uses several open-source libraries including pandas, matplotlib, seaborn, and gseapy
  • Ortholog mapping is performed using the g:Profiler API
  • Pathway enrichment analysis uses the Enrichr API through gseapy

Owner

  • Name: Ishaan Banwait
  • Login: Shaan7071
  • Kind: user

Bioinformatics student aiming to have an impact in the global healthcare industry.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: "Banwait"
    given-names: "Ishaan"
    orcid: "https://orcid.org/0009-0002-1431-4599"
title: "DEG Pipeline Assistant"
version: 1.0.0
date-released: 2025-04-17
url: "https://github.com/Shaan7071/DEG-pipeline-assistant"
repository-code: "https://github.com/yourusername/rnaseq-pipeline-assistant"
abstract: "A comprehensive tool for analyzing RNA-seq data, identifying differentially expressed genes (DEGs), and performing pathway enrichment analysis."
keywords:
  - RNA-seq
  - bioinformatics
  - differential expression
  - pathway analysis
  - transcriptomics
license: MIT

GitHub Events

Total
  • Watch event: 1
  • Delete event: 2
  • Push event: 27
  • Create event: 2
Last Year
  • Watch event: 1
  • Delete event: 2
  • Push event: 27
  • Create event: 2

Dependencies

requirements.txt pypi
  • click >=8.1.8
  • gseapy >=1.1.7
  • matplotlib >=3.10.1
  • matplotlib-inline >=0.1.7
  • mygene >=3.2.2
  • numpy >=2.2.3
  • openai >=1.70.0
  • pandas >=2.2.3
  • pydeseq2 >=0.5.0
  • requests >=2.32.3
  • scikit-learn >=1.6.1
  • scipy >=1.15.2
  • seaborn >=0.13.2
  • statsmodels >=0.14.4
  • streamlit >=1.44.1