cd20-binder-design

A computational pipeline for designing protein binders targeting CD20

https://github.com/klundquist/cd20-binder-design

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 14 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.4%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

A computational pipeline for designing protein binders targeting CD20

Basic Info
  • Host: GitHub
  • Owner: klundquist
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 84 KB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created 11 months ago · Last pushed 11 months ago
Metadata Files
Readme License Citation

README.md

CD20 Protein Binder Design Pipeline

A computational pipeline for designing novel protein binders targeting CD20 using deep learning and molecular simulation approaches. Developed for the BioML Challenge 2024: Bits to Binders competition.

Overview

This project implements a computational approach for designing protein binders that specifically target CD20, a membrane protein expressed on B cells and a key target for various immunotherapies. The pipeline integrates several state-of-the-art computational tools:

  • RFDiffusion for de novo protein structure generation
  • ProteinMPNN for deep learning-based protein sequence design
  • AlphaFold2 for protein structure prediction
  • Rosetta for energy calculations and structural refinement
  • OpenMM for molecular dynamics simulations
  • Prodigy for binding affinity prediction

Competition Context

This pipeline was developed for the BioML Challenge 2024: Bits to Binders competition organized by the University of Texas at Austin BioML Society. The competition required teams to:

  • Design the antigen binding domain of a Chimeric Antigen Receptor (CAR) targeting CD20
  • Adhere to an 80 amino acid length constraint (due to DNA synthesis limitations)
  • Create designs that would activate CAR-T cell killing and proliferation responses
  • Submit sequences for experimental testing by LEAH Laboratories

Our designs are currently in the testing stage with results expected in early 2025.

Project Structure

cd20-binder-design/ ├── data/ │ ├── input/ # Input files for the pipeline │ ├── output/ # Output files from each stage │ ├── pdb/ # PDB files including target CD20 structure │ └── results/ # Final results and rankings ├── src/ │ ├── analysis/ # Scripts for analyzing and filtering designs │ ├── proteinmpnn_af2/ # ProteinMPNN and AlphaFold2 integration │ ├── rfdiffusion/ # RFDiffusion setup and configuration │ ├── scripts/ # Shell scripts for pipeline execution │ └── main.py # Main orchestration script ├── notebooks/ │ └── view_pdbs.ipynb # Jupyter notebook for visualizing PDB structures ├── Dockerfile # Main Dockerfile for the project ├── CITATION.cff # Citation information ├── LICENSE # Project license └── README.md # This file

Installation

Prerequisites

  • Docker (for containerized execution)
  • Python 3.8 or higher
  • CUDA-capable GPU (recommended for AlphaFold2 and RFDiffusion)

Setup

  1. Clone this repository: bash git clone https://github.com/klundquist/cd20-binder-design.git cd cd20-binder-design

  2. Set up the environment: ```bash

    Install RFDiffusion

    src/scripts/setup_rfdiffusion.sh

# Set up ProteinMPNN and AlphaFold2 src/proteinmpnn_af2/setup.sh ```

Usage

Full Pipeline

Run the complete pipeline with:

bash python src/main.py

This will execute all steps in sequence: 1. Generate initial binder structures with RFDiffusion 2. Design sequences with ProteinMPNN and Rosetta FastRelax 3. Predict structures with AlphaFold2 4. Filter designs based on structural criteria 5. Run MD simulations and analyze binding energies 6. Rank and select top designs

Selective Execution

You can skip specific steps using command-line flags:

```bash

Skip RFDiffusion step (if you already have initial structures)

python src/main.py --skip-rfdiffusion

Skip ProteinMPNN and AlphaFold2 (if you already have designed sequences and predicted structures)

python src/main.py --skip-proteinmpnn-af2

Skip analysis (if you just want to generate designs)

python src/main.py --skip-analysis ```

Visualization

Use the provided Jupyter notebook to visualize PDB structures:

bash jupyter notebook notebooks/view_pdbs.ipynb

Design Strategy

The pipeline implements an iterative optimization strategy balancing exploration and exploitation:

  1. Initial Structure Generation: RFDiffusion creates scaffolds with complementary binding interfaces to CD20
  2. Sequence Design: ProteinMPNN optimizes sequences for both stability and binding
  3. Structure Validation: AlphaFold2 predicts structures to ensure design accuracy
  4. Filtering: Removes designs with poor predictions or impractical structural properties
  5. Binding Assessment: MD simulations and energy calculations evaluate binding stability
  6. Iterative Optimization: Top designs are carried forward through multiple rounds

Multiple design strategies were explored: - Hotspot-focused designs targeting specific CD20 residues (168-175) - Broader interface designs covering larger regions (residues 46-210) - Beta-model variants with enhanced structural constraints

Results

The pipeline generated a collection of high-affinity protein binders specifically targeting the extracellular region of CD20, with strong predicted binding affinity, structural stability, and specificity for the target epitope while minimizing interaction with the cell membrane.

Our designs have been submitted to the BioML Challenge and are currently being experimentally tested alongside approximately 12,000 other designs from participating teams. Results are expected in early 2025.

Advanced Configuration

RFDiffusion Parameters

Different contig configurations can be used to target specific regions:

```bash

Target specific hotspot residues

contigmap.contigs=[C168-175/0 D168-175/0 80-80]

Target broader interface

contigmap.contigs=[C46-210/0 D46-210/0 80-80]

Use beta model for enhanced structural constraints

inference.ckptoverridepath=$HOME/models/Complexbetackpt.pt ```

ProteinMPNN Options

Modify sequence design parameters in src/proteinmpnn_af2/beta_model_mpnn.py:

```python

Number of sequence designs per scaffold

numseqper_target = 4

Temperature for sampling (higher = more diverse)

sampling_temp = 0.1 ```

References

  • RFDiffusion:

    • Watson, J.L., Juergens, D., Bennett, N.R. et al. (2023). De novo design of protein structure and function with RFdiffusion. Nature, 620, 1089–1100. https://doi.org/10.1038/s41586-023-06415-8
  • ProteinMPNN & AlphaFold2:

    • Bennett, N.R., Coventry, B., Goreshnik, I. et al. (2023). Improving de novo protein binder design with deep learning. Nat Commun 14, 2625. https://doi.org/10.1038/s41467-023-38328-5
    • Dauparas, J., Anishchenko, I., Bennett, N., et al. (2022). Robust deep learning–based protein sequence design using ProteinMPNN. Science, 378(6615), 49–56. https://doi.org/10.1126/science.add2187
    • Jumper, J., Evans, R., Pritzel, A., et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583–589. https://doi.org/10.1038/s41586-021-03819-2
  • Molecular Dynamics & Energy Analysis:

    • Eastman, P., Swails, J., Chodera, J.D., et al. (2017). OpenMM 7: Rapid development of high performance algorithms for molecular dynamics. PLOS Computational Biology, 13(7), e1005659. https://doi.org/10.1371/journal.pcbi.1005659
    • Vangone, A., & Bonvin, A.M.J.J. (2015). Contacts-based prediction of binding affinity in protein–protein complexes. eLife, 4, e07454. https://doi.org/10.7554/eLife.07454
    • Xue, L.C., Rodrigues, J.P., Kastritis, P.L., Bonvin, A.M.J.J., & Vangone, A. (2016). PRODIGY: a web server for predicting the binding affinity of protein–protein complexes. Bioinformatics, 32(23), 3676–3678. https://doi.org/10.1093/bioinformatics/btw514

Team

  • Karl Philip Lundquist
  • Abel Gurung
  • Amardeep Singh
  • Arjun Singh
  • Dion Whitehead

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this code in your research, please cite this repository:

bibtex @software{lundquist2025cd20, author = {Lundquist, Karl Philip and Gurung, Abel and Singh, Amardeep and Singh, Arjun and Whitehead, Dion}, title = {CD20 Protein Binder Design Pipeline}, year = {2025}, url = {https://github.com/klundquist/cd20-binder-design} }

Owner

  • Name: Karl Lundquist, PhD
  • Login: klundquist
  • Kind: user
  • Location: Minneapolis
  • Company: Calyxt

Data Scientist

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it using these metadata."
title: "CD20 Protein Binder Design Pipeline"
abstract: "A computational pipeline for designing novel protein binders targeting CD20 using deep learning and molecular simulation approaches."
authors:
  - family-names: "Lundquist"
    given-names: "Karl Philip"
    orcid: "https://orcid.org/0000-0000-0000-0000"  # Replace with your actual ORCID if available
  - family-names: "Gurung"
    given-names: "Abel"
  - family-names: "Singh"
    given-names: "Amardeep"
  - family-names: "Singh"
    given-names: "Arjun"
  - family-names: "Whitehead"
    given-names: "Dion"
date-released: 2025-03-26
version: "1.0.0"
repository-code: "https://github.com/klundquist/cd20-binder-design"
license: "MIT"
references:
  - type: article
    authors:
      - family-names: "Watson"
        given-names: "J.L."
      - family-names: "Juergens"
        given-names: "D."
      - family-names: "Bennett"
        given-names: "N.R."
    title: "De novo design of protein structure and function with RFdiffusion"
    journal: "Nature"
    volume: 620
    pages: 1089-1100
    year: 2023
    doi: "10.1038/s41586-023-06415-8"
  - type: article
    authors:
      - family-names: "Bennett"
        given-names: "N.R."
      - family-names: "Coventry"
        given-names: "B."
      - family-names: "Goreshnik"
        given-names: "I."
    title: "Improving de novo protein binder design with deep learning"
    journal: "Nature Communications"
    volume: 14
    issue: 2625
    year: 2023
    doi: "10.1038/s41467-023-38328-5"
  - type: article
    authors:
      - family-names: "Jumper"
        given-names: "J."
      - family-names: "Evans"
        given-names: "R."
      - family-names: "Pritzel"
        given-names: "A."
    title: "Highly accurate protein structure prediction with AlphaFold"
    journal: "Nature"
    volume: 596
    issue: 7873
    pages: 583-589
    year: 2021
    doi: "10.1038/s41586-021-03819-2"

GitHub Events

Total
  • Watch event: 2
  • Push event: 1
  • Fork event: 1
  • Create event: 2
Last Year
  • Watch event: 2
  • Push event: 1
  • Fork event: 1
  • Create event: 2

Dependencies

Dockerfile docker
  • nvidia/cuda 11.8.0-cudnn8-runtime-ubuntu22.04 build
src/proteinmpnn_af2/Dockerfile docker
  • nvidia/cuda 11.8.0-cudnn8-devel-ubuntu20.04 build
src/rfdiffusion/Dockerfile docker
  • nvcr.io/nvidia/cuda 11.6.2-cudnn8-runtime-ubuntu20.04 build
environment.yml pypi
  • torch ==1.9.0
  • torchvision ==0.10.0