cd20-binder-design
A computational pipeline for designing protein binders targeting CD20
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 14 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.4%) to scientific vocabulary
Repository
A computational pipeline for designing protein binders targeting CD20
Basic Info
- Host: GitHub
- Owner: klundquist
- License: mit
- Language: Python
- Default Branch: main
- Size: 84 KB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 1
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
CD20 Protein Binder Design Pipeline
A computational pipeline for designing novel protein binders targeting CD20 using deep learning and molecular simulation approaches. Developed for the BioML Challenge 2024: Bits to Binders competition.
Overview
This project implements a computational approach for designing protein binders that specifically target CD20, a membrane protein expressed on B cells and a key target for various immunotherapies. The pipeline integrates several state-of-the-art computational tools:
- RFDiffusion for de novo protein structure generation
- ProteinMPNN for deep learning-based protein sequence design
- AlphaFold2 for protein structure prediction
- Rosetta for energy calculations and structural refinement
- OpenMM for molecular dynamics simulations
- Prodigy for binding affinity prediction
Competition Context
This pipeline was developed for the BioML Challenge 2024: Bits to Binders competition organized by the University of Texas at Austin BioML Society. The competition required teams to:
- Design the antigen binding domain of a Chimeric Antigen Receptor (CAR) targeting CD20
- Adhere to an 80 amino acid length constraint (due to DNA synthesis limitations)
- Create designs that would activate CAR-T cell killing and proliferation responses
- Submit sequences for experimental testing by LEAH Laboratories
Our designs are currently in the testing stage with results expected in early 2025.
Project Structure
cd20-binder-design/
├── data/
│ ├── input/ # Input files for the pipeline
│ ├── output/ # Output files from each stage
│ ├── pdb/ # PDB files including target CD20 structure
│ └── results/ # Final results and rankings
├── src/
│ ├── analysis/ # Scripts for analyzing and filtering designs
│ ├── proteinmpnn_af2/ # ProteinMPNN and AlphaFold2 integration
│ ├── rfdiffusion/ # RFDiffusion setup and configuration
│ ├── scripts/ # Shell scripts for pipeline execution
│ └── main.py # Main orchestration script
├── notebooks/
│ └── view_pdbs.ipynb # Jupyter notebook for visualizing PDB structures
├── Dockerfile # Main Dockerfile for the project
├── CITATION.cff # Citation information
├── LICENSE # Project license
└── README.md # This file
Installation
Prerequisites
- Docker (for containerized execution)
- Python 3.8 or higher
- CUDA-capable GPU (recommended for AlphaFold2 and RFDiffusion)
Setup
Clone this repository:
bash git clone https://github.com/klundquist/cd20-binder-design.git cd cd20-binder-designSet up the environment: ```bash
Install RFDiffusion
src/scripts/setup_rfdiffusion.sh
# Set up ProteinMPNN and AlphaFold2 src/proteinmpnn_af2/setup.sh ```
Usage
Full Pipeline
Run the complete pipeline with:
bash
python src/main.py
This will execute all steps in sequence: 1. Generate initial binder structures with RFDiffusion 2. Design sequences with ProteinMPNN and Rosetta FastRelax 3. Predict structures with AlphaFold2 4. Filter designs based on structural criteria 5. Run MD simulations and analyze binding energies 6. Rank and select top designs
Selective Execution
You can skip specific steps using command-line flags:
```bash
Skip RFDiffusion step (if you already have initial structures)
python src/main.py --skip-rfdiffusion
Skip ProteinMPNN and AlphaFold2 (if you already have designed sequences and predicted structures)
python src/main.py --skip-proteinmpnn-af2
Skip analysis (if you just want to generate designs)
python src/main.py --skip-analysis ```
Visualization
Use the provided Jupyter notebook to visualize PDB structures:
bash
jupyter notebook notebooks/view_pdbs.ipynb
Design Strategy
The pipeline implements an iterative optimization strategy balancing exploration and exploitation:
- Initial Structure Generation: RFDiffusion creates scaffolds with complementary binding interfaces to CD20
- Sequence Design: ProteinMPNN optimizes sequences for both stability and binding
- Structure Validation: AlphaFold2 predicts structures to ensure design accuracy
- Filtering: Removes designs with poor predictions or impractical structural properties
- Binding Assessment: MD simulations and energy calculations evaluate binding stability
- Iterative Optimization: Top designs are carried forward through multiple rounds
Multiple design strategies were explored: - Hotspot-focused designs targeting specific CD20 residues (168-175) - Broader interface designs covering larger regions (residues 46-210) - Beta-model variants with enhanced structural constraints
Results
The pipeline generated a collection of high-affinity protein binders specifically targeting the extracellular region of CD20, with strong predicted binding affinity, structural stability, and specificity for the target epitope while minimizing interaction with the cell membrane.
Our designs have been submitted to the BioML Challenge and are currently being experimentally tested alongside approximately 12,000 other designs from participating teams. Results are expected in early 2025.
Advanced Configuration
RFDiffusion Parameters
Different contig configurations can be used to target specific regions:
```bash
Target specific hotspot residues
contigmap.contigs=[C168-175/0 D168-175/0 80-80]
Target broader interface
contigmap.contigs=[C46-210/0 D46-210/0 80-80]
Use beta model for enhanced structural constraints
inference.ckptoverridepath=$HOME/models/Complexbetackpt.pt ```
ProteinMPNN Options
Modify sequence design parameters in src/proteinmpnn_af2/beta_model_mpnn.py:
```python
Number of sequence designs per scaffold
numseqper_target = 4
Temperature for sampling (higher = more diverse)
sampling_temp = 0.1 ```
References
RFDiffusion:
- Watson, J.L., Juergens, D., Bennett, N.R. et al. (2023). De novo design of protein structure and function with RFdiffusion. Nature, 620, 1089–1100. https://doi.org/10.1038/s41586-023-06415-8
ProteinMPNN & AlphaFold2:
- Bennett, N.R., Coventry, B., Goreshnik, I. et al. (2023). Improving de novo protein binder design with deep learning. Nat Commun 14, 2625. https://doi.org/10.1038/s41467-023-38328-5
- Dauparas, J., Anishchenko, I., Bennett, N., et al. (2022). Robust deep learning–based protein sequence design using ProteinMPNN. Science, 378(6615), 49–56. https://doi.org/10.1126/science.add2187
- Jumper, J., Evans, R., Pritzel, A., et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583–589. https://doi.org/10.1038/s41586-021-03819-2
Molecular Dynamics & Energy Analysis:
- Eastman, P., Swails, J., Chodera, J.D., et al. (2017). OpenMM 7: Rapid development of high performance algorithms for molecular dynamics. PLOS Computational Biology, 13(7), e1005659. https://doi.org/10.1371/journal.pcbi.1005659
- Vangone, A., & Bonvin, A.M.J.J. (2015). Contacts-based prediction of binding affinity in protein–protein complexes. eLife, 4, e07454. https://doi.org/10.7554/eLife.07454
- Xue, L.C., Rodrigues, J.P., Kastritis, P.L., Bonvin, A.M.J.J., & Vangone, A. (2016). PRODIGY: a web server for predicting the binding affinity of protein–protein complexes. Bioinformatics, 32(23), 3676–3678. https://doi.org/10.1093/bioinformatics/btw514
Team
- Karl Philip Lundquist
- Abel Gurung
- Amardeep Singh
- Arjun Singh
- Dion Whitehead
License
This project is licensed under the MIT License - see the LICENSE file for details.
Citation
If you use this code in your research, please cite this repository:
bibtex
@software{lundquist2025cd20,
author = {Lundquist, Karl Philip and Gurung, Abel and Singh, Amardeep and Singh, Arjun and Whitehead, Dion},
title = {CD20 Protein Binder Design Pipeline},
year = {2025},
url = {https://github.com/klundquist/cd20-binder-design}
}
Owner
- Name: Karl Lundquist, PhD
- Login: klundquist
- Kind: user
- Location: Minneapolis
- Company: Calyxt
- Website: https://nycdatascience.com/blog/author/klundquistgmail-com/
- Repositories: 1
- Profile: https://github.com/klundquist
Data Scientist
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it using these metadata."
title: "CD20 Protein Binder Design Pipeline"
abstract: "A computational pipeline for designing novel protein binders targeting CD20 using deep learning and molecular simulation approaches."
authors:
- family-names: "Lundquist"
given-names: "Karl Philip"
orcid: "https://orcid.org/0000-0000-0000-0000" # Replace with your actual ORCID if available
- family-names: "Gurung"
given-names: "Abel"
- family-names: "Singh"
given-names: "Amardeep"
- family-names: "Singh"
given-names: "Arjun"
- family-names: "Whitehead"
given-names: "Dion"
date-released: 2025-03-26
version: "1.0.0"
repository-code: "https://github.com/klundquist/cd20-binder-design"
license: "MIT"
references:
- type: article
authors:
- family-names: "Watson"
given-names: "J.L."
- family-names: "Juergens"
given-names: "D."
- family-names: "Bennett"
given-names: "N.R."
title: "De novo design of protein structure and function with RFdiffusion"
journal: "Nature"
volume: 620
pages: 1089-1100
year: 2023
doi: "10.1038/s41586-023-06415-8"
- type: article
authors:
- family-names: "Bennett"
given-names: "N.R."
- family-names: "Coventry"
given-names: "B."
- family-names: "Goreshnik"
given-names: "I."
title: "Improving de novo protein binder design with deep learning"
journal: "Nature Communications"
volume: 14
issue: 2625
year: 2023
doi: "10.1038/s41467-023-38328-5"
- type: article
authors:
- family-names: "Jumper"
given-names: "J."
- family-names: "Evans"
given-names: "R."
- family-names: "Pritzel"
given-names: "A."
title: "Highly accurate protein structure prediction with AlphaFold"
journal: "Nature"
volume: 596
issue: 7873
pages: 583-589
year: 2021
doi: "10.1038/s41586-021-03819-2"
GitHub Events
Total
- Watch event: 2
- Push event: 1
- Fork event: 1
- Create event: 2
Last Year
- Watch event: 2
- Push event: 1
- Fork event: 1
- Create event: 2
Dependencies
- nvidia/cuda 11.8.0-cudnn8-runtime-ubuntu22.04 build
- nvidia/cuda 11.8.0-cudnn8-devel-ubuntu20.04 build
- nvcr.io/nvidia/cuda 11.6.2-cudnn8-runtime-ubuntu20.04 build
- torch ==1.9.0
- torchvision ==0.10.0