colbuilder
Building collagen fibrils from amino acids sequences
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: biorxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.9%) to scientific vocabulary
Keywords
Repository
Building collagen fibrils from amino acids sequences
Basic Info
Statistics
- Stars: 6
- Watchers: 1
- Forks: 1
- Open Issues: 3
- Releases: 1
Topics
Metadata Files
README.md
ColBuilder
Generate atomistic models of collagen microfibrils from single collagen molecules
📋 Table of Contents
- 📋 Table of Contents
- 📚 About
- 🚀 Installation
- 🚀 Quick Start
- ⚙️ Operation Modes & Workflow
- 📖 Usage Guide
- Basic Usage
- Configuration Options
- Example Workflows
- Creating a Basic Human Collagen Microfibril
- Generating a Crosslinked Bovine Microfibril
- Creating a Mixed Crosslinked (80% Divalent + 20% Trivalent) Human Collagen Microfibril from Collagen Molecules
- Generating a Coarse-Grained Topology File for MD Simulation
- 📚 Documentation
- 🤝 Contributing
- 📚 Publications & Citation
- 🙏 Acknowledgements
📚 About
ColBuilder is a specialized tool for generating atomistic models of collagen microfibrils from single collagen molecules. Developed by the Gräter group at the Max Planck Institute for Polymer Research, it provides researchers with a flexible framework to create biologically relevant collagen structures for molecular dynamics simulations and structural studies.
Key Features
- Custom microfibril generation: Create collagen microfibrils from individual molecules or amino acid sequences with precise control over structural parameters
- Highly configurable: Adjust collagen sequence, fibril geometry, crosslink types and density to match your custom conditions
- Simulation-ready output: Generate atomistic and coarse-grained topology files compatible with major molecular dynamics packages
- Reproducible research: Standardized approach to collagen modeling to ensure consistency across studies
🚀 Installation
Prerequisites
- Python 3.9 or later
- Git
- Conda package manager (we recommend miniforge)
Step-by-Step Installation
Create and activate a conda environment:
bash conda create -n colbuilder python=3.9 conda activate colbuilderClone the repository:
bash git clone git@github.com:graeter-group/colbuilder.git cd colbuilderInstall ColBuilder:
bash pip install .
Dependencies
ColBuilder requires several external tools to function properly:
PyMOL
bash
conda install conda-forge::pymol-open-source
Note: If PyMOL fails due to missing libnetcdf.so, install:
bash
conda install -c conda-forge libnetcdf==4.7.3
muscle (Multiple Sequence Alignment)
bash
conda install bioconda::muscle
UCSF Chimera
- Download the latest version of UCSF Chimera (64-bit recommended)
- Make the binary executable and run the installer:
bash cd ~/Downloads # or wherever you downloaded the file chmod +x chimera*.bin ./chimera*.bin - Follow the installation prompts, preferably creating a symlink in a directory in your
$PATH
Note: ColBuilder specifically requires UCSF Chimera, not the newer ChimeraX.
Modeller
- Download Modeller version 10.5
- Follow the installation instructions provided
- Add the following environment variables to your
.bashrcor.bash_profile:bash export PYTHONPATH="/home/user/bin/modeller10.5/lib/x86_64-intel8/python3.3:$PYTHONPATH" export PYTHONPATH="/home/user/bin/modeller10.5/modlib:$PYTHONPATH" export LD_LIBRARY_PATH="/home/user/bin/modeller10.5/lib/x86_64-intel8:$LD_LIBRARY_PATH"(Adjust paths according to your installation location)
🚀 Quick Start
To verify your installation and run a basic example:
Verify installation:
bash colbuilder --helpCreate a basic configuration file (save as
config.yaml): ```yamlBasic human collagen microfibril configuration
species: "homosapiens" sequencegenerator: true geometrygenerator: true crosslink: true fibrillength: 60.0 contactdistance: 20 ntermtype: "HLKNL" ctermtype: "HLKNL" ntermcombination: "9.C - 947.A" cterm_combination: "1047.C - 104.C" ```
Run ColBuilder:
bash colbuilder --config_file config.yaml
⚙️ Operation Modes & Workflow
ColBuilder operates through modular modes, each responsible for a different part of the collagen model-building pipeline. These modes can be combined in various ways or run separately using different configuration files.
🧠 Understanding PDB Types
ColBuilder produces or requires two kinds of PDB files:
- Collagen triple helix molecule PDB: a single ~334 nm-long collagen molecule (usually with specified crosslink residues). Output of Mode 1, input to Modes 2 and 4.
- Collagen fibril PDB: a full microfibril model composed of multiple triple helices arranged based on crystal geometry, length, and crosslinking. Output of Modes 2, 4, or 5, input to Modes 3 and 5.
Understanding this distinction is crucial for organizing your workflow correctly.
📊 Mode Summary Table
| # | Mode | Purpose | Input(s) | Output | Can Run With Other Modes? |
|---|------------------------|-------------------------------------------------------------------------|----------------------------------------------------------------|------------------------------------|------------------------------|
| 1 | sequence_generator | Generate a collagen triple helix molecule via homology modeling | species or custom FASTA | Triple helix PDB | Yes: with 2, 3, 5 |
| 2 | geometry_generator | Assemble a collagen fibril from a single triple helix | PDB from Mode 1 or custom PDB | Fibril PDB | Yes: with 1, 3, 5 |
| 3 | topology_generator | Generate topology files for GROMACS simulations | Fibril PDB (from Mode 2, 4, or 5) | .top, .itp, .gro | Yes: with 2, 4, 5 |
| 4 | mix_bool | Generate a fibril by mixing two crosslink types | Two triple helix PDBs from Mode 1 | Mixed fibril PDB | No, requires separate script |
| 5 | replace_bool | Replace crosslinks in an existing fibril | Fibril PDB from Mode 2 or 4 | Modified fibril PDB | Yes: with 2, 3 |
🔁 Valid Mode Combinations
These combinations can be run in a single config file:
```yaml
Example combination
sequencegenerator: true geometrygenerator: true topologygenerator: true # (optional) replacebool: true # (optional) ```
✅ Valid Workflows
These mode combinations can be run in a single configuration file:
- ✅
1 + 2 - ✅
1 + 2 + 3- example - ✅
2 + 3(starting from a custom triple helix PDB) - ✅
1 + 2 + 5 + 3 - ✅
1 + 2 + 5 - ✅
2 + 5- example - ✅
2 + 5 + 3
🔧 Mode 4 (Mixing Crosslinks): Run Separately via Script
Mixing crosslinks (Mode 4) currently requires a separate workflow using two config files for triple helix generation and one for fibril construction:
```bash
Example bash script for mixing crosslinks
colbuilder --configfile triplehelixA.yaml colbuilder --configfile triplehelixB.yaml colbuilder --configfile mixgeometry.yaml # sets mix_bool: true and includes both PDBs ```
You can also chain this with replacebool (Mode 5) or topologygenerator (Mode 3) in the third config.
📖 Usage Guide
Basic Usage
The general syntax for running ColBuilder is:
bash
colbuilder --config_file config.yaml [OPTIONS]
Configuration Options
ColBuilder uses YAML configuration files to define parameters. Here's a complete template with all available options:
```yaml
Operation Mode
mode: null # Specific operation mode if needed configfile: null # Path to another config file (for nested configs) sequencegenerator: true # Generate sequence from species geometrygenerator: true # Generate fibril geometry topologygenerator: false # Generate topology files debug: false # Enable debug mode
Input Configuration
species: "homo_sapiens" # Species for collagen sequence
Available species options:
Mammals (Primates): homosapiens, pantroglodytes, pongoabelii, callithrixjacchus, otolemur_garnettii
Mammals (Rodents): musmusculus, rattusnorvegicus
Mammals (Other): bostaurus, canislupus, ailuropodamelanoleuca, mustelaputorius, myotislucifugus, loxodontaafricana
Fish: daniorerio, oreochromisniloticus, oryziaslatipes, tetraodonnigroviridis, xiphophorus_maculatus
Reptiles: pelodiscus_sinensis
Sequence Settings
fasta_file: null # Custom FASTA file path (if null, auto-generated based on species) crosslink: true # Enable crosslinking in the model
Check available crosslinks and respective combinations at src/colbuilder/data/sequence/crosslinks.csv
ntermtype: "HLKNL" # N-terminal crosslink type (Options: "DPD", "DPL", "HLKNL", "LKNL", "PYD", "PYL", "deHHLNL", "deHLNL", "NONE") ctermtype: "HLKNL" # C-terminal crosslink type (Options: "DPD", "DPL", "HLKNL", "LKNL", "PYD", "PYL", "deHHLNL", "deHLNL", "NONE") ntermcombination: "9.C - 947.A" # N-terminal residue combination ctermcombination: "1047.C - 104.C" # C-terminal residue combination
Geometry Parameters
pdbfile: null # Input PDB file (set to null if sequencegenerator is true) contactdistance: 20 # Distance threshold for contacts (Å) fibrillength: 70.0 # Length of the generated fibril (nm) crystalcontactsfile: null # File with crystal contacts connectfile: null # File with connection information crystalcontacts_optimize: false # Optimize crystal contacts during generation
Mixing Options (for mixed crosslinked microfibril)
mixbool: false # Enable mixing of different crosslink types ratiomix: "A:70 B:30" # Format: "Type:percentage Type:percentage" filesmix: # Required if mixbool is true - "collagen-molecule-crosslinkA.pdb" # PDB file of collagen molecule with type A crosslinks (created by only setting squence and crosslinks = true (please look at the examples)) - "collagen-molecule-crosslinkB.pdb" # PDB file of collagen molecule with type B crosslinks
Replacement Options (for fewer crosslinks)
replacebool: false # Enable crosslink replacement ratioreplace: 30 # Percentage of crosslinks to replace replacefile: null # File with crosslinks to be replaced (set to null if geometrygeneration is true)
Topology Options
force_field: "amber99" # Force field for topology generation (Options: "amber99", "martini3") ```
For a complete list of configuration options, see the detailed documentation.
Example Workflows
Creating a Basic Human Collagen Microfibril
```yaml
confighumanbasic.yaml
species: "homosapiens" sequencegenerator: true geometrygenerator: true crosslink: false fibrillength: 40.0 contact_distance: 25 ```
bash
colbuilder --config_file config_human_basic.yaml
Generating a Crosslinked Bovine Microfibril
```yaml
configbovinecrosslinked.yaml
species: "bostaurus"
sequencegenerator: true
geometrygenerator: true
crosslink: true
ntermtype: "HLKNL"
ctermtype: "HLKNL"
ntermcombination: "9.C - 946.A"
ctermcombination: "1046.C - 103.C"
fibrillength: 80.0
contact_distance: 15
```
bash
colbuilder --config_file config_bovine_crosslinked.yaml
Creating a Mixed Crosslinked (80% Divalent + 20% Trivalent) Human Collagen Microfibril from Collagen Molecules
```yaml
configmixedcrosslinks.yaml
species: "homosapiens"
sequencegenerator: false
geometrygenerator: false
contactdistance: 25
fibrillength: 40
mixbool: true
ratiomix: "D:80 T:20"
filesmix:
- "human-D.pdb"
- "human-T.pdb"
```
bash
colbuilder --config_file config_mixed_crosslinks.yaml
Generating a Coarse-Grained Topology File for MD Simulation
```yaml
config_topology.yaml
species: "homosapiens"
sequencegenerator: false
geometrygenerator: true
topologygenerator: true
pdbfile: "path/to/templatecollagenmolecule.pdb"
contactdistance: 30
fibrillength: 40
forcefield: "martini3"
```
bash
colbuilder --config_file config_topology.yaml
📚 Documentation
For detailed API documentation, advanced usage examples, and theoretical background:
🤝 Contributing
We welcome contributions to ColBuilder! Please see our contributing guidelines for details on how to submit issues, pull requests, and code reviews.
📚 Publications & Citation
If you use ColBuilder in your research, please cite our paper:
https://www.biorxiv.org/content/10.1101/2024.12.10.627782v1
A BibTeX entry is provided in the CITATION.cff file.
🙏 Acknowledgements
ColBuilder is developed and maintained by the Gräter group at the Max Planck Institute for Polymer Research. We thank all contributors that have supported this work.
For questions, feedback, or support, please open an issue on our GitHub repository.
Owner
- Name: graeter-group
- Login: graeter-group
- Kind: organization
- Repositories: 1
- Profile: https://github.com/graeter-group
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: colbuilder
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Debora
family-names: Monego
email: monegod@mpip-mainz.mpg.de
affiliation: Max Planck Institute for Polymer Research
- given-names: Johanna
family-names: Buck
email: buckj@mpip-mainz.mpg.de
affiliation: Max Planck Institute for Polymer Research
- given-names: Matthias
family-names: Brosz
repository-code: 'https://github.com/graeter-group/colbuilder'
url: 'https://colbuilder.mpip-mainz.mpg.de/home'
license: Apache-2.0
GitHub Events
Total
- Issues event: 5
- Watch event: 5
- Delete event: 12
- Push event: 135
- Pull request event: 9
- Pull request review event: 4
- Fork event: 2
- Create event: 10
Last Year
- Issues event: 5
- Watch event: 5
- Delete event: 12
- Push event: 135
- Pull request event: 9
- Pull request review event: 4
- Fork event: 2
- Create event: 10