3d-genome-builder

3DGB is a workflow to build 3D models of genomes from HiC data

https://github.com/data-fun/3d-genome-builder

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.9%) to scientific vocabulary

Keywords

3d-genome hic vizualization
Last synced: 6 months ago · JSON representation

Repository

3DGB is a workflow to build 3D models of genomes from HiC data

Basic Info
  • Host: GitHub
  • Owner: data-fun
  • License: bsd-3-clause
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 32.5 MB
Statistics
  • Stars: 14
  • Watchers: 3
  • Forks: 3
  • Open Issues: 6
  • Releases: 2
Topics
3d-genome hic vizualization
Created almost 4 years ago · Last pushed 12 months ago
Metadata Files
Readme License Authors Codemeta

README.md

3D structure of the Neurospora crassa genome at 50 kb resolution

3D genome builder (3DGB)

3D genome builder (3DGB) is a workflow to build 3D models of genomes from HiC raw data and to integrate omics data on the produced models for further visual exploration. 3DGB bundles HiC-Pro, PASTIS and custom Python scripts into a unified Snakemake workflow with limited inputs (see Preparing Required Files). 3DGB produces annotated 3D models of genome in PDB and G3D formats.

SWH

Download this repository

bash git clone https://github.com/data-fun/3d-genome-builder.git cd 3d-genome-builder

Install dependencies

Singularity

Download the latest version here

Install Singularity:

bash sudo apt install -y ./singularity-container_3.8.7_amd64.deb

Verify version:

$ singularity --version singularity version 3.8.7

Conda environment

Install conda.

Install mamba:

bash conda install mamba -n base -c conda-forge

Create conda environment and install dependendies:

bash mamba env create -f binder/environment.yml

Load conda environment:

bash conda activate 3DGB

Download HiC-Pro Singularity image

bash wget --ciphers=DEFAULT:@SECLEVEL=1 https://zerkalo.curie.fr/partage/HiC-Pro/hicpro_3.1.0_ubuntu.img -P images

If this command fails, try with an alternate download link:

bash wget https://zenodo.org/record/8376626/files/hicpro_3.1.0_ubuntu.img -P images

Check the integrity of the image:

bash $ md5sum images/hicpro_3.1.0_ubuntu.img d480e636397c14e187608e50309eb9af images/hicpro_3.1.0_ubuntu.img

Verify HiC-Pro version with:

bash $ singularity exec images/hicpro_3.1.0_ubuntu.img HiC-Pro --version [...] HiC-Pro version 3.1.0

and bowtie2 version:

bash $ singularity exec images/hicpro_3.1.0_ubuntu.img bowtie2 --version 2>/dev/null | head -n 1 /usr/local/conda/envs/hicpro/bin/bowtie2-align-s version 2.4.4

Prepare required files

Create the config file

Create and edit a configuration file in yaml format. See for instance the template config_template.yml

Add the reference genome

The reference genome fasta file must be located in WORKING_DIR/genome.fasta where WORKING_DIR is the name of the working directory as specified in your config file.

Add FASTQ files (optional)

If you already have fastq files stored locally or some fastq files are not available on GEO or SRA, you can use these files providing they are in the proper directory structure:

3D structure of the chromosome 13 of S. cerevisiae at 5 kb resolution

WORKING_DIR/ ├── fastq_files │   ├── ID1 │   │   ├── ID1_R1.fastq.gz │   │   └── ID1_R2.fastq.gz │   ├── ID2 │   │   ├── ID2_R1.fastq.gz │   │   └── ID2_R2.fastq.gz │   ├── ID3 │   │   ├── ID3_R1.fastq.gz │   │   └── ID3_R2.fastq.gz │   └── ID4 │   ├── ID4_R1.fastq.gz │   └── ID4_R2.fastq.gz └── genome.fasta

  • WORKING_DIR is the name of the working directory as specified in your config file.
  • Paired-end fastq files are in the directory WORKING_DIR/fastq_files/IDx with IDx the identifier of the paired fastq files. Fastq identifiers are reported in the config file. Please note fastq files have to follow the pattern <sample ID>_R<1 or 2>.fastq.gz.

Note

Please strictly follow this file organization as it is required by the 3DGB workflow.

Build model

Run 3DGB:

bash snakemake --profile smk_profile -j 4 --configfile YOUR-CONFIG.yml

Note - Adapt YOUR-CONFIG.yml to the exact name of the config file you created. - Option -j 4 tells Snakemake to use up to 4 cores. If you are more cores available, you can increase this value (e.g. -j 16).

Or with debugging options:

bash snakemake --profile smk_profile_debug -j 4 --configfile YOUR-CONFIG.yml --verbose

Depending on the number and size of fastq files, the 3D construction will take a couple of hours to run.

For troubleshooting, have a look to log files in WORKING_DIR/logs, where WORKING_DIR is the name of the working directory as specified in your config file.

Map quantitative values on the 3D model

To map quantitative values on the model run:

bash python ./scripts/map_parameter.py --pdb path/to/structure.pdb --bedgraph path/to/annotation.bedgraph --output path/to/output.pdb

Quantitative values should be formatted in a 4-column bedgraph file (chromosome/start/stop/value):

chr1 0 50000 116.959 chr1 50000 100000 48.4495 chr1 100000 150000 22.8726 chr1 150000 200000 84.3106 chr1 200000 250000 113.109

Each bead of the model will be assigned a quantitative value. The resolution in the bedgraph file should match the resolution used to build the model.

Get results

Upon completion, the WORKING_DIR should look like this:

WORKING_DIR/ ├── contact_maps ├── dense_matrix ├── fastq_files ├── HiC-Pro ├── logs ├── pastis ├── sequence └── structure

The following paths contain the most interesting results:

  • WORKING_DIR/contact_maps/*.png : contact maps.
  • WORKING_DIR/HiC-Pro/output/hic_results/pic/*/*.pdf : graphical summaries of read alignments produced by Hi-C Pro.
  • WORKING_DIR/pastis/structure_RESOLUTION.pdb : raw 3D models (in PDB format) produced by Pastis.
  • WORKING_DIR/structure/RESOLUTION/structure_cleaned.* : final (annotated) 3D models in PDB and G3D formats.

Note - WORKING_DIR is the name of the working directory as specified in your config file. - RESOLUTION is the resolution of the Hi-C data specified in the config file.

Examples

Visualize 3D model structures

To visualize 3D model structures (.pdb and .g3d files), follow this quick tutorial.

Build DAG graph

For visualization purpose, you can build the graph of all computational steps involved in the 3D construction of the genome.

bash snakemake --profile smk_profile --configfile YOUR-CONFIG.yml --rulegraph | dot -Tpdf > rules.pdf

where YOUR-CONFIG.yml should be replaced by the name of the config file you created.

With wildcards:

bash snakemake --profile smk_profile --configfile YOUR-CONFIG.yml --dag | dot -Tpdf > dag.pdf

Owner

  • Name: data-fun
  • Login: data-fun
  • Kind: organization

CodeMeta (codemeta.json)

{
  "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
  "@type": "SoftwareSourceCode",
  "license": "https://spdx.org/licenses/BSD-3-Clause",
  "codeRepository": "https://github.com/data-fun/3d-genome-builder",
  "dateCreated": "2022-04-07",
  "datePublished": "2023-11-08",
  "name": "3DGB",
  "description": "3D genome builder",
  "applicationCategory": "Biology",
  "funding": "ANR-19-CE45-0017",
  "funder": {
    "@type": "Organization",
    "name": "Agence Nationale de la Recherche"
  },
  "keywords": [
    "HiC",
    "chromatine structure",
    "fungal genomes",
    "3D model",
    "workflow",
    "HiC-Pro",
    "Pastis"
  ],
  "programmingLanguage": [
    "Python"
  ],
  "operatingSystem": [
    "Linux"
  ],
  "author": [
    {
      "@type": "Person",
      "@id": "https://orcid.org/0000-0002-5079-5417",
      "givenName": "Thibault",
      "familyName": "Poinsignon",
      "email": "thibault.poinsignon@protonmail.com",
      "affiliation": {
        "@type": "Organization",
        "name": "Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Universit ParisSaclay, 91198 GifsurYvette, France"
      }
    },
    {
      "@type": "Person",
      "@id": "https://orcid.org/0000-0002-2431-7825",
      "givenName": "Mlina",
      "familyName": "Gallopin",
      "email": "melina.gallopin@universite-paris-saclay.fr",
      "affiliation": {
        "@type": "Organization",
        "name": "Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Universit ParisSaclay, 91198 GifsurYvette, France"
      }
    },
    {
      "@type": "Person",
      "@id": "https://orcid.org/0000-0002-2842-6172",
      "givenName": "Galle",
      "familyName": "Lelandais",
      "email": "gaelle.lelandais@universite-paris-saclay.fr",
      "affiliation": {
        "@type": "Organization",
        "name": "Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Universit ParisSaclay, 91198 GifsurYvette, France"
      }
    },
    {
      "@type": "Person",
      "@id": "https://orcid.org/0000-0003-4177-3619",
      "givenName": "Pierre",
      "familyName": "Poulain",
      "email": "pierre.poulain@cupnet.net",
      "affiliation": {
        "@type": "Organization",
        "name": "Universit Paris Cit, CNRS, Institut Jacques Monod, F-75013 Paris, France"
      }
    }
  ]
}

GitHub Events

Total
  • Issues event: 5
  • Watch event: 2
  • Issue comment event: 9
  • Push event: 1
Last Year
  • Issues event: 5
  • Watch event: 2
  • Issue comment event: 9
  • Push event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 4
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 3
  • Total pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 4
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 3
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • ChloeQuignot (2)
  • allaigle (1)
  • TobyBaril (1)
  • hanshanmengqi (1)
  • Fadwa7 (1)
  • gaellelelandais (1)
  • acalchera (1)
Pull Request Authors
  • pierrepo (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

binder/environment.yml conda
  • biopandas
  • biopython
  • black
  • graphviz
  • python 3.9.*
  • seqkit
  • snakemake-minimal