genome_tracks

A Snakemake workflow and MrBiomics module for easy visualization of genome browser tracks of aligned BAM files (e.g., RNA-seq, ATAC-seq, scRNA-seq, ...) powered by the wrapper gtracks for the package pyGenomeTracks, and IGV-reports.

https://github.com/epigen/genome_tracks

Keywords

atac-seq bioinformatics biomedical-data-science genome-browser genome-track genomic-regions pipeline python rna-seq scrna-seq snakemake visualization workflow

Last synced: 11 months ago · JSON representation ·

Repository

A Snakemake workflow and MrBiomics module for easy visualization of genome browser tracks of aligned BAM files (e.g., RNA-seq, ATAC-seq, scRNA-seq, ...) powered by the wrapper gtracks for the package pyGenomeTracks, and IGV-reports.

Basic Info

Host: GitHub
Owner: epigen
License: mit
Language: Python
Default Branch: main
Homepage: https://epigen.github.io/genome_tracks/
Size: 98.6 KB

Statistics

Stars: 25
Watchers: 2
Forks: 1
Open Issues: 2
Releases: 9

Topics

atac-seq bioinformatics biomedical-data-science genome-browser genome-track genomic-regions pipeline python rna-seq scrna-seq snakemake visualization workflow

Created over 4 years ago · Last pushed 12 months ago

Metadata Files

Readme License Citation

README.md

GitHub Release

Genome Browser Track Visualization Workflow

A Snakemake 8 workflow for easy visualization of genome browser tracks of aligned/mapped BAM files (e.g., RNA-seq, ATAC-seq, scRNA-seq, ...) powered by the wrapper gtracks for the package pyGenomeTracks and IGV-reports.

[!NOTE]
This workflow adheres to the module specifications of MrBiomics, an effort to augment research by modularizing (biomedical) data science. For more details, instructions, and modules check out the project's repository.

⭐️ Star and share modules you find valuable 📤 - help others discover them, and guide our future work!

[!IMPORTANT]
If you use this workflow in a publication, please don't forget to give credit to the authors by citing it using this DOI 10.5281/zenodo.10849097.

Workflow Rulegraph

🖋️ Authors

💿 Software

This project wouldn't be possible without the following software and their dependencies:

🔬 Methods

This is a template for the Methods section of a scientific publication and is intended to serve as a starting point. Only retain paragraphs relevant to your analysis. References [ref] to the respective publications are curated in the software table above. Versions (ver) have to be read out from the respective conda environment specifications (workflow/envs/*.yaml file) or post-execution in the result directory (genome_tracks/envs/*.yaml). Parameters that have to be adapted depending on the data or workflow configurations are denoted in squared brackets e.g., [X].

(optional) Single-cell preprocessing. Each single-cell BAM file was split into [group]-wise BAM files according to it's cell barcode metadata using filterbarcodes from the command line tool sinto (ver) [ref].

Processing. Aligned (filtered, and indexed) BAM files were merged by [group] using samtools (ver) [ref]. Each merged BAM file's coverage was determined for dowmnstream analysis and visualization using bamCoverage from the command line tool deepTools (ver) [ref] and saved in the bigWig format. Finally, we extracted coordinates, extended start and end by [basebuffer] bases, and number of isoforms of all relevant genes/genomic regions [genelist] from the 12 column BED file genome [genome] annotation [genome_bed].

Visualization. Visualizations for each relevant gene/genomic region and [category] were generated by using the generated bigWig coverage files and vertically stacking genome browser tracks with their annotation at the [xaxis] and each track scaled by [ymax] reads. The plotting was performed using the python wrapper gtracks (ver) [ref] for the package pyGenomeTracks (ver) [ref]. Additionally, an interactive self-contained IGV-report containing all merged samples and gene/genomic regions of interest was generated using igv-reports (ver) [ref]. Finally, a UCSC genome browser track hub was created for online sharing and inspection using UCSC Genome Browser. Both the plotted tracks and the UCSC genome browser tracks were color coded according to [group].

The processing and visualizations described here were performed using a publicly available Snakemake ver workflow [10.5281/zenodo.10849097].

🚀 Features

The workflow performs the following steps to produce the outlined results (genome_tracks/).

Processing
1. (optional) Single-cell BAM files are split into groups according to their metadata file using sinto::filterbarcodes and saved in folders named by the md5 hash of the input BAM file path (sc_bams/{md5hash}/{group}.bam). Downstream they are processed and visualized the same as bulk samples.
2. BAM files of the same group are merged and indexed using samtools::merge. (merged_bams/{group}.bam)
3. A bigWig file per merged BAM file is generated using deepTools::bamCoverage. (bigWigs/{group}.bw)
4. Annotations per gene are retrieved from the 12-column BED file (not necessary for genomic regions).
  - coordinates from the 12-column BED file are extracted and extended at start/end by the parameter base_buffer.
  - the number of isoforms i.e. number of lines in the BED file is determined (only for genes, for genomic regions it is hardcoded to 1) to plot below the tracks.
  - genes that were not found in the provided genome BED file are reported (genes_not_found.csv).
5. A BED file of all genes and genomic regions is generated for the IGV-report (genes.bed).
Visualization
- One plot including all groups visualizing the coverage (i.e., bigWigs) per gene/genomic region with the determined gene-parameters i.e, coordinates and gene-rows is generated (tracks/{gene|region}.{svg|pdf|png}). The track height (ymax) and colors (track_colors) are configurable within the gene_list or config file, respectively. Default color is black.
- An interactive self-contained IGV-report from merged BAM files with all genes/regions is generated for inspection for sharing (igv-report.html).
- A UCSC genome browser track hub for all bigWigs is set up (bigWigs/), color coded according to configuration. See detailed instructions for usage and sharing below.

🛠️ Usage

Here are some tips for the usage of this workflow: - Start with the 1-5 most interesting genes (e.g., marker genes for cell types as quality control or the most differentially expressed between conditions) and few/relevant samples for a test run. - Set y-max to auto (i.e., 0 in the gene_list CSV table) for the first run to get a feeling for the magnitudes in your samples/groups and adapt to the highest peaks afterward to make the tracks comparable within a gene/region of interest. - Splitting single-cell BAM files, merging BAM files and generating bigWig files take the longest, but are performed only once (multithreaded 4x configured threads). The plot generation for different genes/genomic regions afterward is very fast. Therefore, it is recommended to get the workflow going with a few samples and then increase the number of samples and genes/regions afterward. - For simple quality control (QC) of all/individual samples just provide a unique sample name in the group column of the annnotation file. Then no samples will be merged into groups, but only renamed (useful before sharing).

This workflow is written with Snakemake and its usage is described in the Snakemake Workflow Catalog.

⚙️ Configuration

Detailed specifications can be found here ./config/README.md

📖 Examples

Explore detailed examples showcasing module usage in our comprehensive end-to-end MrBiomics Recipes, including data, configuration, annotation and results: - ATAC-seq Analysis Recipe - RNA-seq Analysis Recipe - scRNA-seq Analysis Recipe - scCRISPR-seq Analysis Recipe

Runtime examples for different data modalities: - 78 ATAC-seq samples in 31 groups took 22 minutes with max 4 cores and 4GB memory. - 64 RNA-seq samples in 31 groups took 16 minutes with max 4 cores and 4GB memory. - 2 10x genomics 5' scRNA-seq samples/reactions each ~10k cells split in 2 small subset groups took 31 minutes with max 4 cores and 8GB memory.

🧬 Genome Browser Tracks

The bigWigs directory contains the read coverage per sample/group in bigWig format ({group}.bw) for visual inspection of each sample e.g., during QC or group e.g., comparison of conditions. Below are instructions for two different approaches (online/local).

UCSC Genome Browser Track Hub (online)

Requirement: web server.
Copy (or symlink) the bigWigs directory to an externally accessible location on your web server (=web_server_location).
Create a UCSC Genome Browser hyperlink
- the general formula is: ucscurl + genome + webserver_location + bigWigs/hub.txt
- concretely: http://genome-euro.ucsc.edu/cgi-bin/hgTracks?db= + {genome} + &hubUrl= + {webserverlocation} + bigWigs/hub.txt
- mm10 example: http://genome-euro.ucsc.edu/cgi-bin/hgTracks?db=mm10&hubUrl=https://medical-epigenomics.org/data/genome_tracks/mm10test/hub/hub.txt
- hg38 example: http://genome-euro.ucsc.edu/cgi-bin/hgTracks?db=hg38&hubUrl=https://medical-epigenomics.org/data/genome_tracks/hg38test/hub/hub.txt
Share the link with the world e.g., collaborators or upon publication of your data.

A new feature (2024-08-30) allows users to download all visible data in the current region directly from our tracks display. This facilitates reproducibility when writing reports or publications as data can update and change over time. This feature can be found in the blue bar menu by going to Downloads > Download Current Track Data. The resulting pop-up dialogue box (see screenshot below) can configure the exact tracks to download from all visible tracks, as well as the file name and the output format (JSON, csv, tsv). UCSC_download

IGV: Integrative Genomics Viewer (local/offline)

Requirement: IGV Desktop application.
Open IGV.
Select genome.
Drag and drop all/selected bigWig files from the bigWigs directory directly into the IGV application.

🔗 Links

📚 Resources

Recommended compatible MrBiomics modules for upstream processing:
- ATAC-seq Data Processing & Quantification Pipeline for processing, quantification and annotation of ATAC-seq samples.
UCSC Genome Browser annotation track database
- recommended source for the required 12 column BED file annotation of the respective genome.
- e.g., for mm10 from UCSC as gzip https://genome.ucsc.edu/cgi-bin/hgTables assembly:mm10 -> track:NCBI RefSeq -> table:refFlat; output format: BED

📑 Publications

The following publications successfully used this module for their analyses. - Traxler, Reichl et al. (2025) Cell Systems - Integrated time-series analysis and high-content CRISPR screening delineate the dynamics of macrophage immune regulation - ...

⭐ Star History

Owner

Name: Computational Epigenetics
Login: epigen
Kind: organization
Location: Vienna, Austria

Website: http://www.medical-epigenomics.org/
Repositories: 17
Profile: https://github.com/epigen

Computational Epigenetics Research and Software

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Genome Browser Track Visualization Workflow
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - affiliation: CeMM Research Center for Molecular Medicine
    given-names: Stephan
    family-names: Reichl
    orcid: 'https://orcid.org/0000-0001-8555-7198'
  - given-names: Christoph
    family-names: Bock
    orcid: 'https://orcid.org/0000-0001-6091-3088'
    affiliation: CeMM Research Center for Molecular Medicine
identifiers:
  - type: doi
    value: 10.5281/zenodo.10849097
    description: >-
      This DOI represents all versions, and will always
      resolve to the latest one.
repository-code: 'https://github.com/epigen/genome_tracks'
url: 'https://epigen.github.io/genome_tracks'
abstract: >-
  A Snakemake workflow for easy visualization of genome
  browser tracks of aligned BAM files (e.g., RNA-seq,
  ATAC-seq, scRNA-seq, ...) powered by the wrapper gtracks
  for the package pyGenomeTracks, and IGV-reports.
keywords:
  - Bioinformatics
  - Genome tracks
  - Visualization
  - Workflow
  - Snakemake
license: MIT

GitHub Events

Total

Create event: 2
Issues event: 3
Release event: 1
Watch event: 9
Issue comment event: 5
Push event: 5
Fork event: 1

Last Year

Create event: 2
Issues event: 3
Release event: 1
Watch event: 9
Issue comment event: 5
Push event: 5
Fork event: 1

Issues and Pull Requests

Last synced: over 1 year ago

All Time

Total issues: 13
Total pull requests: 0
Average time to close issues: 3 months
Average time to close pull requests: N/A
Total issue authors: 1
Total pull request authors: 0
Average comments per issue: 0.31
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 0
Average time to close issues: 6 months
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0