Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.5%) to scientific vocabulary
Repository
Tobias implementation for ATAC seq data.
Basic Info
- Host: GitHub
- Owner: CCBR
- License: mit
- Language: Python
- Default Branch: main
- Size: 62.2 MB
Statistics
- Stars: 1
- Watchers: 4
- Forks: 1
- Open Issues: 2
- Releases: 5
Metadata Files
README.md
CCBR TOBIAS snakemake pipeline
TOBIAS or "Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal" is a framework of tools for investigating transcription factor binding from ATAC-seq signal. The analysis involves numerous sequential steps (or tasks) to be executed in order to successfully predict TF occupancy footprint from deduplicated alignment BAM files of ATACseq raw data (fastq files). Here we use Snakemake to automate the sequential execution on any HPC. Most tools used by the pipeline are completely containerized in docker format and can be invoked using singularity on the HPC. The minimum requirements for running this pipeline are:
- Python (>=3.5)
- Snakemake(>=5.24.1)
- Singularity(>=3.7.4)
This pipeline was built using the CCBR_SnakemakePipelineCookiecutter.
Please visit the following pages for more details directly from the authors of TOBIAS:
- https://github.molgen.mpg.de/pages/loosolab/www/software/TOBIAS/
- https://github.com/loosolab/TOBIAS
- https://github.molgen.mpg.de/loosolab/TOBIAS_snakemake
Quick start instructions for running CCBR_tobias on Biowulf
Various version of the pipeline have been checked out at /data/CCBR_Pipeliner/Pipelines/CCBR_tobias on biowulf. You can get help about running the pipeline using:
bash
% bash /data/CCBR_Pipeliner/Pipelines/CCBR_tobias/v0.2/run_tobias.bash --help
Pipeline Dir: /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/CCBR_tobias/v0.2
Git Commit/Tag: 6c8726023269ace0fd8fe886a1213859b363f9fd v0.2
/data/CCBR_Pipeliner/Pipelines/CCBR_tobias/v0.2/run_tobias.bash: run CCBR TOBIAS workflow for ATAC seq data
USAGE:
bash /data/CCBR_Pipeliner/Pipelines/CCBR_tobias/v0.2/run_tobias.bash -m/--runmode=<MODE> -w/--workdir=<path_to_workdir>
Required Arguments:
1. RUNMODE: [Type: String] Valid options:
*) init : initialize workdir
*) run : run with slurm
*) reset : DELETE workdir dir and re-init it
*) dryrun : dry run snakemake to generate DAG
*) unlock : unlock workdir if locked by snakemake
*) runlocal : run without submitting to sbatch
2. WORKDIR: [Type: String]: Absolute or relative path to the output folder with write permissions.
The pipeline requires only 2 arguments:
- Runmode
- Working dir
Generally, we anticipate CCBR_tobais to be run in 3 steps:
1. Initialize
bash
% bash /data/CCBR_Pipeliner/Pipelines/CCBR_tobias/dev/run_tobias.bash -m=init -w=/path/to/outfolder
This creates the output folder, so it should not exists before running init. Along with other scripts and files, init copies config.yaml and cluster.json to the output folder, which can then be edited by the user. Some key input values that need to be edited before running the pipeline are as follows:
data: points to the CCBR_ATACseqdedup.bamreplicate files per sample. The sample names should match those later used incontrastscontrasts: which contrasts to perform using TOBIAS. The 2 groups should be already defined underdatapeaks: areas of interests to query for differential foot printing. This should be manually curated before running CCBR_tobias pipelinegenome: currently supports mm10 for mouse with Gencode M21 annotation and hg38 for human Gencode v30 annotation.motifs: motif database to use for analysis. The choices are:
| database | organism | version | | ------------ | ----------- | ----------------- | | HOCOMOCOv11 | Human | Core | | HOCOMOCOv11 | Human | Full | | HOCOMOCOv11 | Mouse | Core | | HOCOMOCOv11 | Mouse | Full | | HOCOMOCOv11 | Human+Mouse | Core | | HOCOMOCOv11 | Human+Mouse | Full | | JASPAR2020 | - | corenonredundant | | JASPAR2020 | - | coreredundant | | JASPAR2020 | vertebrate | corenonredundant | | JASPAR2020 | vertebrate | coreredundant |
2. Dryrun
bash
% bash /data/CCBR_Pipeliner/Pipelines/CCBR_tobias/dev/run_tobias.bash -m=dryrun -w=/path/to/outfolder
Running the above command ensures that
- output folder exists and contains the required files
- examples the
config.yamlfiles and makes sure that we have appropriate permissions to the input files and output locations - runs snakemake in
dry-runmode using thecluster.jsonto enlist a table of rules/tasks to be run
3. Run
After successfully running dryrun , the user can run the same command with -m=run option to submit jobs to the slurm job scheduler on biowulf. By default, the norm partition is used to running jobs, but that and other job parameters can be changed by editing the cluster.json file in the output folder.
Expected Outputs:
The following folders are expected upon successful completion.
bams
Individual replicate alignment BAMs are merged together and pre-sorted. This folder will contains the merged BAMs
coverage
The merged BAMs are converted to normalized bigwigs for visualization with IGV. The bigwigs can be found here.
bias_correction
The merged BAMs from the bams folder are corrected for Tn5 insertion bias. 4 separate bigwigs are expected as output on a per-condition basis:
- uncorrected bigwig: The uncorrected cutsite signal representing observed reads in basepair resolution. This track is normalized for sequencing depth but not corrected in terms of Tn5 bias.
- corrected bigwig: This is the corrected cutsite signal and will contain both positive and negative values for positions respectively more or less cut than expected. Remember, bigwigs cannot have positive and negative values at the same coordinate.
- biased bigwig: The raw bias score against the PWM/DWM bias matrix. This is purely based on sequence.
- expected bigwig: Knowing the cutsite preferences of the Tn5 enzyme the expected cutsite signal is reported here given the influence of bias. It is the raw bias score scaled towards the sum of cuts in the region, and can be directly compared to the uncorrected signal.
- pdf: Plot showing the observed Tn5 bias before and after correction can be found here.
footprinting
Using the bias corrected corrected bigwig a per-condition footprinting bigwig is created limited to the "regions of interest" defined by the peaks in the config.yaml.
peaks
Supplied peaks are annotated using UROPA and annotations are stored here.
TFBS_{contrast}
One TFBS folder is create for each contrast. There are created by running bindetect. Each TFBS folder contains numerous (100s) subfolders, one for each motif in the motif database selected using motifs parameter in config.yaml. Each of these per-TF-motif subfolder also has a standard folder structure including a subfolder name beds. This contains:
- a bed file for the TFBS for the motif in consideration which fall within the "regions of interest" as declared by the
peaksparameter inconfig.yaml - the TFBS sites in the above bed file are split into "bound" and "unbound" sites for each contrast separately resulting into a total of 4 bed files.
More more details see https://github.com/loosolab/TOBIAS/wiki/BINDetect
Caution This folder has a large digital footprint. Approximately, each contrast produces files amounting to about 40-60 GB. Hence, only run those contrasts that are interesting. DO NOT RUN ALL JUST BECAUSE YOU CAN!
This folder also contains:
- bindetect_results.txt
- bindetect_figures.pdf
which are the key results for this contrast as a table and as plots.
overview_{contrast}
All "bound" bed for all the TF motifs considered are concatenated together to be reported here as 2 sorted and indexed bed files. As these are indexed they can be easily loaded in a IGV session for visual inspection.
network
TF-TF binding networks are created with TOBIAS CreateNetwork for the first condition in each contrast.
An adjacency matrix and a list of edges are reported individually for each TF motif and summarized overall for each network.
Owner
- Name: CCR Collaborative Bioinformatics Resource
- Login: CCBR
- Kind: organization
- Email: nciccbr@mail.nih.gov
- Location: United States of America
- Website: https://bioinformatics.ccr.cancer.gov/ccbr/
- Repositories: 92
- Profile: https://github.com/CCBR
CCR Collaborative Bioinformatics Resource, Center for Cancer Research (NCI), National Institutes of Health
Citation (CITATION.cff)
cff-version: 1.2.0
message: Please cite CCBR_tobias as below.
authors:
- family-names: Koparde
given-names: Vishal
orcid: https://orcid.org/0000-0001-8978-8495
affiliation:
Advanced Biomedical Computational Science, Frederick National Laboratory
for Cancer Research, Frederick, MD 21702, USA
- family-names: Sovacool
given-names: Kelly
orcid: https://orcid.org/0000-0003-3283-829X
affiliation:
Advanced Biomedical Computational Science, Frederick National Laboratory
for Cancer Research, Frederick, MD 21702, USA
title: CCBR TOBIAS snakemake pipeline
repository-code: https://github.com/CCBR/CCBR_tobias
license: MIT
type: software
identifiers:
- description: Archived snapshots of all versions
type: doi
value: 10.5281/zenodo.13327935
version: v0.3.1
date-released: "2025-06-23"
GitHub Events
Total
- Issues event: 4
- Watch event: 1
- Delete event: 1
- Issue comment event: 1
- Push event: 10
- Pull request event: 5
- Create event: 3
Last Year
- Issues event: 4
- Watch event: 1
- Delete event: 1
- Issue comment event: 1
- Push event: 10
- Pull request event: 5
- Create event: 3
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 2
- Total pull requests: 3
- Average time to close issues: 6 months
- Average time to close pull requests: 21 minutes
- Total issue authors: 1
- Total pull request authors: 1
- Average comments per issue: 1.0
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 3
- Average time to close issues: 7 months
- Average time to close pull requests: 21 minutes
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- kelly-sovacool (7)
Pull Request Authors
- kelly-sovacool (6)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/add-to-project v1.0.2 composite
- actions/checkout v4 composite
- pre-commit/action v3.0.1 composite
- CCBR/actions/draft-release v0.2 composite
- actions/checkout v4 composite
- CCBR/actions/post-release v0.2 composite
- actions/checkout v4 composite