g4discovery

Pipeline for annotating/predicting G-Quadruplexes (G4s) in a genome sequence, combining pqsfinder and G4Hunter.

https://github.com/makovalab-psu/g4discovery

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 8 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.5%) to scientific vocabulary

Keywords

bioinformatics dna-sequences g-quadruplexes g4hunter genomics pipeline pqsfinder

Last synced: 11 months ago · JSON representation ·

Repository

Pipeline for annotating/predicting G-Quadruplexes (G4s) in a genome sequence, combining pqsfinder and G4Hunter.

Basic Info

Host: GitHub
Owner: makovalab-psu
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 267 KB

Statistics

Stars: 0
Watchers: 4
Forks: 1
Open Issues: 0
Releases: 0

Topics

bioinformatics dna-sequences g-quadruplexes g4hunter genomics pipeline pqsfinder

Created over 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

G4 Discovery Pipeline (requires Docker)

Scripts for annotating/predicting G-Quadruplexes (G4s) in a genome sequence, combining pqsfinder and G4Hunter.

For a non-dockerized implementation (also with PanSN support), visit the stable fork: g4Discovery.PanSN

Overview
Prerequisites
Features
Usage
Notes
References
Citation

Overview

This repository provides a Python script for predicting G-quadruplex (G4) structures in any FASTA sequence. The tool processes the FASTA file format and outputs a BED file containing non-overlapping G4s for each strand.

Workflow:

The script first runs the pqsfinder tool on the user-provided sequence to identify all potential overlapping G4s. It filters the G4s based on a user-defined pqsfinder score threshold (e.g., 30), and generates an output file with the extension .fa.pqs.
The script, takes the output of the previous step as input and then calculates the G4Hunter score for each identified G4, ensuring that the G4 motif is consistent with those defined in the g4DiscoveryFuncs.py script.
G4s with fewer than a specified number of tetrads (e.g., 3) and scores below the set thresholds for both pqsfinder (e.g., 40) and G4Hunter (e.g., 1.5) are filtered out.
Finally, G4s are grouped by starting position, with the highest-scoring, shortest G4 selected from each region to ensure non-overlapping, stable G4s.

Prerequisites

Before using this package, ensure the following prerequisites are met: 1. Docker Installed: - Install Docker if it has not already been installed on your system. - Refer to the Docker Installation Guide for platform-specific instructions. 2. Required Docker Container: - The package requires the container kxk302/pqsfinder:1.0.0. - To pull the container, after installation, run the following command: docker pull kxk302/pqsfinder:1.0.0 - To verify the installation of the required container, run: docker images

For more information on the dockerized version of pqsfinder, please refer to the repository at: kxk302/pqsfinder-docker

Features

Dockerized Execution: Fully containerized to run independently without requiring R language/packages.
Flexible Motif Detection: Supports both standard ((G{3,}[ATCG]{1,12}){3,}G{3,}) and bulged ((G([ATC]{0,1})G([ATC]{0,1})G([ATCG]{1,3})){3,}G([ATC]{0,1})G([ATC]{0,1})G) G4 motifs.
Non-overlapping G4 Detection: Identifies non-overlapping G4 motifs on a given strand and prioritizes the most stable G4s within a region.

Usage

Command-line Usage

Running G4 Discovery:

Use case: g4Discovery.py [-h] -fa FASTA_FILE -chr CHROMOSOME -o OUTPUT [-t TETRAD] [-ps PQSSCORE] [-hs G4HUNTER] [-psd DOCKER_MIN_PQSSCORE]

options: -h, --help show this help message and exit -fa FASTA_FILE, --fasta_file FASTA_FILE Path to the input FASTA file -chr CHROMOSOME, --chromosome CHROMOSOME Chromosome identifier, either an integer or a single-letter -o OUTPUT, --output OUTPUT Path to the output BED file -t TETRAD, --tetrad TETRAD Minimum number of tetrads for a G4 to be considered -ps PQSSCORE, --pqsscore PQSSCORE Minimum pqsfinder score for a G4 to be considered -hs G4HUNTER, --g4hunter G4HUNTER Minimum absolute G4Hunter score for a G4 to be considered -psd DOCKER_MIN_PQSSCORE, --docker_min_pqsscore DOCKER_MIN_PQSSCORE Minimum pqsfinder score for the docker to run

Example Use Case

python3 g4Discovery.py -fa ../test/test.fa -chr 1 -o ../output/out.bed

Notes

The input FASTA file should contain only one sequence (e.g. sequence from one chromosome), with a single identifier that starts with the > symbol (e.g. >chr1 human CHM13).
The docker daemon must be active in the background for the python script to run successfully.

References

Hon, J., Martínek, T., Zendulka, J., & Lexa, M. (2017). pqsfinder: an exhaustive and imperfection-tolerant search tool for potential quadruplex-forming sequences in R. Bioinformatics, 33(21), 3373-3379. doi: 10.1093/bioinformatics/btx413
Bedrat, A., Lacroix, L., & Mergny, J. L. (2016). Re-evaluation of G-quadruplex propensity with G4Hunter. Nucleic acids research, 44(4), 1746-1759. doi: 10.1093/nar/gkw006

Citation

If you use this tool in your research, please cite the following paper:

Mohanty, S. K., Chiaromonte F., & Makova, K. D. (2025). Evolutionary dynamics of predicted G-quadruplexes in human and other great apes. Genome Biology, Vol. 26(161), DOI: 10.1186/s13059-025-03635-1

Owner

Name: makovalab-psu
Login: makovalab-psu
Kind: organization

Repositories: 13
Profile: https://github.com/makovalab-psu

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this repository, please cite it as below."
title: "Evolutionary Dynamics of Predicted G-quadruplexes in Human and Other Great Apes"
authors:
  - family-names: "Mohanty"
    given-names: "Saswat K."
  - family-names: "Chiaromonte"
    given-names: "Francesca"
  - family-names: "Makova"
    given-names: "Kateryna D."
year: 2025
doi: "10.1186/s13059-025-03635-1"
url: "https://genomebiology.biomedcentral.com/articles/10.1186/s13059-025-03635-1"
date-released: 2025-06-11

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science