g4discovery
Pipeline for annotating/predicting G-Quadruplexes (G4s) in a genome sequence, combining pqsfinder and G4Hunter.
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 8 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.5%) to scientific vocabulary
Keywords
Repository
Pipeline for annotating/predicting G-Quadruplexes (G4s) in a genome sequence, combining pqsfinder and G4Hunter.
Basic Info
Statistics
- Stars: 0
- Watchers: 4
- Forks: 1
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
G4 Discovery Pipeline (requires Docker)
Scripts for annotating/predicting G-Quadruplexes (G4s) in a genome sequence, combining pqsfinder and G4Hunter.
For a non-dockerized implementation (also with PanSN support), visit the stable fork: g4Discovery.PanSN
Table of Contents
Overview
This repository provides a Python script for predicting G-quadruplex (G4) structures in any FASTA sequence. The tool processes the FASTA file format and outputs a BED file containing non-overlapping G4s for each strand.
Workflow:
- The script first runs the
pqsfindertool on the user-provided sequence to identify all potential overlapping G4s. It filters the G4s based on a user-definedpqsfinderscore threshold (e.g., 30), and generates an output file with the extension.fa.pqs. - The script, takes the output of the previous step as input and then calculates the
G4Hunterscore for each identified G4, ensuring that the G4 motif is consistent with those defined in theg4DiscoveryFuncs.pyscript. - G4s with fewer than a specified number of tetrads (e.g., 3) and scores below the set thresholds for both
pqsfinder(e.g., 40) andG4Hunter(e.g., 1.5) are filtered out. - Finally, G4s are grouped by starting position, with the highest-scoring, shortest G4 selected from each region to ensure non-overlapping, stable G4s.
Prerequisites
Before using this package, ensure the following prerequisites are met:
1. Docker Installed:
- Install Docker if it has not already been installed on your system.
- Refer to the Docker Installation Guide for platform-specific instructions.
2. Required Docker Container:
- The package requires the container kxk302/pqsfinder:1.0.0.
- To pull the container, after installation, run the following command: docker pull kxk302/pqsfinder:1.0.0
- To verify the installation of the required container, run: docker images
For more information on the dockerized version of pqsfinder, please refer to the repository at: kxk302/pqsfinder-docker
Features
- Dockerized Execution: Fully containerized to run independently without requiring R language/packages.
- Flexible Motif Detection: Supports both standard
((G{3,}[ATCG]{1,12}){3,}G{3,})and bulged((G([ATC]{0,1})G([ATC]{0,1})G([ATCG]{1,3})){3,}G([ATC]{0,1})G([ATC]{0,1})G)G4 motifs. - Non-overlapping G4 Detection: Identifies non-overlapping G4 motifs on a given strand and prioritizes the most stable G4s within a region.
Usage
Command-line Usage
Running G4 Discovery:
Use case: g4Discovery.py [-h] -fa FASTA_FILE -chr CHROMOSOME -o OUTPUT [-t TETRAD] [-ps PQSSCORE] [-hs G4HUNTER] [-psd DOCKER_MIN_PQSSCORE]
options:
-h, --help show this help message and exit
-fa FASTA_FILE, --fasta_file FASTA_FILE
Path to the input FASTA file
-chr CHROMOSOME, --chromosome CHROMOSOME
Chromosome identifier, either an integer or a single-letter
-o OUTPUT, --output OUTPUT
Path to the output BED file
-t TETRAD, --tetrad TETRAD
Minimum number of tetrads for a G4 to be considered
-ps PQSSCORE, --pqsscore PQSSCORE
Minimum pqsfinder score for a G4 to be considered
-hs G4HUNTER, --g4hunter G4HUNTER
Minimum absolute G4Hunter score for a G4 to be considered
-psd DOCKER_MIN_PQSSCORE, --docker_min_pqsscore DOCKER_MIN_PQSSCORE
Minimum pqsfinder score for the docker to run
Example Use Case
python3 g4Discovery.py -fa ../test/test.fa -chr 1 -o ../output/out.bed
Notes
- The input FASTA file should contain only one sequence (e.g. sequence from one chromosome), with a single identifier that starts with the
>symbol (e.g.>chr1 human CHM13). - The docker daemon must be active in the background for the python script to run successfully.
References
- Hon, J., Martínek, T., Zendulka, J., & Lexa, M. (2017). pqsfinder: an exhaustive and imperfection-tolerant search tool for potential quadruplex-forming sequences in R. Bioinformatics, 33(21), 3373-3379.
doi: 10.1093/bioinformatics/btx413 - Bedrat, A., Lacroix, L., & Mergny, J. L. (2016). Re-evaluation of G-quadruplex propensity with G4Hunter. Nucleic acids research, 44(4), 1746-1759.
doi: 10.1093/nar/gkw006
Citation
If you use this tool in your research, please cite the following paper:
Mohanty, S. K., Chiaromonte F., & Makova, K. D. (2025). Evolutionary dynamics of predicted G-quadruplexes in human and other great apes. Genome Biology, Vol. 26(161),
DOI: 10.1186/s13059-025-03635-1
Owner
- Name: makovalab-psu
- Login: makovalab-psu
- Kind: organization
- Repositories: 13
- Profile: https://github.com/makovalab-psu
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this repository, please cite it as below."
title: "Evolutionary Dynamics of Predicted G-quadruplexes in Human and Other Great Apes"
authors:
- family-names: "Mohanty"
given-names: "Saswat K."
- family-names: "Chiaromonte"
given-names: "Francesca"
- family-names: "Makova"
given-names: "Kateryna D."
year: 2025
doi: "10.1186/s13059-025-03635-1"
url: "https://genomebiology.biomedcentral.com/articles/10.1186/s13059-025-03635-1"
date-released: 2025-06-11
GitHub Events
Total
- Member event: 1
- Push event: 8
- Fork event: 1
- Create event: 2
Last Year
- Member event: 1
- Push event: 8
- Fork event: 1
- Create event: 2