https://github.com/alejandrogzi/postoga

The post-TOGA processing pipeline

https://github.com/alejandrogzi/postoga

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    1 of 2 committers (50.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.9%) to scientific vocabulary

Keywords

bioinformatics gene-annotation pipeline post-processing
Last synced: 6 months ago · JSON representation

Repository

The post-TOGA processing pipeline

Basic Info
  • Host: GitHub
  • Owner: alejandrogzi
  • License: apache-2.0
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 4.75 MB
Statistics
  • Stars: 5
  • Watchers: 1
  • Forks: 4
  • Open Issues: 0
  • Releases: 3
Topics
bioinformatics gene-annotation pipeline post-processing
Created over 2 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog License

README.md

[!WARNING]

postoga is dependent from TOGA. Any changes in TOGA will have a repercusion here. If you found any bug/errors, please report them here. This project is in constant development, any desired features are welcome!

postoga

The post-TOGA processing pipeline.

version

What's new on version 0.9.3-devel

  • Re-implementation of postoga to match TOGA2.0 output
  • Includes self-owned rust bed to gtf coverters through rustools
  • Forces bed2gtf, bed2gff and gxf2bed installation for quick access
  • Adds --extract to extract filtered projections from codon and protein alignments
  • Implements --engine option to use polars
  • Now manages configuration and test through 'make configure' and 'make test'
  • Adds license
  • Drops --skip argument and adds --plot argument to plot stats [currently broken]
  • Adds additional BUSCO and completeness stats to the main log file

Usage

To use postoga, just:

Clone the repository ```bash

clone the repository

git clone --recursive https://github.com/alejandrogzi/postoga.git cd postoga ```

Activate the environment and configure binaries bash pip install hatch make configure

Run test to confirm functionality [you can run the test directly from the configuration step] bash make test

If you see something like this at then end, postoga is ready!:

```text

postoga: the post-TOGA processing pipeline version: 0.9.3-devel[2024-10-15 12:25:05] - INFO: postoga started! [2024-10-15 12:25:05] - INFO: running in mode base with arguments: {'mode': 'base', 'outdir': '/Users/alejandrogzi/Documents/projects/postoga/POSTOGATEST', 'togadir': '/Users/alejandrogzi/Documents/projects/postoga/supply/test', 'byclass': 'I,PI', 'byrel': None, 'threshold': 0.95, 'to': 'gtf', 'assemblyqual': PosixPath('/Users/alejandrogzi/Documents/projects/postoga/supply/Ancestralplacentalcomplete.txt'), 'species': 'human', 'source': 'ensembl', 'phylo': 'mammals', 'plot': False, 'paralog': None, 'isoforms': None, 'engine': 'pandas'} [2024-10-15 12:25:05] - INFO: found 52 projections, 49 unique transcripts, 44 unique genes ```

Here is a descrption of postoga features:

[!TIP]

If the only thing you want to do is apply some filters to a TOGA result or convert results to GTF/GFF files, I recommend the following command:

bash ./postoga.py base \ --togadir /your/TOGA/dir \ --outdir /your/out/dir \ -bc [YOUR CLASSES] \ -br [YOUR RELATIONS] \ -th [YOUR THRESHOLD] \ -to [YOUR FORMAT GTF/GFF/BED] \

```text usage: postoga.py [-h] {base,haplotype}

positional arguments: {base,haplotype} Select mode base Base mode haplotype Haplotype mode

postoga.py base [-h] --outdir OUTDIR --togadir TOGADIR [-bc BY_CLASS] [-br BY_REL] [-th THRESHOLD] -to {gtf,gff,bed} [-aq ASSEMBLY_QUAL] [-sp {human,mouse,chicken}] [-src {ensembl,gene_name,entrez}] [-phy {mammals,birds}] [-s] [-par PARALOG]

optional arguments: -h, --help show this help message and exit --outdir OUTDIR, -o OUTDIR Path to posTOGA output directory --togadir TOGADIR, --td TOGADIR Path to TOGA results directory -bc BYCLASS, --by-class BYCLASS Filter parameter to only include certain orthology classes (I, PI, UL, M, PM, L, UL) -br BYREL, --by-rel BYREL Filter parameter to only include certain orthology relationships (o2o, o2m, m2m, m2m, o2z) -th THRESHOLD, --threshold THRESHOLD Filter parameter to preserve orthology scores greater or equal to a given threshold (0.0 - 1.0) -to {gtf,gff,bed}, --to {gtf,gff,bed} Specify the conversion format for .bed (queryannotation/filtered) file (gtf, gff3) or just keep it as .bed (bed) -aq ASSEMBLYQUAL, --assemblyqual ASSEMBLYQUAL Calculate assembly quality based on a list of genes provided by the user (default: Ancestralplacental.txt) -sp {human,mouse,chicken}, --species {human,mouse,chicken} Species name to be used as a reference for the assembly quality calculation (default: human) -src {ensembl,genename,entrez}, --source {ensembl,gene_name,entrez} Source of the ancestral gene names (default: ENSG) -phy {mammals,birds}, --phylo {mammals,birds} Phylogenetic group of your species (default: mammals) -par PARALOG, --paralog PARALOG Filter parameter to preserve transcripts with paralog projection probabilities less or equal to a given threshold (0.0 - 1.0) -iso ISOFORMS, --isoforms ISOFORMS Path to a custom isoform table (default: None) -e {pandas,polars}, --engine {pandas,polars} Database engine to create inner db representations (default: pandas) -p, --plot Flag to plot statistics about the filtered genes (default: False) -ext [{query,reference}], --extract [{query,reference}] Flag or option to extract sequences (only codon and protein alignments) from the filtered genes. Can be 'query', 'reference', or just set as a flag (default: False). When used as a flag extracting 'query' sequences is default.

postoga.py haplotype [-h] --outdir OUTDIR -hp HAPLOTYPE_DIR [-r RULE] [-s {query,loss}]

optional arguments: -h, --help show this help message and exit --outdir OUTDIR, -o OUTDIR Path to posTOGA output directory -hp HAPLOTYPEDIR, --haplotypedir HAPLOTYPE_DIR Path to TOGA results directories separated by commas (path1,path2,path3) -r RULE, --rule RULE Rule to merge haplotype assemblies (default: I>PI>UL>L>M>PM>PG>abs) -s {query,loss}, --source {query,loss} Source of the haplotype classes (query, loss) ```

Owner

  • Name: Alejandro Gonzales-Irribarren
  • Login: alejandrogzi
  • Kind: user

GitHub Events

Total
  • Issues event: 4
  • Watch event: 2
  • Issue comment event: 7
  • Push event: 11
  • Pull request event: 3
  • Fork event: 1
Last Year
  • Issues event: 4
  • Watch event: 2
  • Issue comment event: 7
  • Push event: 11
  • Pull request event: 3
  • Fork event: 1

Committers

Last synced: about 2 years ago

All Time
  • Total Commits: 98
  • Total Committers: 2
  • Avg Commits per committer: 49.0
  • Development Distribution Score (DDS): 0.061
Past Year
  • Commits: 98
  • Committers: 2
  • Avg Commits per committer: 49.0
  • Development Distribution Score (DDS): 0.061
Top Committers
Name Email Commits
alejandrogzi j****1@u****e 92
mahajrod m****d@g****m 6
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: about 2 years ago

All Time
  • Total issues: 0
  • Total pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: about 12 hours
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 4.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: about 12 hours
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 4.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • mili-ai (1)
  • MauriAndresMU1313 (1)
Pull Request Authors
  • shjenkins94 (4)
  • alejandrogzi (2)
  • mahajrod (1)
  • ning-y (1)
Top Labels
Issue Labels
Pull Request Labels
enhancement (1)