https://github.com/bionf/fdog

Feature-aware Directed OrtholoG search

https://github.com/bionf/fdog

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (17.3%) to scientific vocabulary

Keywords

orthologs orthology orthology-inference python
Last synced: 6 months ago · JSON representation

Repository

Feature-aware Directed OrtholoG search

Basic Info
  • Host: GitHub
  • Owner: BIONF
  • License: gpl-3.0
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 28.5 MB
Statistics
  • Stars: 10
  • Watchers: 4
  • Forks: 4
  • Open Issues: 5
  • Releases: 1
Topics
orthologs orthology orthology-inference python
Created over 5 years ago · Last pushed 8 months ago
Metadata Files
Readme Changelog License

README.md

fDOG - Feature-aware Directed OrtholoG search

published in: MBE PyPI version License: GPL v3 Github Build

Table of Contents

How to install

fDOG tool is distributed as a python package called fdog. It is compatible with Python ≥ v3.12.

Install the fDOG package

You can install fdog using pip: python3 -m pip install fdog

or, in case you do not have admin rights, and don't use package systems like Anaconda to manage environments you need to use the --user option: python3 -m pip install --user fdog

and then add the following line to the end of your ~/.bashrc or ~/.bash_profile file, restart the current terminal to apply the change (or type source ~/.bashrc):

export PATH=$HOME/.local/bin:$PATH

Setup fDOG

After installing fdog, you need to setup fdog to get its dependencies and pre-calculated data.

NOTE: in case you haven't installed greedyFAS, it will be installed automatically within fDOG setup. However, you need to run setupFAS after fDOG setup finished before actually using fDOG!

You can setup fDOG by running this command fdog.setup -d /output/path/for/fdog/data

Pre-calculated data set of fdog will be saved in /output/path/for/fdog/data. After the setup run successfully, you can start using fdog. Please make sure to check if you need to run setupFAS first.

You will get a warning if any of the dependencies are not ready to use, please solve those issues and rerun fdog.setup.

For debugging the setup, please create a log file by running the setup as e.g. fdog.setup | tee log.txt and send us that log file, so that we can trouble shoot the issues. Most of the problems can be solved by just re-running the setup.

Usage

fdog will run smoothly with the provided sample input file 'infile.fa' if everything is set correctly.

fdog.run --seqFile infile.fa --jobName test --refspec HUMAN@9606@qfo24_02 The output files with the prefix test will be saved at your current working directory. You can have an overview about all available options with the command fdog.run -h

Please find more information in our wiki to learn about the input and outputs files of fdog.

fDOG data set

Within the data package we provide a set of 81 reference taxa. They will be automatically downloaded during the setup. This data comes "ready to use" with the fdog framework. Species data must be present in the three directories listed below:

  • searchTaxa_dir (Contains sub-directories for proteome fasta files for each species)
  • coreTaxa_dir (Contains sub-directories for BLAST databases made with makeblastdb out of your proteomes)
  • annotation_dir (Contains feature annotation files for each proteome)

For each species/taxon there is a sub-directory named in accordance to the naming schema ([Species acronym]@[NCBI ID]@[Proteome version])

fdog is not limited to those 81 reference taxa. If needed the user can manually add further gene sets (multiple fasta format) using provided functions.

Adding a new gene set into fDOG

For adding one gene set, please use the fdog.addTaxon function: fdog.addTaxon -f newTaxon.fa -i tax_id [-o /output/directory] [-n abbr_tax_name] [-c] [-v protein_version] [-a]

in which, the first 3 arguments are required including newTaxon.fa is the gene set that need to be added, tax_id is its NCBI taxonomy ID, /output/directory is where the sub-directories can be found (genome_dir, blast_dir and weight_dir). If not given, new taxon will be added into the same directory of pre-calculated data. Other arguments are optional, which are -n for specify your own taxon name (if not given, an abbriviate name will be suggested based on the NCBI taxon name of the input tax_id), -c for calculating the BLAST DB (only needed if you need to include your new taxon into the list of taxa for compilating the core set), -v for identifying the genome/proteome version (default will be the current date ), and -a for turning off the annotation step (not recommended).

Adding a list of gene sets into fDOG

For adding more than one gene set, please use the fdog.addTaxa script: fdog.addTaxa -i /path/to/newtaxa/fasta -m mapping_file [-o /output/directory] [-c] in which, /path/to/taxa/fasta is a folder where the FASTA files of all new taxa can be found. mapping_file is a tab-delimited text file, where you provide the taxonomy IDs that stick with the FASTA files:

```

filename taxid abbrtax_name version

filename1.fa 12345678 filename2.faa 9606 filename3.fasta 4932 my_fungi ... ```

The header line (started with #) is a Must. The values of the last 2 columns (abbr. taxon name and genome version) are, however, optional. If you want to specify a new version for a genome, you need to define also the abbr. taxon name, so that the genome version is always at the 4th column in the mapping file.

NOTE: After adding new taxa into fdog, you should check for the validity of the new data before running fdog.

Bugs

Any bug reports or comments, suggestions are highly appreciated. Please open an issue on GitHub or be in touch via email.

How to cite

Tran V, Langschied F, Muelbaier H, Dosch J, Arthen F, Balint M, Ebersberger I. 2025. Feature architecture-aware ortholog search with fDOG reveals the distribution of plant cell wall-degrading enzymes across life. Molecular Biology and Evolution:msaf120. https://doi.org/10.1093/molbev/msaf120

Contributors

Contact

For further support or bug reports please contact: ebersberger@bio.uni-frankfurt.de

Owner

  • Name: BIONF
  • Login: BIONF
  • Kind: organization

GitHub Events

Total
  • Issues event: 3
  • Watch event: 2
  • Delete event: 3
  • Issue comment event: 15
  • Push event: 23
  • Gollum event: 10
  • Pull request event: 5
  • Fork event: 1
  • Create event: 13
Last Year
  • Issues event: 3
  • Watch event: 2
  • Delete event: 3
  • Issue comment event: 15
  • Push event: 23
  • Gollum event: 10
  • Pull request event: 5
  • Fork event: 1
  • Create event: 13

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 474
  • Total Committers: 3
  • Avg Commits per committer: 158.0
  • Development Distribution Score (DDS): 0.236
Top Committers
Name Email Commits
mueli94 h****r@g****m 362
trvinh t****h@g****m 108
mueli94 4****4@u****m 4

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 22
  • Total pull requests: 16
  • Average time to close issues: 7 months
  • Average time to close pull requests: about 3 hours
  • Total issue authors: 10
  • Total pull request authors: 3
  • Average comments per issue: 2.86
  • Average comments per pull request: 0.25
  • Merged pull requests: 13
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 3
  • Average time to close issues: 4 days
  • Average time to close pull requests: about 14 hours
  • Issue authors: 1
  • Pull request authors: 2
  • Average comments per issue: 12.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • trvinh (8)
  • majssssa (4)
  • elfring (2)
  • bheimbu (2)
  • paulmenzel (1)
  • Somnous1998 (1)
  • hyyuu (1)
  • Dream-sugar (1)
  • erya-song (1)
  • sashulkaSh (1)
  • wojiaonjp (1)
  • swttalyan (1)
  • huangziyan11111 (1)
Pull Request Authors
  • mueli94 (19)
  • trvinh (3)
  • HannahBioI (1)
Top Labels
Issue Labels
enhancement (6) bug (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 352 last-month
  • Total dependent packages: 1
  • Total dependent repositories: 1
  • Total versions: 83
  • Total maintainers: 1
pypi.org: fdog

Feature-aware Directed OrtholoG search tool

  • Versions: 83
  • Dependent Packages: 1
  • Dependent Repositories: 1
  • Downloads: 352 Last month
Rankings
Dependent packages count: 4.8%
Average: 16.5%
Downloads: 17.9%
Forks count: 19.1%
Stargazers count: 19.4%
Dependent repos count: 21.5%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/github_build.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • casperdcl/deploy-pypi v2 composite
setup.py pypi
  • PyYAML *
  • biopython *
  • ete3 *
  • greedyFAS >=1.11.2
  • pyhmmer *
  • pysam *
  • six *
  • tqdm *