vicinityanalyzer

Check the neighbouring genes in KEGG genomes for KEGG Orthology (KO), domains and keywords

https://github.com/gaenssle/vicinityanalyzer

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.7%) to scientific vocabulary

Keywords

filtering kegg kegg-genome kegg-ortholog

Last synced: 6 months ago · JSON representation ·

Repository

Check the neighbouring genes in KEGG genomes for KEGG Orthology (KO), domains and keywords

Basic Info

Host: GitHub
Owner: gaenssle
License: apache-2.0
Language: Python
Default Branch: main
Homepage:
Size: 61.5 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Topics

filtering kegg kegg-genome kegg-ortholog

Created over 2 years ago · Last pushed almost 2 years ago

Metadata Files

Readme License Citation

Vicinity Analyzer

created 2023 by gaenssle written in Python 3.8

Process

It accepts three types of inputs
- A KEGG Orthology (KO) ID (e.g. K22276)
- A (list of) KEGG gene ID(s) (e.g. cak:Caul_3276), list with ',' and no spaces
- A file (table) with gene IDs (.txt or .csv), with the column header ID'
If the input is a KO ID, all associated gene IDs are downloaded first from KEGG
Determine the gene label increments (1,5 or 10) for each corresponding KEGG genome
Download all neighbouring genes within the given range (default= +/-5)
Count the occurence of the provided input targets
- KO ID
- Pfam domain
- Keyword in assigned name
- May be prodived as file with target-type pairs (type=[KO-ID, Domain, Name])
Export accumulated neighbours and occurence count

Data

Downloaded sequence data:
- Assigned description
- Organism and taxonomy
- Sequence
- Domain architecture
- Available IDs from other databases
Count each target:
- Per entry (occurences for each gene ID)
- Per range (occurences at each range position)
Save data in the following files:
- Gene IDs (provided or downloaded via entered KO-ID)
- Gene details for each neighbor (organism, architecture, sequence, etc)
- Count files (entry and range)

Dependencies

The program used python 3.8 and the following modules:
- pandas
- argparse
- Bio (Bio.KEGG)
- ssl, urllib.request
- os, re, io

How to use

``` Main.py [-h] [-ask] [-ti TARGETID] [-td TARGETDOMAIN] [-tn TARGETNAME] [-tf TARGETFILE] [-a ACTION] [-r RANGE] [-n NAME] [-f FOLDER] [-cs CLUSTERSIZE] [-ft FILETYPE] [-sep SEPARATOR] input

VICINITY ANALYZER This program downloads neighboring genes from KEGG genomes via KEGG.API The input is either a KO-ID or a list of gene IDs (in file or as list) All gene IDs within the given range of the provided ID(s) are obtained from KEGG It then downloads relevant details for each gene ID, e.g. organism and domain architecture The occurrences of each provided target is counted per entry and per position

positional arguments: input KO ID, KEGG gene ID(s) or file containing KEGG gene IDs (e.g. blb:BBMN681454,blf:BLIF1909)

optional arguments: -h, --help show this help message and exit -ask, --askoverwrite ask before overwriting files -ti TARGETID, --targetID TARGETID target KO ID(s) used for filtering (e.g. K21572) -td TARGETDOMAIN, --targetDomain TARGETDOMAIN target domain name(s) (Pfam) used for filtering (e.g. RagB,SusD-like) -tn TARGETNAME, --targetName TARGETNAME target keword(s) in name/annotation used for filtering (e.g. RagB,SusD) -tf TARGETFILE, --targetFile TARGETFILE File containing target;type pairs used for filtering (types=[KO-ID, Name, Domain], sep=-sep) -a ACTION, --action ACTION add actions to be conducted: a=all, i=retrieve IDs, g=get neighbors, f=filter with target(default: a) -r RANGE, --range RANGE +/- range in which genes will be searched (default: 5) -n NAME, --name NAME name of files (default: same as 'input') -f FOLDER, --folder FOLDER name of the parent folder (default: same as 'input') -cs CLUSTERSIZE, --clustersize CLUSTERSIZE entries/frament files (default: 25) -ft FILETYPE, --filetype FILETYPE type of the generated files (default: .csv) -sep SEPARATOR, --separator SEPARATOR separator between columns in the output files (default: ;) ```

Owner

Login: gaenssle
Kind: user

Repositories: 1
Profile: https://github.com/gaenssle

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Vicinity Analyzer
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - orcid: 'https://orcid.org/0000-0002-9488-5086'
    given-names: Lucie
    family-names: Gaenssle
    email: a.l.o.gaenssle@rug.nl
    affiliation: University of Groningen

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

vicinityanalyzer

Science Score: 44.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Vicinity Analyzer

Process

Data

Dependencies

How to use

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year