domainanalyzer

Analyze domains regarding their taxonomic distribution, motifs with other domains and export their fasta sequence

https://github.com/gaenssle/domainanalyzer

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.8%) to scientific vocabulary

Keywords

domain fasta kegg taxonomy uniprot
Last synced: 6 months ago · JSON representation ·

Repository

Analyze domains regarding their taxonomic distribution, motifs with other domains and export their fasta sequence

Basic Info
  • Host: GitHub
  • Owner: gaenssle
  • License: apache-2.0
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 207 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
domain fasta kegg taxonomy uniprot
Created over 3 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

Domain Analyzer

created 2023 by gaenssle written in Python 3.8

Process

  • The input is an ID, e.g. domain name (PFAM or UniProt ID)
  • Various databases are accessed via Genome.jp, e.g. KEGG or UnitProt
  • All available protein data associated with the input ID are downloaded
  • The downloaded data is extracted, accumulated and counted
  • Various output files are generated, including FASTA

Data

  • Downloaded sequence data:
    • Assigned description
    • Organism and taxonomy
    • Sequence
    • Domain architecture
    • Available IDs from other databases
  • Count:
    • Taxonomy (Phylum and organism)
    • Domain architecture
  • Save data in the following files:
    • Gene IDs
    • Gene details (organism, architecture, sequence, etc)
    • Summary domain architecture
    • Summary of taxonomic distribution
    • Fasta files (containing entire sequence)
    • Fasta files (only containing the target domain sequence)

Dependencies

  • The program used python 3.8 and the following modules:
    • pandas
    • argparse
    • multiprocessing (optional for Linux)

How to use

``` python3 Main.py name [-h] [-m] [-ask] [-db DBLIST] [-a ACTION] [-st SEARCHTYPE] [-c CUTOFF] [-sam SAMPLESIZE] [-f FOLDER] [-cs CLUSTERSIZE] [-ft FILETYPE] [-sep SEPARATOR]

positional arguments:

name name of the domain

optional arguments: -h, --help show this help message and exit -m, --multiprocess turn on mutltiprocessing (only for Linux) -ask, --askoverwrite ask before overwriting files -db DBLIST, --dblist DBLIST list databases to be searched, separated by ',' (default: UniProt;KEGG;PDB;swissprot) -a ACTION, --action ACTION add actions to be conducted: a=all, i=entry IDs, d=protein data, m=KEGG motif, e=extract (default: a) -st SEARCHTYPE, --searchtype SEARCHTYPE type of the searched id (default: pf) -c CUTOFF, --cutoff CUTOFF min E-Value of Pfam domains (default: 0.0001) -sam SAMPLESIZE, --samplesize SAMPLESIZE max number of downloaded entries (default: 0) -f FOLDER, --folder FOLDER name of the parent folder (default: same as 'name') -cs CLUSTERSIZE, --clustersize CLUSTERSIZE entries/frament files (default: 100) -ft FILETYPE, --filetype FILETYPE type of the generated files (default: .csv) -sep SEPARATOR, --separator SEPARATOR separator between columns in the output files (default: ;) ```

Owner

  • Login: gaenssle
  • Kind: user

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Domain Analyzer
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - orcid: 'https://orcid.org/0000-0002-9488-5086'
    given-names: Lucie
    family-names: Gaenssle
    email: a.l.o.gaenssle@rug.nl
    affiliation: University of Groningen

GitHub Events

Total
Last Year

Committers

Last synced: about 2 years ago

All Time
  • Total Commits: 33
  • Total Committers: 2
  • Avg Commits per committer: 16.5
  • Development Distribution Score (DDS): 0.091
Past Year
  • Commits: 33
  • Committers: 2
  • Avg Commits per committer: 16.5
  • Development Distribution Score (DDS): 0.091
Top Committers
Name Email Commits
gaenssle a****e@g****t 30
gaenssle 1****e 3
Committer Domains (Top 20 + Academic)
gmx.at: 1

Issues and Pull Requests

Last synced: about 2 years ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels