symcla

symcla: symbiont classifier

https://github.com/nelli-team/symcla

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.8%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

symcla: symbiont classifier

Basic Info
  • Host: GitHub
  • Owner: NeLLi-team
  • License: gpl-3.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 130 MB
Statistics
  • Stars: 5
  • Watchers: 2
  • Forks: 1
  • Open Issues: 1
  • Releases: 0
Created over 1 year ago · Last pushed 10 months ago
Metadata Files
Readme License Citation

README.md

GitHub top language GitHub License

⚠️⚠️⚠️ symcla is being archived and its development will continue as part of the symclatron project ⚠️⚠️⚠️

symcla: symbiont classifier

💾 Installation

Clone the symcla repository:

{shell} git clone https://github.com/NeLLi-team/symcla.git

{bash} cd symcla/ chmod u+x symcla

Create conda environment and install requirements:

{bash} conda create -c conda-forge -c bioconda --name symcla --file requirements.txt

💽 Setup data (run only once)

Run inside the symcla/ folder:

{shell} conda activate symcla

{shell} ./symcla setup


🚀 Example run

{shell} conda activate symcla

👷🏻‍♀️ Run the classifier

{shell} path_to_symcla/symcla classify --genomedir data/test_genomes --savedir test_output --ncpus 32

To get help

```{bash} ./symcla classify --help

Usage: symcla classify [OPTIONS]

╭─ Options ──────────────────────────────────────────────────────────────────────╮

│ --genomedir TEXT [default: input_genomes] │

│ --savedir TEXT [default: output_symcla] │

│ --ncpus INTEGER [default: 16] │

│ --deltmp --no-deltmp [default: deltmp] │

│ --help Show this message and exit. │

╰────────────────────────────────────────────────────────────────────────────────╯

```

🕺🏻 Results

Expected results from the test data:

| taxonoid | completenessUNI56 | featuresgt0| featuresge20 | featuresge100 | symclascore | |-----------------------------------------|--------------------|-------------|---------------|----------------|--------------| | IMGI2140918011 | 98.214 | 396 | 291 | 128 | 0.000 | | IMGI2645727657 | 100.000 | 287 | 197 | 70 | -0.003 | | IMGI651324087 | 100.000 | 368 | 252 | 106 | -0.009 | | IMGM3300027739BIN74 | 64.286 | 310 | 234 | 95 | 0.001 | | SCISO2808607008 | 98.214 | 406 | 276 | 124 | 2.000 | | SDISOGCA003484685.1 | 83.929 | 193 | 126 | 43 | 2.000 | | SHISO2654587767 | 98.214 | 423 | 309 | 134 | 2.000 | | SLISOGCF900639865.1 | 100.000 | 569 | 429 | 234 | 0.999 | | SRISO640427127 | 92.857 | 296 | 197 | 103 | 2.000 | | SXGCA000019745.1 | 98.214 | 353 | 259 | 106 | 0.126 | | SXGCA902860225.1Azoamicus_ciliaticola | 91.071 | 117 | 83 | 36 | 1.055 | | SXISO642555114 | 96.429 | 333 | 243 | 108 | 1.995 |

🧐 Interpretation of results:

  • completeness_UNI56: The percentage of 56 universal bacterial and archaeal marker genes found in the genome. We do not advise to trust any results <50%. Confidence in symbiont prediction increases with UNI56 completeness.
  • features_gt01: Number of features found with a bitscore greater than 0. Confidence in symbiont prediction increases with more features found.
  • features_ge20: Number of features found with a bitscore greater or equal than 20. Confidence in symbiont prediction increases with more features found.
  • features_ge100: Number of features found with a bitscore greater or equal than 100. Confidence in symbiont prediction increases with more features found.
  • symcla_score: after adjusting the classification thresholds based on thousands of experiments, we recommend the following values:
    • symcla_score <= 0.42: Free-living
    • 0.42 < symcla_score < 1.21: Symbiont;Host-associated
    • symcla_score >= 1.21: Symbiont;Intracellular

🤖 Note: by design symcla minimizes the rate of false positives for symbionts, at the expense of increased false negatives (i.e. some Symbiont;Host-associated might still get a symcla score lower than 0.42, and some Symbiont;Intracellular might still get a symcla score lower than 1.21).

🐳 symcla container

Apptainer

```bash apptainer pull \ docker://docker.io/jvillada/symcla:latest

apptainer run \ docker://docker.io/jvillada/symcla:latest \ symcla \ classify \ --genomedir pathtodirwithfaafiles \ --savedir pathtooutputdir \ --ncpus 16 ```

Owner

  • Name: NeLLi: New Lineages of Life
  • Login: NeLLi-team
  • Kind: organization
  • Location: United States of America

New Lineages of Life - US DOE Joint Genome Institute

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Villada"
  given-names: "Juan C."
  orcid: "https://orcid.org/0000-0003-2216-4279"
- family-names: "Schulz"
  given-names: "Frederik"
  orcid: "https://orcid.org/0000-0002-4932-4677"
title: "symcla: symbiont classifier"
version: 0.1.0
date-released: 2024-07-08
url: "https://github.com/NeLLi-team/symcla"

GitHub Events

Total
  • Issues event: 1
  • Watch event: 2
  • Push event: 12
  • Pull request event: 2
  • Fork event: 1
Last Year
  • Issues event: 1
  • Watch event: 2
  • Push event: 12
  • Pull request event: 2
  • Fork event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 1
  • Total pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: 2 days
  • Total issue authors: 1
  • Total pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: 2 days
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • willboulton (1)
Pull Request Authors
  • Tsaranoga (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

Dockerfile docker
  • ubuntu 23.10 build
requirements.txt pypi
  • brotli =1.0.9
  • brotli-bin =1.0.9
  • bzip2 =1.0.8
  • ca-certificates =2023.5.7
  • certifi =2023.5.7
  • charset-normalizer =3.1.0
  • click =8.1.3
  • colorama =0.4.6
  • hmmer =3.3.2
  • idna =3.4
  • joblib =1.2.0
  • ld_impl_linux-64 =2.40
  • libblas =3.9.0
  • libbrotlicommon =1.0.9
  • libbrotlidec =1.0.9
  • libbrotlienc =1.0.9
  • libcblas =3.9.0
  • libexpat =2.5.0
  • libffi =3.4.2
  • libgcc-ng =12.2.0
  • libgfortran-ng =12.2.0
  • libgfortran5 =12.2.0
  • libgomp =12.2.0
  • liblapack =3.9.0
  • libnsl =2.0.0
  • libopenblas =0.3.21
  • libsqlite =3.42.0
  • libstdcxx-ng =12.2.0
  • libuuid =2.38.1
  • libxgboost =1.7.4
  • libzlib =1.2.13
  • markdown-it-py =2.2.0
  • mdurl =0.1.0
  • ncurses =6.3
  • numpy =1.24.3
  • openssl =3.1.1
  • packaging =23.1
  • pandas =2.0.2
  • pip =23.1.2
  • platformdirs =3.5.1
  • pooch =1.7.0
  • py-xgboost =1.7.4
  • pygments =2.15.1
  • pysocks =1.7.1
  • python =3.11.3
  • python-dateutil =2.8.2
  • python-tzdata =2023.3
  • python_abi =3.11
  • pytz =2023.3
  • readline =8.2
  • requests =2.31.0
  • rich =13.4.1
  • scikit-learn =1.2.2
  • scipy =1.10.1
  • setuptools =67.7.2
  • shap =0.45.0
  • shellingham =1.5.1
  • six =1.16.0
  • threadpoolctl =3.1.0
  • tk =8.6.12
  • typer =0.9.0
  • typing-extensions =4.6.2
  • tzdata =2023c
  • urllib3 =2.0.2
  • wheel =0.40.0
  • xgboost =1.7.4
  • xz =5.2.6