https://github.com/csbiology/chlamyatlas

Chlamy Atlas is a AI-powered web application which predicts the localizations of proteins from the Green Algae Chlamydomonas reinhardtii.

https://github.com/csbiology/chlamyatlas

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.2%) to scientific vocabulary

Keywords

chlamydomonas data-science ml plant science target tool
Last synced: 5 months ago · JSON representation

Repository

Chlamy Atlas is a AI-powered web application which predicts the localizations of proteins from the Green Algae Chlamydomonas reinhardtii.

Basic Info
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 1
  • Releases: 1
Topics
chlamydomonas data-science ml plant science target tool
Created about 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme

README.md

ChlamyAtlas

A web UI for optimised versions of the models published in Wang et al. 2023.

alt text

Supported formats

ChlamyAtlas expects input in either the FASTA format or as pure amino acid sequence. The FASTA format consists of two building blocks. The first is a description which explains the following sequence. This description starts with ">" and is written in a single line. The amino acid sequence follows in the next line and can span multiple lines. An example for this format is: ```

sp|A0A178WF56|CSTM3ARATH Protein CYSTEINE-RICH TRANSMEMBRANE MODULE 3 OS=Arabidopsis thaliana OX=3702 GN=CYSTM3 PE=1 SV=1 MAQYHQQHEMKQTMAETQYVTAPPPMGYPVMMKDSPQTVQPPHEGQSKGSGGFLRGCLAA MCCCCVLDCVF sp|A1YKT1|TCP18ARATH Transcription factor TCP18 OS=Arabidopsis thaliana OX=3702 GN=TCP18 PE=1 SV=1 MNNNIFSTTTTINDDYMLFPYNDHYSSQPLLPFSPSSSINDILIHSTSNTSNNHLDHHHQ FQQPSPFSHFEFAPDCALLTSFHPENNGHDDNQTIPNDNHHPSLHFPLNNTIVEQPTEPS ETINLIEDSQRISTSQDPKMKKAKKPSRTDRHSKIKTAKGTRDRRMRLSLDVAKELFGLQ DMLGFDKASKTVEWLLTQAKPEIIKIATTLSHHGCFSSGDESHIRPVLGSMDTSSDLCEL ASMWTVDDRGSNTNTTETRGNKVDGRSMRGKRKRPEPRTPILKKLSKEERAKARERAKGR TMEKMMMKMKGRSQLVKVVEEDAHDHGEIIKNNNRSQVNRSSFEMTHCEDKIEELCKNDR FAVCNEFIMNKKDHISNESYDLVNYKPNSSFPVINHHRSQGAANSIEQHQFTDLHYSFGA KPRDLMHNYQNMY ```

ChlamyAtlas was developed with the assumption that the description follows the standard used by the Universal Protein Resource (Uniprot) and only returns the Uniprot ID as description in the output table. This can be circumvented by removing the "|" in the description. In this case the complete description gets returned.

The only other supported format are pure amino acid sequences. An example for this format is: MAQYHQQHEMKQTMAETQYVTAPPPMGYPVMMKDSPQTVQPPHEGQSKGSGGFLRGCLAA MCCCCVLDCVF This format can only be used for a single amino acid sequence. Multiple amino acid sequences must be in the following format: ```

!MAQYHQQHEMKQTMAETQYVTAPPPMGYPVMMKDSPQTVQPPHEGQSKGSGGFLRGCLAA MCCCCVLDCVF !MNNNIFSTTTTINDDYMLFPYNDHYSSQPLLPFSPSSSINDILIHSTSNTSNNHLDHHHQ FQQPSPFSHFEFAPDCALLTSFHPENNGHDDNQTIPNDNHHPSLHFPLNNTIVEQPTEPS ETINLIEDSQRISTSQDPKMKKAKKPSRTDRHSKIKTAKGTRDRRMRLSLDVAKELFGLQ DMLGFDKASKTVEWLLTQAKPEIIKIATTLSHHGCFSSGDESHIRPVLGSMDTSSDLCEL ASMWTVDDRGSNTNTTETRGNKVDGRSMRGKRKRPEPRTPILKKLSKEERAKARERAKGR TMEKMMMKMKGRSQLVKVVEEDAHDHGEIIKNNNRSQVNRSSFEMTHCEDKIEELCKNDR FAVCNEFIMNKKDHISNESYDLVNYKPNSSFPVINHHRSQGAANSIEQHQFTDLHYSFGA KPRDLMHNYQNMY ```

Result

alt text

Explanations of Chloropred ,Qchloro, Mitopred,Qmito,Secrpred,Qsecr, and FinalPred.

Chloropred

Prediction score indicating the likelihood of the protein being localized to the Chloroplast. A higher scores suggest a stronger prediction that the protein is localized in the Chloroplast.

Qchloro

q-value associated with the Chloroplast prediction score. Provides a measure of statistical significance for the Chloroplast prediction. Lower q-values indicate higher statistical significance.

Mitopred

Prediction score for the localization of the protein to the Mitochondria. A higher scores suggest a stronger prediction of Mitochondrial localization.

Qmito

q-value associated with the Mitochondria prediction score. Indicates the statistical significance of the Mitochondria localization prediction. Lower q-values suggest a more reliable prediction.

Secrpred

Prediction score for identifying the protein as a Secretory Protein.A higher scores indicate a stronger likelihood that the protein functions as a Secretory Protein.

Qsecr

q-value for the Secretory Protein prediction. Provides a measure of the statistical significance of the Secretory Protein prediction. Lower q-values are indicative of more statistically significant predictions.

FinalPred

Represents the model's final prediction of the protein's localization based on the highest score and its corresponding q-value. The final localization is determined by comparing the q-values and prediction scores against preset cutoffs. If all q-values exceed the cutoff, the protein is classified as "Cytoplasmic."

Cutoff

The threshold q-value below which a prediction is considered statistically significant. Set to 0.05 by default, meaning that predictions with q-values below this threshold are classified as significant. This parameter helps in distinguishing between statistically significant and non-significant predictions, reducing the chance of false-positive localizations.

Docker

Environment Variables

  • NETEMAILEMAIL: Email address to send emails from

Default: Set via user secrets

  • NETEMAILACCOUNTNAME: Email account name to send emails from

    Default: Set via user secrets

  • NETEMAILPASSWORD: Email account password to send emails from

    Default: Set via user secrets

  • NETEMAILSERVER: Email server to send emails from

    Default: Set via user secrets

  • NETEMAILPORT: Email server port to send emails from

    Default: Set via user secrets

  • PYTHONSERVICETIMEOUT: Time in minutes before conenction timeout between ui and api service

    Default: 30 minutes

  • PYTHONSERVICEURL: Sets the url for the api predictor backend.

    Default: http://localhost:8000

    Remarks: In docker compose this could be http://host.docker.internal:8000

    Remarks: On Linux might require:

    extra_hosts: - "host.docker.internal:host-gateway"

  • PYTHONSERVICESTORAGE_TIMESPAN: How long the user data should be stored

    Default: 1 Hour

Docker Compose

```yaml version: '3.7'

name: chlamyatlas

services: api: image: csbdocker/chlamyatlas-api:latest ports: - 8000:80 environment: GUNICORNCMDARGS: "-k uvicorn.workers.UvicornWorker --preload" MAXWORKERS: "4" TIMEOUT: "0" ui: image: csbdocker/chlamyatlas-ui:latest environment: PYTHONSERVICEURL: "http://host.docker.internal:8000" PYTHONSERVICESTORAGETIMESPAN: "7" ports: - 5000:5000 # Use this to make host.docker.internal accessible on linux docker extra_hosts: - "host.docker.internal:host-gateway" ```

Local Development

Install pre-requisites

You'll need to install the following pre-requisites in order to build SAFE applications

Install

  • run setup.cmd

.. or ..

  1. dotnet tool restore
  2. py -m venv .venv
  3. .\.venv\Scripts\python.exe -m pip install -r .\src\FastAPI\requirements.txt

Run

  • .\build.cmd run starts SAFE stack

plus in another terminal run:

  1. activate local python environment: .\.venv\Scripts\Activate.ps1
  2. navigate to fastapi folder: cd .\src\FastAPI\
  3. start fastapi backend: ./run.cmd

Activate Email notification (optional)

Set user-secrets in the following schema:

json { "email": { "NET_EMAIL_EMAIL": "placeholder@mail.de", "NET_EMAIL_ACCOUNTNAME": "PlaceholderAccountName", "NET_EMAIL_PASSWORD": "HelloWorld1234", "NET_EMAIL_SERVER": "smtp.placeholdermail.de", "NET_EMAIL_PORT": 587 } }

Publish

Test Publish

  1. .\build.cmd dockerbundle [--uionly], creates :new docker image(s). Skip fastapi image with --uionly
  2. .\build.cmd dockertest, uses local docker-compose file to start :new images.

To docker-hub

  1. Login to CSB-Docker
  2. Ensure correct Versions, both for python and dotnet service.
    • .\build.cmd versions
    • Remarks: Versions are defined in project files. Paths can be found in build project ProjectInfo.fs. Accessed via regex parsing.
  3. Run Test Publish steps. The following step requires built :new images.
  4. .\build.cmd dockerpublish

Request Workflow

mermaid sequenceDiagram participant py as Python ML participant net as F#35; Server participant c as Client actor u as User u -->> c: Gives data c -->>+net: sends user data par start analysis loop net-)+py: send sequence py->py: predict target py-)net: return predicted target end and return request information net -) c: returns `request-ID` end critical ⚠️ u -->> c: copies and stores `request-ID` end opt email u -->> c: give email address c -->> net: give id + email to store end opt check status u -->> c: use `request-ID` to check status end py-)net: send last package deactivate py net-->>net: run q-value calculation net-->>net: store results deactivate net opt gave email net-)u: send email end u -->> c: request data c-->>net: get data net-->>c: return data c-->>u: download data

Owner

  • Name: Computational Systems Biology
  • Login: CSBiology
  • Kind: organization
  • Location: Kaiserslautern

Computational Systems Biology

GitHub Events

Total
Last Year

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 45
  • Total Committers: 2
  • Avg Commits per committer: 22.5
  • Development Distribution Score (DDS): 0.244
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Kevin F F****r@g****e 34
fejung 9****g 11
Committer Domains (Top 20 + Academic)
gmx.de: 1

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 25
  • Total pull requests: 0
  • Average time to close issues: 1 day
  • Average time to close pull requests: N/A
  • Total issue authors: 2
  • Total pull request authors: 0
  • Average comments per issue: 0.36
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 25
  • Pull requests: 0
  • Average time to close issues: 1 day
  • Average time to close pull requests: N/A
  • Issue authors: 2
  • Pull request authors: 0
  • Average comments per issue: 0.36
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • Freymaurer (23)
  • muehlhaus (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels