https://github.com/csbiology/chlamyatlas
Chlamy Atlas is a AI-powered web application which predicts the localizations of proteins from the Green Algae Chlamydomonas reinhardtii.
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.2%) to scientific vocabulary
Keywords
Repository
Chlamy Atlas is a AI-powered web application which predicts the localizations of proteins from the Green Algae Chlamydomonas reinhardtii.
Basic Info
- Host: GitHub
- Owner: CSBiology
- Language: F#
- Default Branch: main
- Homepage: https://csb-chlamyatlas.bio.rptu.de
- Size: 135 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
- Releases: 1
Topics
Metadata Files
README.md
ChlamyAtlas
A web UI for optimised versions of the models published in Wang et al. 2023.

Supported formats
ChlamyAtlas expects input in either the FASTA format or as pure amino acid sequence. The FASTA format consists of two building blocks. The first is a description which explains the following sequence. This description starts with ">" and is written in a single line. The amino acid sequence follows in the next line and can span multiple lines. An example for this format is: ```
sp|A0A178WF56|CSTM3ARATH Protein CYSTEINE-RICH TRANSMEMBRANE MODULE 3 OS=Arabidopsis thaliana OX=3702 GN=CYSTM3 PE=1 SV=1 MAQYHQQHEMKQTMAETQYVTAPPPMGYPVMMKDSPQTVQPPHEGQSKGSGGFLRGCLAA MCCCCVLDCVF sp|A1YKT1|TCP18ARATH Transcription factor TCP18 OS=Arabidopsis thaliana OX=3702 GN=TCP18 PE=1 SV=1 MNNNIFSTTTTINDDYMLFPYNDHYSSQPLLPFSPSSSINDILIHSTSNTSNNHLDHHHQ FQQPSPFSHFEFAPDCALLTSFHPENNGHDDNQTIPNDNHHPSLHFPLNNTIVEQPTEPS ETINLIEDSQRISTSQDPKMKKAKKPSRTDRHSKIKTAKGTRDRRMRLSLDVAKELFGLQ DMLGFDKASKTVEWLLTQAKPEIIKIATTLSHHGCFSSGDESHIRPVLGSMDTSSDLCEL ASMWTVDDRGSNTNTTETRGNKVDGRSMRGKRKRPEPRTPILKKLSKEERAKARERAKGR TMEKMMMKMKGRSQLVKVVEEDAHDHGEIIKNNNRSQVNRSSFEMTHCEDKIEELCKNDR FAVCNEFIMNKKDHISNESYDLVNYKPNSSFPVINHHRSQGAANSIEQHQFTDLHYSFGA KPRDLMHNYQNMY ```
ChlamyAtlas was developed with the assumption that the description follows the standard used by the Universal Protein Resource (Uniprot) and only returns the Uniprot ID as description in the output table. This can be circumvented by removing the "|" in the description. In this case the complete description gets returned.
The only other supported format are pure amino acid sequences. An example for this format is:
MAQYHQQHEMKQTMAETQYVTAPPPMGYPVMMKDSPQTVQPPHEGQSKGSGGFLRGCLAA
MCCCCVLDCVF
This format can only be used for a single amino acid sequence. Multiple amino acid sequences must be in the following format:
```
!MAQYHQQHEMKQTMAETQYVTAPPPMGYPVMMKDSPQTVQPPHEGQSKGSGGFLRGCLAA MCCCCVLDCVF !MNNNIFSTTTTINDDYMLFPYNDHYSSQPLLPFSPSSSINDILIHSTSNTSNNHLDHHHQ FQQPSPFSHFEFAPDCALLTSFHPENNGHDDNQTIPNDNHHPSLHFPLNNTIVEQPTEPS ETINLIEDSQRISTSQDPKMKKAKKPSRTDRHSKIKTAKGTRDRRMRLSLDVAKELFGLQ DMLGFDKASKTVEWLLTQAKPEIIKIATTLSHHGCFSSGDESHIRPVLGSMDTSSDLCEL ASMWTVDDRGSNTNTTETRGNKVDGRSMRGKRKRPEPRTPILKKLSKEERAKARERAKGR TMEKMMMKMKGRSQLVKVVEEDAHDHGEIIKNNNRSQVNRSSFEMTHCEDKIEELCKNDR FAVCNEFIMNKKDHISNESYDLVNYKPNSSFPVINHHRSQGAANSIEQHQFTDLHYSFGA KPRDLMHNYQNMY ```
Result

Explanations of Chloropred ,Qchloro, Mitopred,Qmito,Secrpred,Qsecr, and FinalPred.
Chloropred
Prediction score indicating the likelihood of the protein being localized to the Chloroplast. A higher scores suggest a stronger prediction that the protein is localized in the Chloroplast.
Qchloro
q-value associated with the Chloroplast prediction score. Provides a measure of statistical significance for the Chloroplast prediction. Lower q-values indicate higher statistical significance.
Mitopred
Prediction score for the localization of the protein to the Mitochondria. A higher scores suggest a stronger prediction of Mitochondrial localization.
Qmito
q-value associated with the Mitochondria prediction score. Indicates the statistical significance of the Mitochondria localization prediction. Lower q-values suggest a more reliable prediction.
Secrpred
Prediction score for identifying the protein as a Secretory Protein.A higher scores indicate a stronger likelihood that the protein functions as a Secretory Protein.
Qsecr
q-value for the Secretory Protein prediction. Provides a measure of the statistical significance of the Secretory Protein prediction. Lower q-values are indicative of more statistically significant predictions.
FinalPred
Represents the model's final prediction of the protein's localization based on the highest score and its corresponding q-value. The final localization is determined by comparing the q-values and prediction scores against preset cutoffs. If all q-values exceed the cutoff, the protein is classified as "Cytoplasmic."
Cutoff
The threshold q-value below which a prediction is considered statistically significant. Set to 0.05 by default, meaning that predictions with q-values below this threshold are classified as significant. This parameter helps in distinguishing between statistically significant and non-significant predictions, reducing the chance of false-positive localizations.
Docker
Environment Variables
- NETEMAILEMAIL: Email address to send emails from
Default: Set via user secrets
NETEMAILACCOUNTNAME: Email account name to send emails from
Default: Set via user secrets
NETEMAILPASSWORD: Email account password to send emails from
Default: Set via user secrets
NETEMAILSERVER: Email server to send emails from
Default: Set via user secrets
NETEMAILPORT: Email server port to send emails from
Default: Set via user secrets
PYTHONSERVICETIMEOUT: Time in minutes before conenction timeout between ui and api service
Default: 30 minutes
PYTHONSERVICEURL: Sets the url for the api predictor backend.
Default:
http://localhost:8000Remarks: In docker compose this could be
http://host.docker.internal:8000Remarks: On Linux might require:
extra_hosts: - "host.docker.internal:host-gateway"PYTHONSERVICESTORAGE_TIMESPAN: How long the user data should be stored
Default: 1 Hour
Docker Compose
```yaml version: '3.7'
name: chlamyatlas
services: api: image: csbdocker/chlamyatlas-api:latest ports: - 8000:80 environment: GUNICORNCMDARGS: "-k uvicorn.workers.UvicornWorker --preload" MAXWORKERS: "4" TIMEOUT: "0" ui: image: csbdocker/chlamyatlas-ui:latest environment: PYTHONSERVICEURL: "http://host.docker.internal:8000" PYTHONSERVICESTORAGETIMESPAN: "7" ports: - 5000:5000 # Use this to make host.docker.internal accessible on linux docker extra_hosts: - "host.docker.internal:host-gateway" ```
Local Development
Install pre-requisites
You'll need to install the following pre-requisites in order to build SAFE applications
- .NET SDK 8.0 or higher
- Node 18 or higher
- NPM 9 or higher
- Python 3.11 or higher
Install
- run
setup.cmd
.. or ..
dotnet tool restorepy -m venv .venv.\.venv\Scripts\python.exe -m pip install -r .\src\FastAPI\requirements.txt
Run
.\build.cmd runstarts SAFE stack
plus in another terminal run:
- activate local python environment:
.\.venv\Scripts\Activate.ps1 - navigate to fastapi folder:
cd .\src\FastAPI\ - start fastapi backend:
./run.cmd
Activate Email notification (optional)
Set user-secrets in the following schema:
json
{
"email": {
"NET_EMAIL_EMAIL": "placeholder@mail.de",
"NET_EMAIL_ACCOUNTNAME": "PlaceholderAccountName",
"NET_EMAIL_PASSWORD": "HelloWorld1234",
"NET_EMAIL_SERVER": "smtp.placeholdermail.de",
"NET_EMAIL_PORT": 587
}
}
Publish
Test Publish
.\build.cmd dockerbundle [--uionly], creates:newdocker image(s). Skip fastapi image with--uionly.\build.cmd dockertest, uses local docker-compose file to start:newimages.
To docker-hub
- Login to CSB-Docker
- Ensure correct Versions, both for python and dotnet service.
.\build.cmd versions- Remarks: Versions are defined in project files. Paths can be found in build project
ProjectInfo.fs. Accessed via regex parsing.
- Run
Test Publishsteps. The following step requires built:newimages. .\build.cmd dockerpublish
Request Workflow
mermaid
sequenceDiagram
participant py as Python ML
participant net as F#35; Server
participant c as Client
actor u as User
u -->> c: Gives data
c -->>+net: sends user data
par start analysis
loop
net-)+py: send sequence
py->py: predict target
py-)net: return predicted target
end
and return request information
net -) c: returns `request-ID`
end
critical ⚠️
u -->> c: copies and stores `request-ID`
end
opt email
u -->> c: give email address
c -->> net: give id + email to store
end
opt check status
u -->> c: use `request-ID` to check status
end
py-)net: send last package
deactivate py
net-->>net: run q-value calculation
net-->>net: store results
deactivate net
opt gave email
net-)u: send email
end
u -->> c: request data
c-->>net: get data
net-->>c: return data
c-->>u: download data
Owner
- Name: Computational Systems Biology
- Login: CSBiology
- Kind: organization
- Location: Kaiserslautern
- Website: https://csb.bio.uni-kl.de/
- Twitter: cs_biology
- Repositories: 48
- Profile: https://github.com/CSBiology
Computational Systems Biology
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 25
- Total pull requests: 0
- Average time to close issues: 1 day
- Average time to close pull requests: N/A
- Total issue authors: 2
- Total pull request authors: 0
- Average comments per issue: 0.36
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 25
- Pull requests: 0
- Average time to close issues: 1 day
- Average time to close pull requests: N/A
- Issue authors: 2
- Pull request authors: 0
- Average comments per issue: 0.36
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- Freymaurer (23)
- muehlhaus (1)