alphafind
AlphaFind: Discover structure similarity across the entire known proteome
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 7 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (18.0%) to scientific vocabulary
Keywords
Repository
AlphaFind: Discover structure similarity across the entire known proteome
Basic Info
- Host: GitHub
- Owner: Coda-Research-Group
- License: mit
- Language: TypeScript
- Default Branch: main
- Homepage: https://alphafind.fi.muni.cz
- Size: 6.75 MB
Statistics
- Stars: 22
- Watchers: 2
- Forks: 2
- Open Issues: 2
- Releases: 2
Topics
Metadata Files
README.md
AlphaFind: Discover structure similarity across the entire known proteome
AlphaFind is a web-based search engine that allows for structure-based search of the entire AlphaFold Protein Structure Database. Uniprot ID, PDB ID, or Gene Symbol is accepted as input the engine will return the most similar proteins found within AlphaFold DB, with an option for additional search to extend and refine the results. The search results are grouped by their source organism and displayed along with several similarity metrics. 3D visualizations of the structural superposition of the proteins are provided, and text filters can be used to find specific organisms or Uniprot IDs. For details about the methodology and usage, please see the manual. This website is free and open to all users and there is no login requirement.
Vector embeddings and model weights used in AlphaFind are available at AlphaFind: Discover structure similarity across the entire known proteome data and model | Czech national repository. This project uses USalign.
Code Structure
The codebase is divided into three folders:
- training (model training, index building)
- api (backend)
- ui (frontend)
See the README.md files in each folder for more details.
Installation and execution
Prerequisites / Dependencies: - Docker (version 20.10 or later) - Git
Steps
Clone this repository:
sh git clone https://github.com/Coda-Research-Group/AlphaFind.gitRun Docker compose, which will do the following:
- build the docker image for
api/,ui/andtraining/, - run the
training/container to prepare the necessary data structures, - run the
api/container (the backend), - run the
ui/container (the frontend).
- build the docker image for
Use -d switch for a detached process.
sh
docker compose up --build
- Open
http://localhost:8081in your browser.
Data use
The training/data/cifs folder contains a small subset of the AlphaFold DB comprising 109 proteins.
The full AlphaFold DB can be downloaded from here.
To use your own protein data:
1. Place your .cif files in the training/data/cifs directory before running run.sh.
2. Ensure your files follow the naming convention: AF-[UniProtID]-F1-model_v4.cif.
For the full AlphaFold DB, download it from here and place the files in the same directory.
Tested on: Ubuntu 22.04 LTS, Fedora Linux 40 (Workstation Edition)
Cite Us
If you use AlphaFind in your research, please cite the following publication:
@article{prochazka2024alphafind,
title={AlphaFind: discover structure similarity across the proteome in AlphaFold DB},
author={Proch{\'a}zka, David and Slanin{\'a}kov{\'a}, Ter{\'e}zia and Olha, Jaroslav and Ro{\v{s}}inec, Adri{\'a}n and Gre{\v{s}}ov{\'a}, Katar{\'\i}na and J{\'a}no{\v{s}}ov{\'a}, Miriama and {\v{C}}ill{\'\i}k, Jakub and Porubsk{\'a}, Jana and Svobodov{\'a}, Radka and Dohnal, Vlastislav and others},
journal={Nucleic Acids Research},
pages={gkae397},
year={2024},
publisher={Oxford University Press}
}
Additional Information
- Publisher: Intelligent Systems for Complex Data Research Group
- Object Identifier: doi/10.5281/zenodo.11085862
- Keywords: similarity search, protein structure, AlphaFold, protein database, web application
- Creators and Active Contributors: Terzia Slaninkov, David Prochzka
- Inactive Contributors: Jakub illk,
- Object Type: Software
- Title: AlphaFind: discover structure similarity across the proteome in AlphaFold DB
- Publication Date: 2024-05-15
- Publication: doi.org/10.1093/nar/gkae397
License
MIT license
Owner
- Name: Complex Data Research Group
- Login: Coda-Research-Group
- Kind: organization
- Website: https://disa.fi.muni.cz/complex-data-analysis/
- Repositories: 1
- Profile: https://github.com/Coda-Research-Group
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: AlphaFind
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Terézia
family-names: Slanináková
email: slaninakova@ics.muni.cz
affiliation: >-
Institute of Computer Science, Masaryk University,
Brno, 602 00, Czech Republic
orcid: 'https://orcid.org/0000-0003-0502-1145'
credit: 0.4
- given-names: David
family-names: Procházka
email: davidprochazka@mail.muni.cz
orcid: 'https://orcid.org/0009-0000-2765-8329'
affiliation: >-
Faculty of Informatics, Masaryk University, Brno, 602
00, Czech Republic
credit: 0.4
- given-names: Jakub
family-names: Čillík
email: 524749@mail.muni.cz
affiliation: >-
Institute of Computer Science, Masaryk University,
Brno, 602 00, Czech Republic
orcid: 'https://orcid.org/0009-0001-7780-3317'
credit: 0.2
identifiers:
- type: doi
value: 10.5281/zenodo.11085862
- type: url
value: 'https://alphafind.fi.muni.cz/'
- type: url
value: 'https://github.com/Coda-Research-Group/AlphaFind/'
repository-code: 'https://github.com/Coda-Research-Group/AlphaFind/'
url: 'https://alphafind.fi.muni.cz/'
repository-artifact: >-
https://data.narodni-repozitar.cz/general/datasets/egsm2-7a369
abstract: >-
AlphaFind is a web-based search engine that allows for
structure-based search of the entire AlphaFold Protein
Structure Database. Uniprot ID, PDB ID, or Gene Symbol is
accepted as input – the engine will return the most
similar proteins found within AlphaFold DB, with an option
for additional search to extend and refine the results.
The search results are grouped by their source organism
and displayed along with several similarity metrics. 3D
visualizations of the structural superposition of the
proteins are provided, and text filters can be used to
find specific organisms or Uniprot IDs. For details about
the methodology and usage, please see the manual. This
website is free and open to all users and there is no
login requirement.
keywords:
- similarity search
- protein similarity
- AlphaFold
- proteins
- web application
- AFDB
license: MIT
commit: 976511d1ad489b222346bc81e1d89ad49d388924
version: 1.0.0
date-released: '2024-04-29'
GitHub Events
Total
- Create event: 2
- Issues event: 1
- Watch event: 8
- Delete event: 8
- Issue comment event: 1
- Member event: 1
- Push event: 13
- Gollum event: 1
- Pull request review event: 10
- Pull request review comment event: 5
- Pull request event: 7
- Fork event: 1
Last Year
- Create event: 2
- Issues event: 1
- Watch event: 8
- Delete event: 8
- Issue comment event: 1
- Member event: 1
- Push event: 13
- Gollum event: 1
- Pull request review event: 10
- Pull request review comment event: 5
- Pull request event: 7
- Fork event: 1
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 0
- Total pull requests: 3
- Average time to close issues: N/A
- Average time to close pull requests: 27 days
- Total issue authors: 0
- Total pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.33
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 3
- Average time to close issues: N/A
- Average time to close pull requests: 27 days
- Issue authors: 0
- Pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.33
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- TerkaSlan (3)
Pull Request Authors
- TerkaSlan (7)
- xcillik (2)
- oeb-fairsoft-evaluator[bot] (1)
- Markek1 (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- registry.gitlab.ics.muni.cz 443/alphafind/alphafind-api/alphafind-base build
- black ==23.7.0 development
- flake8 ==6.1.0 development
- isort ==5.12.0 development
- pre-commit * development
- faiss-cpu *
- flask *
- gunicorn *
- h5py *
- numpy *
- pandas *
- prometheus-client *
- psutil *
- torch-summary *
- wandb *
- python 3.8 build
- nginx 1.25.3 build
- node 21.6.0-alpine3.18 build
- 838 dependencies
- @tanstack/eslint-plugin-query ^5.12.1 development
- @testing-library/jest-dom ^6.2.1 development
- @testing-library/react ^14.1.2 development
- @types/node ^20.10.5 development
- @types/react ^18.2.43 development
- @types/react-dom ^18.2.17 development
- @typescript-eslint/eslint-plugin ^6.14.0 development
- @typescript-eslint/parser ^6.14.0 development
- @vitejs/plugin-react-swc ^3.5.0 development
- eslint ^8.55.0 development
- eslint-plugin-react-hooks ^4.6.0 development
- eslint-plugin-react-refresh ^0.4.5 development
- jsdom ^24.0.0 development
- sass ^1.69.5 development
- typescript ^5.3.3 development
- vite ^5.0.8 development
- vitest ^1.2.1 development
- @fortawesome/fontawesome-svg-core ^6.5.1
- @fortawesome/free-brands-svg-icons ^6.5.1
- @fortawesome/free-regular-svg-icons ^6.5.1
- @fortawesome/free-solid-svg-icons ^6.5.1
- @fortawesome/react-fontawesome ^0.2.0
- @tanstack/react-query ^5.14.2
- @tanstack/react-query-devtools ^5.14.5
- @tanstack/react-table ^8.11.2
- bootstrap ^5.3.2
- localforage ^1.10.0
- match-sorter ^6.3.1
- molstar ^4.0.1
- ngl ^2.2.1
- react ^18.2.0
- react-autocomplete-input ^1.0.29
- react-bootstrap ^2.9.1
- react-dom ^18.2.0
- react-router-dom ^6.21.1
- react-top-loading-bar ^2.3.1
- sort-by ^1.2.0
- black ==23.7.0 development
- flake8 ==6.1.0 development
- isort ==5.12.0 development
- pre-commit * development
- faiss-cpu *
- h5py *
- kaleido *
- keras *
- matplotlib *
- numpy *
- pandas *
- pillow *
- psutil *
- py-cpuinfo *
- pyyaml *
- scikit-learn *
- torch-summary *
- tqdm *
- wandb *
