alphafind

AlphaFind: Discover structure similarity across the entire known proteome

https://github.com/coda-research-group/alphafind

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (18.0%) to scientific vocabulary

Keywords

alphafind alphafold lmi proteins structural-similarity
Last synced: 6 months ago · JSON representation ·

Repository

AlphaFind: Discover structure similarity across the entire known proteome

Basic Info
  • Host: GitHub
  • Owner: Coda-Research-Group
  • License: mit
  • Language: TypeScript
  • Default Branch: main
  • Homepage: https://alphafind.fi.muni.cz
  • Size: 6.75 MB
Statistics
  • Stars: 22
  • Watchers: 2
  • Forks: 2
  • Open Issues: 2
  • Releases: 2
Topics
alphafind alphafold lmi proteins structural-similarity
Created about 2 years ago · Last pushed about 1 year ago
Metadata Files
Readme Changelog Contributing License Citation

README.md



AlphaCharges

AlphaFind: Discover structure similarity across the entire known proteome

DOI GitHub Actions

AlphaFind is a web-based search engine that allows for structure-based search of the entire AlphaFold Protein Structure Database. Uniprot ID, PDB ID, or Gene Symbol is accepted as input the engine will return the most similar proteins found within AlphaFold DB, with an option for additional search to extend and refine the results. The search results are grouped by their source organism and displayed along with several similarity metrics. 3D visualizations of the structural superposition of the proteins are provided, and text filters can be used to find specific organisms or Uniprot IDs. For details about the methodology and usage, please see the manual. This website is free and open to all users and there is no login requirement.

Vector embeddings and model weights used in AlphaFind are available at AlphaFind: Discover structure similarity across the entire known proteome data and model | Czech national repository. This project uses USalign.

Code Structure

The codebase is divided into three folders: - training (model training, index building) - api (backend) - ui (frontend)

See the README.md files in each folder for more details.

Installation and execution

Prerequisites / Dependencies: - Docker (version 20.10 or later) - Git

Steps

  1. Clone this repository: sh git clone https://github.com/Coda-Research-Group/AlphaFind.git

  2. Run Docker compose, which will do the following:

    • build the docker image for api/, ui/ and training/,
    • run the training/ container to prepare the necessary data structures,
    • run the api/ container (the backend),
    • run the ui/ container (the frontend).

Use -d switch for a detached process. sh docker compose up --build

  1. Open http://localhost:8081 in your browser.

Data use

The training/data/cifs folder contains a small subset of the AlphaFold DB comprising 109 proteins. The full AlphaFold DB can be downloaded from here.

To use your own protein data: 1. Place your .cif files in the training/data/cifs directory before running run.sh. 2. Ensure your files follow the naming convention: AF-[UniProtID]-F1-model_v4.cif.

For the full AlphaFold DB, download it from here and place the files in the same directory.

Tested on: Ubuntu 22.04 LTS, Fedora Linux 40 (Workstation Edition)

Cite Us

If you use AlphaFind in your research, please cite the following publication:

@article{prochazka2024alphafind, title={AlphaFind: discover structure similarity across the proteome in AlphaFold DB}, author={Proch{\'a}zka, David and Slanin{\'a}kov{\'a}, Ter{\'e}zia and Olha, Jaroslav and Ro{\v{s}}inec, Adri{\'a}n and Gre{\v{s}}ov{\'a}, Katar{\'\i}na and J{\'a}no{\v{s}}ov{\'a}, Miriama and {\v{C}}ill{\'\i}k, Jakub and Porubsk{\'a}, Jana and Svobodov{\'a}, Radka and Dohnal, Vlastislav and others}, journal={Nucleic Acids Research}, pages={gkae397}, year={2024}, publisher={Oxford University Press} }

Additional Information

  • Publisher: Intelligent Systems for Complex Data Research Group
  • Object Identifier: doi/10.5281/zenodo.11085862
  • Keywords: similarity search, protein structure, AlphaFold, protein database, web application
  • Creators and Active Contributors: Terzia Slaninkov, David Prochzka
  • Inactive Contributors: Jakub illk,
  • Object Type: Software
  • Title: AlphaFind: discover structure similarity across the proteome in AlphaFold DB
  • Publication Date: 2024-05-15
  • Publication: doi.org/10.1093/nar/gkae397

License

MIT license

Owner

  • Name: Complex Data Research Group
  • Login: Coda-Research-Group
  • Kind: organization

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: AlphaFind
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Terézia
    family-names: Slanináková
    email: slaninakova@ics.muni.cz
    affiliation: >-
      Institute of Computer Science, Masaryk University,
      Brno, 602 00, Czech Republic
    orcid: 'https://orcid.org/0000-0003-0502-1145'
    credit: 0.4
  - given-names: David
    family-names: Procházka
    email: davidprochazka@mail.muni.cz
    orcid: 'https://orcid.org/0009-0000-2765-8329'
    affiliation: >-
      Faculty of Informatics, Masaryk University, Brno, 602
      00, Czech Republic
    credit: 0.4
  - given-names: Jakub
    family-names: Čillík
    email: 524749@mail.muni.cz
    affiliation: >-
      Institute of Computer Science, Masaryk University,
      Brno, 602 00, Czech Republic
    orcid: 'https://orcid.org/0009-0001-7780-3317'
    credit: 0.2
identifiers:
  - type: doi
    value: 10.5281/zenodo.11085862
  - type: url
    value: 'https://alphafind.fi.muni.cz/'
  - type: url
    value: 'https://github.com/Coda-Research-Group/AlphaFind/'
repository-code: 'https://github.com/Coda-Research-Group/AlphaFind/'
url: 'https://alphafind.fi.muni.cz/'
repository-artifact: >-
  https://data.narodni-repozitar.cz/general/datasets/egsm2-7a369
abstract: >-
  AlphaFind is a web-based search engine that allows for
  structure-based search of the entire AlphaFold Protein
  Structure Database. Uniprot ID, PDB ID, or Gene Symbol is
  accepted as input – the engine will return the most
  similar proteins found within AlphaFold DB, with an option
  for additional search to extend and refine the results.
  The search results are grouped by their source organism
  and displayed along with several similarity metrics. 3D
  visualizations of the structural superposition of the
  proteins are provided, and text filters can be used to
  find specific organisms or Uniprot IDs. For details about
  the methodology and usage, please see the manual. This
  website is free and open to all users and there is no
  login requirement.
keywords:
  - similarity search
  - protein similarity
  - AlphaFold
  - proteins
  - web application
  - AFDB
license: MIT
commit: 976511d1ad489b222346bc81e1d89ad49d388924
version: 1.0.0
date-released: '2024-04-29'

GitHub Events

Total
  • Create event: 2
  • Issues event: 1
  • Watch event: 8
  • Delete event: 8
  • Issue comment event: 1
  • Member event: 1
  • Push event: 13
  • Gollum event: 1
  • Pull request review event: 10
  • Pull request review comment event: 5
  • Pull request event: 7
  • Fork event: 1
Last Year
  • Create event: 2
  • Issues event: 1
  • Watch event: 8
  • Delete event: 8
  • Issue comment event: 1
  • Member event: 1
  • Push event: 13
  • Gollum event: 1
  • Pull request review event: 10
  • Pull request review comment event: 5
  • Pull request event: 7
  • Fork event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 3
  • Average time to close issues: N/A
  • Average time to close pull requests: 27 days
  • Total issue authors: 0
  • Total pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.33
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 3
  • Average time to close issues: N/A
  • Average time to close pull requests: 27 days
  • Issue authors: 0
  • Pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.33
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • TerkaSlan (3)
Pull Request Authors
  • TerkaSlan (7)
  • xcillik (2)
  • oeb-fairsoft-evaluator[bot] (1)
  • Markek1 (1)
Top Labels
Issue Labels
bug (3)
Pull Request Labels

Dependencies

api/server/Dockerfile docker
  • registry.gitlab.ics.muni.cz 443/alphafind/alphafind-api/alphafind-base build
api/pyproject.toml pypi
api/requirements-dev.txt pypi
  • black ==23.7.0 development
  • flake8 ==6.1.0 development
  • isort ==5.12.0 development
  • pre-commit * development
api/requirements.txt pypi
  • faiss-cpu *
  • flask *
  • gunicorn *
  • h5py *
  • numpy *
  • pandas *
  • prometheus-client *
  • psutil *
  • torch-summary *
  • wandb *
training/Dockerfile docker
  • python 3.8 build
ui/Dockerfile docker
  • nginx 1.25.3 build
  • node 21.6.0-alpine3.18 build
ui/package-lock.json npm
  • 838 dependencies
ui/package.json npm
  • @tanstack/eslint-plugin-query ^5.12.1 development
  • @testing-library/jest-dom ^6.2.1 development
  • @testing-library/react ^14.1.2 development
  • @types/node ^20.10.5 development
  • @types/react ^18.2.43 development
  • @types/react-dom ^18.2.17 development
  • @typescript-eslint/eslint-plugin ^6.14.0 development
  • @typescript-eslint/parser ^6.14.0 development
  • @vitejs/plugin-react-swc ^3.5.0 development
  • eslint ^8.55.0 development
  • eslint-plugin-react-hooks ^4.6.0 development
  • eslint-plugin-react-refresh ^0.4.5 development
  • jsdom ^24.0.0 development
  • sass ^1.69.5 development
  • typescript ^5.3.3 development
  • vite ^5.0.8 development
  • vitest ^1.2.1 development
  • @fortawesome/fontawesome-svg-core ^6.5.1
  • @fortawesome/free-brands-svg-icons ^6.5.1
  • @fortawesome/free-regular-svg-icons ^6.5.1
  • @fortawesome/free-solid-svg-icons ^6.5.1
  • @fortawesome/react-fontawesome ^0.2.0
  • @tanstack/react-query ^5.14.2
  • @tanstack/react-query-devtools ^5.14.5
  • @tanstack/react-table ^8.11.2
  • bootstrap ^5.3.2
  • localforage ^1.10.0
  • match-sorter ^6.3.1
  • molstar ^4.0.1
  • ngl ^2.2.1
  • react ^18.2.0
  • react-autocomplete-input ^1.0.29
  • react-bootstrap ^2.9.1
  • react-dom ^18.2.0
  • react-router-dom ^6.21.1
  • react-top-loading-bar ^2.3.1
  • sort-by ^1.2.0
training/pyproject.toml pypi
training/requirements-dev.txt pypi
  • black ==23.7.0 development
  • flake8 ==6.1.0 development
  • isort ==5.12.0 development
  • pre-commit * development
training/requirements.txt pypi
  • faiss-cpu *
  • h5py *
  • kaleido *
  • keras *
  • matplotlib *
  • numpy *
  • pandas *
  • pillow *
  • psutil *
  • py-cpuinfo *
  • pyyaml *
  • scikit-learn *
  • torch-summary *
  • tqdm *
  • wandb *