https://github.com/google-deepmind/alphamissense

https://github.com/google-deepmind/alphamissense

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: science.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.9%) to scientific vocabulary

Keywords from Contributors

jax deep-neural-networks distributed mujoco research
Last synced: 6 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: google-deepmind
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Size: 202 KB
Statistics
  • Stars: 572
  • Watchers: 27
  • Forks: 71
  • Open Issues: 1
  • Releases: 0
Archived
Created over 2 years ago · Last pushed about 2 years ago
Metadata Files
Readme License

README.md

AlphaMissense

This package provides the AlphaMissense model implementation. This implementation is provided for reference alongside the AlphaMissense 2023 publication and will not be actively maintained moving forward.

We forked the AlphaFold repository and modified it to implement AlphaMissense.

What we provide: * Detailed implementation of the AlphaMissense model and training losses (modules_missense.py)

  • The data pipeline to create input features for inference (pipeline_missense.py). The data pipeline requires access to genetic databases for multiple sequence alignments and, if using spatial cropping, protein structures of the AlphaFold Database hosted in Google Cloud Storage. Please see the section for genetic databases and AFDB readme file) to learn how to access these datasets.

  • Pre-computed predictions for all possible human amino acid substitutions and missense variants (hosted here).

What we don’t provide: * The trained AlphaMissense model weights.

Access AlphaMissense predictions:

Predictions for human major transcripts and isoforms are provided here. You can use these files with the Ensembl VEP tool and AlphaMissense plug-in.

Installation

  1. Install all dependencies: bash sudo apt install python3.11-venv aria2 hmmer

  2. Clone this repository and cd into it. bash git clone https://github.com/deepmind/alphamissense.git cd ./alphamissense

  3. Set up a Python virtual environment and install the Python dependencies: bash python3 -m venv ./venv venv/bin/pip install -r requirements.txt venv/bin/pip install -e .

  4. Test the installation bash venv/bin/python test/test_installation.py

Usage

Because we are not releasing the trained model weights, the code is not meant to be used for making new predictions, but serve as an implementation reference. We are releasing the data pipeline, model and loss function code.

The data pipeline requires a FASTA file (i.e. protein_sequence_file) which should contain all target sequences, and the genetic sequence databases outlined in the next section. ```python from alphamissense.data import pipeline_missense

proteinsequencefile = ... pipeline = pipelinemissense.DataPipeline( jackhmmerbinarypath=..., # Typically '/usr/bin/jackhmmer'. proteinsequencefile=proteinsequencefile, uniref90databasepath=DATABASESDIR + '/uniref90/uniref90.fasta', mgnifydatabasepath=DATABASESDIR + '/mgnify/mgyclusters202205.fa', smallbfddatabasepath=DATABASESDIR + '/smallbfd/bfd-firstnonconsensussequences.fasta', )

sample = pipeline.process( proteinid=..., # Sequence identifier in the FASTA file. referenceaa=..., # Single capital letter, e.g. 'A'. alternateaa=..., position=..., # Integer, note that the position is 1-based! msaoutputdir=msaoutput_dir, ) ```

The model is implemented as a JAX module and can be instantiated for example as: ```python from alphamissense.model import config from alphamissense.model import modules_missense

def forwardfn(batch): model = modulesmissense.AlphaMissense(config.modelconfig().model) return model(batch, istraining=False, returnrepresentations=False)

randomseed = 0 prng = jax.random.PRNGKey(randomseed)

params = hk.transform(forwardfn).init(prng, sample) apply = jax.jit(hk.transform(forwardfn).apply) output = apply(params, prng, sample) `` For example, at this point the score of the variant would be stored inoutput['logitdiff']['variantpathogenicity']`.

Genetic databases

AlphaMissense used multiple genetic (sequence) databases for multiple sequence alignments:

We refer to the AlphaFold repository for instructions on how to download these databases.

Citing this work

Any publication that discloses findings arising from using this source code should cite:

bibtex @article {AlphaMissense2023, author = {Jun Cheng, Guido Novati, Joshua Pan, Clare Bycroft, Akvilė Žemgulytė, Taylor Applebaum, Alexander Pritzel, Lai Hong Wong, Michal Zielinski, Tobias Sargeant, Rosalia G. Schneider, Andrew W. Senior, John Jumper, Demis Hassabis, Pushmeet Kohli, Žiga Avsec}, journal = {Science}, title = {Accurate proteome-wide missense variant effect prediction with AlphaMissense}, year = {2023}, doi = {10.1126/science.adg7492}, URL = {https://www.science.org/doi/10.1126/science.adg7492}, }

Acknowledgements

AlphaMissense communicates with and/or references the following separate libraries and packages: * Abseil * Biopython * HMMER Suite * Haiku * Immutabledict * JAX * Matplotlib * NumPy * Pandas * SciPy * Tree * Zstandard We thank all their contributors and maintainers!

License and Disclaimer

This is not an officially supported Google product.

The AlphaMissense Database contains predictions with varying levels of confidence, caution should be exercised in use. The information provided is not intended to be a substitute for professional medical advice, diagnosis, or treatment, and does not constitute medical or other professional advice. AlphaMisense has not been validated for, and is not approved for, any clinical use.

Copyright 2023 DeepMind Technologies Limited.

AlphaMissense Code License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

AlphaMissense predictions License

AlphaMissense predictions are licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0) (“CC License”). You may obtain a copy of the CC License at https://creativecommons.org/licenses/by/4.0/legalcode.

Third-party software

Use of the third-party software, libraries or code referred to in the Acknowledgements section above may be governed by separate terms and conditions or license provisions. Your use of the third-party software, libraries or code is subject to any such terms and you should check that you can comply with any applicable restrictions or terms and conditions before use.

Owner

  • Name: Google DeepMind
  • Login: google-deepmind
  • Kind: organization

GitHub Events

Total
  • Watch event: 82
  • Fork event: 11
Last Year
  • Watch event: 82
  • Fork event: 11

Committers

Last synced: 10 months ago

All Time
  • Total Commits: 15
  • Total Committers: 8
  • Avg Commits per committer: 1.875
  • Development Distribution Score (DDS): 0.667
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Guido Novati g****n@g****m 5
Ziga Avsec a****c@g****m 3
Jun Cheng j****g@g****m 2
Peter Hawkins p****s@g****m 1
Matthew Johnson m****j@g****m 1
DeepMind n****y@g****m 1
Jake VanderPlas v****s@g****m 1
DeepMind n****y@d****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 0
  • Total pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: 9 minutes
  • Total issue authors: 0
  • Total pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 3.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 1.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • andrewyatz (1)
  • eunos-1128 (1)
Top Labels
Issue Labels
Pull Request Labels