2025-asr-plms

Exploring the intersection of Ancestral Sequence Reconstruction, Protein Language Models, and Consensus Bias

https://github.com/arcadia-science/2025-asr-plms

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (4.0%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Exploring the intersection of Ancestral Sequence Reconstruction, Protein Language Models, and Consensus Bias

Basic Info
  • Host: GitHub
  • Owner: Arcadia-Science
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 85 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 2
Created 10 months ago · Last pushed 7 months ago
Metadata Files
Readme License Citation Authors

README.md

Do protein language models understand evolution? Mixed evidence from ancestral sequences and ESM2.

This code repository contains or points to all materials required for creating and hosting the publication entitled, "Do protein language models understand evolution? Mixed evidence from ancestral sequences and ESM2.".

The publication is hosted at this URL.

Data Description

Ancestral Sequence Generation

Ancestral sequences were reconstructed using the workflow in ASR/ASR_notebook.ipynb. All sequence inputs and outputs are in the ASR/ directory.

ESM2 Pseudo-perplexity Calculation

ESM2 pseudo-perplexity scores were calculated using ESM2scoring/esm2pppl_calculator.py on a GPU-enable AWS EC2 instance. All inputs and outputs are in the ESM2_scoring/ directory.

Reproduce

Please see SETUP.qmd.

Contribute

Please see CONTRIBUTING.qmd.

Owner

  • Name: Arcadia Science
  • Login: Arcadia-Science
  • Kind: organization
  • Location: United States of America

Citation (CITATION.cff)

cff-version: 1.2.0
message: If you use this software, please cite the associated publication.
title: Do protein language models understand evolution? Mixed evidence from ancestral
  sequences and ESM2
doi: 10.57844/arcadia-5cwu-spn8
authors:
- family-names: Kiefl
  given-names: Evan
  affiliation: Arcadia Science
  orcid: https://orcid.org/0000-0002-6473-0921
- family-names: Nocedal
  given-names: Isabel
  affiliation: Arcadia Science
  orcid: https://orcid.org/0000-0002-4706-1113
- family-names: York
  given-names: Ryan
  affiliation: Arcadia Science
  orcid: https://orcid.org/0000-0002-1073-1494
preferred-citation:
  title: Do protein language models understand evolution? Mixed evidence from ancestral
    sequences and ESM2
  type: article
  doi: 10.57844/arcadia-5cwu-spn8
  authors:
  - family-names: Nocedal
    given-names: Isabel
    affiliation: Arcadia Science
    orcid: https://orcid.org/0000-0002-4706-1113
  - family-names: York
    given-names: Ryan
    affiliation: Arcadia Science
    orcid: https://orcid.org/0000-0002-1073-1494
  year: 2025

GitHub Events

Total
  • Release event: 1
  • Delete event: 1
  • Issue comment event: 5
  • Push event: 9
  • Pull request event: 10
  • Create event: 4
Last Year
  • Release event: 1
  • Delete event: 1
  • Issue comment event: 5
  • Push event: 9
  • Pull request event: 10
  • Create event: 4

Dependencies

.github/workflows/build.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
  • astral-sh/setup-uv v5 composite
  • peter-evans/create-pull-request v7 composite
.github/workflows/publish.yml actions
  • actions/checkout v4 composite
  • quarto-dev/quarto-actions/publish v2 composite
  • quarto-dev/quarto-actions/setup v2 composite
pyproject.toml pypi