2025-asr-plms
Exploring the intersection of Ancestral Sequence Reconstruction, Protein Language Models, and Consensus Bias
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (4.0%) to scientific vocabulary
Repository
Exploring the intersection of Ancestral Sequence Reconstruction, Protein Language Models, and Consensus Bias
Basic Info
- Host: GitHub
- Owner: Arcadia-Science
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Size: 85 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 2
Metadata Files
README.md
Do protein language models understand evolution? Mixed evidence from ancestral sequences and ESM2.
This code repository contains or points to all materials required for creating and hosting the publication entitled, "Do protein language models understand evolution? Mixed evidence from ancestral sequences and ESM2.".
The publication is hosted at this URL.
Data Description
Ancestral Sequence Generation
Ancestral sequences were reconstructed using the workflow in ASR/ASR_notebook.ipynb. All sequence inputs and outputs are in the ASR/ directory.
ESM2 Pseudo-perplexity Calculation
ESM2 pseudo-perplexity scores were calculated using ESM2scoring/esm2pppl_calculator.py on a GPU-enable AWS EC2 instance. All inputs and outputs are in the ESM2_scoring/ directory.
Reproduce
Please see SETUP.qmd.
Contribute
Please see CONTRIBUTING.qmd.
Owner
- Name: Arcadia Science
- Login: Arcadia-Science
- Kind: organization
- Location: United States of America
- Website: https://www.arcadiascience.com/
- Twitter: ArcadiaScience
- Repositories: 16
- Profile: https://github.com/Arcadia-Science
Citation (CITATION.cff)
cff-version: 1.2.0
message: If you use this software, please cite the associated publication.
title: Do protein language models understand evolution? Mixed evidence from ancestral
sequences and ESM2
doi: 10.57844/arcadia-5cwu-spn8
authors:
- family-names: Kiefl
given-names: Evan
affiliation: Arcadia Science
orcid: https://orcid.org/0000-0002-6473-0921
- family-names: Nocedal
given-names: Isabel
affiliation: Arcadia Science
orcid: https://orcid.org/0000-0002-4706-1113
- family-names: York
given-names: Ryan
affiliation: Arcadia Science
orcid: https://orcid.org/0000-0002-1073-1494
preferred-citation:
title: Do protein language models understand evolution? Mixed evidence from ancestral
sequences and ESM2
type: article
doi: 10.57844/arcadia-5cwu-spn8
authors:
- family-names: Nocedal
given-names: Isabel
affiliation: Arcadia Science
orcid: https://orcid.org/0000-0002-4706-1113
- family-names: York
given-names: Ryan
affiliation: Arcadia Science
orcid: https://orcid.org/0000-0002-1073-1494
year: 2025
GitHub Events
Total
- Release event: 1
- Delete event: 1
- Issue comment event: 5
- Push event: 9
- Pull request event: 10
- Create event: 4
Last Year
- Release event: 1
- Delete event: 1
- Issue comment event: 5
- Push event: 9
- Pull request event: 10
- Create event: 4
Dependencies
- actions/checkout v4 composite
- actions/setup-python v5 composite
- astral-sh/setup-uv v5 composite
- peter-evans/create-pull-request v7 composite
- actions/checkout v4 composite
- quarto-dev/quarto-actions/publish v2 composite
- quarto-dev/quarto-actions/setup v2 composite