https://github.com/google-deepmind/ithaca

Restoring and attributing ancient texts using deep neural networks

https://github.com/google-deepmind/ithaca

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: nature.com
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.6%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

Restoring and attributing ancient texts using deep neural networks

Basic Info
  • Host: GitHub
  • Owner: google-deepmind
  • License: apache-2.0
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 2.38 MB
Statistics
  • Stars: 567
  • Watchers: 15
  • Forks: 62
  • Open Issues: 1
  • Releases: 0
Created over 4 years ago · Last pushed over 2 years ago
Metadata Files
Readme Contributing License Authors

README.md

Ithaca logo

Restoring and attributing ancient texts using deep neural networks

Yannis Assael1,*, Thea Sommerschield2,3,*, Brendan Shillingford1, Mahyar Bordbar1, John Pavlopoulos4, Marita Chatzipanagiotou4, Ion Androutsopoulos4, Jonathan Prag3, Nando de Freitas1

1 DeepMind, United Kingdom
2 Ca’ Foscari University of Venice, Italy
3 University of Oxford, United Kingdom
4 Athens University of Economics and Business, Greece
* Authors contributed equally to this work


Open In Colab

Ancient History relies on disciplines such as Epigraphy, the study of inscribed texts known as "inscriptions", for evidence of the thought, language, society and history of past civilizations. However, over the centuries many inscriptions have been damaged to the point of illegibility, transported far from their original location, and their date of writing is steeped in uncertainty. We present Ithaca, the first Deep Neural Network for the textual restoration, geographical and chronological attribution of ancient Greek inscriptions. Ithaca is designed to assist and expand the historian’s workflow: its architecture focuses on collaboration, decision support, and interpretability.

Restoration of damaged inscription
Restoration of damaged inscription: this inscription (IG I3 4B) records a decree concerning the Acropolis of Athens and dates 485/4 BCE. (CC BY-SA 3.0, WikiMedia)

While Ithaca alone achieves 62% accuracy when restoring damaged texts, as soon as historians use Ithaca their performance leaps from 25% to 72%, confirming this synergistic research aid’s impact. Ithaca can attribute inscriptions to their original location with 71% accuracy and can date them with a distance of less than 30 years from ground-truth ranges, redating key texts of Classical Athens and contributing to topical debates in Ancient History. This work shows how models like Ithaca can unlock the cooperative potential between AI and historians, transformationally impacting the way we study and write about one of the most significant periods in human history.

Ithaca architecture
Ithaca's architecture processing the phrase "δήμο το αθηναίων" ("the people of Athens"). The first 3 characters of the phrase were hidden and their restoration is proposed. In tandem, Ithaca also predicts the inscription’s region and date.

References

When using any of this project's source code, please cite:

@article{asssome2022restoring, title = {Restoring and attributing ancient texts using deep neural networks}, author = {Assael*, Yannis and Sommerschield*, Thea and Shillingford, Brendan and Bordbar, Mahyar and Pavlopoulos, John and Chatzipanagiotou, Marita and Androutsopoulos, Ion and Prag, Jonathan and de Freitas, Nando}, doi = {10.1038/s41586-022-04448-z}, journal = {Nature}, year = {2022} }

Ithaca inference online

To aid further research in the field we created an online interactive python notebook, where researchers can query one of our trained models to get text restorations, visualise attention weights, and more.

Ithaca inference offline

Advanced users who want to perform inference using the trained model may want to do so manually using the ithaca library directly.

First, to install the ithaca library and its dependencies, run: sh pip install .

Then, download the model via sh curl --output checkpoint.pkl https://storage.googleapis.com/ithaca-resources/models/checkpoint_v1.pkl

An example of using the library can be run via sh python inference_example.py --input_file=example_input.txt which will run restoration and attribution on the text in example_input.txt.

To run it with different input text, run ```sh python inference_example.py --input="..."

or using text in a UTF-8 encoded text file:

python inferenceexample.py --inputfile=someotherinput_file.txt ```

The restoration or attribution JSON can be saved to a file: sh python inference_example.py \ --input_file=example_input.txt \ --attribute_json=attribute.json \ --restore_json=restore.json

For full help, run: sh python inference_example.py --help

Dataset generation

Ithaca was trained on The Packard Humanities Institute’s "Searchable Greek Inscriptions" public dataset. The processing workflow for generating the machine-actionable text and metadata, as well as further details on the train, validation and test splits are available at I.PHI dataset.

Training Ithaca

See train/README.md for instructions.

License

Apache License, Version 2.0

Owner

  • Name: Google DeepMind
  • Login: google-deepmind
  • Kind: organization

GitHub Events

Total
  • Watch event: 20
  • Fork event: 10
Last Year
  • Watch event: 20
  • Fork event: 10

Committers

Last synced: 10 months ago

All Time
  • Total Commits: 11
  • Total Committers: 4
  • Avg Commits per committer: 2.75
  • Development Distribution Score (DDS): 0.364
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Yannis Assael y****l@g****m 7
Brendan Shillingford s****d@g****m 2
Peter Hawkins p****s@g****m 1
Jake VanderPlas v****s@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 6
  • Total pull requests: 0
  • Average time to close issues: 21 days
  • Average time to close pull requests: N/A
  • Total issue authors: 4
  • Total pull request authors: 0
  • Average comments per issue: 1.5
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • RobbeW (3)
  • AK391 (1)
  • brunocgf (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

requirements.txt pypi
  • absl-py ==0.13.0
  • chex ==0.0.8
  • dm-haiku ==0.0.5
  • flax ==0.3.6
  • jax ==0.2.21
  • ml-collections ==0.1.1
  • numpy >=1.18.0
  • tqdm >=4.62.2
setup.py pypi