duosearch
Search engine for historical documents, which uses ElasticSearch and deep neural networks to address this problem.
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.7%) to scientific vocabulary
Repository
Search engine for historical documents, which uses ElasticSearch and deep neural networks to address this problem.
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
DuoSearch
A novel search engine for historical newspapers utilizing ElasticSearch and machine learning methods. Code for the paper https://arxiv.org/abs/2305.19392
Purpose
The purpose of this research is to build a proof of concept search engine which addresses the two issues: mistakes in the OCR and orthographic variety within language reforms in Bulgarian from 1850s till 1945.
Scope
This is a PoC version and can be used for collections of digitised historical documents within the same time span. The tool uses dictionaries for Bulgarian but this can be easily adapted for other languages as well.
Target audience
This research would be useful for anyone who is interested in search tools in collections of historical documents/newspapaers containing errors and/or linguistic variance. The target user of the engine is a library in Bulgaria, but can be adapted and used by external users as well.
Architecture

Citation
Beshirov, A., Hadzhieva, S., Dobreva, M., & Koychev, I. (2022). DuoSearch: A Novel Search Engine for Bulgarian Historical Documents. Proceedings of the European Conference on Information Retrieval. https://doi.org/10.1007/978-3-030-99739-7_31
Owner
- Login: angelbeshirov
- Kind: user
- Location: Sofia, Bulgaria
- Repositories: 3
- Profile: https://github.com/angelbeshirov
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this code, please cite our paper:"
title: "DuoSearch: A Novel Search Engine for Bulgarian Historical Documents"
authors:
- family-names: Beshirov
given-names: Angel
orcid: https://orcid.org/0000-0002-0684-2730
- family-names: Hadzhieva
given-names: Suzan
orcid: https://orcid.org/0000-0002-1480-1437
- family-names: Dobreva
given-names: Milena
orcid: https://orcid.org/0000-0002-2579-7541
- family-names: Koychev
given-names: Ivan
orcid: https://orcid.org/0000-0003-3919-030X
date-released: 2022-04-05
doi: 10.1007/978-3-030-99739-7_31
preferred-citation:
type: article
title: "DuoSearch: A Novel Search Engine for Bulgarian Historical Documents"
authors:
- family-names: Beshirov
given-names: Angel
- family-names: Hadzhieva
given-names: Suzen
- family-names: Dobreva
given-names: Milena
- family-names: Koychev
given-names: Ivan
journal: "Proceedings of the European Conference on Information Retrieval"
year: 2022
doi: 10.1007/978-3-030-99739-7_31
GitHub Events
Total
- Push event: 2
Last Year
- Push event: 2