https://github.com/acdh-oeaw/bd-data

https://github.com/acdh-oeaw/bd-data

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.2%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: acdh-oeaw
  • Language: Python
  • Default Branch: main
  • Size: 19.6 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 11 months ago · Last pushed 11 months ago
Metadata Files
Readme

README.md

BD-DATA

Repo to process data from

Czeitschner, U., & Krautgartner, B. (2018). travel!digital Collection (U. Czeitschner & B. Krautgartner, Eds.; Version 1) [Data set]. ARCHE. https://hdl.handle.net/21.11115/0000-000C-29F3-4

The idea is to process the data in a way it can be easily (re)published using dse-static-coociecutter

Tasks

  • Split files: Each page should be contained by in a single TEI/XML document
  • Add proper headers: Each TEI/XML file should have a propert TEI-Header

How to

  • clone the repo
  • download Baedeker TEI/XML files from ARCHE and move them into a folder xml in the repo's root directory
  • run ./build.sh

License

  • Milestone class found in milestone.py and written by Zeth is licensed under BSD Licence and was adapted by Daniel Elsner.
  • All other code in the Repo is under MIT-License.
  • All data in the Repo is under https://creativecommons.org/licenses/by/4.0/.

Owner

  • Name: Austrian Centre for Digital Humanities & Cultural Heritage
  • Login: acdh-oeaw
  • Kind: organization
  • Email: acdh@oeaw.ac.at
  • Location: Vienna, Austria

GitHub Events

Total
  • Issues event: 1
  • Push event: 5
  • Pull request event: 2
  • Create event: 3
Last Year
  • Issues event: 1
  • Push event: 5
  • Pull request event: 2
  • Create event: 3

Dependencies

pyproject.toml pypi
  • acdh-tei-pyutils >=1.6
  • acdh-xml-pyutils >=1.1.1
  • pandas >=2.3.1
  • tqdm >=4.67.1
uv.lock pypi
  • acdh-handle-pyutils 0.4.2
  • acdh-tei-pyutils 1.6
  • acdh-xml-pyutils 1.1.1
  • appnope 0.1.4
  • asttokens 3.0.0
  • bd-data 0.1.0
  • certifi 2025.7.14
  • cffi 1.17.1
  • charset-normalizer 3.4.2
  • click 8.2.1
  • colorama 0.4.6
  • comm 0.2.2
  • debugpy 1.8.15
  • decorator 5.2.1
  • executing 2.2.0
  • idna 3.10
  • ipykernel 6.29.5
  • ipython 9.4.0
  • ipython-pygments-lexers 1.1.1
  • jedi 0.19.2
  • jupyter-client 8.6.3
  • jupyter-core 5.8.1
  • lxml 6.0.0
  • matplotlib-inline 0.1.7
  • nest-asyncio 1.6.0
  • numpy 2.3.1
  • packaging 25.0
  • pandas 2.3.1
  • parso 0.8.4
  • pexpect 4.9.0
  • platformdirs 4.3.8
  • prompt-toolkit 3.0.51
  • psutil 7.0.0
  • ptyprocess 0.7.0
  • pure-eval 0.2.3
  • pycparser 2.22
  • pygments 2.19.2
  • python-dateutil 2.9.0.post0
  • python-slugify 8.0.4
  • pytz 2025.2
  • pywin32 311
  • pyzmq 27.0.0
  • requests 2.32.4
  • ruff 0.12.3
  • six 1.17.0
  • stack-data 0.6.3
  • text-unidecode 1.3
  • tornado 6.5.1
  • tqdm 4.67.1
  • traitlets 5.14.3
  • tzdata 2025.2
  • urllib3 2.5.0
  • wcwidth 0.2.13