https://github.com/acdh-oeaw/shawi-data

Data of the project "The Shawi-type Arabic dialects (FWF P 33574)".

https://github.com/acdh-oeaw/shawi-data

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.8%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Data of the project "The Shawi-type Arabic dialects (FWF P 33574)".

Basic Info
  • Host: GitHub
  • Owner: acdh-oeaw
  • License: other
  • Language: HTML
  • Default Branch: main
  • Homepage:
  • Size: 435 MB
Statistics
  • Stars: 1
  • Watchers: 6
  • Forks: 2
  • Open Issues: 27
  • Releases: 0
Created about 4 years ago · Last pushed 10 months ago
Metadata Files
Readme License

README.md

SHAWI Transcription Repository

This git repository hosts the transcription data of the project The Shawi-type Arabic dialects (FWF P 33574).

PI: Stephan Procházka (University of Vienna)
National Cooperation Partner: Charly Mörth (Austrian Academy of Sciences)

Status

THIS IS PRELIMINARY DATA AND COPYRIGHTED MATERIAL!

If you want to use any material in this repository please contact PI Stephan Procházka (University of Vienna).

This will change at the end of the project.

Directory Structure

| Directory | Content | Remarks | | --------------------- | -------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | 001_src | Original sources | Source documents (e.g. raw transcriptions) | | 080_scripts_generic | Conversion Scripts | mostly the ELAN2TEI conversion script (implemented in Python) which generates the initial TEI data prior to tokenization based on the ELAN transcription documents in 122_elan | | 082_scripts_xsl | XSLT scripts | XSLT scripts | | 103_tei_w | TEI-XML with tokens | This is where ELAN2TEI puts its output. Re-running TEI2ELAN will overwrite all content in this directory, so do not do any manual changes here but copy the file to 010_manannot beforehand. | | 010_manannot | manually annotated TEI-XML | Tokenized TEI documents from 103_tei_w which are manually annotated. | | 802_tei_odd | TEI customization (ODD) | This is the source of truth for the SHAWI Schema and the HTML documentation generated from it. | | 130_vert_plain | NoSketch Engine Verticals | NoSketch Engine text verticals | | 803_RNG-schematron | Schemas | derived from the ODD in 802_tei_odd | | 804_xsd | Schemas | derived from the ODD in 802_tei_odd | | 850_docs | Documentation | Further data documentation, esp. the HTML documentation of the ODD |

The oXygen project shawi.xpr contains the configuration for various transformation scenarios.

The directories css, html, js and xsl are used by the TEI Enricher.

Other data locations

  • Master files of the audio recordings are stored on the project's network share at the University of Vienna
  • the metadata spreadsheet is hosted on Sharepoint.
  • The SHAWI Dictionary is curated in (BaseX Curation)[https://redmine.acdh.oeaw.ac.at/issues/11318].

General Workflow

For more information refer to the SHAWI Data Processing and Curation Document

The following steps happen before data is ingested into this repository:

  • fieldwork (recording audio etc.) – The recordings so far cover only material collected in previous campaigns
  • collecting metadata: – This is collected at curated in [the metadata spreadsheet].

Workflow steps reflected in the data in this repository:

  • Transcription and translation – Curators segment the audio recordings into sensible sets of "utterances" and transcribe and translate them using ELAN. When transcription has finished, the curator adds the ELAN document(s) to 122_elan and pushes the changes to git.
  • Tokenization This push triggers the ELAN2TEI conversion workflow which takes all *.eaf files in 122_ELAN and transforms them into tokenized standalone TEI documents, storing them under 103_tei_w. Additionally, a TEI Corpus file is generated which includes corpus level metadata and controlled vocabularies.
  • Annotation After transformation to TEI, curators annotate the texts using the TEI_enricher and store the results under 010_manannot.
  • Conversion to NoSke Verticals During the tokenization process, a NoSke-compatible vertical is created which incorporates the annotations found in `010_manannot .
  • Deployment Inteagration of deployment in the workflow TBD

Re-Deploy SHAWI Website

  • Start GitHub Workflow in the vicav-app repository https://github.com/acdh-oeaw/vicav-app:
    • choose generate-workflow_vars-shawi and
    • click re-run this job
    • wait until it is done.
  • Go to ACDH-CH Rancher https://rancher.acdh-dev.oeaw.ac.at/dashboard/home and
    • click on AC2 at the upper left corner of the screen or acdh-ch-cluster-2
    • then search for vicav-test in the window in the upper right corner of the screen
    • click on workloads (menu on the left) and on deployments
    • now choose shawi-app-devel and
    • click redeploy (three dots on the right)
    • wait until it is done

Owner

  • Name: Austrian Centre for Digital Humanities & Cultural Heritage
  • Login: acdh-oeaw
  • Kind: organization
  • Email: acdh@oeaw.ac.at
  • Location: Vienna, Austria

GitHub Events

Total
  • Issues event: 81
  • Watch event: 1
  • Delete event: 1
  • Issue comment event: 85
  • Push event: 550
  • Pull request review event: 1
  • Pull request event: 2
  • Create event: 1
Last Year
  • Issues event: 81
  • Watch event: 1
  • Delete event: 1
  • Issue comment event: 85
  • Push event: 550
  • Pull request review event: 1
  • Pull request event: 2
  • Create event: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 125
  • Total pull requests: 11
  • Average time to close issues: 3 months
  • Average time to close pull requests: 2 days
  • Total issue authors: 6
  • Total pull request authors: 3
  • Average comments per issue: 1.42
  • Average comments per pull request: 0.09
  • Merged pull requests: 9
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 58
  • Pull requests: 2
  • Average time to close issues: about 1 month
  • Average time to close pull requests: about 1 hour
  • Issue authors: 4
  • Pull request authors: 1
  • Average comments per issue: 1.45
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • rausch-supola (75)
  • dasch124 (32)
  • charlymo (8)
  • VeronikaEngler (5)
  • simar0at (4)
  • kisram (1)
Pull Request Authors
  • rausch-supola (9)
  • simar0at (1)
  • MauPalantir (1)
Top Labels
Issue Labels
dictionary (51) corpus (27) schema (12) data-processing (7) bug (3) standoff (3) data curation (3) enhancement (2) documentation (1) question (1)
Pull Request Labels

Dependencies

080_scripts_generic/080_01_ELAN2TEI/Pipfile pypi
  • importlib-resources *
  • jupyterlab *
  • nbconvert *
  • pexpect *
  • saxonpy ==0.0.2
  • sharepy *
  • zipp *
080_scripts_generic/080_01_ELAN2TEI/Pipfile.lock pypi
  • anyio ==3.6.1
  • argon2-cffi ==21.3.0
  • argon2-cffi-bindings ==21.2.0
  • asttokens ==2.0.5
  • attrs ==21.4.0
  • babel ==2.10.1
  • backcall ==0.2.0
  • beautifulsoup4 ==4.11.1
  • bleach ==5.0.0
  • certifi ==2022.5.18.1
  • cffi ==1.15.0
  • charset-normalizer ==2.0.12
  • colorama ==0.4.4
  • debugpy ==1.6.0
  • decorator ==5.1.1
  • defusedxml ==0.7.1
  • entrypoints ==0.4
  • executing ==0.8.3
  • fastjsonschema ==2.15.3
  • idna ==3.3
  • importlib-resources ==5.7.1
  • ipykernel ==6.13.1
  • ipython ==8.4.0
  • ipython-genutils ==0.2.0
  • jedi ==0.18.1
  • jinja2 ==3.1.2
  • json5 ==0.9.8
  • jsonschema ==4.6.0
  • jupyter-client ==7.3.4
  • jupyter-core ==4.10.0
  • jupyter-server ==1.17.1
  • jupyterlab ==3.4.3
  • jupyterlab-pygments ==0.2.2
  • jupyterlab-server ==2.14.0
  • markupsafe ==2.1.1
  • matplotlib-inline ==0.1.3
  • mistune ==0.8.4
  • nbclassic ==0.3.7
  • nbclient ==0.6.4
  • nbconvert ==6.5.0
  • nbformat ==5.4.0
  • nest-asyncio ==1.5.5
  • notebook ==6.4.12
  • notebook-shim ==0.1.0
  • packaging ==21.3
  • pandocfilters ==1.5.0
  • parso ==0.8.3
  • pexpect ==4.8.0
  • pickleshare ==0.7.5
  • prometheus-client ==0.14.1
  • prompt-toolkit ==3.0.29
  • psutil ==5.9.1
  • ptyprocess ==0.7.0
  • pure-eval ==0.2.2
  • pycparser ==2.21
  • pygments ==2.12.0
  • pyparsing ==3.0.9
  • pyrsistent ==0.18.1
  • python-dateutil ==2.8.2
  • pytz ==2022.1
  • pywin32 ==304
  • pywinpty ==2.0.5
  • pyzmq ==23.1.0
  • requests ==2.28.0
  • saxonpy ==0.0.2
  • send2trash ==1.8.0
  • setuptools ==62.3.4
  • sharepy ==2.0.0
  • six ==1.16.0
  • sniffio ==1.2.0
  • soupsieve ==2.3.2.post1
  • stack-data ==0.2.0
  • terminado ==0.15.0
  • tinycss2 ==1.1.1
  • tornado ==6.1
  • traitlets ==5.2.2.post1
  • urllib3 ==1.26.9
  • wcwidth ==0.2.5
  • webencodings ==0.5.1
  • websocket-client ==1.3.2
  • zipp ==3.8.0
080_scripts_generic/080_01_ELAN2TEI/requirements.txt pypi
  • Jinja2 ==3.0.3
  • MarkupSafe ==2.1.0
  • Pygments ==2.11.2
  • QtPy ==2.0.1
  • Send2Trash ==1.8.0
  • argon2-cffi ==21.3.0
  • argon2-cffi-bindings ==21.2.0
  • asttokens ==2.0.5
  • attrs ==21.4.0
  • backcall ==0.2.0
  • beautifulsoup4 ==4.10.0
  • bleach ==4.1.0
  • certifi ==2021.10.8
  • cffi ==1.15.0
  • charset-normalizer ==2.0.12
  • debugpy ==1.5.1
  • decorator ==5.1.1
  • defusedxml ==0.7.1
  • entrypoints ==0.4
  • executing ==0.8.3
  • idna ==3.3
  • ipykernel ==6.9.1
  • ipython ==8.1.1
  • ipython-genutils ==0.2.0
  • ipywidgets ==7.6.5
  • jedi ==0.18.1
  • jsonschema ==4.4.0
  • jupyter ==1.0.0
  • jupyter-client ==7.1.2
  • jupyter-console ==6.4.3
  • jupyter-core ==4.9.2
  • jupyterlab-pygments ==0.1.2
  • jupyterlab-widgets ==1.0.2
  • matplotlib-inline ==0.1.3
  • mistune ==0.8.4
  • nbclient ==0.5.13
  • nbconvert ==6.4.4
  • nbformat ==5.2.0
  • nest-asyncio ==1.5.4
  • notebook ==6.4.12
  • packaging ==21.3
  • pandocfilters ==1.5.0
  • parso ==0.8.3
  • pexpect ==4.8.0
  • pickleshare ==0.7.5
  • prometheus-client ==0.13.1
  • prompt-toolkit ==3.0.28
  • ptyprocess ==0.7.0
  • pure-eval ==0.2.2
  • pycparser ==2.21
  • pyparsing ==3.0.7
  • pyrsistent ==0.18.1
  • python-dateutil ==2.8.2
  • pyzmq ==22.3.0
  • qtconsole ==5.2.2
  • requests ==2.27.1
  • saxonpy ==0.0.2
  • sharepy ==2.0.0
  • six ==1.16.0
  • soupsieve ==2.3.1
  • stack-data ==0.2.0
  • terminado ==0.13.3
  • testpath ==0.6.0
  • tornado ==6.1
  • traitlets ==5.1.1
  • urllib3 ==1.26.8
  • wcwidth ==0.2.5
  • webencodings ==0.5.1
  • widgetsnbextension ==3.5.2
.github/workflows/convert-to-tei.yaml actions
  • actions/checkout v3 composite
  • ad-m/github-push-action master composite
  • docker/build-push-action v3 composite
  • docker/login-action v2 composite
  • docker/metadata-action v4 composite
  • docker/setup-buildx-action v2 composite
nosketchengine/Dockerfile docker
  • acdhch/noske 5.58.1-2.214.1-open build