gt-mufilevelrules
OCR-D-Level-Rules can be created automatically with gt-MufiLevelRules from the encodings published by MUFI: The Medieval Unicode Font Initiative.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.6%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
OCR-D-Level-Rules can be created automatically with gt-MufiLevelRules from the encodings published by MUFI: The Medieval Unicode Font Initiative.
Basic Info
- Host: GitHub
- Owner: OCR-D
- License: gpl-3.0
- Language: XSLT
- Default Branch: main
- Homepage: https://tboenig.github.io/gt-MufiLevelRules/
- Size: 1.21 MB
Statistics
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
gt-MufiLevelRules
Creates OCR-D Ground-Truth Transcription Level Rules automatically from the encodings published by MUFI: The Medieval Unicode Font Initiative.
The resulting OCR-D level rules conform to the OCR-D specification. These rules can be used for substitutions or level checks, among other things.
Note: - There may not always be a definition for every level, esp. on level 1. - OCR-D will try to fill in these gaps manually or automatically. The automated completion is based on the unicruft program. - For this reason, using the rules for automatic character normalization from level 3 or level 2 to level 1 is currently not recommended before manually checking and correcting the corresponding rules.
Download the Rules
🚦 You can download the set of rules here. 🚦 - select the corresponding rule file: rules directory - as zip release file: latest Releases
Recreation of the rules
copy or clone the repository.
git clone https://github.com/tboenig/gt-MufiLevelRules.gitInstall Saxon for XSL Transformations v3.0. Then simply run with:
java -jar saxon-he-XX.jar -xsl:scripts/MufiGTLevelRules2.xsl -s:scripts/MufiGTLevelRules.xsl output=characters merge=yes
Parameters:
- output characters -> create the rules, all rules are saved under directory: [directory]/rules/characters
- merge yes -> create the megarules, all rules in one file. Megarules saved under directoy [directory]/rules
The result of the conversion can be found in the directory: [directory]/rules/characters.
- Output Format:
- xml
- json
The script uses:
the MUFI rules [new Version] and MUFI rules old-Version
a summary of the following additional rules from the OCR-D Ground-Truth Transcription Guide, which have priority (take precendence over MUFI rules where applicable):
Description of the rules
JSON Format
All JSON files (both the pure MUFI rules and the final result) follow the same schema.
Example:
JSON
{"ruleset":[
...
{"rule": ["ä", "aͤ", ""], "type": "level"}
...
]}
- Each rule has a key:
ruleand a list of values - The values define the character representation on each of the 3 transcription levels:
- Level 1 is at the first position
- Level 2 is in the second place
- Level 3 is in the third place
- Additional key-value combinations: ...
- Character values can be empty to signify there is no definition (representation) at that level.
XML Format
XML
<levelrules>
<ruleset>
<range>AlphPresForm</range>
<desc>LATIN SMALL LIGATURE FF</desc>
<rule>ff</rule>
<rule>ff</rule>
<rule>ff</rule>
<type>level</type>
</ruleset>
</levelrules>
- Elements
- <levelrules> = root element of a gt-MufiLevelRules dataset
- <ruleset> = root element of a ruleset
- <range> = category of characters
- <desc> = general description of the sign or symbol
- <rule>
- Level 1: rule[position() = 1]
- Level 2: rule[position() = 2]
- Level 3: rule[position() = 3]
The category of characters <range> and the general description of the sign or symbol <desc> were imported from the MUFI dataset.
The JSONPaths are:
- range : $['..']['range']
- desc : $['..']['description']
See Also
- MUFI: The Medieval Unicode Font Initiative https://mufi.info/
- MUFI's data as JSON export https://gefin.ku.dk/q.php?q=mufiexport
- OCR-D Ground Truth Transcription Guidelines https://ocr-d.de/en/gt-guidelines/trans/
- Ground Truth level overview https://ocr-d.de/en/gt-guidelines/trans/trLevels.html
Owner
- Name: OCR-D
- Login: OCR-D
- Kind: organization
- Website: https://ocr-d.de
- Twitter: OCR_D_community
- Repositories: 27
- Profile: https://github.com/OCR-D
DFG-Koordinierungsprojekt zur Weiterentwicklung von Verfahren der Optical Character Recognition
Citation (CITATION.cff)
cff-version: 1.2.0
title: gt-MufiLevelRules
message: If you use this dataset, please cite it using the metadata from this file.
type: dataset
authors:
- given-names: Matthias
family-names: Boenig
orcid: 'https://orcid.org/0000-0003-4615-4753'
repository-code: 'https://github.com/OCR-D/gt-MufiLevelRules'
url: 'https://github.com/OCR-D/gt-MufiLevelRules'
abstract: Creates OCR-D Ground-Truth Transcription Level Rules automatically from the encodings published by the Medieval Unicode Font Initiative (MUFI). The generated OCR-D level rules conform to the OCR-D specification. These rules can be used for substitutions or level checks, among other things.
keywords:
- ocr-d
- repository
- ground-truth
- level classification
- level checks
- guidelines
- transcription
license: LGPL-3.0
commit: v1.2.5
version: 65_v1.2.5
date-released: '2024-04-18'
GitHub Events
Total
Last Year
Committers
Last synced: over 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| Matthias Boenig | m****g@g****t | 175 |
| github-actions[bot] | 4****] | 9 |
| Robert Sachunsky | 3****y | 3 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 11 months ago
All Time
- Total issues: 1
- Total pull requests: 2
- Average time to close issues: 8 days
- Average time to close pull requests: 25 days
- Total issue authors: 1
- Total pull request authors: 1
- Average comments per issue: 1.0
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- JamesIves/github-pages-deploy-action v4 composite
- actions/checkout v3 composite
- actions/create-release v1 composite
- actions/upload-release-asset v1 composite
- thedoctor0/zip-release master composite
- JamesIves/github-pages-deploy-action v4 composite
- actions/checkout v3 composite
- thedoctor0/zip-release master composite