Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.2%) to scientific vocabulary
Keywords
Repository
HTRVX : HTR Validation with XSD
Basic Info
Statistics
- Stars: 2
- Watchers: 2
- Forks: 1
- Open Issues: 5
- Releases: 2
Topics
Metadata Files
README.md

HTRVX : HTR Validation for eXtra-quality controlled documents
HTRVX - pronounced Ashterux - allows for quality control of XML using XSD schema validation, Segmonto validation and other verifications.
How to install
Simply run pip install htrvx
How to run
The basic way to run the script is htrvx PATHTOFILES --format FORMAT, eg. htrvx ./tests/test_data/page/*.xml --format page
Each verification is an opt-in verification: you need to express the fact that you want to check it.
--segmontowill check for Segmonto compliancy- You can use your own vocabulary or a restricted Segmonto vocabulary by using
--zone ZONENAMEand--line LINENAMEsuch ashtrvx [...] --line DefaultLine --line HeadingLine --zone MainZone - You can use
--allow-untaggedwith eitherline,zoneorbothso that zones without type are allowed. If you want to limit such lines or zone, combine it with--max-untagged-zones Nor--max-untagged-lines Nwhere N is the number of allowed occurrences.
- You can use your own vocabulary or a restricted Segmonto vocabulary by using
--xsdwill check if the data are compliant with XML Schemas--check-emptywill check if regions have no lines or if lines have no text--check-emptycan be refined with--raise-emptyto throw an error if empty elements are found, otherwise it's simply reported. =--check-imagechecks for link in the XML. Link are checked relatively to the XML file, ie. if XML file ./data/element.xml points to file.jpeg, file ./data/file.jpeg is expected to exist.
Other parameters mainly have to do with verbosity: --verbose displays details about errors, --group groups errors (instead of showing one line per error, groups by error types).
| Parameters | Default | Function | |--------------------------|---------|------------------------------------------------------------------------------------------| | -v, --verbose | False | Prints more information | | -f, --format [alto,page] | alto | Format of files | | -s, --segmonto | False | Apply Segmonto Zoning verification | | -e, --check-empty | False | Check for empty lines or empty zones | | -r, --raise-empty | False | Warns but not fails if empty lines or empty zones are found | | -x, --xsd | False | Apply XSD Schema verification | | -g, --group | False | Group error types (reduce verbosity) | | -i, --check-image | False | Check if the image link in the XML points to the right path | | -l, --verbose-level | zen | Level of details and amount of color shown in the logs (see below). | | --zone TEXT | None | Provide a custom zone to control zone types instead of Segmonto | | --line TEXT | None | Provide a custom line to control Line types instead of Segmonto |
Verbosity levels
minimal: shows only failing tests, no details.low: shows only failing test and their details, such as which lines fails in a file.zen(default): shows all tests and their details, but displays only one color (red for errors).all: shows everything.
Github Action code
If you want to add this to your github repository, as a continuous integration workflow, add a file htrux.yml at in the path .github/workflows of your repository.
```yaml
This workflow will install Python dependencies, run tests and lint with a single version of Python
For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions
name: HTRVX
on: [push, pull_request] # You can edit this of course !
jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Set up Python 3.8 uses: actions/setup-python@v2 with: python-version: 3.8 - name: Install dependencies run: | python -m pip install --upgrade pip pip install htrvx - name: Run HTRVX run: | htrvx --verbose --group --format alto --segmonto --xsd --check-empty --raise-empty UNIX/Path/to/*/your/.xml
```
Logo by Alix Chagué.
Owner
- Name: HTR United
- Login: HTR-United
- Kind: organization
- Location: France
- Website: https://htr-united.github.io
- Repositories: 21
- Profile: https://github.com/HTR-United
Citation (CITATION.CFF)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Clérice
given-names: Thibault
orcid: https://orcid.org/0000-0002-0136-4434
- family-names: Pinche
given-names: Ariane
orcid: https://orcid.org/0000-0002-7843-5050
title: "HTRVX, HTR Validation with XSD"
version: 0.0.1
doi: 10.5281/zenodo.5359963
date-released: 2021-09-01
url: "https://github.com/HTR-United/HTRVX"
GitHub Events
Total
Last Year
Committers
Last synced: almost 3 years ago
All Time
- Total Commits: 61
- Total Committers: 4
- Avg Commits per committer: 15.25
- Development Distribution Score (DDS): 0.098
Top Committers
| Name | Commits | |
|---|---|---|
| Thibault Clérice | l****e@g****m | 55 |
| Alix Chagué | 3****z@u****m | 3 |
| Thibault Clérice | 1****e@u****m | 2 |
| Pauline Jacsont | 6****c@u****m | 1 |
Issues and Pull Requests
Last synced: over 1 year ago
All Time
- Total issues: 14
- Total pull requests: 9
- Average time to close issues: 24 days
- Average time to close pull requests: about 1 hour
- Total issue authors: 6
- Total pull request authors: 3
- Average comments per issue: 1.71
- Average comments per pull request: 0.33
- Merged pull requests: 9
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- PonteIneptique (7)
- gabays (3)
- matgille (1)
- alix-tz (1)
- elodiepaupe (1)
- sven-nm (1)
Pull Request Authors
- PonteIneptique (7)
- PaulineJac (1)
- alix-tz (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 285 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 19
- Total maintainers: 1
pypi.org: htrvx
HTRVX, HTR Validation with XSD
- Homepage: https://github.com/htr-united/htrvx
- Documentation: https://htrvx.readthedocs.io/
- License: MIT
-
Latest release: 0.0.19
published over 2 years ago