htrvx

HTRVX : HTR Validation with XSD

https://github.com/htr-united/htrvx

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.2%) to scientific vocabulary

Keywords

github-actions tool
Last synced: 4 months ago · JSON representation ·

Repository

HTRVX : HTR Validation with XSD

Basic Info
  • Host: GitHub
  • Owner: HTR-United
  • License: gpl-3.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 511 KB
Statistics
  • Stars: 2
  • Watchers: 2
  • Forks: 1
  • Open Issues: 5
  • Releases: 2
Topics
github-actions tool
Created over 4 years ago · Last pushed over 2 years ago
Metadata Files
Readme License Citation

README.md

HTRVX : HTR Validation for eXtra-quality controlled documents

Test library

HTRVX - pronounced Ashterux - allows for quality control of XML using XSD schema validation, Segmonto validation and other verifications.

How to install

Simply run pip install htrvx

How to run

The basic way to run the script is htrvx PATHTOFILES --format FORMAT, eg. htrvx ./tests/test_data/page/*.xml --format page

Each verification is an opt-in verification: you need to express the fact that you want to check it.

  • --segmonto will check for Segmonto compliancy
    • You can use your own vocabulary or a restricted Segmonto vocabulary by using --zone ZONENAME and --line LINENAME such as htrvx [...] --line DefaultLine --line HeadingLine --zone MainZone
    • You can use --allow-untagged with either line, zone or both so that zones without type are allowed. If you want to limit such lines or zone, combine it with --max-untagged-zones N or --max-untagged-lines N where N is the number of allowed occurrences.
  • --xsd will check if the data are compliant with XML Schemas
  • --check-empty will check if regions have no lines or if lines have no text
    • --check-empty can be refined with --raise-empty to throw an error if empty elements are found, otherwise it's simply reported. = --check-image checks for link in the XML. Link are checked relatively to the XML file, ie. if XML file ./data/element.xml points to file.jpeg, file ./data/file.jpeg is expected to exist.

Other parameters mainly have to do with verbosity: --verbose displays details about errors, --group groups errors (instead of showing one line per error, groups by error types).

| Parameters | Default | Function | |--------------------------|---------|------------------------------------------------------------------------------------------| | -v, --verbose | False | Prints more information | | -f, --format [alto,page] | alto | Format of files | | -s, --segmonto | False | Apply Segmonto Zoning verification | | -e, --check-empty | False | Check for empty lines or empty zones | | -r, --raise-empty | False | Warns but not fails if empty lines or empty zones are found | | -x, --xsd | False | Apply XSD Schema verification | | -g, --group | False | Group error types (reduce verbosity) | | -i, --check-image | False | Check if the image link in the XML points to the right path | | -l, --verbose-level | zen | Level of details and amount of color shown in the logs (see below). | | --zone TEXT | None | Provide a custom zone to control zone types instead of Segmonto | | --line TEXT | None | Provide a custom line to control Line types instead of Segmonto |

Verbosity levels

  • minimal: shows only failing tests, no details.
  • low: shows only failing test and their details, such as which lines fails in a file.
  • zen (default): shows all tests and their details, but displays only one color (red for errors).
  • all: shows everything.

Github Action code

If you want to add this to your github repository, as a continuous integration workflow, add a file htrux.yml at in the path .github/workflows of your repository.

```yaml

This workflow will install Python dependencies, run tests and lint with a single version of Python

For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions

name: HTRVX

on: [push, pull_request] # You can edit this of course !

jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Set up Python 3.8 uses: actions/setup-python@v2 with: python-version: 3.8 - name: Install dependencies run: | python -m pip install --upgrade pip pip install htrvx - name: Run HTRVX run: | htrvx --verbose --group --format alto --segmonto --xsd --check-empty --raise-empty UNIX/Path/to/*/your/.xml

```


Logo by Alix Chagué.

Owner

  • Name: HTR United
  • Login: HTR-United
  • Kind: organization
  • Location: France

Citation (CITATION.CFF)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Clérice
    given-names: Thibault
    orcid: https://orcid.org/0000-0002-0136-4434
  - family-names: Pinche
    given-names: Ariane
    orcid: https://orcid.org/0000-0002-7843-5050
title: "HTRVX, HTR Validation with XSD"
version: 0.0.1
doi: 10.5281/zenodo.5359963
date-released: 2021-09-01
url: "https://github.com/HTR-United/HTRVX"

GitHub Events

Total
Last Year

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 61
  • Total Committers: 4
  • Avg Commits per committer: 15.25
  • Development Distribution Score (DDS): 0.098
Top Committers
Name Email Commits
Thibault Clérice l****e@g****m 55
Alix Chagué 3****z@u****m 3
Thibault Clérice 1****e@u****m 2
Pauline Jacsont 6****c@u****m 1

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 14
  • Total pull requests: 9
  • Average time to close issues: 24 days
  • Average time to close pull requests: about 1 hour
  • Total issue authors: 6
  • Total pull request authors: 3
  • Average comments per issue: 1.71
  • Average comments per pull request: 0.33
  • Merged pull requests: 9
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • PonteIneptique (7)
  • gabays (3)
  • matgille (1)
  • alix-tz (1)
  • elodiepaupe (1)
  • sven-nm (1)
Pull Request Authors
  • PonteIneptique (7)
  • PaulineJac (1)
  • alix-tz (1)
Top Labels
Issue Labels
Pull Request Labels
enhancement (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 285 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 19
  • Total maintainers: 1
pypi.org: htrvx

HTRVX, HTR Validation with XSD

  • Versions: 19
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 285 Last month
Rankings
Dependent packages count: 10.1%
Downloads: 18.5%
Average: 20.9%
Dependent repos count: 21.6%
Forks count: 22.6%
Stargazers count: 31.9%
Maintainers (1)
Last synced: 4 months ago