https://github.com/centrefordigitalhumanities/historic-hebrew-dates

Python library and console application for extracting Hebrew and Aramaic dates from historic texts.

https://github.com/centrefordigitalhumanities/historic-hebrew-dates

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.9%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Python library and console application for extracting Hebrew and Aramaic dates from historic texts.

Basic Info
  • Host: GitHub
  • Owner: CentreForDigitalHumanities
  • License: bsd-3-clause
  • Language: Python
  • Default Branch: develop
  • Size: 3.11 MB
Statistics
  • Stars: 1
  • Watchers: 4
  • Forks: 0
  • Open Issues: 36
  • Releases: 0
Created over 7 years ago · Last pushed 12 months ago
Metadata Files
Readme License

README.md

WARNING: this library is very rough around the edges

Historic Hebrew Dates

Python library and console application for extracting Hebrew and Aramaic dates from historic texts. It includes a graphical editor to specify, modify and test the search patterns.

Running from the Console

```bash $ python -m historichebrewdates שבע מאות וחמישים וארבע

7100+510+4 754 ```

From Code

```python from historichebrewdates import createparsers hebrew = createparsers('hebrew')

result = hebrew['numerals'].parse('שבע מאות וחמישים וארבע') print(result[0][0].value) # ((7100+510)+4) print(result[0][0].evaluated) # 754 ```

Getting the Editor to Work

Using Vagrant

On a machine without developer tools it's probably most convenient to use Vagrant for running the editor.

Once this has been setup run the following from a terminal at the root directory of this project:

bash vagrant up

Wait for everything to be ready (can take a few minutes), then it should be possible to go to http://localhost:4200. Changes made to the patterns can be send back to this repository using Git.

Locally

(If you want you could setup a virtual environment first).

yarn yarn start

Go to http://localhost:4200.

How Does it Work?

Dates consist of different formats and constituent parts, e.g.:

  • thirteenth of September, 2019
  • September 13, twenty-nineteen

These different formats can be matched using a list of patterns:

  • {day:ordinal} of {month}, {year:number}
  • {month} {day:number}, {year:number}

Patterns can also be derived automatically using an annotated corpus (see annotated_corpus.py).

The patterns are text strings with optional placeholder references for other patterns. Those references consist of a name (e.g. month, day, year) and a type (number, ordinal, month...). It is also possible to reference to preceding patterns by their type name or all preceding patterns using a numbered reference (e.g. {1}).

The matched values are then available for the expression, which can be evaluated using the evaluation function which has been specified for the pattern type.

For example for numbers:

| type | pattern | value | | ---- | ------- | ----- | | A | one | 1 | | A | two | 2 | | A | ... | ...| | A | nine | 9 | | B | twenty | 20 | | B | thirty | 30 | | B | ... | ... | | B | ninety | 90 | | C | {big:B}-{small:A} | ({big}+{small}) |

This could match and evaluate forty-two.

The text is tokenized using all the tokens found in the patterns. If a word is of an unknown token, the system will try to split it up in multiple tokens. This way it can work with words which are written together.

It is also possible to specify in the search text that parts are missing by using a question mark. Those will be filled with all the possible known tokens.

Search is then done using a chart parser. The parser goes through all the patterns, matches them against all the possible (sub) token interpretations and finally returns all the possible matches.

The patterns are specified in historic_hebrew_dates/patterns and can be edited using a graphical web interface.

Supporting Another Language

Copying, renaming and editing the .json file of another language is enough to get started. Once this has been done you can specify the patterns using the editor.

Owner

  • Name: Centre for Digital Humanities
  • Login: CentreForDigitalHumanities
  • Kind: organization
  • Email: cdh@uu.nl
  • Location: Netherlands

Interdisciplinary centre for research and education in computational and data-driven methods in the humanities.

GitHub Events

Total
  • Push event: 1
Last Year
  • Push event: 1

Dependencies

package.json npm
  • cross-env ^5.2.0
web-ui/package.json npm
  • @angular-devkit/build-angular ~0.13.0 development
  • @angular/cli ~7.3.8 development
  • @angular/compiler-cli ~7.2.0 development
  • @angular/language-service ~7.2.0 development
  • @types/jasmine ~2.8.8 development
  • @types/jasminewd2 ~2.0.3 development
  • @types/node ~8.9.4 development
  • codelyzer ~4.5.0 development
  • jasmine-core ~2.99.1 development
  • jasmine-spec-reporter ~4.2.1 development
  • karma ~4.0.0 development
  • karma-chrome-launcher ~2.2.0 development
  • karma-coverage-istanbul-reporter ~2.0.1 development
  • karma-jasmine ~1.1.2 development
  • karma-jasmine-html-reporter ^0.2.2 development
  • protractor ~5.4.0 development
  • ts-node ~7.0.0 development
  • tslint ~5.11.0 development
  • typescript ~3.2.2 development
  • @angular/animations ~7.2.0
  • @angular/cdk ^7.3.7
  • @angular/common ~7.2.0
  • @angular/compiler ~7.2.0
  • @angular/core ~7.2.0
  • @angular/forms ~7.2.0
  • @angular/platform-browser ~7.2.0
  • @angular/platform-browser-dynamic ~7.2.0
  • @angular/router ~7.2.0
  • @fortawesome/angular-fontawesome ^0.3.0
  • @fortawesome/fontawesome-svg-core ^1.2.18
  • @fortawesome/free-solid-svg-icons ^5.8.2
  • add ^2.0.6
  • bulma ^0.7.5
  • core-js ^2.5.4
  • primeicons ^1.0.0
  • primeng ^7.1.3
  • rxjs ~6.3.3
  • service ^0.1.4
  • tslib ^1.9.0
  • yarn ^1.17.3
  • zone.js ~0.8.26
web-ui/yarn.lock npm
  • 984 dependencies
yarn.lock npm
  • cross-env 5.2.0
  • cross-spawn 6.0.5
  • is-windows 1.0.2
  • isexe 2.0.0
  • nice-try 1.0.5
  • path-key 2.0.1
  • semver 5.7.0
  • shebang-command 1.2.0
  • shebang-regex 1.0.0
  • which 1.3.1
requirements.in pypi
  • flask *
  • mypy *
  • pandas *
  • ply *
  • python-bidi *
  • pyyaml *
  • xlrd *
requirements.txt pypi
  • click ==7.0
  • flask ==1.1.1
  • itsdangerous ==1.1.0
  • jinja2 ==2.10.1
  • markupsafe ==1.1.1
  • mypy ==0.720
  • mypy-extensions ==0.4.1
  • numpy ==1.17.2
  • pandas ==0.25.1
  • ply ==3.11
  • python-bidi ==0.4.2
  • python-dateutil ==2.8.0
  • pytz ==2019.2
  • pyyaml ==5.1.2
  • six ==1.12.0
  • typed-ast ==1.4.0
  • typing-extensions ==3.7.4
  • werkzeug ==0.15.6
  • xlrd ==1.2.0
setup.py pypi