https://github.com/centrefordigitalhumanities/parseport
Dutch sentence parser for Spindle + Æthel (and maybe others in the future).
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.6%) to scientific vocabulary
Keywords from Contributors
Repository
Dutch sentence parser for Spindle + Æthel (and maybe others in the future).
Basic Info
- Host: GitHub
- Owner: CentreForDigitalHumanities
- License: bsd-3-clause
- Language: TypeScript
- Default Branch: develop
- Size: 22.7 MB
Statistics
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 30
- Releases: 3
Metadata Files
README.md
ParsePort
ParsePort is a web interface for two NLP-related (natural language processing) parsers and two associated pre-parsed text corpora, both developed at Utrecht University.
The Spindle parser is used to produce type-logical parses of Dutch sentences. It features a pre-parsed corpus of around 65.000 sentences (based on Lassy Small) called Æthel. These tools have been developed by dr. Konstantinos Kogkalidis as part of a research project conducted with prof. dr. Michaël Moortgat at Utrecht University.
The Minimalist Parser produces syntactic tree models of English sentences based on user input, creating syntax trees in the style of Chomskyan Minimalist Grammar. The parser has been developed by dr. Meaghan Fowlie at Utrecht University and comes with a pre-parsed corpus of 100 sentences taken from the Wall Street Journal. The tool used to visualize these syntax trees in an interactive way is Vulcan, developed by dr. Jonas Groschwitz, also at Utrecht University.
Running this application in Docker
In order to run this application you need a working installation of Docker and an internet connection. You will also need the source code from four other repositories. These must be located in the same directory as the parseport source code.
spindle-serverhosts the source code for a server with the Spindle parser;latex-servicecontains a LaTeX compiler that is used to export the Spindle parse results in PDF format;mg-parser-serverhas the source code for the Minimalist Grammar parser;vulcan-parseportis needed for the websocket-based webserver that hosts Vulcan, the visualization tool for MGParser parse results.
See the instructions in the README files of these repositories for more information on these codebases.
In addition, you need to add a configuration file named .env to the root directory of this project with at least the following settings. Use generated keys for the _KEY settings. Use 0 or 1 for the DJANGO_DEBUG setting, depending on whether you want to run the backend server in production or development mode.
properties
DJANGO_SECRET_KEY=<secret_key_here>
DJANGO_DEBUG=<0 for production, 1 for development>
MG_PARSER_SECRET_KEY=<secret_key_here>
VULCAN_SECRET_KEY=<secret_key_here>
In overview, your file structure should be as follows.
┌── parseport (this project)
| ├── compose.yaml
| ├── .env
| ├── frontend
| | └── Dockerfile
| └── backend
| ├── Dockerfile
| └── aethel_db
| └── data
| └── aethel.pickle
|
├── spindle-server
| ├── Dockerfile
| └── model_weights.pt
|
├── latex-service
| └── Dockerfile
|
├── mg-parser-server
| └── Dockerfile
|
└── vulcan-parseport
├── Dockerfile
└── app
└── standard.pickle
Note that you will need three data files in order to run this project.
model_weights.ptshould be put in the root directory of thespindle-serverproject. It can be downloaded from Yoda-link here.aethel.picklecontains the pre-parsed data for Æthel and should live atparseport/backend/aethel_db/data. You can find it in the zip archive here.standard.picklecontains the pre-parsed corpus for the Minimalist Parser. It should be placed in thevulcan-parseport/appdirectory. You can download it from Yoda-link here.
This application can be run in both production and development mode. Either mode will start a network of seven containers.
| Name | Description |
|-------------------|---------------------------------------------------|
| pp-nginx | Entry point and reverse proxy, exposes port 5001. |
| pp-ng | The frontend server (Angular). |
| pp-dj | The backend/API server (Django). |
| pp-spindle | The server hosting the Spindle parser. |
| pp-latex | The server hosting a LaTeX compiler. |
| pp-mg-parser | The server hosting the Minimalist Grammar parser. |
| pp-vulcan | The server hosting the Vulcan visualization tool. |
Start the Docker network in development mode by running the following command in your terminal.
bash
docker compose --profile dev up --build -d
For production mode, run the following instead.
bash
docker compose --profile prod up --build -d
The Spindle server needs to download several files before the parser is ready to receive input. You should wait a few minutes until the message App is ready! appears in the Spindle container logs.
Open your browser and visit your project at http://localhost:5001 to view the application.
Preparing for development
Note that the Aethel dataset will be loaded in every time the backend server restarts. To avoid slow feedback loops in a development environment, consider running python manage.py create_aethel_subset before starting the development server. This will take create a much smaller subset that takes less than a second to load.
Before you start
You need to install the following software:
- PostgreSQL >= 10, client, server and C libraries
- Python >= 3.10
- virtualenv
- WSGI-compatible webserver (deployment only)
- Visual C++ for Python (Windows only)
- Node.js >= 14.20.0 (>=20 for Macbook users, see below)
- Yarn
- WebDriver for at least one browser (only for functional testing)
How it works
This project integrates three isolated subprojects, each inside its own subdirectory with its own code, package dependencies and tests:
backend: the server side web application based on Django and DRF
frontend: the client side web application based on Angular
functional-tests: the functional test suite based on Selenium and pytest
Each subproject is configurable from the outside. Integration is achieved using "magic configuration" which is contained inside the root directory together with this README. In this way, the subprojects can stay truly isolated from each other.
If you are reading this README, you'll likely be working with the integrated project as a whole rather than with one of the subprojects in isolation. In this case, this README should be your primary source of information on how to develop or deploy the project. However, we recommend that you also read the "How it works" section in the README of each subproject.
Development
Quickstart
First time after cloning this project:
console
$ python bootstrap.py
This will set up several development systems, i.e.: - a python virtual environment, - install backend requirements in the virtual environment, - install frontend requirements - create a postgres database, - create a django superuser, - run django migrations, - set up git flow
This is just a preliminary script to get you started, check bootstrap.log in the parseport directory to see which of these steps you need to complete manually.
Running the application in development mode (hit ctrl-C to stop):
console
$ yarn start
This will run the backend and frontend applications, as well as their unittests, and watch all source files for changes. You can visit the frontend on http://localhost:8000/, the browsable backend API on http://localhost:8000/api/ and the backend admin on http://localhost:8000/admin/. On every change, unittests rerun, frontend code rebuilds and open browser tabs refresh automatically (livereload).
Installation for ARM-chips (Macbooks M1+)
When installing this application, ARM-chip user need to additionally run:
shell
brew install cmake llvm libomp
You will need to have homebrew installed to run this. These are the additional packages required to install pytorch on ARM-chips.
Recommended order of development
For each new feature, we suggested that you work through the steps listed below. This could be called a back-to-front or "bottom up" order. Of course, you may have reasons to choose otherwise. For example, if very precise specifications are provided, you could move step 8 to the front for a more test-driven approach.
Steps 1–5 also include updating the unittests. Only functions should be tested, especially critical and nontrivial ones.
- Backend model changes including migrations.
- Backend serializer changes and backend admin changes.
- Backend API endpoint changes.
- Frontend model changes.
- Other frontend unit changes (templates, views, routers, FSMs).
- Frontend integration (globals, event bindings).
- Run functional tests, repair broken functionality and broken tests.
- Add functional tests for the new feature.
- Update technical documentation.
For release branches, we suggest the following checklist.
- Bump the version number in the
package.jsonnext to this README. - Run the functional tests in production mode, fix bugs if necessary.
- Try using the application in production mode, look for problems that may have escaped the tests.
- Add regression tests (unit or functional) that detect problems from step 3.
- Work on the code until new regression tests from step 4 pass.
- Optionally, repeat steps 2–5 with the application running in a real deployment setup (see Deployment).
Commands for common tasks
The package.json next to this README defines several shortcut commands to help streamline development. In total, there are over 30 commands. Most may be regarded as implementation details of other commands, although each command could be used directly. Below, we discuss the commands that are most likely to be useful to you. For full details, consult the package.json.
Install the pinned versions of all package dependencies in all subprojects:
console
$ yarn
Run backend and frontend in production mode:
console
$ yarn start-p
Run the functional test suite:
console
$ yarn test-func [FUNCTIONAL TEST OPTIONS]
The functional test suite by default assumes that you have the application running locally in production mode (i.e., on port 4200). See Configuring the browsers and Configuring the base address in functional-tests/README for options.
Run all tests (mostly useful for continuous integration):
console
$ yarn test [FUNCTIONAL TEST OPTIONS]
Run an arbitrary command from within the root of a subproject:
console
$ yarn back [ARBITRARY BACKEND COMMAND HERE]
$ yarn front [ARBITRARY FRONTEND COMMAND HERE]
$ yarn func [ARBITRARY FUNCTIONAL TESTS COMMAND HERE]
For example,
console
$ yarn back less README.md
is equivalent to
console
$ cd backend
$ less README.md
$ cd ..
Run python manage.py within the backend directory:
console
$ yarn django [SUBCOMMAND] [OPTIONS]
yarn django is a shorthand for yarn back python manage.py. This command is useful for managing database migrations, among other things.
Manage the frontend package dependencies:
console
$ yarn fyarn (add|remove|upgrade|...) (PACKAGE ...) [OPTIONS]
Notes on Python package dependencies
Both the backend and the functional test suite are Python-based and package versions are pinned using pip-tools in both subprojects. For ease of development, you most likely want to use the same virtualenv for both and this is also what the bootstrap.py assumes.
This comes with a small catch: the subprojects each have their own separate requirements.txt. If you run pip-sync in one subproject, the dependencies of the other will be uninstalled. In order to avoid this, you run pip install -r requirements.txt instead. The yarn command does this correctly by default.
Another thing to be aware of, is that pip-compile takes the old contents of your requirements.txt into account when building the new version based on your requirements.in. You can use the following trick to keep the requirements in both projects aligned so the versions of common packages don't conflict:
```console $ yarn back pip-compile
append contents of backend/requirements.txt to functional-tests/requirements.txt
$ yarn func pip-compile ```
Development mode vs production mode
The purpose of development mode is to facilitate live development, as the name implies. The purpose of production mode is to simulate deployment conditions as closely as possible, in order to check whether everything still works under such conditions. A complete overview of the differences is given below.
| dimension | Development mode | Production mode |
| ----------------------------- | ------------------------------------------- | ------------------------------------------- |
| command | yarn start | yarn start-p |
| base address | http://localhost:8000 | http://localhost:4200 |
| backend server (Django) | in charge of everything | serves backend only |
| frontend server (angular-cli) | serves | watch and build |
| static files | served directly by Django's staticfiles app | collected by Django, served by gulp-connect |
| backend DEBUG setting | True | False |
| backend ALLOWED_HOSTS | - | restricted to localhost |
| frontend sourcemaps | yes | no |
| frontend optimization | no | yes |
Deployment
Both the backend and frontend applications have a section dedicated to deployment in their own READMEs. You should read these sections entirely before proceeding. All instructions in these sections still apply, though it is good to know that you can use the following shorthand commands from the integrated project root:
```console
collect static files of both backend and frontend, with overridden settings
$ yarn django collectstatic --settings SETTINGS --pythonpath path/to/SETTINGS.py ```
You should build the frontend before collecting all static files.
Owner
- Name: Centre for Digital Humanities
- Login: CentreForDigitalHumanities
- Kind: organization
- Email: cdh@uu.nl
- Location: Netherlands
- Website: https://cdh.uu.nl/
- Repositories: 39
- Profile: https://github.com/CentreForDigitalHumanities
Interdisciplinary centre for research and education in computational and data-driven methods in the humanities.
GitHub Events
Total
- Create event: 19
- Release event: 3
- Issues event: 20
- Watch event: 1
- Delete event: 17
- Member event: 1
- Issue comment event: 7
- Push event: 63
- Pull request review comment event: 17
- Pull request review event: 29
- Pull request event: 33
Last Year
- Create event: 19
- Release event: 3
- Issues event: 20
- Watch event: 1
- Delete event: 17
- Member event: 1
- Issue comment event: 7
- Push event: 63
- Pull request review comment event: 17
- Pull request review event: 29
- Pull request event: 33
Committers
Last synced: about 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| Xander Vertegaal | a****l@u****l | 246 |
| Ben | b****l@u****l | 30 |
| Xander Vertegaal | x****r@v****l | 15 |
| Meesch | m****t@u****l | 9 |
| dependabot[bot] | 4****] | 1 |
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 60
- Total pull requests: 48
- Average time to close issues: 3 months
- Average time to close pull requests: about 1 month
- Total issue authors: 2
- Total pull request authors: 3
- Average comments per issue: 0.52
- Average comments per pull request: 0.38
- Merged pull requests: 45
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 7
- Pull requests: 26
- Average time to close issues: 7 days
- Average time to close pull requests: about 2 months
- Issue authors: 1
- Pull request authors: 2
- Average comments per issue: 0.0
- Average comments per pull request: 0.15
- Merged pull requests: 23
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- XanderVertegaal (33)
- bbonf (1)
Pull Request Authors
- XanderVertegaal (50)
- bbonf (6)
- Meesch (3)
- dependabot[bot] (1)