https://github.com/centrefordigitalhumanities/vulcan-parseport

Fork of VULCAN (Visualizations for Understanding Language Corpora and model predictioNs) for use in the ParsePort project

Last synced: 9 months ago · JSON representation

Repository

Fork of VULCAN (Visualizations for Understanding Language Corpora and model predictioNs) for use in the ParsePort project

Basic Info

Host: GitHub
Owner: CentreForDigitalHumanities
License: apache-2.0
Language: Python
Default Branch: main
Size: 6.31 MB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 5
Releases: 0

Fork of jgroschwitz/vulcan

Created over 1 year ago · Last pushed about 1 year ago

https://github.com/CentreForDigitalHumanities/vulcan-parseport/blob/main/

# VULCAN -- ParsePort

This is a fork of the original repository of VULCAN (Visualizations for Understanding Language Corpora and model predictioNs), developed and maintained by [dr. Jonas Groschwitz (jgroschwitz)](https://github.com/jgroschwitz). Please refer to the [original repo](https://github.com/jgroschwitz/vulcan) for the original documentation, including setup and usage instructions.

This fork adapts VULCAN to be used within the ParsePort project as a visualization tool for syntactic parses made by the Minimalist Parser developed by [dr. Meaghan Fowlie (megodoonch)](https://github.com/megodoonch) at Utrecht University. More documentation on ParsePort, developed at the Centre For Digital Humanities at Utrecht University, can be found [here](https://github.com/CentreForDigitalHumanities/parseport).

This project contains the code for a web-based visualization tool, run on a Flask webserver. The server accepts both regular HTTP requests and WebSocket connections. In addition, there is a small SQLite database to keep track of individual parse results.

## Aim and functionality

The server is designed to receive parse results from the Minimalist Parser and turn them into so-called 'Layout' objects that can be sent to a client. In the browser, these Layouts are used to build navigable trees representing the parse result in the shape of syntactic tree structures.

The server is designed to be run in a Docker container but can also be run locally for development purposes, see below for instructions.

The client-side files for Vulcan are also available in this repository (in `/vulcan/client`) for reference purposes only. They are not served by the Flask server. In the context of the ParsePort project, these files are served by the NGINX server that is part of the ParsePort container network.

## API architecture

### HTTP

The server's main HTTP endpoint is `/`, which is used to register new parse results and create new Layout objects. It only accepts POST requests with a JSON object of the following form.

```json
{
"id": "unique-identifier-for-parse",
"parse_data": "base64-encoded-parse-result",
}
```

The server decodes the parse result data, turns it into a Layout object and stores it in a SQLite database together with the ID and a timestamp. The server then sends a JSON response of the shape `{"ok": True}` with a status code of 200.

In addition, HTTP GET requests to the `/status/` endpoint will return `{"ok": "true"}` if the server is running and ready to receive connections.

### WebSocket

As soon as a user downloads and opens the Vulcan client-side HTML + JS in their browser, the client will establish a WebSocket connection with the server. All communications go through the `/socket.io/` endpoint, with an optional ID route parameter used to identify Layouts in the SQLite database. If an ID is provided, the server will look up the corresponding Layout and send it back to the client. If no ID is provided, the server will instead return a standard Layout object, based on a pre-parsed corpus containing sentences from the Wall Street Journal.

The server handles the following WebSocket events:

- `connect`: tells the server to establish a connection. The server will return a stored Layout (if an ID is provided) or the standard Layout to the client, where it can be rendered on screen.
- `disconnect`: tells the server to close the connection.
- `instance_requested`: provides a page number, or index, to the server. The server will return a Layout object based on the sentence at that index in the pre-parsed corpus.
- `perform_search`: provides search parameters. The server then uses these parameters to perform a search on the standard Layout. The resulting new Layout object is stored in the database alongside a unique identifier, which is sent back to the client. The client uses the identifier to construct a new URL where it can find the search result. This URL can be shared with other users, who will then see the same search result.
- `clear_search`: retrieves the base Layout for the current Layout. This is used to clear the search results and return the standard Layout.

## Layout cleanup

Layouts that are stored in the database will be marked for cleanup if they have not been consulted for 90 days, as measured by the timestamp associated with the Layout in the database. Whenever Layout is requested, its timestamp is updated to the current time.

It is recommended to periodically clean up the database by running `remove_old_layouts.py`. This script will remove all Layouts that have been marked for cleanup. It is recommended to run this script periodically. The file `Crontab` can be used to schedule this script to run automatically on Linux-based machines that host the ParsePort Docker network.

## Running a local development server

The server can be run in three different ways:

1. Locally, using Flask's built-in development server.
2. In a standalone Docker container.
3. As part of the ParsePort container network, using Docker Compose.

To run the server locally, you need to have Python 3.12 or higher installed (lower versions may work but have not been tested).

### Running the server locally

1. Recommended: set up a virtual environment. You can do this by running the following commands in the root directory of the project:

```bash
python3 -m venv venv
source venv/bin/activate
```

On Windows, you can use the following commands:

```bash
python -m venv venv
venv\Scripts\activate
```

This will create a virtual environment in the `venv` directory and activate it. You can deactivate the virtual environment by running `deactivate`.

2. Install the required dependencies. You can do this by running the following command:

```bash
pip install -r requirements.txt
```

3. Generate a secret key and set it as the value of the VULCAN_SECRET_KEY environment variable. How this is done differs per operating system.

On Linux-based systems:
```bash
export VULCAN_SECRET_KEY=''
```

In PowerShell on Windows:
```powershell
$Env:VULCAN_SECRET_KEY=''
```

4. Start the development server by running the following command in the `/app` folder:

```bash
flask run --host 0.0.0.0
```

This will start the server on `http://localhost:5000`. Visit `http://localhost:5000/status/` to check if the server is running.

### Running the server in a Docker container

The server can be run in a Docker container in two ways, either as a standalone container or as part of the ParsePort container network. This requires Docker to be installed on your machine.

#### Running Vulcan-ParsePort in a standalone Docker container

The project expects an `.env` file in the root directory of the project. This file should contain at least the following line. (Consult `.env.example` for an example.)

```properties
VULCAN_SECRET_KEY='your-secret-key-here'
```

Optionally, you may add:

```properties
# Starts the server in debug mode.
FLASK_DEBUG=1

# Specifies the port on which the application will run (default is 32771).
VULCAN_PORT=your-port-here
```

Then, build and run your container using the following commands:

```bash
docker build -t vulcan-parseport .
docker run -d -p 5000:32771 --env-file .env --name vulcan-parseport vulcan-parseport
```

If you specified a different port in the `.env` file, replace `32771` with that port number.

**Tip:** Add `-d` to run the container in detached mode and keep your terminal clean.

**Tip:** If you are running the Flask server in debug mode (with live reloading), adding `-v .:/app:rw` to the `docker run` command in Step 6 will mount the current directory to the container. Upon making changes to the code, the server will automatically reload.

The server should now be running and reachable on `http://localhost:5000`.

#### Running Vulcan-ParsePort within the ParsePort container network

No `.env` file is needed in this case, as the ParsePort container network will provide the necessary environment variables. Please consult the [ParsePort documentation](https://github.com/CentreForDigitalHumanities/parseport) for more information.

Owner

Name: Centre for Digital Humanities
Login: CentreForDigitalHumanities
Kind: organization
Email: cdh@uu.nl
Location: Netherlands

Website: https://cdh.uu.nl/
Repositories: 39
Profile: https://github.com/CentreForDigitalHumanities

Interdisciplinary centre for research and education in computational and data-driven methods in the humanities.

GitHub Events

Total

Create event: 6
Release event: 1
Issues event: 4
Delete event: 4
Issue comment event: 4
Member event: 1
Push event: 16
Pull request review comment event: 4
Pull request review event: 8
Pull request event: 7

Last Year

Create event: 6
Release event: 1
Issues event: 4
Delete event: 4
Issue comment event: 4
Member event: 1
Push event: 16
Pull request review comment event: 4
Pull request review event: 8
Pull request event: 7

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 1
Total pull requests: 4
Average time to close issues: N/A
Average time to close pull requests: 3 months
Total issue authors: 1
Total pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 0.0
Merged pull requests: 3
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 4
Average time to close issues: N/A
Average time to close pull requests: 3 months
Issue authors: 1
Pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 0.0
Merged pull requests: 3
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

XanderVertegaal (4)

Pull Request Authors

XanderVertegaal (4)

Top Labels

Issue Labels

bug (1) help wanted (1) data (1) documentation (1) enhancement (1)

Pull Request Labels

enhancement (2) documentation (1)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/centrefordigitalhumanities/vulcan-parseport

Repository

Basic Info

Statistics

https://github.com/CentreForDigitalHumanities/vulcan-parseport/blob/main/

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels