lwmdb

A django-based library for managing the Living with Machines newspapers metadata database schema

https://github.com/living-with-machines/lwmdb

Science Score: 62.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
    Organization living-with-machines has institutional domain (livingwithmachines.ac.uk)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (17.1%) to scientific vocabulary

Keywords

hut23
Last synced: 6 months ago · JSON representation ·

Repository

A django-based library for managing the Living with Machines newspapers metadata database schema

Basic Info
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 0
  • Open Issues: 45
  • Releases: 3
Topics
hut23
Created about 4 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog Contributing License Code of conduct Citation

README.md

Living With Machines Database: lmwdb

DOI mit-license CI coverage pre-commit.ci status Code style: black

All Contributors <!-- ALL-CONTRIBUTORS-BADGE:END --> <!-- prettier-ignore-end -->

A package containing database access to the Living with Machines newspaper collection’s metadata, designed to facilitate quicker and easier humanities research on heterogeneous and complex newspaper data.

Background on the development of the database is available in Metadata Enrichment in the Living with Machines Project: User-focused Collaborative Database Development in a Digital Humanities Context from the Digital Humanities 2023 book of abstracts.

Installation

Install Docker

It is possible to run this code without Docker, but at present we are only maintaining it via Docker Containers so we highly recommend installing Docker to run and/or test this code locally. Instructions are available for most operating systems here: https://docs.docker.com/desktop/

Clone the repository

Clone the repository via either

console git clone https://github.com:living-with-machines/lwmdb.git

or (using a GitHub ssh key)

console git clone git@github.com:living-with-machines/lwmdb.git

followed by:

console cd lwmdb

Local deploy of documentation

If you have a local install of poetry you can run the documentation locally without using docker:

console poetry install poetry run mkdocs serve

Running locally

console docker compose -f local.yml up --build

Note: this uses the .envs/local file provided in the repo. This must not be used in production, it is simply for local development and to ease demonstrating what is required for .envs/production, which must be generated separately for deploying via production.yml.

It will take some time to download a set of docker images required to run locally, after which it should attempt to start the server in the django container. If successful, the console should print logs resembling

console lwmdb_local_django | WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead. lwmdb_local_django | * Running on all addresses (0.0.0.0) lwmdb_local_django | * Running on http://127.0.0.1:8000 lwmdb_local_django | * Running on http://172.20.0.4:8000 lwmdb_local_django | Press CTRL+C to quit lwmdb_local_django | * Restarting with stat lwmdb_local_django | Performing system checks... lwmdb_local_django | lwmdb_local_django | System check identified no issues (0 silenced). lwmdb_local_django | lwmdb_local_django | Django version 4.2.1, using settings 'lwmdb.settings' lwmdb_local_django | Development server is running at http://0.0.0.0:8000/ lwmdb_local_django | Using the Werkzeug debugger (http://werkzeug.pocoo.org/) lwmdb_local_django | Quit the server with CONTROL-C. lwmdb_local_django | * Debugger is active! lwmdb_local_django | * Debugger PIN: 139-826-693

Indicating it's up and running. You should then be able to go to http://127.0.0.1:8000 in your local browser and see a start page.

To stop the app call the down command:

console docker compose -f local.yml down

Importing data

If a previous version of the database is available as either json fixtures or raw sql via a pg_dump (or similar) command.

json import

json fixtures need to be placed in a fixtures folder in your local checkout:

console cd lwmdb mkdir fixtures cp DataProvider-1.json Ingest-1.json Item-1.json Newspaper-1.json Digitisation-1.json Issue-1.json Item-2.json fixtures/

The files can then be imported via

console docker compose -f local.yml exec django /app/manage.py loaddata fixtures/Newspaper-1.json docker compose -f local.yml exec django /app/manage.py loaddata fixtures/Issue-1.json docker compose -f local.yml exec django /app/manage.py loaddata fixtures/Item-2.json ...

:warning: Note the import order is important, specifically: Newspaper, Issue and any other data json files prior to Item json.

Importing a postgres database

Importing from json can be very slow. If provided a postgres data file, it is possible to import that directly. First copy the database file(s) to a backups folder on the postgres instance (assuming you've run the build command)

console docker cp backups $(docker compose -f local.yml ps -q postgres):/backups

Next make sure the app is shut down, then start up with only the postgres container running:

console docker compose -f local.yml down docker compose -f local.yml up postgres

Then run the restore command with the filename of the backup. By default backup filenames indicates when the backup was made and are compressed (using gzip compression in the example below backup_2023_04_03T07_22_10.sql.gz ):

:warning: There is a chance the default docker size allocated is not big enough for a full version of the dataset (especially if running on a desktop). If so, you may need to increase the allocated disk space. For example, see Docker Mac FAQs for instructions to increase available disk space.

console docker compose -f local.yml exec postgres restore backup_2023_04_03T07_22_10.sql.gz

:warning: If the version of the database you are loading is not compatible with the current version of the python package, this can cause significant errors.

Querying the database

Jupyter Notebook

In order to run the Django framework inside a notebook, open another terminal window once you have it running via docker as described above and run

console docker compose -f local.yml exec django /app/manage.py shell_plus --notebook

This should launch a normal Jupyter Notebook in your browser window where you can create any notebooks and access the database in different ways.

Important: Before importing any models and working with the database data, you will want to run the import django_initialiser in a cell, which will set up all the dependencies needed.

Note: For some users we provide two jupyter notebooks:

  • getting-started.ipynb
  • explore-newspapers.ipynb

Both will give some overview of how one can access the database’s information and what one can do with it. They only scratch the surface of what is possible, of course, but will be a good entry point for someone who wants to orient themselves toward the database and Django database querying.

Upgrade development version

In order to upgrade the current development version that you have, make sure that you have synchronised the repository to your local drive:

Step 1: git pull

Step 2: docker compose -f local.yml up --build

Run on a server

To run in production, an .envs/production ENV file must be created. This must befilled in with new passwords for each key rather than a copy of .envs/local. The same keys set in .envs/local are needed, as well as the follwing two:

  • TRAEFIK_EMAIL="email.register.for.traefik.account@test.com"
  • HOST_URL="host.for.lwmdb.deploy.org"

A domain name (in this example "host.for.lwmdb.deploy.org) must be registered for https (encripyted) usage, and a TLS certificate is needed. See traefik docs for details.

Contributors

Kalle Westerling
Kalle Westerling

💻 🤔 📖
griff-rees
griff-rees

💻 🤔 🧑‍🏫 🚧 📖
Aoife Hughes
Aoife Hughes

💻
Tim Hobson
Tim Hobson

💻 🤔
Nilo Pedrazzini
Nilo Pedrazzini

💻 🤔
Christina Last
Christina Last

💻 🤔
claireaustin01
claireaustin01

🤔
Mia
Mia

💻 🤔 📖
Andy Smith
Andy Smith

💻 🤔
Katie McDonough
Katie McDonough

💻 🤔
Mariona
Mariona

💻 🤔
Kaspar Beelen
Kaspar Beelen

💻 🤔
David Beavan
David Beavan

💻 🤔

Owner

  • Name: Living with Machines
  • Login: Living-with-machines
  • Kind: organization

A radical collaboration between computational linguists, curators, data scientists, software engineers, geographers and historians

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Living With Machines Database
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Griffith
    family-names: ' Rees'
    orcid: 'https://orcid.org/0000-0001-7281-4116'
    affiliation: The Alan Turing Institute
  - given-names: 'Adam '
    family-names: Farquhar
    orcid: 'https://orcid.org/0000-0001-5331-6592'
  - given-names: Amy
    family-names: Krause
    orcid: 'https://orcid.org/0000-0002-6173-6738'
  - given-names: Andre
    family-names: Piza
    orcid: 'https://orcid.org/0000-0003-1563-9125'
  - given-names: Andrew
    family-names: Smith
    orcid: 'https://orcid.org/0000-0002-4465-2284'
    affiliation: The Alan Turing Institute
  - given-names: Barbara
    family-names: McGillivray
    orcid: 'https://orcid.org/0000-0003-3426-8200'
  - given-names: Christina
    family-names: Last
    affiliation: The Alan Turing Institute
  - given-names: Daniel
    name-particle: van
    family-names: Strien
    orcid: 'https://orcid.org/0000-0003-1684-6556'
  - given-names: Daniel
    family-names: Wilson
    orcid: 'https://orcid.org/0000-0001-6886-775X'
  - given-names: David
    family-names: Beavan
    orcid: 'https://orcid.org/0000-0002-0347-6659'
    affiliation: The Alan Turing Institute
  - given-names: Emma
    family-names: Griffin
    orcid: 'https://orcid.org/0000-0002-0245-6082'
  - given-names: Giovanni
    family-names: Colavizza
    orcid: 'https://orcid.org/0000-0002-9806-084X'
    affiliation: 0000-0002-9806-084X
  - given-names: James
    orcid: 'https://orcid.org/0000-0001-6993-0319'
    affiliation: The Alan Turing Institute
    family-names: Hetherington
  - given-names: Jon
    family-names: Lawrence
    orcid: 'https://orcid.org/0000-0001-6561-6381'
  - given-names: Kalle
    family-names: Westerling
    orcid: 'https://orcid.org/0000-0002-2014-332X'
    affiliation: The Alan Turing Institute
  - given-names: Kaspar
    name-particle: von
    family-names: Beelen
    orcid: 'https://orcid.org/0000-0001-7331-1174'
  - given-names: Kasra
    family-names: Hosseini
    orcid: 'https://orcid.org/0000-0003-4396-6019'
    affiliation: The Alan Turing Institute
  - given-names: Mariona
    family-names: Coll-Ardanuy
    orcid: 'https://orcid.org/0000-0001-8455-7196'
  - given-names: Mia
    family-names: Ridge
    orcid: 'https://orcid.org/0000-0003-3733-8120'
    affiliation: The British Library
  - given-names: Nilo
    family-names: Pedrazzini
    orcid: 'https://orcid.org/0000-0003-3757-2961'
  - given-names: Olivia
    family-names: Vane
    orcid: 'https://orcid.org/0000-0002-3777-4910'
  - given-names: Ruth
    family-names: Ahnert
    affiliation: The Alan Turing Institute
    orcid: 'https://orcid.org/0000-0002-8503-1580'
  - given-names: Tim
    family-names: Hobson
    orcid: 'https://orcid.org/0000-0002-5653-527X'
    affiliation: The Alan Turing Institute
  - given-names: Aoife
    family-names: Hughes
    orcid: 'https://orcid.org/0000-0002-4572-5828'
    affiliation: The Alan Turing Institute
  - given-names: Alan
    family-names: Wilson
    affiliation: The Alan Turing Institute
  - given-names: Bowan
    family-names: Zhang
    orcid: 'https://orcid.org/0000-0002-5562-609X'
    affiliation: The Alan Turing Institute and King's College London
  - given-names: 'Giorgia '
    family-names: Tolfo
    affiliation: The British Library
    orcid: 'https://orcid.org/0000-0002-2821-4049'
  - given-names: Guy
    family-names: Solomon
    orcid: 'https://orcid.org/0000-0002-4394-1498'
    affiliation: The Alan Turing Institute
  - given-names: Joshua
    orcid: 'https://orcid.org/0000-0002-4017-2777'
    family-names: Rhodes
  - given-names: Katherine
    family-names: McDonough
    affiliation: The Alan Turing Institute and Lancaster University
    orcid: 'https://orcid.org/0000-0001-7506-1025'
  - given-names: Léllé
    family-names: Demertzi
    affiliation: The Alan Turing Institute
  - given-names: Lucy
    family-names: Havens
    affiliation: The University of Edinburgh
  - given-names: Luke
    family-names: Hare
    affiliation: 'The Alan Turing Institute '
  - given-names: Rosie
    family-names: Wood
    affiliation: The Alan Turing Institute
  - given-names: Sarah
    family-names: Gibson
    orcid: 'https://orcid.org/0000-0003-0356-2765'
    affiliation: The Alan Turing Institute
  - given-names: 'Sherman '
    family-names: Lo
    affiliation: Queen Mary University of London
identifiers:
  - type: doi
    value: 10.5281/zenodo.8208204
repository-code: 'https://github.com/Living-with-machines/lwmdb'
url: 'https://livingwithmachines.ac.uk/'
license: MIT

GitHub Events

Total
  • Delete event: 1
  • Push event: 28
Last Year
  • Delete event: 1
  • Push event: 28

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 63
  • Total pull requests: 38
  • Average time to close issues: 2 months
  • Average time to close pull requests: 7 days
  • Total issue authors: 7
  • Total pull request authors: 9
  • Average comments per issue: 3.49
  • Average comments per pull request: 1.11
  • Merged pull requests: 32
  • Bot issues: 0
  • Bot pull requests: 13
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • kallewesterling (24)
  • griff-rees (22)
  • ChristinaLast (9)
  • mialondon (4)
  • mcollardanuy (2)
  • rwood-97 (1)
  • claireaustin01 (1)
Pull Request Authors
  • dependabot[bot] (17)
  • kallewesterling (14)
  • ChristinaLast (5)
  • pre-commit-ci[bot] (3)
  • kasparvonbeelen (2)
  • griff-rees (1)
  • npedrazzini (1)
  • mcollardanuy (1)
  • thobson88 (1)
Top Labels
Issue Labels
documentation (8) enhancement (5) high-priority (4) bug (4) blocked (1) help wanted (1) deploy (1)
Pull Request Labels
dependencies (17)

Dependencies

.github/workflows/ci.yml actions
  • actions/cache v3 composite
  • actions/checkout v3 composite
  • actions/checkout main composite
  • actions/download-artifact v3 composite
  • actions/setup-python main composite
  • actions/setup-python v4 composite
  • actions/upload-artifact v3 composite
  • pre-commit-ci/lite-action v1.0.1 composite
  • pre-commit/action main composite
compose/local/django/Dockerfile docker
  • python ${PYTHON_VERSION} build
  • python latest build
compose/local/docs/Dockerfile docker
  • python ${PYTHON_VERSION} build
  • python latest build
compose/local/node/Dockerfile docker
  • node 19-bullseye-slim build
compose/production/django/Dockerfile docker
  • node 16-bullseye-slim build
  • python ${PYTHON_VERSION} build
  • python latest build
compose/production/postgres/Dockerfile docker
  • postgis/postgis 15-3.3 build
compose/production/traefik/Dockerfile docker
  • traefik v2.9.10 build
package.json npm
  • @popperjs/core ^2.10.2 development
  • autoprefixer ^10.4.0 development
  • bootstrap ^5.1.3 development
  • browser-sync ^2.27.7 development
  • cssnano ^5.0.11 development
  • gulp ^4.0.2 development
  • gulp-concat ^2.6.1 development
  • gulp-imagemin ^7.1.0 development
  • gulp-plumber ^1.2.1 development
  • gulp-postcss ^9.0.1 development
  • gulp-rename ^2.0.0 development
  • gulp-sass ^5.0.0 development
  • gulp-uglify-es ^3.0.0 development
  • pixrem ^5.0.0 development
  • postcss ^8.3.11 development
  • sass ^1.43.4 development
poetry.lock pypi
  • 196 dependencies
pyproject.toml pypi
  • GDAL 3.5.3
  • azure-storage-blob ^12.17.0
  • colorama ^0.4.6
  • django ^4.2.4
  • django-anymail ^9.2
  • django-debug-toolbar ^4.1.0
  • django-extensions ^3.2.3
  • django-pandas ^0.6.6
  • gunicorn ^20.1.0
  • ipython ^8.14.0
  • jupyter ^1.0.0
  • openpyxl ^3.1.2
  • pandas ^2.0.3
  • psutil 5.9.4
  • psycopg ^3.1.9
  • pyopenssl ^23.2.0
  • python >=3.11,<3.12
  • python-dotenv ^1.0.0
  • seaborn ^0.12.2
  • tqdm ^4.65.0
  • uvicorn ^0.22.0
  • validators ^0.20.0
  • whitenoise ^6.5.0