holofood-database

HoloFood is a project investigating sustainable food production through hologenomics. This Django app is the data portal where samples and other datasets from the project are made publicly available.

https://github.com/ebi-metagenomics/holofood-database

Science Score: 62.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
✓
Institutional organization owner
Organization ebi-metagenomics has institutional domain (www.ebi.ac.uk)
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.6%) to scientific vocabulary

Keywords

metabolomics metagenomics omics python

Last synced: 6 months ago · JSON representation ·

Repository

HoloFood is a project investigating sustainable food production through hologenomics. This Django app is the data portal where samples and other datasets from the project are made publicly available.

Basic Info

Host: GitHub
Owner: EBI-Metagenomics
License: apache-2.0
Language: Python
Default Branch: main
Homepage: https://www.holofooddata.org
Size: 21.4 MB

Statistics

Stars: 1
Watchers: 6
Forks: 0
Open Issues: 1
Releases: 3

Topics

metabolomics metagenomics omics python

Created almost 4 years ago · Last pushed 6 months ago

Metadata Files

Readme License Citation

Holofood Data Portal / Database

The database, website, and API to present Holofood samples, and unify the datasets stored in supporting services.

Background

HoloFood is a consortium and project focussed on understanding the biomolecular and physiological processes triggered by incorporating feed additives and novel sustainable feeds in farmed animals.

This codebase is the public website and API for browsing the Samples and datasets created by the project, which are stored in publicly-accessible data repositories.

The website is built with Django.

Management commands are used to import a cache of Samples and their metadata from ENA and Biosamples.

There is a normal Django Admin panel as well.

Development

Install development tools (including pre-commit hooks to run Black code formatting). shell pip install -r requirements-dev.txt pre-commit install

Code style

Use Black. Use djLint. These are both configured if you install the pre-commit tools as above.

To manually run them: black . and djlint . --extension=html --lint (or --reformat).

Fake data

Once a database is created and migrated (see below), there is a management command to fill the database with some minimal fake data for development ease. shell python manage.py generate_dev_data

Testing

```shell

You most likely need (see below):

brew install chromedriver

pip install -r requirements-dev.txt pytest ```

Chrome Driver for web interface tests

The web interface tests need the Chrome browser and chromedriver to communicate with the browser. To install chromedriver on a Mac or Linux machine, use the Homebrew formula or any other sensible installation method. On GitHub Actions, a "Setup Chromedriver" action step exists for this. On a Mac, you’ll probably get Gate Keeper permissions problems running chromedriver; so: shell which chromedriver # probably: /usr/local/bin/chromedriver spctl --add /usr/local/bin/chromedriver If this doesn't work, open /usr/local/bin, then find chromedriver in Finder, right click, Open.

Configuration

We use Pydantic to formalise Config files. Configuration is split between: - config/local.env as a convenience for env vars. - config/data_config.json contains what are expected to be somewhat change-able config options. - config/secrets.env is needed during development, whilst some data are private.

Use

shell source config/secrets.env python manage.py migrate python manage.py fetch_project_samples python manage.py refresh_external_data python manage.py runserver TODO: update README once importer supports hierarchical samples

Refreshing external data

refresh_external_data has several options, for fetching data for some or all samples/projects, and for fetching data from some or all supporting APIs.

To (re)fetch sample metadata from ENA and Biosamples only, for a specific sample:

python manage.py refresh_external_data --samples SAMEA7687881 --types METADATA

or to refresh metagenomic data for all sample in a project:

python manage.py refresh_external_data --projects PRJEB39110 --types METAGENOMIC

(Note that it is much more efficient to fetch metagenomics data on a per-project basis than a per-sample basis – this is because it uses many fewer API calls and the MGnify API has strict rate-limits.)

Or to (re)fetch sample metadata and metagenomic existence data from ENA, Biosamples, and MGnify, for samples with accessions in a certain range:

python manage.py refresh_external_data --sample_filters accession__gt=SAMEA7687880 accession__lt=SAMEA7687900 --types METADATA METAGENOMIC

--sample_filters... is expected to be useful in cases where refreshing a large number of samples fails, and you therefore need to retry from a certain accession onwards. (Because samples are iterated through in ascending order of accession.)

Import catalogues

MAG and Viral catalogues can be imported using management commands. Each MAG catalogue needs to relate to a single public MAG catalogue on MGnify; in that the species representative of each MAG must exist on MGnify. Each Viral Catalogue must relate (biome-wise) to a single MAG catalogue on the portal. In other words, the order of data insertion needs to be:

public MAG catalogue into MGnify
MAG catalogue into this data portal
Viral catalogue into this data portal

The uploaders expect TSV files (and a folder of GFFs in the case of viral catalogue). For the format / column naming, inspect the files in holofood/tests/static_fixtures.

Run python manage.py import_mag_catalogue or import_viral_catalogue for help, but essentially:

shell python manage.py import_mag_catalogue hf-salmon-mags-v1 ./salmon.tsv "HoloFood Salmon V1" mgnify-salmon-v1-0 "Some:Biome:String" salmon python manage.py import_viral_catalogue hf-salmon-vir-v1 './salmon_viral_cat.tsv' './salmon_viral_annotations.gff' --title="HoloFood Salmon Viruses V1" --related_mag_catalogue_id=hf-salmon-mags-v1 The import_viral_catalogue command can be run multiple times to populate the catalogue with several TSV/GFF combinations if needed – fragments are appended to the existing catalogue if it already exists.

Adding users

Superusers can do everything in the admin panel, including managing other users. shell python manage.py createsuperuser Superusers can go to e.g. http://localhost:8000/admin and create other users there.

"Staff" users can access the admin panel, but won't by default have permissions to do anything there. Add them to the "authors" user group to give them permissions to author "Analysis Summary" documents via the admin panel.

Deployment

AWS Elastic Beanstalk

The Django application can be deployed to AWS Cloud via Elastic Beanstalk, a scalable web app deployment service.

There is an Elastic Beanstalk configuration in .ebextensions/django.config. This config will migrate the db on deployment, as well as compile scss styles and collect static files.

Run pip install -r requirements-aws.txt to install the CLI tool for EB.

Create an Elastic Beanstalk environment. You need a Python 3.8 on Amazon Linux 2 platform, and an RDS Postgres database (a db.t4g.micro instance is fine).

Run eb use <whatever-the-name-of-your-elastic-beanstalk-environment-is>, e.g. eb use holofood-data-portal-dev-env.

Deploy the latest git commit with eb deploy

To log into the lead instance e.g. to run management commands: ```shell eb ssh cd /var/app/current source ../venv/*/bin/activate export $(/opt/elasticbeanstalk/bin/get-config --output YAML environment | sed -r 's/: /=/' | xargs)

^ this sources the env vars

python manage.py ... ```

Secret environment variables can be configured in the AWS EB Console.

EBI-specific info

If using the EBI AWS cloud, via an SSO login. Refer to the Confluence page on AWS SSO for the SSO parameters. Use aws configure sso --profile eb-cli to sign in. Occasionally you’ll need aws sso login --profile eb-cli to get a new token.

Kubernetes

Local

Use minikube or kind.
Make a secrets .env file at k8s/secrets-k8s.env with e.g. DJANGO_SECRET_KEY.
kubectl create secret generic holofood-secret --from-env-file=k8s/secrets-k8s.env
minikube image build -t holofood -f k8s/Dockerfile .
kubectl apply -f k8s
kubectl get pods -A and find the pod ID for holofood-app-...
kubectl exec --stdin --tty holofood-app-......... -- /bin/bash
python manage.py migrate will make the /app/data/db.sqlite3
minikube service holofood

EBI WebProd k8s

EBI operates a two-clusters-per-service policy (primary in "HL" data centre a.k.a. "HH" in some places, fallback in "HX"). The app needs to be deployed to both. There are stub configs in k8s-hl and k8s-hx for these.
K8s cluster configurations are provided as YML files by EBI's webprod team. You need these to deploy.
Build the image (with some customisation for EBI's NFS filesystem): docker build -f k8s-hl/Dockerfile -t quay.io/microbiome-informatics/holofood-data-portal:ebi-k8s-hl .
docker push quay.io/microbiome-informatics/holofood-data-portal:ebi-k8s-hl (you need appropriate Quay credentials for this).
Make a secrets .env file at k8s-hl/secrets-k8s.env with e.g. DJANGO_SECRET_KEY=.....
- Push it with e.g.: kubectl --kubeconfig ~/webprod-configs/mgnify-k8s-team-admin-hh.conf --namespace holofood-hl-prod create secret generic holofood-secret --from-env-file=k8s-hl/secrets-k8s.env
Get authentication credentials for quay.io (the built image is private). You can get a Kubernetes secrets yaml file from your Quay.io user settings, in the "CLI Password" section.
- Download the secrets yaml and name the secret name: quay-pull-secret in the metadata section. Put this into the k8s-hl folder.
Deploy: kubectl --kubeconfig ~/webprod-configs/mgnify-k8s-team-admin-hh.conf apply -f k8s-hl. If the namespace doesn't exist, you might need to apply twice.

Documentation

There is an Quarto based documentation pack in the docs/ folder, configured by docs/_quarto.yml.

This uses a mixture of Markdown and rendered Jupyter Notebooks. (This choice allows a code-based tutorial to be included in the docs as well as run by users.)

To develop documentation:

To make some small text changes

Just edit the .qmd (essentially just Markdown) files, and commit to GitHub. GitHub Actions will render your changes to the GitHub Pages site. (Because there is a .github/workflows/docs.yml action to do this.)

To preview changes, or change a Jupyter Notebook

Install Quarto on your system.

shell pip install -r requirements-docs.txt jupyter lab and edit the .qmd or .ipynb files in docs/.

Run shell quarto preview docs to open a live-preview of the documentation site that updates as you save changes.

If you add a new file (page), add it to the navigation by editing the website.sidebar.contents list in docs/_quarto.yml.

Note: any Jupyter Notebooks will be rendered to the documentation site exactly as you leave them. This is because Quarto defaults to not executing Jupyter Notebooks during rendering which is a good thing. If you've added any executable Quarto docs other than Jupyter Notebooks (like an R script...) run quarto render docs and commit the docs/_freeze/ dir.

Owner

Name: MGnify
Login: EBI-Metagenomics
Kind: organization
Email: metagenomics-help@ebi.ac.uk
Location: Genome Campus, UK

Website: https://www.ebi.ac.uk/metagenomics/
Twitter: MGnifyDB
Repositories: 153
Profile: https://github.com/EBI-Metagenomics

MGnify (formerly known as EBImetagenomics) is a free resource for the assembly, analysis, archiving and browsing all types of microbiome derived sequence data

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: HoloFood Data Portal
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Alexander B
    family-names: Rogers
    email: sandyr@ebi.ac.uk
    affiliation: EMBL-EBI
    orcid: 'https://orcid.org/0000-0002-4283-6135'
identifiers:
  - type: doi
    value: 10.1093/database/baae112
    description: Journal article describing the software
  - type: doi
    value: 10.5281/zenodo.12698157
    description: Software source code
repository-code: 'https://github.com/EBI-Metagenomics/holofood-database/'
url: 'https://www.holofooddata.org'
abstract: >-
  The HoloFood Data Portal is a django web application
  presenting a unified index of hologenomic, multi-omic data
  from the HoloFood project, which studied the effects of
  feed additives on chicken and salmon. The data themselves
  are deposited in existing open science repositories. This
  application makes them collectively searchable,
  interlinked, and retrievable in a schema that is aware of
  the project's experimental design.
keywords:
  - python
  - django
  - multiomics
  - holobiont
license: Apache-2.0

GitHub Events

Total

Push event: 4
Pull request event: 2

Last Year

Push event: 4
Pull request event: 2

Dependencies

requirements-aws.txt pypi

awsebcli ==3.20.3

requirements-dev.txt pypi

black ==22.6.0 development
django-debug-toolbar ==3.5.0 development
djlint ==1.7.0 development
pre-commit ==2.19.0 development
pytest ==7.1.2 development
pytest-cov ==3.0.0 development
pytest-django ==4.5.2 development
pytest-env ==0.6.2 development
pytest-html ==3.1.1 development
requests-mock ==1.9.3 development
selenium ==4.3.0 development
selenium-wire ==4.6.5 development

requirements-docs.txt pypi

jupyterlab ==3.4.4
matplotlib ==3.5.2
pandas ==1.4.3

requirements.txt pypi

django ==4.1
django-admin-inline-paginator ==0.3.0
django-compressor ==4.1
django-filter ==22.1
django-ninja ==0.17.0
django-sass-processor ==1.2.1
gunicorn ==20.1.0
libsass ==0.21.0
martor ==1.6.13
psycopg2-binary ==2.9.3
pydantic ==1.9.1
python-dotenv ==0.20.0
requests ==2.28.1
xsdata ==22.7

.github/workflows/docs.yml actions

actions/checkout v2 composite
quarto-dev/quarto-actions/publish v2 composite
quarto-dev/quarto-actions/setup v2 composite

.github/workflows/test.yml actions

actions/checkout v3 composite
actions/setup-python v3 composite
codecov/codecov-action v2 composite
nanasess/setup-chromedriver master composite

k8s/Dockerfile docker

python 3.11 build

k8s-hl/Dockerfile docker

python 3.11 build