holofood-database
HoloFood is a project investigating sustainable food production through hologenomics. This Django app is the data portal where samples and other datasets from the project are made publicly available.
Science Score: 62.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
✓Institutional organization owner
Organization ebi-metagenomics has institutional domain (www.ebi.ac.uk) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.6%) to scientific vocabulary
Keywords
Repository
HoloFood is a project investigating sustainable food production through hologenomics. This Django app is the data portal where samples and other datasets from the project are made publicly available.
Basic Info
- Host: GitHub
- Owner: EBI-Metagenomics
- License: apache-2.0
- Language: Python
- Default Branch: main
- Homepage: https://www.holofooddata.org
- Size: 21.4 MB
Statistics
- Stars: 1
- Watchers: 6
- Forks: 0
- Open Issues: 1
- Releases: 3
Topics
Metadata Files
README.md
Holofood Data Portal / Database
The database, website, and API to present Holofood samples, and unify the datasets stored in supporting services.
Background
HoloFood is a consortium and project focussed on understanding the biomolecular and physiological processes triggered by incorporating feed additives and novel sustainable feeds in farmed animals.
This codebase is the public website and API for browsing the Samples and datasets created by the project, which are stored in publicly-accessible data repositories.
The website is built with Django.
Management commands are used to import a cache of Samples and their metadata from ENA and Biosamples.
There is a normal Django Admin panel as well.
Development
Install development tools (including pre-commit hooks to run Black code formatting).
shell
pip install -r requirements-dev.txt
pre-commit install
Code style
Use Black. Use djLint. These are both configured if you install the pre-commit tools as above.
To manually run them:
black . and djlint . --extension=html --lint (or --reformat).
Fake data
Once a database is created and migrated (see below), there is a management command to fill the database
with some minimal fake data for development ease.
shell
python manage.py generate_dev_data
Testing
```shell
You most likely need (see below):
brew install chromedriver
pip install -r requirements-dev.txt pytest ```
Chrome Driver for web interface tests
The web interface tests need the Chrome browser and chromedriver to communicate with the browser.
To install chromedriver on a Mac or Linux machine, use the Homebrew formula
or any other sensible installation method. On GitHub Actions, a "Setup Chromedriver" action step exists for this.
On a Mac, you’ll probably get Gate Keeper permissions problems running chromedriver; so:
shell
which chromedriver # probably: /usr/local/bin/chromedriver
spctl --add /usr/local/bin/chromedriver
If this doesn't work, open /usr/local/bin, then find chromedriver in Finder, right click, Open.
Configuration
We use Pydantic to formalise Config files.
Configuration is split between:
- config/local.env as a convenience for env vars.
- config/data_config.json contains what are expected to be somewhat change-able config options.
- config/secrets.env is needed during development, whilst some data are private.
Use
shell
source config/secrets.env
python manage.py migrate
python manage.py fetch_project_samples
python manage.py refresh_external_data
python manage.py runserver
TODO: update README once importer supports hierarchical samples
Refreshing external data
refresh_external_data has several options, for fetching data for some or all samples/projects,
and for fetching data from some or all supporting APIs.
To (re)fetch sample metadata from ENA and Biosamples only, for a specific sample:
python manage.py refresh_external_data --samples SAMEA7687881 --types METADATA
or to refresh metagenomic data for all sample in a project:
python manage.py refresh_external_data --projects PRJEB39110 --types METAGENOMIC
(Note that it is much more efficient to fetch metagenomics data on a per-project basis than a per-sample basis – this is because it uses many fewer API calls and the MGnify API has strict rate-limits.)
Or to (re)fetch sample metadata and metagenomic existence data from ENA, Biosamples, and MGnify, for samples with accessions in a certain range:
python manage.py refresh_external_data --sample_filters accession__gt=SAMEA7687880 accession__lt=SAMEA7687900 --types METADATA METAGENOMIC
--sample_filters... is expected to be useful in cases where refreshing a large number of samples fails,
and you therefore need to retry from a certain accession onwards.
(Because samples are iterated through in ascending order of accession.)
Import catalogues
MAG and Viral catalogues can be imported using management commands. Each MAG catalogue needs to relate to a single public MAG catalogue on MGnify; in that the species representative of each MAG must exist on MGnify. Each Viral Catalogue must relate (biome-wise) to a single MAG catalogue on the portal. In other words, the order of data insertion needs to be:
- public MAG catalogue into MGnify
- MAG catalogue into this data portal
- Viral catalogue into this data portal
The uploaders expect TSV files (and a folder of GFFs in the case of viral catalogue).
For the format / column naming, inspect the files in holofood/tests/static_fixtures.
Run python manage.py import_mag_catalogue or import_viral_catalogue for help, but essentially:
shell
python manage.py import_mag_catalogue hf-salmon-mags-v1 ./salmon.tsv "HoloFood Salmon V1" mgnify-salmon-v1-0 "Some:Biome:String" salmon
python manage.py import_viral_catalogue hf-salmon-vir-v1 './salmon_viral_cat.tsv' './salmon_viral_annotations.gff' --title="HoloFood Salmon Viruses V1" --related_mag_catalogue_id=hf-salmon-mags-v1
The import_viral_catalogue command can be run multiple times to populate the catalogue with several TSV/GFF combinations if needed –
fragments are appended to the existing catalogue if it already exists.
Adding users
Superusers can do everything in the admin panel, including managing other users.
shell
python manage.py createsuperuser
Superusers can go to e.g. http://localhost:8000/admin and create other users there.
"Staff" users can access the admin panel, but won't by default have permissions to do anything there. Add them to the "authors" user group to give them permissions to author "Analysis Summary" documents via the admin panel.
Deployment
AWS Elastic Beanstalk
The Django application can be deployed to AWS Cloud via Elastic Beanstalk, a scalable web app deployment service.
There is an Elastic Beanstalk configuration in .ebextensions/django.config.
This config will migrate the db on deployment, as well as compile scss styles and collect static files.
Run pip install -r requirements-aws.txt to install the CLI tool for EB.
Create an
Elastic Beanstalk environment.
You need a Python 3.8 on Amazon Linux 2 platform, and an RDS Postgres database (a db.t4g.micro instance is fine).
Run eb use <whatever-the-name-of-your-elastic-beanstalk-environment-is>, e.g.
eb use holofood-data-portal-dev-env.
Deploy the latest git commit with eb deploy
To log into the lead instance e.g. to run management commands: ```shell eb ssh cd /var/app/current source ../venv/*/bin/activate export $(/opt/elasticbeanstalk/bin/get-config --output YAML environment | sed -r 's/: /=/' | xargs)
^ this sources the env vars
python manage.py ... ```
Secret environment variables can be configured in the AWS EB Console.
EBI-specific info
If using the EBI AWS cloud, via an SSO login.
Refer to
the Confluence page on AWS SSO
for the SSO parameters. Use aws configure sso --profile eb-cli to sign in.
Occasionally you’ll need aws sso login --profile eb-cli to get a new token.
Kubernetes
Local
- Use
minikubeorkind. - Make a secrets .env file at
k8s/secrets-k8s.envwith e.g.DJANGO_SECRET_KEY. kubectl create secret generic holofood-secret --from-env-file=k8s/secrets-k8s.envminikube image build -t holofood -f k8s/Dockerfile .kubectl apply -f k8skubectl get pods -Aand find the pod ID forholofood-app-...kubectl exec --stdin --tty holofood-app-......... -- /bin/bashpython manage.py migratewill make the/app/data/db.sqlite3minikube service holofood
EBI WebProd k8s
- EBI operates a two-clusters-per-service policy (primary in "HL" data centre a.k.a. "HH" in some places, fallback in "HX"). The app needs to be deployed to both. There are stub configs in
k8s-hlandk8s-hxfor these. - K8s cluster configurations are provided as YML files by EBI's webprod team. You need these to deploy.
- Build the image (with some customisation for EBI's NFS filesystem):
docker build -f k8s-hl/Dockerfile -t quay.io/microbiome-informatics/holofood-data-portal:ebi-k8s-hl . docker push quay.io/microbiome-informatics/holofood-data-portal:ebi-k8s-hl(you need appropriate Quay credentials for this).- Make a secrets .env file at
k8s-hl/secrets-k8s.envwith e.g.DJANGO_SECRET_KEY=.....- Push it with e.g.:
kubectl --kubeconfig ~/webprod-configs/mgnify-k8s-team-admin-hh.conf --namespace holofood-hl-prod create secret generic holofood-secret --from-env-file=k8s-hl/secrets-k8s.env
- Push it with e.g.:
- Get authentication credentials for quay.io (the built image is private). You can get a Kubernetes secrets yaml file from your Quay.io user settings, in the "CLI Password" section.
- Download the secrets yaml and name the secret
name: quay-pull-secretin the metadata section. Put this into thek8s-hlfolder.
- Download the secrets yaml and name the secret
- Deploy:
kubectl --kubeconfig ~/webprod-configs/mgnify-k8s-team-admin-hh.conf apply -f k8s-hl. If the namespace doesn't exist, you might need to apply twice.
Documentation
There is an Quarto based documentation pack in the docs/ folder,
configured by docs/_quarto.yml.
This uses a mixture of Markdown and rendered Jupyter Notebooks. (This choice allows a code-based tutorial to be included in the docs as well as run by users.)
To develop documentation:
To make some small text changes
Just edit the .qmd (essentially just Markdown) files, and commit to GitHub.
GitHub Actions will render your changes to the
GitHub Pages site.
(Because there is a .github/workflows/docs.yml action to do this.)
To preview changes, or change a Jupyter Notebook
Install Quarto on your system.
shell
pip install -r requirements-docs.txt
jupyter lab
and edit the .qmd or .ipynb files in docs/.
Run
shell
quarto preview docs
to open a live-preview of the documentation site that updates as you save changes.
If you add a new file (page), add it to the navigation by editing the
website.sidebar.contents list in docs/_quarto.yml.
Note: any Jupyter Notebooks will be rendered to the documentation site exactly as you leave them.
This is because
Quarto defaults to not executing Jupyter Notebooks during rendering
which is a good thing.
If you've added any executable Quarto docs other than Jupyter Notebooks (like an R script...) run quarto render docs
and commit the docs/_freeze/ dir.
Owner
- Name: MGnify
- Login: EBI-Metagenomics
- Kind: organization
- Email: metagenomics-help@ebi.ac.uk
- Location: Genome Campus, UK
- Website: https://www.ebi.ac.uk/metagenomics/
- Twitter: MGnifyDB
- Repositories: 153
- Profile: https://github.com/EBI-Metagenomics
MGnify (formerly known as EBImetagenomics) is a free resource for the assembly, analysis, archiving and browsing all types of microbiome derived sequence data
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: HoloFood Data Portal
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Alexander B
family-names: Rogers
email: sandyr@ebi.ac.uk
affiliation: EMBL-EBI
orcid: 'https://orcid.org/0000-0002-4283-6135'
identifiers:
- type: doi
value: 10.1093/database/baae112
description: Journal article describing the software
- type: doi
value: 10.5281/zenodo.12698157
description: Software source code
repository-code: 'https://github.com/EBI-Metagenomics/holofood-database/'
url: 'https://www.holofooddata.org'
abstract: >-
The HoloFood Data Portal is a django web application
presenting a unified index of hologenomic, multi-omic data
from the HoloFood project, which studied the effects of
feed additives on chicken and salmon. The data themselves
are deposited in existing open science repositories. This
application makes them collectively searchable,
interlinked, and retrievable in a schema that is aware of
the project's experimental design.
keywords:
- python
- django
- multiomics
- holobiont
license: Apache-2.0
GitHub Events
Total
- Push event: 4
- Pull request event: 2
Last Year
- Push event: 4
- Pull request event: 2
Dependencies
- awsebcli ==3.20.3
- black ==22.6.0 development
- django-debug-toolbar ==3.5.0 development
- djlint ==1.7.0 development
- pre-commit ==2.19.0 development
- pytest ==7.1.2 development
- pytest-cov ==3.0.0 development
- pytest-django ==4.5.2 development
- pytest-env ==0.6.2 development
- pytest-html ==3.1.1 development
- requests-mock ==1.9.3 development
- selenium ==4.3.0 development
- selenium-wire ==4.6.5 development
- jupyterlab ==3.4.4
- matplotlib ==3.5.2
- pandas ==1.4.3
- django ==4.1
- django-admin-inline-paginator ==0.3.0
- django-compressor ==4.1
- django-filter ==22.1
- django-ninja ==0.17.0
- django-sass-processor ==1.2.1
- gunicorn ==20.1.0
- libsass ==0.21.0
- martor ==1.6.13
- psycopg2-binary ==2.9.3
- pydantic ==1.9.1
- python-dotenv ==0.20.0
- requests ==2.28.1
- xsdata ==22.7
- actions/checkout v2 composite
- quarto-dev/quarto-actions/publish v2 composite
- quarto-dev/quarto-actions/setup v2 composite
- actions/checkout v3 composite
- actions/setup-python v3 composite
- codecov/codecov-action v2 composite
- nanasess/setup-chromedriver master composite
- python 3.11 build
- python 3.11 build