biospecdb

Biosample Spectral Repository

https://github.com/rispadd/biospecdb

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 3 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (15.3%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Biosample Spectral Repository

Basic Info

Host: GitHub
Owner: RISPaDD
License: bsd-3-clause
Language: Python
Default Branch: main
Size: 11.9 MB

Statistics

Stars: 0
Watchers: 4
Forks: 1
Open Issues: 7
Releases: 9

Created about 3 years ago · Last pushed over 1 year ago

Metadata Files

Readme License Code of conduct Citation Codeowners Zenodo

SSEC-JHU BioSpecDB

SSEC-JHU Logo

Spectroscopy for Patient Diagnosis Database (SPaDDa)

Research Project

This database application stands as an online community collection point for patient biosample spectral data and metadata, to be used as an AI training set, generating a tool for easy disease detection from future patient biosamples. Further details for the science goals of this application can be found at SSEC@JHU research projects: BSR.

Installation, Build, & Run instructions

Conda:

For additional cmds see the Conda cheat-sheet.

Download and install either miniconda or anaconda.
Create new environment (env) and install conda create -n <environment_name>
Activate/switch to new env conda activate <environment_name>
cd into repo dir.
Install python and pip conda install python=3.11 pip
Install all required dependencies (assuming local dev work) pip install -r requirements/dev.txt.

Build:

#### with Docker: * Follow the run instructions below using docker-compose.

#### with Python ecosystem: * cd into repo dir. * conda activate <environment_name> * Build and install package in conda env: pip install . * Do the same but in dev/editable mode (changes to repo will be reflected in env installation upon python kernel restart) NOTE: This is the preferred installation method for dev work. pip install -e .. NOTE: If you didn't install dependencies from requirements/dev.txt, you can install a looser constrained set of deps using: pip install -e .[dev].

Run:

#### using Docker (for production): * Download & install Docker - see Docker install docs. * cd into repo dir. * Setup env vars: * export DJANGO_SUPERUSER_PASSWORD=admin * source scripts/prd/gen_secret_key.sh * Copy both the SSL certificate and the private key to nginx/cert.crt & nginx/cert.key respectively. For local testing of the prd service see creating self-signed SSL certificates for instructions on how to create "fake" ones. * Build and run with docker compose up, add -d to run in the background. * The site can then be accessed using any browser from https://localhost

#### with Python ecosystem (for development): * Follow the above Build with Python ecosystem instructions. * For a completely fresh start and rebuild of the database: ./scripts/dev/rebuild_sqlite.sh. * Run DJANGO_SETTINGS_MODULE=biospecdb.settings.dev python manage.py runserver 0.0.0.0:8000 * The site can then be accessed using any browser from http://localhost:8000

### Self signed SSL certificates: Warning! This is intended of local testing only and not for use in production. * Install openssl * cd nginx * openssl req -newkey rsa:4096 -nodes -x509 -out cert.crt -keyout cert.key -days 365 Note: This certificate is valid for 365 days. Also, when accessing https://localhost your browser will flag the site as unsafe and the certificate as invalid.

Custom Deployment Settings:

EXPLORER_CHARTS_ENABLED: Include the spectral data files, if present in query results, for download as zip file.
EXPLORER_DATA_EXPORTERS_ALLOW_DATA_FILE_ALIAS: Exhaustively scan query result values for relevant filepaths to collect data files. Does nothing when EXPLORER_DATA_EXPORTERS_INCLUDE_DATA_FILES is False.
AUTO_ANNOTATE: Automatically run "default" annotators when new spectral data is added. Quality Control Annotations.
RUN_DEFAULT_ANNOTATORS_WHEN_SAVED: Run newly added/updated annotator on all spectral data if annotator.default is True. WARNING: This may be time-consuming if the annotators takes a while to run and there are a lot of spectral data samples in the database. See Quality Control Annotations.

DB Management

We're currently using sqlite requiring the following setup instructions:

For a quickstart run the provided script rebuild_sqlite.sh, otherwise follow the instructions below. The default superuser credentials are username: admin, password: admin. Set the env var DJANGO_SUPERUSER_PASSWORD to override the default given in rebuild_sqlite.sh.

_NOTE: For postgresql usage, run the provided script rebuild_postgres.sh.

cd into repo
python manage.py migrate
python manage.py migrate --database=bsr
python manage.py createsuperuser
python manage.py loaddata centers queries
python manage.py loaddata --database=bsr centers observables instruments qcannotators biosampletypes spectrameasurementtypes
python manage.py update_sql_views flat_view
python manage.py runserver

For running the Quality Control Annotators (QCAnnotators) use the following:

python manage.py run_qc_annotators

...using the following option --no_reruns to NOT run annotators on existing annotations, but instead leave their existing computed values as they are. This will, however, still run all "default" annotators on all SepectralData entries that have not yet been annotated.

On subsequent deployments only python manage.py runserver is needed, unless the db (db.sqlite) is nuked from disk.

When the models are changed only the following migration commands are required: * python manage.py makemigrations user * python manage.py makemigrations uploader * python manage.py makemigrations catalog * git add biospecdb/apps/uploader/migrations * git add biospecdb/apps/user/migrations * git commit -asm"Update model migrations" * python manage.py migrate * python manage.py migrate --database=bsr

The DB can be dumped to a file using the following:

python manage.py dumpdata --indent 4 uploader --exclude uploader.uploadedfile --output test_data.json

Custom commands:

python manage.py prune_files [--dry_run]: Delete any and all orphaned data files.
- --dry_run: Output files to be deleted but don't actually delete anything.
python manage.py update_sql_views <view>: Create/update the custom SQL view provided and its view dependencies if any.
- --drop_only: Drop SQL view (and dependencies) but don't re-create.
python manage.py run_qc_annotators [--no_reruns]: Run all Quality Control annotators on the SpectralData database table.
- --no_reruns: Don't run annotators on existing annotations, leave computed values as is.
python manage.py get_column_names [--exlcude_observables] [--exclude_non_observables] [--center=<name|id>]
- --exlcude_observables: Only output column names for all observables currently in the database.
- --exclude_non_observables: Only output column names for non-observables.
- --center=<name|id>: Filter observables by center name or center ID.
- --category=<category>: Filter observables by category.
- --descriptions: Also print field.help_text and observation.description.
- --include_instrument_fields: Also include Instrument fields. Note: These are not used for bulk uploads, only the database Instrument ID is used. Therefore these aren't that useful to list. Does nothing when used with --excludenonobservables.
python manage.py send_test_email <send_to_email_address>: Send a test email to "sendtoemail_address" to test email setup.
python manage.py makesuperuser: This is a wrapper of Django's builtin createsuperuser command except that it doesn't fail when the user already exists.
- See python manage.py createsuperuser --help for options.
- --fail: Revert to createsuperuser behavior and fail when the user already exists.

_NOTE: These commands must be run from the /app/ directory on the server.

AWS:

The above management commands (and others) can be run in production from an EC2 instance correctly configured. To aid in shell setup, the following script can be executed: * source repo/biospecdb/scripts/prd/ec2_init.sh

NOTE: ``scripts/prd/ec2init.sh`` will export all AWS secrets as shell environment variables.

Usage

URL Paths:

catalog/: Access cataloged datasets to download and explore for research purposes.
data/: Data ingestion and editing. The following paths are the principal ingestion methods:
- data/uploader/patient/add/: Add all data associated with a given patient, new or existing.
- data/uploader/uploadedfile/add/: Bulk upload data in a tabulated format, e.g., .csv, .xlsx, & .json.
explorer/: SQL UI interface for direct data exploration. Privileged user permissions required.
admin/: Access to patient data, explorer queries, center info, user info, and general database admin. Privileged user permissions required. Note that the data input forms are simpler (not nested) than those at data/.
healthz/: Display simple health check system status for both the web application and backend infrastructure.

Quality Control Annotations

Entries in the SpectralData table can be annotated by running QCAnnotators on them, thus producing a value stored as an ACAnnotation associated with the SpectralData entry. The SpectralData table contains the actual spectral data file containing the wavelength and intensity values. It may be desirable to annotate this data with certain quality control metrics that can later be used to filter the data. Such quality control functions are to be implemented as a subclass of Biospecdb.app.uploader.qc.qcfilter.QcFilter. They can then be added to the database belonging to the QCAnnotator table. Annotations of this annotators can then either be manually associated with a SpectralData entry manually via the admin form, or by "default" if the QCAnnotator.default = True. They can also be run by using the run_qc_annotator Django management command. The behavior for running these annotators and population of the QCAnnotation table is configurable and is described below.

Settings:

The following QC annotator settings are available in biospecdb.settings.base:

AUTO_ANNOTATE: If True and if default annotators exist in the DB, they will be automatically run upon adding/updating SpectralData entries. (Default: True)
RUN_DEFAULT_ANNOTATORS_WHEN_SAVED: If True and SpectralData entries exist in the DB, newly added/updated default annotators will be run on all SpectralData entries. (Default: False)

Management Command:

Running python manage.py run_qc_annotors will run all existing default annotators and re-run existing annotations on all relevant SpectralData table entries. The option --no_reruns can be used to prevent re-running existing annotations and only run default annotators that have not yet been run. NOTE: If AUTO_ANNOTATE = RUN_DEFAULT_ANNOTATORS_WHEN_SAVED = False using the run_qc_annotors management command is the only mechanism for creating quality control annotations.

Testing

NOTE: The following steps require pip install -r requirements/dev.txt.

Linting:

Facilitates in testing typos, syntax, style, and other simple code analysis tests. * cd into repo dir. * Switch/activate correct environment: conda activate <environment_name> * Run ruff . * This can be automatically run (recommended for devs) every time you git push by installing the provided pre-push git hook available in ./githooks. Instructions are in that file - just cp ./githooks/pre-push .git/hooks/;chmod +x .git/hooks/pre-push.

Security Checks:

Facilitates in checking for security concerns using Bandit. * cd into repo dir. * bandit -c pyproject.toml --severity-level=medium -r biospecdb

Unit Tests:

Facilitates in testing core package functionality at a modular level. * cd into repo dir. * Run all available tests: pytest . * Run specific test: pytest tests/test_util.py::test_base_dummy.

Regression tests:

Facilitates in testing whether core data results differ during development. * WIP

Smoke Tests:

Facilitates in testing at the application and infrastructure level. * WIP

Build Docs:

Facilitates in building, testing & viewing the docs. * cd into repo dir. * pip install -r requirements/docs.txt * cd docs * make clean * make html * To view the docs in your default browser run open docs/_build/html/index.html.

The DB Model:

BioSpecDB Model.jpg

Column Names:

When constructing bulk data files to be uploaded, the server must parse matching column names with model fields. To obtain a list of these column names run python manage.py get_column_names. See Custom Commands

Owner

Name: Spectroscopy for Patient Diagnosis Database
Login: RISPaDD
Kind: organization

Repositories: 1
Profile: https://github.com/RISPaDD

GitHub Events

Total

Delete event: 3
Issue comment event: 9
Pull request event: 9
Create event: 4

Last Year

Delete event: 3
Issue comment event: 9
Pull request event: 9
Create event: 4

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 0
Total pull requests: 4
Average time to close issues: N/A
Average time to close pull requests: about 1 month
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 1.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 4

Past Year

Issues: 0
Pull requests: 4
Average time to close issues: N/A
Average time to close pull requests: about 1 month
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 1.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 4

View more stats

Top Authors

Issue Authors

jamienoss (11)
Alvaro-FG (5)

Pull Request Authors

jamienoss (49)
dependabot[bot] (33)

Top Labels

Issue Labels

bug (2) enhancement (2)

Pull Request Labels

dependencies (39) bug (10) AWS (6) documentation (5) fix (4) security (2) enhancement (2) wontfix (1) refactor (1) deprecation (1) CI (1)

Dependencies

.github/workflows/ci.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite
actions/upload-artifact v3 composite
codecov/codecov-action v3 composite
docker/build-push-action f2a1d5e99d037542a71f64918e516c093c6f3fc4 composite
docker/login-action 65b78e6e13532edd9afa3aa52ac7964289d1a9c1 composite
docker/metadata-action 9ec57ed1fcdbf14dcef7dfbe97b2010124a938b7 composite

.github/workflows/security.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

Dockerfile docker

python 3.11-slim build

docker-compose.yml docker

bsr latest

pyproject.toml pypi

django *
django-sql-explorer [charts]
kaleido *
openpyxl *
pandas *
plotly *
uvicorn *
xlsxwriter *

requirements/dev.txt pypi

bandit ==1.7.5 development
build ==1.0.3 development
factory-boy ==3.3.0 development
httpx ==0.25.0 development
pytest ==7.4.3 development
pytest-cov ==4.1.0 development
pytest-django ==4.6.0 development
ruff ==0.0.288 development
setuptools ==68.2.2 development
setuptools_scm ==8.0.3 development
tox ==4.11.3 development

requirements/docs.txt pypi

nbsphinx ==0.9.3
sphinx ==6.2.1
sphinx-automodapi ==0.16.0
sphinx-issues ==3.0.1
sphinx_rtd_theme ==1.3.0

requirements/prd.txt pypi

django ==4.2.7
django-sql-explorer ==3.2.1
kaleido ==0.2.1
openpyxl ==3.1.2
pandas ==2.1.0
plotly ==5.17.0
uvicorn ==0.23.2
xlsxwriter ==3.1.9