Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.3%) to scientific vocabulary
Repository
Biosample Spectral Repository
Basic Info
- Host: GitHub
- Owner: RISPaDD
- License: bsd-3-clause
- Language: Python
- Default Branch: main
- Size: 11.9 MB
Statistics
- Stars: 0
- Watchers: 4
- Forks: 1
- Open Issues: 7
- Releases: 9
Metadata Files
README.md
SSEC-JHU BioSpecDB

Spectroscopy for Patient Diagnosis Database (SPaDDa)
Research Project
This database application stands as an online community collection point for patient biosample spectral data and metadata, to be used as an AI training set, generating a tool for easy disease detection from future patient biosamples. Further details for the science goals of this application can be found at SSEC@JHU research projects: BSR.
Installation, Build, & Run instructions
Conda:
For additional cmds see the Conda cheat-sheet.
- Download and install either miniconda or anaconda.
- Create new environment (env) and install
conda create -n <environment_name> - Activate/switch to new env
conda activate <environment_name> cdinto repo dir.- Install
pythonandpipconda install python=3.11 pip - Install all required dependencies (assuming local dev work)
pip install -r requirements/dev.txt.
Build:
#### with Docker: * Follow the run instructions below using docker-compose.
#### with Python ecosystem:
* cd into repo dir.
* conda activate <environment_name>
* Build and install package in pip install .
* Do the same but in dev/editable mode (changes to repo will be reflected in env installation upon python kernel restart)
NOTE: This is the preferred installation method for dev work.
pip install -e ..
NOTE: If you didn't install dependencies from requirements/dev.txt, you can install
a looser constrained set of deps using: pip install -e .[dev].
Run:
#### using Docker (for production):
* Download & install Docker - see Docker install docs.
* cd into repo dir.
* Setup env vars:
* export DJANGO_SUPERUSER_PASSWORD=admin
* source scripts/prd/gen_secret_key.sh
* Copy both the SSL certificate and the private key to nginx/cert.crt & nginx/cert.key respectively. For local
testing of the prd service see creating self-signed SSL certificates for
instructions on how to create "fake" ones.
* Build and run with docker compose up, add -d to run in the background.
* The site can then be accessed using any browser from https://localhost
#### with Python ecosystem (for development):
* Follow the above Build with Python ecosystem instructions.
* For a completely fresh start and rebuild of the database: ./scripts/dev/rebuild_sqlite.sh.
* Run DJANGO_SETTINGS_MODULE=biospecdb.settings.dev python manage.py runserver 0.0.0.0:8000
* The site can then be accessed using any browser from http://localhost:8000
### Self signed SSL certificates:
Warning! This is intended of local testing only and not for use in production.
* Install openssl
* cd nginx
* openssl req -newkey rsa:4096 -nodes -x509 -out cert.crt -keyout cert.key -days 365
Note: This certificate is valid for 365 days. Also, when accessing https://localhost your browser will flag the site
as unsafe and the certificate as invalid.
Custom Deployment Settings:
EXPLORER_CHARTS_ENABLED: Include the spectral data files, if present in query results, for download as zip file.EXPLORER_DATA_EXPORTERS_ALLOW_DATA_FILE_ALIAS: Exhaustively scan query result values for relevant filepaths to collect data files. Does nothing whenEXPLORER_DATA_EXPORTERS_INCLUDE_DATA_FILES is False.AUTO_ANNOTATE: Automatically run "default" annotators when new spectral data is added. Quality Control Annotations.RUN_DEFAULT_ANNOTATORS_WHEN_SAVED: Run newly added/updated annotator on all spectral data if annotator.default is True. WARNING: This may be time-consuming if the annotators takes a while to run and there are a lot of spectral data samples in the database. See Quality Control Annotations.
DB Management
We're currently using sqlite requiring the following setup instructions:
For a quickstart run the provided script rebuild_sqlite.sh, otherwise follow the instructions
below. The default superuser credentials are username: admin, password: admin. Set the env var
DJANGO_SUPERUSER_PASSWORD to override the default given in rebuild_sqlite.sh.
_NOTE: For postgresql usage, run the provided script rebuild_postgres.sh.
- cd into repo
python manage.py migratepython manage.py migrate --database=bsrpython manage.py createsuperuserpython manage.py loaddata centers queriespython manage.py loaddata --database=bsr centers observables instruments qcannotators biosampletypes spectrameasurementtypespython manage.py update_sql_views flat_viewpython manage.py runserver
For running the Quality Control Annotators (QCAnnotators) use the following:
python manage.py run_qc_annotators
...using the following option --no_reruns to NOT run annotators on existing annotations, but instead leave their
existing computed values as they are. This will, however, still run all "default" annotators on all SepectralData
entries that have not yet been annotated.
On subsequent deployments only python manage.py runserver is needed, unless the db (db.sqlite) is nuked from
disk.
When the models are changed only the following migration commands are required:
* python manage.py makemigrations user
* python manage.py makemigrations uploader
* python manage.py makemigrations catalog
* git add biospecdb/apps/uploader/migrations
* git add biospecdb/apps/user/migrations
* git commit -asm"Update model migrations"
* python manage.py migrate
* python manage.py migrate --database=bsr
The DB can be dumped to a file using the following:
python manage.py dumpdata --indent 4 uploader --exclude uploader.uploadedfile --output test_data.json
Custom commands:
python manage.py prune_files [--dry_run]: Delete any and all orphaned data files.--dry_run: Output files to be deleted but don't actually delete anything.
python manage.py update_sql_views <view>: Create/update the custom SQL view provided and its view dependencies if any.--drop_only: Drop SQL view (and dependencies) but don't re-create.
python manage.py run_qc_annotators [--no_reruns]: Run all Quality Control annotators on the SpectralData database table.--no_reruns: Don't run annotators on existing annotations, leave computed values as is.
python manage.py get_column_names [--exlcude_observables] [--exclude_non_observables] [--center=<name|id>]--exlcude_observables: Only output column names for all observables currently in the database.--exclude_non_observables: Only output column names for non-observables.--center=<name|id>: Filter observables by center name or center ID.--category=<category>: Filter observables by category.--descriptions: Also print field.help_text and observation.description.--include_instrument_fields: Also include Instrument fields. Note: These are not used for bulk uploads, only the database Instrument ID is used. Therefore these aren't that useful to list. Does nothing when used with --excludenonobservables.
python manage.py send_test_email <send_to_email_address>: Send a test email to "sendtoemail_address" to test email setup.python manage.py makesuperuser: This is a wrapper of Django's builtincreatesuperusercommand except that it doesn't fail when the user already exists.- See
python manage.py createsuperuser --helpfor options. --fail: Revert tocreatesuperuserbehavior and fail when the user already exists.
- See
_NOTE: These commands must be run from the /app/ directory on the server.
AWS:
The above management commands (and others) can be run in production from an EC2 instance correctly configured. To
aid in shell setup, the following script can be executed:
* source repo/biospecdb/scripts/prd/ec2_init.sh
NOTE: ``scripts/prd/ec2init.sh`` will export all AWS secrets as shell environment variables.
Usage
URL Paths:
catalog/: Access cataloged datasets to download and explore for research purposes.data/: Data ingestion and editing. The following paths are the principal ingestion methods:data/uploader/patient/add/: Add all data associated with a given patient, new or existing.data/uploader/uploadedfile/add/: Bulk upload data in a tabulated format, e.g.,.csv,.xlsx, &.json.
explorer/: SQL UI interface for direct data exploration. Privileged user permissions required.admin/: Access to patient data, explorer queries, center info, user info, and general database admin. Privileged user permissions required. Note that the data input forms are simpler (not nested) than those atdata/.healthz/: Display simple health check system status for both the web application and backend infrastructure.
Quality Control Annotations
Entries in the SpectralData table can be annotated by running QCAnnotators on them, thus producing a value
stored as an ACAnnotation associated with the SpectralData entry. The SpectralData table contains the actual
spectral data file containing the wavelength and intensity values. It may be desirable to annotate this data with
certain quality control metrics that can later be used to filter the data. Such quality control functions are to be
implemented as a subclass of Biospecdb.app.uploader.qc.qcfilter.QcFilter.
They can then be added to the database belonging to the QCAnnotator
table. Annotations of this annotators can then either be manually associated with a SpectralData entry manually via
the admin form, or by "default" if the QCAnnotator.default = True. They can also be run by using the
run_qc_annotator Django management command. The behavior for running these annotators and
population of the QCAnnotation table is configurable and is described below.
Settings:
The following QC annotator settings are available in biospecdb.settings.base:
AUTO_ANNOTATE: IfTrueand if default annotators exist in the DB, they will be automatically run upon adding/updatingSpectralDataentries. (Default: True)RUN_DEFAULT_ANNOTATORS_WHEN_SAVED: IfTrueandSpectralDataentries exist in the DB, newly added/updated default annotators will be run on allSpectralDataentries. (Default: False)
Management Command:
Running python manage.py run_qc_annotors will run all existing default annotators and re-run existing annotations on
all relevant SpectralData table entries. The option --no_reruns can be used to prevent re-running existing
annotations and only run default annotators that have not yet been run.
NOTE: If AUTO_ANNOTATE = RUN_DEFAULT_ANNOTATORS_WHEN_SAVED = False using the run_qc_annotors management command
is the only mechanism for creating quality control annotations.
Testing
NOTE: The following steps require pip install -r requirements/dev.txt.
Linting:
Facilitates in testing typos, syntax, style, and other simple code analysis tests.
* cd into repo dir.
* Switch/activate correct environment: conda activate <environment_name>
* Run ruff .
* This can be automatically run (recommended for devs) every time you git push by installing the provided
pre-push git hook available in ./githooks.
Instructions are in that file - just cp ./githooks/pre-push .git/hooks/;chmod +x .git/hooks/pre-push.
Security Checks:
Facilitates in checking for security concerns using Bandit.
* cd into repo dir.
* bandit -c pyproject.toml --severity-level=medium -r biospecdb
Unit Tests:
Facilitates in testing core package functionality at a modular level.
* cd into repo dir.
* Run all available tests: pytest .
* Run specific test: pytest tests/test_util.py::test_base_dummy.
Regression tests:
Facilitates in testing whether core data results differ during development. * WIP
Smoke Tests:
Facilitates in testing at the application and infrastructure level. * WIP
Build Docs:
Facilitates in building, testing & viewing the docs.
* cd into repo dir.
* pip install -r requirements/docs.txt
* cd docs
* make clean
* make html
* To view the docs in your default browser run open docs/_build/html/index.html.
The DB Model:

Column Names:
When constructing bulk data files to be uploaded, the server must parse matching column names with model fields. To
obtain a list of these column names run python manage.py get_column_names.
See Custom Commands
Owner
- Name: Spectroscopy for Patient Diagnosis Database
- Login: RISPaDD
- Kind: organization
- Repositories: 1
- Profile: https://github.com/RISPaDD
GitHub Events
Total
- Delete event: 3
- Issue comment event: 9
- Pull request event: 9
- Create event: 4
Last Year
- Delete event: 3
- Issue comment event: 9
- Pull request event: 9
- Create event: 4
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 0
- Total pull requests: 4
- Average time to close issues: N/A
- Average time to close pull requests: about 1 month
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 1.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 4
Past Year
- Issues: 0
- Pull requests: 4
- Average time to close issues: N/A
- Average time to close pull requests: about 1 month
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 1.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 4
Top Authors
Issue Authors
- jamienoss (11)
- Alvaro-FG (5)
Pull Request Authors
- jamienoss (49)
- dependabot[bot] (33)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/upload-artifact v3 composite
- codecov/codecov-action v3 composite
- docker/build-push-action f2a1d5e99d037542a71f64918e516c093c6f3fc4 composite
- docker/login-action 65b78e6e13532edd9afa3aa52ac7964289d1a9c1 composite
- docker/metadata-action 9ec57ed1fcdbf14dcef7dfbe97b2010124a938b7 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- python 3.11-slim build
- bsr latest
- django *
- django-sql-explorer [charts]
- kaleido *
- openpyxl *
- pandas *
- plotly *
- uvicorn *
- xlsxwriter *
- bandit ==1.7.5 development
- build ==1.0.3 development
- factory-boy ==3.3.0 development
- httpx ==0.25.0 development
- pytest ==7.4.3 development
- pytest-cov ==4.1.0 development
- pytest-django ==4.6.0 development
- ruff ==0.0.288 development
- setuptools ==68.2.2 development
- setuptools_scm ==8.0.3 development
- tox ==4.11.3 development
- nbsphinx ==0.9.3
- sphinx ==6.2.1
- sphinx-automodapi ==0.16.0
- sphinx-issues ==3.0.1
- sphinx_rtd_theme ==1.3.0
- django ==4.2.7
- django-sql-explorer ==3.2.1
- kaleido ==0.2.1
- openpyxl ==3.1.2
- pandas ==2.1.0
- plotly ==5.17.0
- uvicorn ==0.23.2
- xlsxwriter ==3.1.9