kso
Notebooks to upload/download marine footage, connect to a citizen science project, train machine learning models and publish marine biological observations.
Science Score: 39.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.8%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Notebooks to upload/download marine footage, connect to a citizen science project, train machine learning models and publish marine biological observations.
Basic Info
Statistics
- Stars: 7
- Watchers: 2
- Forks: 13
- Open Issues: 64
- Releases: 1
Topics
Metadata Files
README.md
KSO System
The Koster Seafloor Observatory is an open-source, citizen science and machine learning approach to analyse subsea movies.
KSO overview
The KSO system has been developed to: * move and process underwater footage and its associated data (e.g. location, date, sampling device). * make this data available to citizen scientists in Zooniverse to annotate the data. * train and evaluate machine learning models (customise Yolov5 or Yolov8 models).

The system is built around a series of easy-to-use Jupyter Notebooks. Each notebook allows users to perform a specific task of the system (e.g. upload footage to the citizen science platform or analyse the classified data).
Users can run these notebooks via Google Colab (by clicking on the Colab links in the table below), locally or on a high-performance computing (HPC) environment.
Notebooks
Our notebooks are modular and grouped into four main task categories; Set up, Classify, Analyse and Publish.
| Task | Notebook | Description | Try it! |
| ------------------------------------------------- | ------------------------------------------------- | ------------------------------------------------------------------------------------------- | --------|
| Set up | Checkmetadata | Check format and contents of footage and sites, media and species csv files | [
][bindertut] |
| Classify | UploadsubjectstoZooniverse | Prepare original footage and upload short clips to Zooniverse, extract frames of interest from the original footage and upload them to Zooniverse |
[
][bindertut] |
| Classify | Processclassifications | Pull and process up-to-date classifications from Zooniverse |
[
][bindertut] |
| Analyse | Trainmodels | Prepare the training and test data, set model parameters and train models |
[
][bindertut] |
| Analyse | Evaluatemodels | Use ecologically relevant metrics to test the models |
[
][bindertut] |
| Publish | Publishmodels | Publish the model to a public repository |
[
][bindertut] |
| Publish | Publishobservations | Automatically classify new footage and export observations to GBIF |
[
][bindertut] |
Local Installation
Docker Installation
Requirements
Pull KSO Docker image
Bash
docker pull ghcr.io/ocean-data-factory-sweden/kso:dev
Conda Installation
Requirements
Download this repository
Clone this repository using
python
git clone https://github.com/ocean-data-factory-sweden/kso.git
Prepare your system
Depending on your system (Windows/Linux/MacOS), you might need to install some extra tools. If this is the case, you will get a message about what you need to install in the next steps. For example, Microsoft Build Tools C++ with a version higher than 14.0 is required for Windows systems.
Set up the environment with Conda
- Open the Anaconda Prompt
Navigate to the folder where you have cloned the repository or unzipped the manually downloaded repository. Then go into the kso folder.
cd ksoCreate an Anaconda environment with Python 3.8. Remember to change the name env.
conda create -n <name env> python=3.8Enter the environment:
conda activate <name env>Specify your GPU details.
5a. Find out the pytorch installation you need. Navigate to the system options (example below) and select your device/platform details.
5b. Add the recommended command to the KSO's gpurequirementsuser.txt file.
- Install all the requirements:
pip install -r requirements.txt -r gpu_requirements_user.txt
Cloudina
Cloudina is a hosted version of KSO (powered by JupyterHub) on NAISS Science Cloud. It allows users to scale and automate larger workflows using a powerful processing backend. This is currently an invitation-only service. To access the platform, please contact jurie.germishuys[at]combine.se.
The current portals are accessible as: 1. Console (object storage) - storage 2. Album (JupyterHub) - notebooks 3. Vendor (MLFlow) - mlflow
Starting a new project
To start a new project you will need to: 1. Create initial information for the database: Input the information about the underwater footage files, sites and species of interest. You can use a template of the csv files and move the directory to the "dbstarter" folder. 2. Link your footage to the database: You will need files of underwater footage to run this system. You can download some samples and move them to `dbstarter`. You can also store your own files and specify their directory in the notebooks.
Please remember the format of the underwater media is standardised (typically .mp4 or .jpg) and the associated metadata captured in three CSV files (“movies”, “sites” and “species”) should follow the Darwin Core standards (DwC).
Developer instructions
If you would like to expand and improve the KSO capabilities, please follow the instructions above to set the project up on your local computer.
When you add any changes, please create your branch on top of the current 'dev' branch. Before submitting a Merge Request, please:
* Run Black on the code you have edited
shell
black filename
* Clean up your commit history on your branch, so that every commit represents a logical change. (so squash and edit commits so that it is understandable for others)
* For the commit messages, we ask that you please follow the conventional commits guidelines (table below) to facilitate code sharing. Also, please describe the logic behind the commit in the body of the message.
#### Commit types
| Commit Type | Title | Description | Emoji |
|:-----------:|--------------------------|-------------------------------------------------------------------------------------------------------------|:-----:|
| feat | Features | A new feature | ✨ |
| fix | Bug Fixes | A bug Fix | 🐛 |
| docs | Documentation | Documentation only changes | 📚 |
| style | Styles | Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc) | 💎 |
| refactor | Code Refactoring | A code change that neither fixes a bug nor adds a feature | 📦 |
| perf | Performance Improvements | A code change that improves performance | 🚀 |
| test | Tests | Adding missing tests or correcting existing tests | 🚨 |
| build | Builds | Changes that affect the build system or external dependencies (example scopes: gulp, broccoli, npm) | 🛠 |
| ci | Continuous Integrations | Changes to our CI configuration files and scripts (example scopes: Travis, Circle, BrowserStack, SauceLabs) | ⚙️ |
| chore | Chores | Other changes that don't modify src or test files | ♻️ |
| revert | Reverts | Reverts a previous commit | 🗑 |
- Rebase on top of dev. (never merge, only use rebase)
- Submit a Pull Request and link at least 2 reviewers
Citation
If you use this code or its models in your research, please cite:
Anton V, Germishuys J, Bergström P, Lindegarth M, Obst M (2021) An open-source, citizen science and machine learning approach to analyse subsea movies. Biodiversity Data Journal 9: e60548. https://doi.org/10.3897/BDJ.9.e60548
Collaborations/Questions
You can find out more about the project at https://subsim.se.
We are always excited to collaborate and help other marine scientists. Please feel free to contact us (matthias.obst(at)marine.gu.se) with your questions.
Troubleshooting
If you experience issues importing panoptes_client in Windows, it is a known issue with the libmagic package. Pmason's suggestions in the Talk board of Zooniverse can be useful for troubleshooting it.
Owner
- Name: Ocean Data Factory Sweden
- Login: ocean-data-factory-sweden
- Kind: organization
- Email: torsten.linders@gu.se
- Repositories: 4
- Profile: https://github.com/ocean-data-factory-sweden
GitHub Events
Total
- Create event: 9
- Issues event: 21
- Watch event: 3
- Delete event: 1
- Member event: 2
- Issue comment event: 30
- Push event: 30
- Pull request review comment event: 13
- Pull request review event: 10
- Pull request event: 8
- Fork event: 1
Last Year
- Create event: 9
- Issues event: 21
- Watch event: 3
- Delete event: 1
- Member event: 2
- Issue comment event: 30
- Push event: 30
- Pull request review comment event: 13
- Pull request review event: 10
- Pull request event: 8
- Fork event: 1
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Jurie Germishuys | j****s@c****e | 618 |
| Victor | 5****e | 88 |
| Diewertje11 | d****r@c****e | 63 |
| Jannes | 3****g | 10 |
| Pablo Correa Gómez | p****z@c****e | 10 |
| PilarNavarro | p****r@h****s | 5 |
| Jurie Germishuys | j****g@a****e | 2 |
| dependabot[bot] | 4****] | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 194
- Total pull requests: 119
- Average time to close issues: about 1 month
- Average time to close pull requests: 7 days
- Total issue authors: 10
- Total pull request authors: 7
- Average comments per issue: 1.48
- Average comments per pull request: 1.66
- Merged pull requests: 68
- Bot issues: 0
- Bot pull requests: 26
Past Year
- Issues: 32
- Pull requests: 7
- Average time to close issues: 18 days
- Average time to close pull requests: 17 days
- Issue authors: 6
- Pull request authors: 4
- Average comments per issue: 0.78
- Average comments per pull request: 1.71
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- Bergylta (68)
- victor-wildlife (40)
- jannesgg (37)
- Diewertje11 (21)
- donkyjohn (5)
- ShrimpFather7 (3)
- KalindiFonda (2)
- pabloyoyoista (2)
- pilarnavarro (2)
- XhD98 (1)
Pull Request Authors
- victor-wildlife (57)
- Diewertje11 (25)
- dependabot[bot] (25)
- jannesgg (19)
- pilarnavarro (5)
- trossi (2)
- pabloyoyoista (2)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v3 composite
- psf/black stable composite
- nvcr.io/nvidia/pytorch 21.05-py3 build
- PIMS ==0.6.1
- PyYAML >=5.3.1
- av ==8.1.0
- boto3 ==1.26.64
- dataclass-csv ==1.4.0
- easydict ==1.9.0
- fastapi ==0.73.0
- ffmpeg-python ==0.2.0
- gdown ==3.13.0
- imagesize ==1.4.1
- ipyfilechooser ==0.4.4
- itables ==0.3.0
- jupyter ==1.0.0
- jupyter_bbox_widget ==0.5.0
- matplotlib >=3.2.2
- moviepy ==1.0.3
- natsort ==8.1.0
- numpy >=1.18.5
- opencv-contrib-python *
- opencv-python ==4.6.0.66
- opencv-python-headless *
- openpyxl ==3.1.0
- pandas ==1.1.4
- panoptes-client ==1.5.0
- protobuf ==3.15.8
- pyopenssl >=23
- python-magic ==0.4.24
- python-multipart ==0.0.5
- scipy >=1.4.1
- scp ==0.14.1
- seaborn >=0.11.0
- split-folders ==0.5.1
- tensorboard >=2.4.1
- thop *
- tqdm >=4.41.0
- uvicorn ==0.17.2
- wandb *
- actions/checkout v3 composite
- docker/login-action v2 composite
- tj-actions/changed-files v37 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- torch *
- torchaudio *
- torchvision *
- boto3 1.26.64
- csv-diff ^1.1
- dataclass-csv 1.4.0
- ffmpeg 1.4
- ffmpeg-python 0.2.0
- folium 0.12.1
- ftfy 6.1.1
- gdown 4.6.4
- imagesize 1.4.1
- ipyfilechooser 0.4.4
- ipysheet 0.4.4
- ipython 8.11.0
- ipywidgets 7.7.2
- jupyter-bbox-widget 0.5.0
- natsort 8.1.0
- opencv-python 4.5.4.60
- pandas 1.5.3
- panoptes-client 1.6.0
- pillow 9.4.0
- pims 0.6.1
- python ^3.8
- pyyaml 6.0
- requests 2.28.2
- scikit-learn 1.2.2
- scp 0.14.1
- split-folders 0.5.1
- torch 1.8.0
- tqdm 4.64.1
- wandb 0.13.2
- PIMS ==0.6.1
- PyYAML ==6.0
- SQLAlchemy ==2.0.20
- av ==8.1.0
- boto3 ==1.26.64
- boxmot ==10.0.43
- csv-diff ==1.1
- dataclass-csv ==1.4.0
- ffmpeg ==1.4
- ffmpeg-python ==0.2.0
- fiftyone ==0.20.0
- fiftyone_db ==0.4.0
- folium ==0.12.1
- ftfy ==6.1.1
- gdown ==4.7.1
- imagesize ==1.4.1
- ipyfilechooser ==0.4.4
- ipysheet ==0.7.0
- ipython ==8.11.0
- ipywidgets ==8.1.1
- jupyter ==1.0.0
- jupyter-bbox-widget ==0.5.0
- jupyter_contrib_nbextensions ==0.7.0
- mlflow ==2.7.1
- more-itertools ==9.1.0
- moviepy ==1.0.3
- natsort ==8.1.0
- notebook ==7.0.4
- numpy >=1.22.0,<1.24.1
- opencv-contrib-python ==4.6.0.66
- opencv-python ==4.6.0.66
- opencv-python-headless ==4.6.0.66
- pandas ==1.4.0
- scikit_learn ==1.3.0
- scp ==0.14.1
- setuptools ==67.6.1
- split-folders ==0.5.1
- tqdm ==4.64.1
- traitlets ==5.9.0
- ultralytics ==8.0.200
- wandb ==0.15.11
- yolov5 ==7.0.13
- av ==10.0.0
- boto3 ==1.28.80
- csv_diff ==1.1
- dataclass_csv ==1.4.0
- ffmpeg ==1.4
- fiftyone ==0.22.3
- ipysheet ==0.7.0
- jupyter_bbox_widget ==0.5.0
- lida ==0.0.10
- mlflow ==2.8.0
- pims ==0.6.1
- torch ==2.1.1
- typing-extensions ==4.5.0
- ultralytics ==8.0.200
- wandb ==0.16.0
- yolov5 ==7.0.13