genome-portal
This is the repository for the Swedish Reference Genome Portal, a service facilitating access and discovery of genome data of non-model eukaryotic species studied in Sweden
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.4%) to scientific vocabulary
Repository
This is the repository for the Swedish Reference Genome Portal, a service facilitating access and discovery of genome data of non-model eukaryotic species studied in Sweden
Basic Info
- Host: GitHub
- Owner: ScilifelabDataCentre
- License: mit
- Language: Python
- Default Branch: main
- Homepage: https://genomes.scilifelab.se/
- Size: 68.5 MB
Statistics
- Stars: 0
- Watchers: 5
- Forks: 0
- Open Issues: 2
- Releases: 11
Metadata Files
README.md
Swedish Reference Genome Portal
This repository contains the source code for the Swedish Reference Genome Portal, which:
- Showcases genome research performed in Sweden on non-model eukaryotic species.
- Lowers the barrier of entry to access, visualise, and interpret genome data.
- Encourages sharing of genomic annotations, even the seldom-published kind.
- Strives to present FAIR data, available in public repositories.
Table of Contents
Overview
The Swedish Reference Genome Portal website is built using the Hugo static web generator.
The JBrowse2 genome browser is embedded within the website to visually explore genome datasets.
Primary data file sources are available in public repositories (such as ENA), and prepared for display on JBrowse by our
Makefilerecipes (essentially compressing and indexing).The code for the Genome Portal is available under an MIT (open source) license.
The Genome Portal website is currently hosted by the KTH Royal Institute of Technology in Stockholm.
Cite this portal
See 'Cite this repository' in the "About" section at the top right of this page.
Contributing
Two types of contributions are especially welcome:
Datasets for display in the portal: Consult our requirements for including a genome dataset to the portal, and contact us if you have any questions.
Source code and documentation: We welcome contributions, small and large, to our codebase and documentation. They will be published after review and approval by the Genome Portal team. Fork, open a PR, or contact us to discuss ideas!
Funding
This service is supported by SciLifeLab and the Knut and Alice Wallenberg Foundation through the Data-Driven Life Science (DDLS) program, as well as by the Swedish Foundation for Strategic Research (SSF).
Contact us
We welcome all questions and suggestions (including feature requests or bug reports).
- Email us at dsn-eb@scilifelab.se.
- Create an issue on Github.
Technical overview
This section contains high-level technical documentation about the source code.
Repository layout
The
config/directory contains information about data sources (tracks and assemblies) displayed in the genome browser.- Each species subdirectory includes:
config.yml: specifies the assembly and tracks to be displayed in JBrowse2.config.json: starting point from which to generate a complete JBrowse2 configuration, based onconfig.yaml. A common use is to define default browsing sessions.
Different
makerecipes prepare the material described inconfig/for use by JBrowse2. The main operations are downloading data files, compressing usingbgzipand indexing withsamtools.The website content resides in the
hugodirectory.- Most importantly, each species gets:
- A content subdirectory in
hugo/content/species/(e.g.hugo/content/species/clupea_harengus). - A data directory in
hugo/data/(taxonomic information and statistics). - An assets directory in
hugo/assets(data inventory).
The
scriptsfolder contains executables to help:- Build and serve the website using Docker.
- Add a new species to the website content.
- Add new datasets to the portal.
The
testsfolder contains tests and fixtures, mainly covering the data preparation scripts.The
dockerfolder contains two Docker files:docker/data.dockerfileused for data preparation (everything thatmakeneeds).docker/hugo.dockerfileused to build and serve the website.
Local development
The steps described below requires
docker to be installed.
1. Clone the repository
git clone git@github.com:ScilifelabDataCentre/genome-portal.git
cd genome-portal
2. Build and install the genomic data
```bash
Build local image from docker/data.dockerfile
./scripts/dockerbuild data
Run the dockermake script to build the assets and install them locally.
./scripts/dockermake ```
You may need to be patient, some files are tens of Gigabytes. Should only a subset of species be of interest, you can restrict the scope of the build:
bash
./scripts/dockermake SPECIES=clupea_harengus,linum_tenue
3. Run the web application container
Then to run the website locally, you have several options:
Using the latest development image
bash
docker pull ghcr.io/scilifelabdatacentre/swg-hugo-site:dev
./scripts/dockerserve -t dev
Using a local build
bash
./scripts/dockerbuild -t local -k hugo
./scripts/dockerserve -t local
Using the Hugo development server
This last method is adequate when you want to see changes to the source immediately reflected in the web browser.
It requires the additional step of installing the JBrowse static
bundle in hugo/static/browser:
bash
./scripts/download_jbrowse v2.15.4 hugo/static/browser
./scripts/dockerserve -d
Either of these methods will serve you the website at http://localhost:8080/.
Making a new release/updating the dev cluster
We use kubernetes to deploy and manage both the production and development instances of the genome portal.
This repository is responsible for making the 2 docker images needed for the deployment. This is controlled by this GH actions workflow file.
To update the production instance we need to create a new release with GitHub:
- Identify a commit to base the release on.
- Agree with the team on the:
- commit to tag.
- the planned version number (we use semantic versioning)
- The contents of the release, use the previous releases as inspiration
- Once you have the go ahead, either:
- Create an annotated tag locally (e.g: git tag -a v1.3.1 "v1.3.1" ) and push the tag, Then create the release (on that tag) using GitHub's interface.
- Create the release using GitHub's interface and specify the commit you want to use and get GitHub to automatically create the tag for you.
- Once the release is published, a GH actions workflow will be triggered automatically to build the two images. The docker images will be tagged with the same string as used for the git tag (i.e. vX.X.X). They will also be given the tag "latest". You can see the docker images created from this repository here.
- With the 2 images made, you can follow the instructions in the README of our private repository that contains the kubernetes manifest files which we use in combination with ArgoCD to define the desired state of the cluster.
To update the development instance
- Identify the commit you want the docker images to be built off of.
- If the commit is on the main branch a GH actions workflow run will have already built the images (unless the commit message was prefixed to skip CI). The images will be tagged with the full commit hash. If the image is already built your job on this repository is already done.
- If the commit is on any other branch you'll need to trigger a workflow_dispatch to create the docker images.
- Head to the actions tag on GitHub and to the action "Build and push both docker images to the GitHub Container Registry". From there click run manual workflow. You can choose to specify the name of the image tag if you want. Otherwise leave the input blank and it will be tagged with the full commit hash.
Once the images are built you can head over to our private repository that contains the kubernetes manifest files and follow the instructions there on how to apply your changes to the cluster.
Credits
The Swedish Reference Genome Portal is developed and maintained by the DDLS Data Science Node in Evolution and Biodiversity (DSN-EB) team as part of the SciLifeLab Data Platform, operated by the SciLifeLab Data Centre. Members if the DSN-EB team are affiliated with SciLifeLab Data Centre and the National Bioinformatics Infrastructure Sweden (NBIS), based at Uppsala University and the Swedish Museum of Natural History.
Owner
- Name: SciLifeLab Data Centre
- Login: ScilifelabDataCentre
- Kind: organization
- Location: Stockholm and Uppsala, Sweden
- Website: https://www.scilifelab.se/data
- Twitter: SciLifeLab_DC
- Repositories: 27
- Profile: https://github.com/ScilifelabDataCentre
The SciLifeLab Data Centre provides the SciLifeLab platforms with services for IT and data management.
Citation (CITATION.cff)
cff-version: 1.2.0
title: The Swedish Reference Genome Portal
message: "If you use or reuse this software, please cite it using the following metadata."
type: software
authors:
- family-names: Lantz
given-names: Henrik
- family-names: Brink
given-names: Daniel P.
orcid: "https://orcid.org/0000-0003-4041-0250"
- family-names: Crean
given-names: Rory
- family-names: Ågren
given-names: Quentin
- family-names: Fuentes-Pardo
given-names: Angela P.
orcid: "https://orcid.org/0000-0002-5734-9030"
- family-names: Kochari
given-names: Arnold
orcid: "https://orcid.org/0000-0003-1373-5121"
- family-names: Kultima
given-names: Hanna
orcid: "https://orcid.org/0000-0001-7724-2567"
- family-names: Persson
given-names: Bengt
- family-names: Rung
given-names: Johan
orcid: "https://orcid.org/0000-0001-5875-8429"
repository-code: "https://github.com/ScilifelabDataCentre/genome-portal"
url: "https://genomes.scilifelab.se"
license: MIT
identifiers:
- description: "This is the collection of archived snapshots of all versions of the Swedish Reference Genome Portal"
type: doi
value: 10.5281/zenodo.14049736
references:
- authors:
- family-names: Diesh
given-names: Colin
- family-names: Stevens
given-names: Garrett J.
- family-names: Xie
given-names: Peter
- family-names: De Jesus Martinez
given-names: Teresa
- family-names: Hershberg
given-names: Elliot A.
- family-names: Leung
given-names: Angel
- family-names: Guo
given-names: Emma
- family-names: Dider
given-names: Shihab
- family-names: Zhang
given-names: Junjun
- family-names: Bridge
given-names: Caroline
- family-names: Hogue
given-names: Gregory
- family-names: Duncan
given-names: Andrew
- family-names: Morgan
given-names: Matthew
- family-names: Flores
given-names: Tia
- family-names: Bimber
given-names: Benjamin N.
- family-names: Haw
given-names: Robin
- family-names: Cain
given-names: Scott
- family-names: Buels
given-names: Robert M.
- family-names: Stein
given-names: Lincoln D.
- family-names: Holmes
given-names: Ian H.
doi: 10.1186/s13059-023-02914-z
issue: 1
journal: Genome Biology
scope: "Please cite this paper for referencing JBrowse 2, the genome browser software utilized on the Swedish Reference Genome Portal."
title: "JBrowse 2: a modular genome browser with views of synteny and structural variation"
type: article
volume: 24
year: 2023
GitHub Events
Total
- Create event: 62
- Commit comment event: 2
- Release event: 6
- Delete event: 54
- Member event: 1
- Issue comment event: 122
- Push event: 402
- Pull request review comment event: 174
- Pull request event: 87
- Pull request review event: 213
Last Year
- Create event: 62
- Commit comment event: 2
- Release event: 6
- Delete event: 54
- Member event: 1
- Issue comment event: 122
- Push event: 402
- Pull request review comment event: 174
- Pull request event: 87
- Pull request review event: 213
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 1
- Total pull requests: 110
- Average time to close issues: 20 days
- Average time to close pull requests: 6 days
- Total issue authors: 1
- Total pull request authors: 5
- Average comments per issue: 1.0
- Average comments per pull request: 1.84
- Merged pull requests: 97
- Bot issues: 0
- Bot pull requests: 2
Past Year
- Issues: 0
- Pull requests: 71
- Average time to close issues: N/A
- Average time to close pull requests: 8 days
- Issue authors: 0
- Pull request authors: 5
- Average comments per issue: 0
- Average comments per pull request: 2.27
- Merged pull requests: 59
- Bot issues: 0
- Bot pull requests: 1
Top Authors
Issue Authors
- RMCrean (1)
- brinkdp (1)
- apfuentes (1)
Pull Request Authors
- RMCrean (65)
- kwentine (34)
- brinkdp (32)
- apfuentes (3)
- dependabot[bot] (2)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- alpine 3.19.1 build
- nginxinc/nginx-unprivileged alpine build
- actions/checkout v4 composite
- docker/build-push-action v5 composite
- docker/login-action v3 composite
- docker/metadata-action v5 composite
- actions/checkout v4 composite
- actions/setup-node v4 composite
- peaceiris/actions-hugo v3 composite
- actions/checkout v4 composite
- actions/setup-python v5 composite
- actions/checkout v4 composite
- aquasecurity/trivy-action master composite
- github/codeql-action/upload-sarif v3 composite
- pre-commit ==3.7.0
- requests ==2.31.0