https://github.com/australianbiocommons/gen3schemadev
Gen3 Schema Development tools
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.5%) to scientific vocabulary
Keywords
Repository
Gen3 Schema Development tools
Basic Info
Statistics
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 1
- Releases: 1
Topics
Metadata Files
Readme.md
Tools for Gen3 Data Dictionary Development
This repository facilitates Gen3 data modeling using Google Sheets. It includes tools to convert Google Sheets into YAML files and then into a bundled JSON format. Additionally, it offers tools for schema validation and local data model visualization.
Setup
1. Set up environment
bash
git clone --recurse-submodules "https://github.com/AustralianBioCommons/gen3schemadev.git"
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pip install -e .
cd acdc-tools && pip install -e .
2. Install Docker
To install Docker Desktop, download it from the Docker website and follow the installation instructions for your operating system. After installation, verify by running docker --version in the terminal.
3. Spin up containers
bash
cd umccr-dictionary
make down
make pull
make up
make ps
cd ..
Usage
You can run using the Schema Development Framework Notebook or by following the usage below.
Alternatively you can run the script:
bash
bash scripts/generate_schema.sh --help
bash scripts/generate_schema.sh
1. Pull Data Schema from Google Sheets
This step involves pulling the schema design from a Google Sheet template. The template can be accessed here. Feel free to duplicate this spreadsheet and input your own google sheet id along with the tab ids for objects, links, properties, and enums.
bash
[ -d "schema_out" ] && rm -rf "schema_out"
python3 sheet2yaml-CLI.py --google-id '1zjDBDvXgb0ydswFBwy47r2c8V1TFnpUj1jcG0xsY7ZI' --objects-gid 0 --links-gid 270346573 --properties-gid 613332252 --enums-gid 1807456496
2. Move Schema Output
Move the generated schema files to the umccr-dictionary directory:
bash
mkdir -p umccr-dictionary/dictionary/schema_dev/gdcdictionary/schemas
cp schema_out/* umccr-dictionary/dictionary/schema_dev/gdcdictionary/schemas/
ls -lsha umccr-dictionary/dictionary/schema_dev/gdcdictionary/schemas/
3. Compile and Bundle into JSON
Compile and bundle the schema into a JSON format:
bash
cd umccr-dictionary && make compile program=schema_dev
4. Run Validation
Validate the compiled schema:
bash
cd umccr-dictionary && make validate program=schema_dev
5. Visualize Data Dictionary
Open the data dictionary visualization in your web browser:
bash
open http://localhost:8080/#schema/schema_dev.json
6. View Outputs
After running the script, you can view the generated outputs in the output folder. This folder contains the schema files pulled from the Google Sheets. You can access this folder directly in your file system or use the following command to open it:
Owner
- Name: AustralianBioCommons
- Login: AustralianBioCommons
- Kind: organization
- Email: systems@biocommons.org.au
- Website: https://www.biocommons.org.au/
- Repositories: 17
- Profile: https://github.com/AustralianBioCommons
Documentation for the development, deployment and/or optimisation of key community-endorsed bioinformatics tools and workflows
GitHub Events
Total
- Create event: 3
- Issues event: 5
- Release event: 1
- Delete event: 4
- Issue comment event: 1
- Push event: 37
- Fork event: 2
Last Year
- Create event: 3
- Issues event: 5
- Release event: 1
- Delete event: 4
- Issue comment event: 1
- Push event: 37
- Fork event: 2
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 3
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 3
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- JoshuaHarris391 (7)
Pull Request Authors
- JoshuaHarris391 (8)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- PyYAML *
- argparse *
- boto3 *
- dictionaryutils *
- ete3 *
- gen3 *
- jsonschema *
- ldap3 *
- matplotlib *
- networkx *
- numpy *
- openpyxl *
- oyaml *
- pandas *
- requests *
- setuptools *
- gen3 *