dashboard-queries-biomedical
Biomedical Open Science project
https://github.com/curtin-open-knowledge-initiative/dashboard-queries-biomedical
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.0%) to scientific vocabulary
Repository
Biomedical Open Science project
Basic Info
- Host: GitHub
- Owner: Curtin-Open-Knowledge-Initiative
- License: apache-2.0
- Language: Jinja
- Default Branch: main
- Size: 1.89 MB
Statistics
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 3
Metadata Files
README.md
Workflows for Biomedical Open Science Dashboards
Data Workflows for Open Science Dashboards for Biomedical Research Organizations
Contacts
coki@curtin.edu.au
Rebecca Handcock, Kathryn Napier, Cameron Neylon
Description
Our objective is to create a digital tool that can automatically curate information, thus providing an audit, about Open Science research practices. Specifically, we have designed and implemented an automated dashboard that reports Open Science metrics. Our tool has been developed for and is specific to the discipline of biomedicine. The dashboard displays metrics and benchmarks to visualize institutional and individual performance regarding Open Science practices.
This repository contains version controlled SQL queries and processing scripts used to create data for these dashboards.

How to generate data and update these dashboards
The dashboards for this project are created in Google LookerStudio, using data stored in BigQuery and processed with SQL scripts. Additional scripts in 'R' are used to create flowcharts documenting project processes.
Naming conventions for SQL scripts
There are 4 SQL scripts stored as “Project Queries” within the BigQuery project that need to be run in sequence due to interdependencies between the datasets Filenames contain a version number (eg 1o) which corresponds to a “sprint” of work, and a corresponding Jira ticket. Script filenames also contain a suffix of “YYYYMMMDD”, which is incremented whenever a new version of the script is made. At the end of the sprint, all intermediate copies will be deleted, leaving only the final copies of the scripts to represent the sprint.
Steps to run these scripts and update data on the dashboards - this assumes that the raw data has already been uploaded into BigQuery.
Step 1 - Backup existing dashboard
If making a new dashboard version, export the dashboard as a PDF and upload to the GoogleDrive folder, and increment the LookerStudio dashboard version at the bottom of the FAQ page.
Step 2A - Run SQL for PubMed data
This SQL script takes a data extract from the Academic Observatory of Crossref and PubMed data and creates a combined list of Clinical Trials from these. The reason for creating this data extract is that the data is reused downstream in the workflow, so it makes sense to create it once and re-use it. This SQL script is listed as being required to be run first, but if the data already exists on disc then it does not need to be re-run. If the script does need to be run to make changes or for some other reason, then do the following steps:
In BigQuery, make a copy of the most recent SQL script and save it as a ‘Project’ query. Increment the naming to reflect the current sprint and creation date, eg
neuro_ver1p_query1_alltrials_2024_05_17Set variables at the top of the SQL script. These variables are not used to select data locations, but are 'text tags' that will be added as fields in the output file:
- varSQLscript_name: name of the SQL script, eg
neuro_ver1p_query1_alltrials_2024_05_17 - varSQLyear_cutoff: earliest year that data is extracted from the Academic Observatory, eg 2000, or use 1 for all data
- varAcademicObservatorydoi: name of the DOI table version used, eg
doi20240512
- varSQLscript_name: name of the SQL script, eg
In the script, make sure that you are happy with the version of the BigQuery input dataset, as these may have been updated since the last time that the script was run:
- Academic Observatory - the version of the Academic Observatory DOI table, eg
doi20240512
- Academic Observatory - the version of the Academic Observatory DOI table, eg
Check that the output table has a similar naming convention to the script name, eg:
neuro_ver1p_query1_alltrials_2024_05_17
Make any other changes to the SQL script and save the changes.
Run the SQL script.
Note: See information on the overloaded field containing both Clinical Trial Registries and Databanks here
Step 2B - Run SQL for Trial data
This Trial Data query SQL script and should be run second due to dependencies between the files.
In BigQuery, make a copy of the most recent script and save it as a ‘Project’ query. Increment the naming to reflect the current sprint and creation date, eg
neuro_ver1p_query2_trials_2024_05_17Set variables at the top of the SQL script. These variables are not used to select data locations, but are 'text tags' that will be added as fields in the output file:
- varSQLscript_name: name of the SQL script, eg
neuro_ver1p_query2_trials_2024_05_17 - vardatatrials:
PROJ_trials_YYYYMMTable name of clinical trials output from the Charite processing, egtheneuro_trials_20231111 - vardatadois:
ORG_dois_YYYYMMTable name containing DOIs from the partner institution, egtheneuro_dois_20230217
- varSQLscript_name: name of the SQL script, eg
In the script, make sure that you are happy with the versions of the BigQuery input datasets, as these may have been updated since the last time that the script was run:
- Charite Processing by the Project -
PROJ_trials_YYYYMMTable of output from the Charite processing, egtheneuro_trials_20231111 - Publication DOIs -
ORG_dois_YYYYMMTable containing DOIs from the partner institution, egtheneuro_dois_20230217 - Clinical Trial data extract - This table was created in Step 1, eg
neuro_ver1p_query1_alltrials_2024_05_17
- Charite Processing by the Project -
Check that the output table has a similar naming convention to the script name, eg:
OUTPUT_ver1p_query2_trials_2024_05_17
Make any other changes you want to make to the script and save the changes.
Run the script.
Step 2C - Run SQL for Publication data
This is the main dashboard SQL query for The Neuro's publications and should be run third due to dependencies between the files.
In BigQuery, make a copy of the most recent script and save it as a ‘Project’ query. Increment the naming to reflect the current sprint and creation date, eg
neuro_ver1p_query3_pubs_2024_05_17Set variables at the top of the SQL script. These variables are not used to select data locations, but are 'text tags' that will be added as fields in the output file:
- varSQLscript_name: name of the SQL script, eg
neuro_ver1p_query3_pubs_2024_05_17 - vardatadois:
ORG_dois_YYYYMMTable name containing DOIs from the partner institution, egtheneuro_dois_20230217 - vardatatrials:
PROJ_trials_YYYYMMTable name of clinical trials output from the Charite processing, egtheneuro_trials_20231111 - vardataoddpub:
PROJ_oddpub_YYYYMMTable name of output from the Oddpub processing, egtheneuro_oddpub_20231017
- varSQLscript_name: name of the SQL script, eg
In the script, make sure that you are happy with the versions of the BigQuery input datasets, as these may have been updated since the last time that the script was run:
- Publication DOIs -
ORG_dois_YYYYMMTable containing DOIs from the partner institution,theneuro_dois_20230217 - Institution Clinical Trials -
ORG_trials_YYYYMMTable containing list of clinical trials from the partner institution, egtheneuro_trials_20231111 - Oddpub Processing by the Project -
PROJ_oddpub_YYYYMMTable of output from the Oddpub processing, egtheneuro_oddpub_20231017 - Clinical Trial data extract - This table was created in Step 1, eg
neuro_ver1p_query1_alltrials_2024_05_17 - Academic Observatory - the version of the Academic Observatory DOI table, eg
academic-observatory.observatory.doi20240512
- Publication DOIs -
Check that the output table has a similar naming convention to the script name, eg:
university-of-ottawa.neuro_dashboard_data_archive.OUTPUT_ver1p_query3_pubs_2024_05_17
Make any other changes you want to make to the script and save the changes.
Run the script.
Step 2D - Run SQL for ORCID data
The Researcher ORCID Data query SQL script and should be run fourth due to dependencies between the files.
In BigQuery, make a copy of the most recent script and save it as a ‘Project’ query. Increment the naming to reflect the current sprint and creation date, eg
neuro_ver1p_query4_orcid_2024_05_17Set variables at the top of the SQL script. These variables are not used to select data locations, but are 'text tags' that will be added as fields in the output file:
- varSQLscript_name: name of the SQL script, eg
neuro_ver1p_query4_orcid_2024_05_17 - varORCIDDataset_name: name of the table containing list of researcher ORCIDs from the partner institution, eg
theneuro_orcids_20230906 - varoutputtable: name of the output table, eg
OUTPUT_ver1p_query4_orcid_2024_05_17
- varSQLscript_name: name of the SQL script, eg
In the script, make sure that you are happy with the versions of the BigQuery input datasets, as these may have been updated since the last time that the script was run:
- The contributed Researcher ORCID data
- Researcher ORCIDs -
ORG_orcid_YYYYMMTable containing list of researcher ORCIDs from the partner institution, egtheneuro_orcids_20230906
Check that the output table has a similar naming convention to the script name, eg:
university-of-ottawa.neuro_dashboard_data_archive.OUTPUT_ver1p_query4_orcid_2024_05_17
Make any other changes you want to make to the script and save the changes.
Run the script.
Step 3 - Update the BigQuery views with the new data
Edit the following views to point at the tables created in Steps 2-4”. Do this by opening the view, going to the ‘Details’ tab, and clicking ‘Edit Query’. In the tab that opens, edit the query and select ‘Save View’. Back in the view, click ’Refresh’ at the top right:
university-of-ottawa.neuro_dashboard_data.dashboard_data_trialsuniversity-of-ottawa.neuro_dashboard_data.dashboard_data_pubsuniversity-of-ottawa.neuro_dashboard_data.dashboard_data_orcid
Note: the *PubMed data extract from Step 1 is not used in the dashboard
Step 4 - Update the data connections in Looker Studio
In LookerStudio refresh the data connections to look at the new files:
- Have the dashboard in edit mode and go to “Resource” > “Manage added data sources”
- For each table, refresh the link by going “Edit” then “Edit Connection”
- Check the correct view is still selected (it should not have changed) and click “Reconnect”
- For “Apply connection changes” click “Yes”, then “Done”
Step 5 - QC ...
Check all dashboard pages that everything looks OK.
Step 6 - Refresh the data extract in Google Sheets
Refresh the data extract for the Publications output that is made available the linked Google Sheet, “Data”, “Data Connectors”, “Refresh Options”, “Refresh All”. Copy/paste the main publication SQL into the dashboard page too.
Step 7 - Back-up
Back-up the scripts to Github
Funding
The Biomedical institutional open science dashboard program was supported by a Wellcome Trust Open Research Fund (223828/Z/21/Z). The results of the Delphi in step one were published in PloS Biology.
Please contact, Dr. Kelly Cobey (kcobey@ottawaheart.ca), the Primary Investigator, with additional questions.
Internal project resources
The following sites require authentication to access
Owner
- Name: Curtin Open Knowledge Initiative
- Login: Curtin-Open-Knowledge-Initiative
- Kind: organization
- Website: http://openknowledge.community
- Repositories: 6
- Profile: https://github.com/Curtin-Open-Knowledge-Initiative
Code and resources from the Curtin Open Knowledge Initiative
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "Workflows for Biomedical Open Science Dashboards"
url: "https://github.com/Curtin-Open-Knowledge-Initiative/dashboard-queries-biomedical"
doi: 10.5281/zenodo.11648786
license: Apache-2.0 License
authors:
- given-names: "Rebecca N"
family-names: "Handcock"
- given-names: "Kathryn R"
family-names: "Napier"
- given-names: "Cameron"
family-names: "Neylon"
GitHub Events
Total
- Watch event: 1
- Delete event: 7
- Push event: 50
- Pull request review event: 1
- Pull request event: 11
- Create event: 6
- Commit comment event: 1
Last Year
- Watch event: 1
- Delete event: 7
- Push event: 50
- Pull request review event: 1
- Pull request event: 11
- Create event: 6
- Commit comment event: 1
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 0
- Total pull requests: 5
- Average time to close issues: N/A
- Average time to close pull requests: 8 days
- Total issue authors: 0
- Total pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 5
- Average time to close issues: N/A
- Average time to close pull requests: 8 days
- Issue authors: 0
- Pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- bechandcock (3)
- keegansmith21 (2)