dashboard-queries-biomedical

Biomedical Open Science project

https://github.com/curtin-open-knowledge-initiative/dashboard-queries-biomedical

Last synced: 6 months ago · JSON representation ·

Repository

Biomedical Open Science project

Basic Info

Host: GitHub
Owner: Curtin-Open-Knowledge-Initiative
License: apache-2.0
Language: Jinja
Default Branch: main
Size: 1.89 MB

Statistics

Stars: 2
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 3

Created almost 3 years ago · Last pushed 6 months ago

Metadata Files

Readme License Code of conduct Citation Zenodo

Workflows for Biomedical Open Science Dashboards

Data Workflows for Open Science Dashboards for Biomedical Research Organizations

Contacts

coki@curtin.edu.au

Rebecca Handcock, Kathryn Napier, Cameron Neylon

Description

Our objective is to create a digital tool that can automatically curate information, thus providing an audit, about Open Science research practices. Specifically, we have designed and implemented an automated dashboard that reports Open Science metrics. Our tool has been developed for and is specific to the discipline of biomedicine. The dashboard displays metrics and benchmarks to visualize institutional and individual performance regarding Open Science practices.

This repository contains version controlled SQL queries and processing scripts used to create data for these dashboards.

Alt text

How to generate data and update these dashboards

The dashboards for this project are created in Google LookerStudio, using data stored in BigQuery and processed with SQL scripts. Additional scripts in 'R' are used to create flowcharts documenting project processes.

Naming conventions for SQL scripts

There are 4 SQL scripts stored as “Project Queries” within the BigQuery project that need to be run in sequence due to interdependencies between the datasets Filenames contain a version number (eg 1o) which corresponds to a “sprint” of work, and a corresponding Jira ticket. Script filenames also contain a suffix of “YYYYMMMDD”, which is incremented whenever a new version of the script is made. At the end of the sprint, all intermediate copies will be deleted, leaving only the final copies of the scripts to represent the sprint.

Steps to run these scripts and update data on the dashboards - this assumes that the raw data has already been uploaded into BigQuery.

Step 1 - Backup existing dashboard

If making a new dashboard version, export the dashboard as a PDF and upload to the GoogleDrive folder, and increment the LookerStudio dashboard version at the bottom of the FAQ page.

Step 2A - Run SQL for PubMed data

This SQL script takes a data extract from the Academic Observatory of Crossref and PubMed data and creates a combined list of Clinical Trials from these. The reason for creating this data extract is that the data is reused downstream in the workflow, so it makes sense to create it once and re-use it. This SQL script is listed as being required to be run first, but if the data already exists on disc then it does not need to be re-run. If the script does need to be run to make changes or for some other reason, then do the following steps:

In BigQuery, make a copy of the most recent SQL script and save it as a ‘Project’ query. Increment the naming to reflect the current sprint and creation date, eg neuro_ver1p_query1_alltrials_2024_05_17
Set variables at the top of the SQL script. These variables are not used to select data locations, but are 'text tags' that will be added as fields in the output file:
- varSQLscript_name: name of the SQL script, eg neuro_ver1p_query1_alltrials_2024_05_17
- varSQLyear_cutoff: earliest year that data is extracted from the Academic Observatory, eg 2000, or use 1 for all data
- varAcademicObservatorydoi: name of the DOI table version used, eg doi20240512
In the script, make sure that you are happy with the version of the BigQuery input dataset, as these may have been updated since the last time that the script was run:
- Academic Observatory - the version of the Academic Observatory DOI table, eg doi20240512
Check that the output table has a similar naming convention to the script name, eg:
- neuro_ver1p_query1_alltrials_2024_05_17
Make any other changes to the SQL script and save the changes.
Run the SQL script.

Note: See information on the overloaded field containing both Clinical Trial Registries and Databanks here

Step 2B - Run SQL for Trial data

This Trial Data query SQL script and should be run second due to dependencies between the files.

In BigQuery, make a copy of the most recent script and save it as a ‘Project’ query. Increment the naming to reflect the current sprint and creation date, eg neuro_ver1p_query2_trials_2024_05_17
Set variables at the top of the SQL script. These variables are not used to select data locations, but are 'text tags' that will be added as fields in the output file:
- varSQLscript_name: name of the SQL script, eg neuro_ver1p_query2_trials_2024_05_17
- vardatatrials: PROJ_trials_YYYYMM Table name of clinical trials output from the Charite processing, eg theneuro_trials_20231111
- vardatadois: ORG_dois_YYYYMM Table name containing DOIs from the partner institution, eg theneuro_dois_20230217
In the script, make sure that you are happy with the versions of the BigQuery input datasets, as these may have been updated since the last time that the script was run:
- Charite Processing by the Project - PROJ_trials_YYYYMM Table of output from the Charite processing, eg theneuro_trials_20231111
- Publication DOIs - ORG_dois_YYYYMM Table containing DOIs from the partner institution, eg theneuro_dois_20230217
- Clinical Trial data extract - This table was created in Step 1, eg neuro_ver1p_query1_alltrials_2024_05_17
Check that the output table has a similar naming convention to the script name, eg:
- OUTPUT_ver1p_query2_trials_2024_05_17
Make any other changes you want to make to the script and save the changes.
Run the script.

Step 2C - Run SQL for Publication data

This is the main dashboard SQL query for The Neuro's publications and should be run third due to dependencies between the files.

In BigQuery, make a copy of the most recent script and save it as a ‘Project’ query. Increment the naming to reflect the current sprint and creation date, eg neuro_ver1p_query3_pubs_2024_05_17
Set variables at the top of the SQL script. These variables are not used to select data locations, but are 'text tags' that will be added as fields in the output file:
- varSQLscript_name: name of the SQL script, eg neuro_ver1p_query3_pubs_2024_05_17
- vardatadois: ORG_dois_YYYYMM Table name containing DOIs from the partner institution, eg theneuro_dois_20230217
- vardatatrials: PROJ_trials_YYYYMM Table name of clinical trials output from the Charite processing, eg theneuro_trials_20231111
- vardataoddpub: PROJ_oddpub_YYYYMM Table name of output from the Oddpub processing, eg theneuro_oddpub_20231017
In the script, make sure that you are happy with the versions of the BigQuery input datasets, as these may have been updated since the last time that the script was run:
- Publication DOIs - ORG_dois_YYYYMM Table containing DOIs from the partner institution, theneuro_dois_20230217
- Institution Clinical Trials - ORG_trials_YYYYMM Table containing list of clinical trials from the partner institution, eg theneuro_trials_20231111
- Oddpub Processing by the Project - PROJ_oddpub_YYYYMM Table of output from the Oddpub processing, eg theneuro_oddpub_20231017
- Clinical Trial data extract - This table was created in Step 1, eg neuro_ver1p_query1_alltrials_2024_05_17
- Academic Observatory - the version of the Academic Observatory DOI table, eg academic-observatory.observatory.doi20240512
Check that the output table has a similar naming convention to the script name, eg:
- university-of-ottawa.neuro_dashboard_data_archive.OUTPUT_ver1p_query3_pubs_2024_05_17
Make any other changes you want to make to the script and save the changes.
Run the script.

Step 2D - Run SQL for ORCID data

The Researcher ORCID Data query SQL script and should be run fourth due to dependencies between the files.

In BigQuery, make a copy of the most recent script and save it as a ‘Project’ query. Increment the naming to reflect the current sprint and creation date, eg neuro_ver1p_query4_orcid_2024_05_17
Set variables at the top of the SQL script. These variables are not used to select data locations, but are 'text tags' that will be added as fields in the output file:
- varSQLscript_name: name of the SQL script, eg neuro_ver1p_query4_orcid_2024_05_17
- varORCIDDataset_name: name of the table containing list of researcher ORCIDs from the partner institution, eg theneuro_orcids_20230906
- varoutputtable: name of the output table, eg OUTPUT_ver1p_query4_orcid_2024_05_17
In the script, make sure that you are happy with the versions of the BigQuery input datasets, as these may have been updated since the last time that the script was run:
- The contributed Researcher ORCID data
- Researcher ORCIDs - ORG_orcid_YYYYMM Table containing list of researcher ORCIDs from the partner institution, eg theneuro_orcids_20230906
Check that the output table has a similar naming convention to the script name, eg:
- university-of-ottawa.neuro_dashboard_data_archive.OUTPUT_ver1p_query4_orcid_2024_05_17
Make any other changes you want to make to the script and save the changes.
Run the script.

Step 3 - Update the BigQuery views with the new data

Edit the following views to point at the tables created in Steps 2-4”. Do this by opening the view, going to the ‘Details’ tab, and clicking ‘Edit Query’. In the tab that opens, edit the query and select ‘Save View’. Back in the view, click ’Refresh’ at the top right:

university-of-ottawa.neuro_dashboard_data.dashboard_data_trials
university-of-ottawa.neuro_dashboard_data.dashboard_data_pubs
university-of-ottawa.neuro_dashboard_data.dashboard_data_orcid

Note: the *PubMed data extract from Step 1 is not used in the dashboard

Step 4 - Update the data connections in Looker Studio

In LookerStudio refresh the data connections to look at the new files:

Have the dashboard in edit mode and go to “Resource” > “Manage added data sources”
For each table, refresh the link by going “Edit” then “Edit Connection”
Check the correct view is still selected (it should not have changed) and click “Reconnect”
For “Apply connection changes” click “Yes”, then “Done”

Step 5 - QC ...

Check all dashboard pages that everything looks OK.

Step 6 - Refresh the data extract in Google Sheets

Refresh the data extract for the Publications output that is made available the linked Google Sheet, “Data”, “Data Connectors”, “Refresh Options”, “Refresh All”. Copy/paste the main publication SQL into the dashboard page too.

Step 7 - Back-up

Back-up the scripts to Github

Funding

The Biomedical institutional open science dashboard program was supported by a Wellcome Trust Open Research Fund (223828/Z/21/Z). The results of the Delphi in step one were published in PloS Biology.

Please contact, Dr. Kelly Cobey (kcobey@ottawaheart.ca), the Primary Investigator, with additional questions.

Internal project resources

The following sites require authentication to access

Jira ticket and GoogleDrive folder

Owner

Name: Curtin Open Knowledge Initiative
Login: Curtin-Open-Knowledge-Initiative
Kind: organization

Website: http://openknowledge.community
Repositories: 6
Profile: https://github.com/Curtin-Open-Knowledge-Initiative

Code and resources from the Curtin Open Knowledge Initiative

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "Workflows for Biomedical Open Science Dashboards"
url: "https://github.com/Curtin-Open-Knowledge-Initiative/dashboard-queries-biomedical"
doi: 10.5281/zenodo.11648786
license: Apache-2.0 License
authors:
  - given-names: "Rebecca N"
    family-names: "Handcock"
  - given-names: "Kathryn R"
    family-names: "Napier"
  - given-names: "Cameron"
    family-names: "Neylon"

GitHub Events

Total

Watch event: 1
Delete event: 7
Push event: 50
Pull request review event: 1
Pull request event: 11
Create event: 6
Commit comment event: 1

Last Year

Watch event: 1
Delete event: 7
Push event: 50
Pull request review event: 1
Pull request event: 11
Create event: 6
Commit comment event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 0
Total pull requests: 5
Average time to close issues: N/A
Average time to close pull requests: 8 days
Total issue authors: 0
Total pull request authors: 2
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 4
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 5
Average time to close issues: N/A
Average time to close pull requests: 8 days
Issue authors: 0
Pull request authors: 2
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 4
Bot issues: 0
Bot pull requests: 0