https://github.com/alan-turing-institute/cvd-net-wrangle

Data Wrangling pipeline for the CVD-Net project (Network of Cardiovascular Digital Twins)

https://github.com/alan-turing-institute/cvd-net-wrangle

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.1%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Data Wrangling pipeline for the CVD-Net project (Network of Cardiovascular Digital Twins)

Basic Info
  • Host: GitHub
  • Owner: alan-turing-institute
  • License: gpl-3.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 7.32 MB
Statistics
  • Stars: 3
  • Watchers: 6
  • Forks: 0
  • Open Issues: 7
  • Releases: 0
Created over 1 year ago · Last pushed 12 months ago
Metadata Files
Readme License

README.md

CVD-Net Data Wrangling

CVD-Net (Network of Cardiovascular Digital Twins) is a multi-year collaboration between The Alan Turing Institute, Imperial College London, University of Nottingham, and University of Sheffield. Its aim is to create digital twins of the hearts of a group of patients suffering from pulmonary arterial hypertension and to demonstrate their use in a clinical care pathway.

This work is supported by the UK Engineering and Physical Sciences Research Council (EPSRC) Grant EP/Z531297/1.

This repository contains outputs from Work Package 2 of 6 ("Digital Tapestry and Infrastructure.")

This repository is a work in progress, currently containing a draft data wrangling pipeline. We are sharing this code base early to promote collaboration and enhance transparency.

Navigate the directories

  • dummy data: Data dictionaries for ASPIRE and FIT-PH (created with our best guess of what the variables are - awaiting confirmation from dataset provider), and scripts to generate dummy datasets for both.
  • headers: Blank header files for ASPIRE and FIT-PH showing the structure of the raw data.
  • pipeline: The codebase for the data wrangling pipeline. This contains a PostgreSQL database schema and a Python codebase for validating data, loading data and inserting it into the database.
    • Extensive documentation gives a detailed overview of how the pipeline works, how it is run, and why certain decisions were made.
    • An interactive tutorial notebook shows how the pipeline is run from start to finish.

These files are currently blank (but are needed for the pipeline to run): - aspiredtdataframeheaders.csv - fitphlinqcmdataframeheaders.csv - ASPIREdictionaryto_template.csv - FIT-PHdictionaryto_template.csv - generateASPIREdummy_data.py - generateFIT-PHdummy_data.py - transformrawdata.py

Contributors

Past Contributors

  • Daniel Delbarre
    • Led the planning and implementation of the first working pipeline based on dummy data (ASPIRE and FIT-PH), thoroughly documenting the decisions taken to make this code base and how to use it. Planned the pipeline so that further datasets could be included at later stages (once transformed to the harmonised format). Considered both database developers and users (researchers using this data for their model development).

Current Contributors

  • Rachael Stickland (until 6th March 2025)
  • May Young
  • Mahwish Mohammad
  • Luis Santos

Contact

For any questions contact Camila Rangel Smith (crangelsmith@turing.ac.uk) and/or Mahwish Mohammad (mmohammad@turing.ac.uk), or feel free to submit a GitHub Issue.

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
For more information, refer to GNU General Public License.

Owner

  • Name: The Alan Turing Institute
  • Login: alan-turing-institute
  • Kind: organization
  • Email: info@turing.ac.uk

The UK's national institute for data science and artificial intelligence.

GitHub Events

Total
  • Issues event: 13
  • Watch event: 3
  • Issue comment event: 20
  • Member event: 2
  • Push event: 18
  • Pull request event: 2
  • Create event: 5
Last Year
  • Issues event: 13
  • Watch event: 3
  • Issue comment event: 20
  • Member event: 2
  • Push event: 18
  • Pull request event: 2
  • Create event: 5

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 10
  • Total pull requests: 2
  • Average time to close issues: 28 days
  • Average time to close pull requests: 1 minute
  • Total issue authors: 4
  • Total pull request authors: 1
  • Average comments per issue: 0.4
  • Average comments per pull request: 1.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 10
  • Pull requests: 2
  • Average time to close issues: 28 days
  • Average time to close pull requests: 1 minute
  • Issue authors: 4
  • Pull request authors: 1
  • Average comments per issue: 0.4
  • Average comments per pull request: 1.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • Rainiefantasy (4)
  • myyong (4)
  • H-Sax (1)
  • LevanBokeria (1)
Pull Request Authors
  • RayStick (2)
Top Labels
Issue Labels
question (2) documentation (2) communication (1)
Pull Request Labels