cvddqchecker

CvdDqChecker: A Software Solution for Explainable and Traceable Assessments of Cardiovascular Disease Data Quality

https://github.com/kaistahar/cvddqchecker

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.3%) to scientific vocabulary

Keywords

cardiovascular-diseases data-quality data-quality-assessment
Last synced: 6 months ago · JSON representation ·

Repository

CvdDqChecker: A Software Solution for Explainable and Traceable Assessments of Cardiovascular Disease Data Quality

Basic Info
  • Host: GitHub
  • Owner: KaisTahar
  • License: gpl-3.0
  • Language: R
  • Default Branch: master
  • Homepage:
  • Size: 512 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
cardiovascular-diseases data-quality data-quality-assessment
Created 12 months ago · Last pushed 7 months ago
Metadata Files
Readme License Citation

README.md

CvdDqChecker: A Software Solution for Explainable and Traceable Assessments of Cardiovascular Disease Data Quality

1. Description

This repository provides a set of metrics and harmonized methods for assessing the quality of cardiovascular disease (CVD) data. The developed software CvdDqChecker enables explainable and traceable data quality (DQ) assessments. Specifically, the generated reports provide detailed information to explain the detected DQ issues and help users trace them back to their sources and underlying causes. CvdDqChecker also enables the detection and visualization of plausibility issues based on predefined logical and mathematical rules. To improve usability, CvdDqChecker allows users to specify the DQ rules using spreadsheets. The current version was validated using synthetic and real-world data on CVDs. Exemplary DQ reports and visualizations are available in section 4.

2. Local Execution

To conduct local DQ assessments, please follow the following instructions:

  1. Clone repository and checkout master branch

    • Run the git command: git clone --branch master https://github.com/KaisTahar/cvdDqChecker
  2. Edit the file config.yaml with your local configuration parameters (p1...p7)

    • Define your study name (p1) and organization name (p2)
    • Set the data input path (p3). This parameter specifies which data set should be imported. By default, this path is set as follows:
      dataPath="./data/medData/syntheticData.csv"
    • Define the code list of missing data values (p4) else by default this parameter is defined as follows:
      missingCode: !expr list (NA, "NULL", "")
    • Set the path to the spreadsheet file containing the DQ rules (p5). By default, this path is set as follows:
      rulePath: "./data/refData/dq_rules.xlsx"
    • Set the path to the metadata and semantic annotations (p6). By default, this parameter is set as follows:
      domainMetadata: "./data/refData/domain_metadata.xlsx"
    • Specify the path to the resulting visualizations and DQ reports (p7). By default, the export path is set as follows:
      exportPath: "./data/export"
  3. Once the local configuration parameters are defined, you can run CvdDqChecker using Rstudio or Dockerfile. To avoid local dependency issues, simply execute the command sudo docker-compose up to get the software up and running

  4. As a result,CvdDqChecker generates visualizations of detected outliers and contradictions, as well as an Excel file that contains reports on DQ metrics, detected DQ issues, and used DQ rules. The generated visualizations and DQ reports are saved in the folder ./data/export

3. DQ Metrics and Reports

The data quality library (dqLib) was employed as an R package to report on DQ issues and metrics. dqLib provides multiple metrics to assess different DQ aspects. This library was used to select appropriate indicators and generate specific DQ reports. The following generic indicators were employed in this study:


DQ Indicator DQ Dimension
Abbreviation Name
dqicoicr Item Completeness Rate completeness
dqicovcr Value Completeness Rate
dqiplrpr Range Plausibility Rate Plausibility
dqiplspr Semantic Plausibility Rate


In addition to DQ indicators, the reports include the resulting DQ parameters and provide adequate information to help users address the detected DQ issues. CvdDqChecker enables reporting on the following DQ issues and associated parameters:

| Abbreviation | DQ Parameter | Description | |-----|--------------------------- | ------------| | vo | outlier values | number of detected outlier values | | vc | contradictory values | number of detected contradictory data values | | immisg | missing mandatory data items | number of missing mandatory data items| | vmmisg | missing mandatory data values| number of missing mandatory data values| | patdqiss | patient records with DQ issues| number of patient records with DQ issues|

4. Examples

  • Exemplary reports on detected outliers and contradictions can be found in the ./data/export folder
  • The ./data/export folder also contains exemplary visualizations generated using synthetic data

5. Notes

  • The developed software CvdDqChecker is compatible with dqLib 1.32.1. To install all required packages, please use the script installPackages.R located in the folder ./R or just run the command sudo docker-compose up. This command will install the necessary packages and run the DQ assessment software.

  • To cite CvdDqChecker, please use the citation file CITATION.cff

Owner

  • Login: KaisTahar
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this tool, please cite it as below."
authors:
- family-names: "Tahar"
  given-names: "Kais"
  orcid: "https://orcid.org/0000-0001-9683-0575"
title: "CvdDqChecker: A Set of Metrics and Methods for Harmonized Quality Assessments on Cardiovascular Disease Data"
year: 2025
url: "https://github.com/KaisTahar/cvdDqChecker"

GitHub Events

Total
  • Delete event: 4
  • Push event: 17
  • Public event: 1
  • Create event: 2
Last Year
  • Delete event: 4
  • Push event: 17
  • Public event: 1
  • Create event: 2