eucanscreenwp5_demo_aspire

Demo version for the 2025_02_06 meeting. The tools are shown with an example model and a simple R script.

https://github.com/iacs-biocomp/eucanscreenwp5_demo_aspire

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.6%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Demo version for the 2025_02_06 meeting. The tools are shown with an example model and a simple R script.

Basic Info
  • Host: GitHub
  • Owner: iacs-biocomp
  • License: cc-by-4.0
  • Language: Python
  • Default Branch: main
  • Size: 115 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 2
Created about 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme License Citation

README.md

Logo of the project

This project follows the structure built using the Common Data Model Builder, a tool that allows you to create common data models to facilitate interoperability and reproducibility of the analyses.

Your project title


Outputs

Outputs structure and content is described below including the files and folders that are generated when creating a research project with the cdmb Python library. There are four main folders corresponding to:

  • docs/CDM/
    • cdmb_config.json: Configuration file.
    • cohortdefinitioninclusion.csv: csv file that defines the criteria (i.e., codes) for inclusion in a cohort.
    • cohortdefinitionexclusion.csv: csv file that defines the criteria (i.e., codes) for exclusion in a cohort.
    • common_datamodel.xlsx: The definition of the common data model in Excel format.
    • entities/: Folder structure where, for each defined entity, the catalogs and the established validation rules are stored.
    • ER.gv, ER.gv.png: an Entity-Relationship Diagram of the entities included in the CDM.
    • synthetic-data/: Folder structure contaning an automatically generated set of 1000 synthetic records per entity included en the CDM.
    • hashedfileslist.json: List of the files generated or used after generating the project with their md5 hash. This file must be kept hidden and should be used to cross-check with the results obtained from the analysis from the original input files.
  • inputs/
    • data.duckdb: Database that temporarily contains the data entered by the user (synthetic data by default)
  • outputs/
    • (Default directory of all the outputs produced in the project execution)
  • src/
    • analysis-scripts/
    • (directory where the analysis scripts developed by the user are stored)
    • rreporttemplate.qmd: Quarto document, with an example analysis, showing the interaction with the folder structure and files generated in the project.
    • _quarto.yml: File containing the Metadata to execute Quarto documents.
    • check_load-scripts/
    • check_load.py: Script in charge of the mapping between the files introduced by the user (./inputs) and map them to the defined entities (inputs/data.duckdb). In the loading process, the following checks are performed: Name of the variables match; the format/type of the variables match those established in the configuration.
    • inputs/: Auxiliary folder for the script 'check_load.py'.
    • dqa-scripts/
    • dqa.py: Data Quality Assesment script by default.
    • validation-scripts/
    • validator.py: Script in charge of applying the validation rules to the data.
    • valididator_report.qmd: Quarto document that generates a report in html from the results obtained from 'validator.py'.
    • _quarto.yml: File containing metadata to execute Quarto documents.
  • ro-crate-metadata.json: Accessible and practical formal metadata description for use in a wider variety of situations, from an individual researcher working with a folder of data, to large data-intensive computational research environments. For more information, visit RO-Crate.
  • mancontainerdeployment.md: From Data Science for Health Services and Policy Research group we provide in the following GitHub repository, a solution, for the deployment of the generated project. This step is optional.
  • LICENSE.md: Project license (CC BY 4.0 by default).

Requirements/Dependencies

Note that dependencies may vary depending on user modifications!

R dependencies

Version of Rbase used: 4.1

Version of Quarto used: 1.1.149

| library | version | link | |------------|---------|-----------------------------------------------------------------------------------------| | DuckDB | 0.8.1 | https://duckdb.org/ | | jsonlite | 1.8.7 | https://cran.r-project.org/web/packages/jsonlite/index.html | | kableExtra | 1.3.4 | https://cran.r-project.org/web/packages/kableExtra/vignettes/awesometablein_html.html | | Hmisc | 4.7.1 | https://cran.r-project.org/package=Hmisc |

Python dependencies

Version of Python used: 3.8

| library | version | link | |-----------------|---------|---------------------------------------------------------| | pandas | 1.3.4 | https://pandas.pydata.org/ | | DuckDB | 0.8.1 | https://duckdb.org/ | | ydata_profiling | 4.1.2 | https://ydata-profiling.ydata.ai/docs/master/index.html |

Authoring

| Surname, name | Affiliation | orcid ORCID | |---------------|-------------|------------------------------------------------------------------------------|

Previous version(s):

How to contribute

  • Repository: https://github.com/youruser/yourrepository/
  • Issue tracker: https://github.com/youruser/yourrepository/issues

References

  • Data Science for Health Services and Policy Research group: https://cienciadedatosysalud.org/en/
  • Common Data Model Builder library :https://github.com/cienciadedatosysalud/cdmb
  • Analytic Software Pipeline Interface for Reproducible Execution (ASPIRE): https://github.com/cienciadedatosysalud/ASPIRE
  • Atlas VPM community in Zenodo: https://zenodo.org/communities/atlasvpm
  • Research Object Crate (RO-Crate): https://www.researchobject.org/ro-crate/
  • ORCID: https://orcid.org/

License: CC-BY 4.0

Owner

  • Login: iacs-biocomp
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below:"
title: "eucc_cdm"
version: "0.1.0"
date-released: "2025-02-21" 

url: ""
repository-code: "https://github.com/user/repository"
license: "CC BY 4.0 https://creativecommons.org/licenses/by/4.0/"
    

GitHub Events

Total
  • Release event: 4
  • Delete event: 3
  • Push event: 3
  • Create event: 7
Last Year
  • Release event: 4
  • Delete event: 3
  • Push event: 3
  • Create event: 7

Dependencies

.github/workflows/docker-publish.yml actions
  • actions/checkout v4 composite
  • docker/build-push-action v5 composite
  • docker/login-action v2 composite
  • docker/metadata-action v4 composite
Dockerfile docker
  • ghcr.io/cienciadedatosysalud/aspire latest build