Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (4.7%) to scientific vocabulary
Repository
Master Thesis
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
Methodology for generating and evaluating realistic synthetic healthcare data
This repository contains the code, data, and documentation for my Master's Thesis on generating and evaluating realistic synthetic healthcare datasets. It demonstrates a reproducible pipeline for:
Data Cleaning — Preparing clinical (EHR) and proteomic (TCGA) datasets.
Synthetic Data Generation — Creating synthetic patient records using multiple methods (synthpop, vine copula, ctGAN).
Evaluation — Comparing univariate distributions and other metrics between real and synthetic data.
The was approved by the ethics committee of the faculty of social and Behavioural Sciences of the University of Utrecht:
FETC: 24-2032
This archive can be accessed via GitHub for an unlimited amount of time. I am responsible for the research archive. If there are any questions, feel free to contact me via: l.jochim@students.uu.nl
Owner
- Login: laurajochim
- Kind: user
- Location: Netherlands
- Repositories: 1
- Profile: https://github.com/laurajochim
Citation (CITATION.cff)
cff-version: 1.1.0 message: "If you use this software, please cite it as below." authors: - family-names: Jochim given-names: Laura orcid: https://orcid.org/0009-0001-4577-4152 title:laurajochim/Methodology-for-generating-and-evaluating-realistic-synthetic-healthcare-data: First thesis release version: v0.1.0 date-released: 2025-05-10
GitHub Events
Total
Last Year
Dependencies
- absl-py ==2.1.0
- astunparse ==1.6.3
- boto3 ==1.37.18
- botocore ==1.37.18
- contourpy ==1.3.1
- cycler ==0.12.1
- faker ==35.2.0
- flatbuffers ==25.1.24
- fonttools ==4.56.0
- fsspec ==2024.12.0
- gast ==0.6.0
- google-pasta ==0.2.0
- grpcio ==1.70.0
- h5py ==3.12.1
- keras ==3.8.0
- kiwisolver ==1.4.8
- libclang ==18.1.1
- markdown ==3.7
- markdown-it-py ==3.0.0
- matplotlib ==3.10.0
- mdurl ==0.1.2
- ml-dtypes ==0.4.1
- namex ==0.0.8
- narwhals ==1.26.0
- numpy ==2.2.4
- opt-einsum ==3.4.0
- optree ==0.14.0
- plotly ==6.0.0
- protobuf ==5.29.3
- pyparsing ==3.2.1
- rich ==13.9.4
- seaborn ==0.13.2
- sympy ==1.13.1
- tensorboard ==2.18.0
- tensorboard-data-server ==0.7.2
- termcolor ==2.5.0
- threadpoolctl ==3.5.0
- tzdata ==2025.1
- werkzeug ==3.1.3
- wrapt ==1.17.2