MasterThesis

Master Thesis

https://github.com/laurajochim/MasterThesis

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (4.7%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Master Thesis

Basic Info

Host: GitHub
Owner: laurajochim
License: gpl-3.0
Language: R
Default Branch: main
Homepage:
Size: 36.8 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 1

Created about 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

README.md

Methodology for generating and evaluating realistic synthetic healthcare data

This repository contains the code, data, and documentation for my Master's Thesis on generating and evaluating realistic synthetic healthcare datasets. It demonstrates a reproducible pipeline for:

Data Cleaning — Preparing clinical (EHR) and proteomic (TCGA) datasets.

Synthetic Data Generation — Creating synthetic patient records using multiple methods (synthpop, vine copula, ctGAN).

Evaluation — Comparing univariate distributions and other metrics between real and synthetic data.

The was approved by the ethics committee of the faculty of social and Behavioural Sciences of the University of Utrecht:

FETC: 24-2032

This archive can be accessed via GitHub for an unlimited amount of time. I am responsible for the research archive. If there are any questions, feel free to contact me via: l.jochim@students.uu.nl

Owner

Login: laurajochim
Kind: user
Location: Netherlands

Repositories: 1
Profile: https://github.com/laurajochim

Citation (CITATION.cff)

cff-version: 1.1.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Jochim
  given-names: Laura
orcid: https://orcid.org/0009-0001-4577-4152
title:laurajochim/Methodology-for-generating-and-evaluating-realistic-synthetic-healthcare-data: First thesis release
version: v0.1.0
date-released: 2025-05-10

GitHub Events

Total

Last Year

Dependencies

environment.yml pypi

absl-py ==2.1.0
astunparse ==1.6.3
boto3 ==1.37.18
botocore ==1.37.18
contourpy ==1.3.1
cycler ==0.12.1
faker ==35.2.0
flatbuffers ==25.1.24
fonttools ==4.56.0
fsspec ==2024.12.0
gast ==0.6.0
google-pasta ==0.2.0
grpcio ==1.70.0
h5py ==3.12.1
keras ==3.8.0
kiwisolver ==1.4.8
libclang ==18.1.1
markdown ==3.7
markdown-it-py ==3.0.0
matplotlib ==3.10.0
mdurl ==0.1.2
ml-dtypes ==0.4.1
namex ==0.0.8
narwhals ==1.26.0
numpy ==2.2.4
opt-einsum ==3.4.0
optree ==0.14.0
plotly ==6.0.0
protobuf ==5.29.3
pyparsing ==3.2.1
rich ==13.9.4
seaborn ==0.13.2
sympy ==1.13.1
tensorboard ==2.18.0
tensorboard-data-server ==0.7.2
termcolor ==2.5.0
threadpoolctl ==3.5.0
tzdata ==2025.1
werkzeug ==3.1.3
wrapt ==1.17.2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science