MasterThesis

Master Thesis

https://github.com/laurajochim/MasterThesis

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (4.7%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Master Thesis

Basic Info
  • Host: GitHub
  • Owner: laurajochim
  • License: gpl-3.0
  • Language: R
  • Default Branch: main
  • Homepage:
  • Size: 36.8 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created about 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme License Citation

README.md

Methodology for generating and evaluating realistic synthetic healthcare data

This repository contains the code, data, and documentation for my Master's Thesis on generating and evaluating realistic synthetic healthcare datasets. It demonstrates a reproducible pipeline for:

Data Cleaning — Preparing clinical (EHR) and proteomic (TCGA) datasets.

Synthetic Data Generation — Creating synthetic patient records using multiple methods (synthpop, vine copula, ctGAN).

Evaluation — Comparing univariate distributions and other metrics between real and synthetic data.

The was approved by the ethics committee of the faculty of social and Behavioural Sciences of the University of Utrecht:

FETC: 24-2032

This archive can be accessed via GitHub for an unlimited amount of time. I am responsible for the research archive. If there are any questions, feel free to contact me via: l.jochim@students.uu.nl

Owner

  • Login: laurajochim
  • Kind: user
  • Location: Netherlands

Citation (CITATION.cff)

cff-version: 1.1.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Jochim
  given-names: Laura
orcid: https://orcid.org/0009-0001-4577-4152
title:laurajochim/Methodology-for-generating-and-evaluating-realistic-synthetic-healthcare-data: First thesis release
version: v0.1.0
date-released: 2025-05-10

GitHub Events

Total
Last Year

Dependencies

environment.yml pypi
  • absl-py ==2.1.0
  • astunparse ==1.6.3
  • boto3 ==1.37.18
  • botocore ==1.37.18
  • contourpy ==1.3.1
  • cycler ==0.12.1
  • faker ==35.2.0
  • flatbuffers ==25.1.24
  • fonttools ==4.56.0
  • fsspec ==2024.12.0
  • gast ==0.6.0
  • google-pasta ==0.2.0
  • grpcio ==1.70.0
  • h5py ==3.12.1
  • keras ==3.8.0
  • kiwisolver ==1.4.8
  • libclang ==18.1.1
  • markdown ==3.7
  • markdown-it-py ==3.0.0
  • matplotlib ==3.10.0
  • mdurl ==0.1.2
  • ml-dtypes ==0.4.1
  • namex ==0.0.8
  • narwhals ==1.26.0
  • numpy ==2.2.4
  • opt-einsum ==3.4.0
  • optree ==0.14.0
  • plotly ==6.0.0
  • protobuf ==5.29.3
  • pyparsing ==3.2.1
  • rich ==13.9.4
  • seaborn ==0.13.2
  • sympy ==1.13.1
  • tensorboard ==2.18.0
  • tensorboard-data-server ==0.7.2
  • termcolor ==2.5.0
  • threadpoolctl ==3.5.0
  • tzdata ==2025.1
  • werkzeug ==3.1.3
  • wrapt ==1.17.2