https://github.com/clevelandclinicqhs/neocare_documentation

https://github.com/clevelandclinicqhs/neocare_documentation

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (5.4%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: ClevelandClinicQHS
  • Default Branch: master
  • Size: 194 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 1
  • Releases: 0
Created over 6 years ago · Last pushed over 5 years ago
Metadata Files
Readme

README.md

Data_Dictionary.xlsx is the data dictionary for the NEOCARE database.

Each **.docx* file here annotates a portion of the process of setting up the NEOCARE registry. These general steps are as follows:

  1. Initial raw EHR data extraction at CCF and MH.

This is not documented anywhere in any of these files, but the queries employed and other details can be obtained from certain members of the NEOCARE team.

  1. Deduplication

This is the word we used for determining which CCF and MH data correspond to the same individual so that we could link these patients' data across the institutions.

deduplicationccfside.docx and deduplicationmhside.docx can be taken together as explaining this multistage undertaking. They describe the actions taken on CCF's and MH's systems, respectively.

  1. CCF-side data management

CCFcohortcreation.docx describes the cleaning of the raw CCF data and the subsequent creation of tables and views that contain only NEOCARE cohort data dated during 1999-2017. Since all the raw CCF data is on the Teradata database anyway, most of this is handled via SQL queries that manipulate existing tables and views.

  1. MH-side data management

Handling the MH data is more involved since it originated at MH and needed to be transferred to CCF and then uploaded to Teradata.

mhdataprep.docx describes the preparation of MH data for transfer to CCF. All MH data were indexed by the arbitrary study_id that was assigned at the end of deduplication (see step 2 above), and all other identifiers were removed prior to transfer, except for basic demographic information that was only in the designated "demographics" files.

mhdeathdata_prep.docx describes the preparation of MH death data for transfer to CCF. It is specially treated since it involves a matching protocol on personal identifiers.

MHcohortcreation.docx describes secondary manipulation of MH data taking place on the CCF servers after its transfer from MH, followed by the uploading of the data to Teradata. Like CCFcohortcreation.docx (see step 3 above) it results in tables and views that contain only NEOCARE cohort data during 1999-2017.

  1. Harmonization

This is the word we used for combining corresponding CCF data and MH data into single tables (e.g., combining CCF's medications data and MH's medications data into a single "NEOCARE_COHORT" medications view).

NEOCARE_harmonization.docx describes this process. Since all the source data are on Teradata, most of this is handled via SQL queries that manipulate existing tables and views.

Owner

  • Name: Cleveland Clinic Quantitative Health Sciences
  • Login: ClevelandClinicQHS
  • Kind: organization
  • Email: QHSGitHubAdmin@ccf.org
  • Location: United States of America

Open-source projects from the Department of Quantitative Health Sciences at Cleveland Clinic

GitHub Events

Total
Last Year