PhenotypeR

https://github.com/ohdsi/phenotyper

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (16.2%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: OHDSI
License: apache-2.0
Language: R
Default Branch: main
Homepage: https://ohdsi.github.io/PhenotypeR/
Size: 32.6 MB

Statistics

Stars: 5
Watchers: 5
Forks: 1
Open Issues: 25
Releases: 7

Created almost 2 years ago · Last pushed 10 months ago

Metadata Files

Readme License

README.Rmd

---
output: github_document
editor_options: 
  markdown: 
    wrap: 72
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE, warn = FALSE, message = FALSE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# PhenotypeR 



[![CRAN
status](https://www.r-pkg.org/badges/version/PhenotypeR)](https://CRAN.R-project.org/package=PhenotypeR)
[![R-CMD-check](https://github.com/ohdsi/PhenotypeR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/ohdsi/PhenotypeR/actions/workflows/R-CMD-check.yaml)
[![Lifecycle:Experimental](https://img.shields.io/badge/Lifecycle-Experimental-339999)](https://lifecycle.r-lib.org/articles/stages.html#experimental)



The PhenotypeR package helps us to assess the research-readiness of a
set of cohorts we have defined. This assessment includes:

-   ***Database diagnostics*** which help us to better understand the
    database in which they have been created. This includes information
    about the size of the data, the time period covered, the number of
    people in the data as a whole. More granular information that may
    influence analytic decisions, such as the number of observation
    periods per person, is also described.\
-   ***Codelist diagnostics*** which help to answer questions like what
    concepts from our codelist are used in the database? What concepts
    were present led to individuals' entry in the cohort? Are there any
    concepts being used in the database that we didn't include in our
    codelist but maybe we should have?\
-   ***Cohort diagnostics*** which help to answer questions like how
    many individuals did we include in our cohort and how many were
    excluded because of our inclusion criteria? If we have multiple
    cohorts, is there overlap between them and when do people enter one
    cohort relative to another? What is the incidence of cohort entry
    and what is the prevalence of the cohort in the database? It can also
    compare our study cohorts to the general population by matching people
    with similar age and sex.\
-   ***Population diagnostics*** which estimates the frequency of our
    study cohorts in the database in terms of their incidence rates and
    prevalence.

## Installation

You can install PhenotypeR from CRAN:

```{r, eval = FALSE}
install.packages("PhenotypeR")
```

Or you can install the development version from GitHub:

```{r, eval = FALSE}
# install.packages("remotes")
remotes::install_github("OHDSI/PhenotypeR")
```

## Example usage

To illustrate the functionality of PhenotypeR, let's create a cohort
using the Eunomia Synpuf dataset. We'll first load the required packages and
create the cdm reference for the data.

```{r, message=FALSE, warning=FALSE}
library(dplyr)
library(CohortConstructor)
library(PhenotypeR)
library(CodelistGenerator)
library(duckdb)
library(CDMConnector)
library(DBI)
```

```{r, message=FALSE, warning=FALSE}
# Connect to the database and create the cdm object
con <- dbConnect(duckdb(), dbdir = eunomiaDir("synpuf-1k", "5.3"))
cdm <- CDMConnector::cdmFromCon(con = con, 
                                cdmName = "Eunomia Synpuf",
                                cdmSchema   = "main",
                                writeSchema = "main",
                                achillesSchema = "main")
```

Note that we've included achilles results in our cdm reference. Where we can we'll use these precomputed counts to speed up our analysis.

```{r, message=TRUE, warning=FALSE}
cdm
```

```{r, message=FALSE, warning=FALSE}
# Create a code lists
codes <- list("user_of_warfarin" = c(1310149L, 40163554L),
              "user_of_acetaminophen" = c(1125315L, 1127078L, 1127433L, 40229134L, 40231925L, 40162522L, 19133768L),
              "user_of_morphine" = c(1110410L, 35605858L, 40169988L),
              "measurements_cohort" = c(40660437L, 2617206L, 4034850L,  2617239L, 4098179L))

# Instantiate cohorts with CohortConstructor
cdm$my_cohort <- conceptCohort(cdm = cdm,
                               conceptSet = codes, 
                               exit = "event_end_date",
                               overlap = "merge",
                               name = "my_cohort")
```

We can easily run all the analyses explained above (**database diagnostics**, **codelist diagnostics**, **cohort diagnostics**, and **population diagnostics**) using
`phenotypeDiagnostics()`:

```{r, message = FALSE}
result <- phenotypeDiagnostics(cdm$my_cohort, survival = TRUE)
```

You can also create a table with the expected results, so you can compare later with the actual results.

```{r, message = FALSE}
expectations <- tibble(
  "cohort_name" = c("warfarin", "acetaminophen", "morphine", "measurements_cohort"),
  "estimate" = c("Male percentage", "Survival probability after 5y", "Median age", "Median age"),
  "value" = c("56%", "96%", "57-58", "42-45"),
  "source" = c("A clinician", "A clinician", "A clinician", "A clinician"),
  "diagnostic" = c("cohort_characteristics", "cohort_survival", "cohort_characteristics", "cohort_characteristics") 
)
```
Or alternatively, you can use AI to generate expectations
```{r, message = FALSE}
library(ellmer)
# Notice that you may need to generate an google gemini API with https://aistudio.google.com/app/apikey and add it to your R environment:
# usethis::edit_r_environ()
# GEMINI_API_KEY = "your API"

chat <- chat("google_gemini")

expectations <- getCohortExpectations(chat = chat, 
                      phenotypes = result)
```

Once we have our results we can quickly view them in an interactive
application. Here we'll apply a minimum cell count of 10 to our results and save our shiny app to a temporary directory.

```{r, eval=FALSE}
shinyDiagnostics(result = result, minCellCount = 2, directory = tempdir(), expectations = expectations)
```

See the shiny app generated from the example cohort in
[here](https://dpa-pde-oxford.shinyapps.io/PhenotypeRShiny/).

### More information

To see more details regarding each one of the analyses, please refer to
the package vignettes.

Owner

Name: Observational Health Data Sciences and Informatics
Login: OHDSI
Kind: organization

Website: http://ohdsi.org
Repositories: 285
Profile: https://github.com/OHDSI

GitHub Events

Total

Create event: 138
Release event: 5
Issues event: 269
Watch event: 4
Delete event: 101
Member event: 3
Issue comment event: 77
Push event: 460
Pull request review event: 12
Pull request review comment event: 16
Pull request event: 294
Fork event: 1

Last Year

Create event: 138
Release event: 5
Issues event: 269
Watch event: 4
Delete event: 101
Member event: 3
Issue comment event: 77
Push event: 460
Pull request review event: 12
Pull request review comment event: 16
Pull request event: 294
Fork event: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 177
Total pull requests: 272
Average time to close issues: 22 days
Average time to close pull requests: about 17 hours
Total issue authors: 12
Total pull request authors: 6
Average comments per issue: 0.39
Average comments per pull request: 0.03
Merged pull requests: 228
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 165
Pull requests: 259
Average time to close issues: 19 days
Average time to close pull requests: about 18 hours
Issue authors: 12
Pull request authors: 6
Average comments per issue: 0.39
Average comments per pull request: 0.03
Merged pull requests: 215
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

edward-burn (118)
martaalcalde (37)
catalamarti (10)
xihang-chen (8)
nmercadeb (5)
daniellenewby (4)
daniprietoalhambra (3)
martapineda (2)
wanningwang (2)
elinrow (2)
ablack3 (1)
albertpratsu (1)

Pull Request Authors

edward-burn (191)
martaalcalde (108)
catalamarti (14)
xihang-chen (10)
nmercadeb (9)
cecicampanile (2)
daniellenewby (1)

Top Labels

Issue Labels

enhancement (9) documentation (4) bug (2) needs discussion (1) duplicate (1) good first issue (1)

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- cran 403 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 8
Total maintainers: 1

cran.r-project.org: PhenotypeR

Assess Study Cohorts Using a Common Data Model

Homepage: https://ohdsi.github.io/PhenotypeR/
Documentation: http://cran.r-project.org/web/packages/PhenotypeR/PhenotypeR.pdf
License: Apache License (≥ 2)
Latest release: 0.2.0
published 11 months ago

Versions: 8
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 403 Last month

Rankings

Dependent packages count: 27.6%

Dependent repos count: 34.0%

Average: 49.5%

Downloads: 86.9%

Maintainers (1)

edward.burn@ndorms.ox.ac.uk