Releases | Open Source Science

https://github.com/bayer-group/phenex - v0.7.0

MAJOR ADDITIONS

Introducing lazy execution : cohort execution becomes a lot smarter with a 'lazy execution' keyword argument. When set to true, cohorts will intelligently minimize computation; only cohort components who's definition has changed from a previous run are executed. Unmodified components use previously run execution results. This is great when interactively developing a cohort - you can now quickly build up a cohort and see how it affects cohort results while minimizing wait time.

NOT BACKWARDS COMPATIBLE

This release is mostly compatible with 0.6.0; however, the call signature to Cohort.execute() has changed slightly.

CHANGES PRIOR BEHAVIOR

MINOR ADDITIONS

- Implement to_list method on Codelist
LogicPhenotype now returns a VALUE corresponding to the returned DATE. : previously, logic phenotype did not return a value. Prior behavior allowed for selection of a date using return_date = last, first or all. Now you can return the value associated with the returned date.
- BinPhenotype now works with categorical values : prior, BinPhenotype only worked on numeric valued phenotypes. Now, BinPhenotype works on non-numerical VALUE columns as well, allowing value mapping of multiple categorical values to user defined bins.
- CodelistPhenotype now returns matching codes : prior, CodelistPhenotype only returned person ids and event dates, with all null values. Now it returns a VALUE; this value is the code that resulted in a person fulfilling the phenotype criteria, Thus answering the question 'what code did person x have from codelist y'.
Improvements to ibis_connect : Previously SnowflakeConnector required two authentifications. Now requires only one authentication.

BUG FIXES

fix to computation graph phenotype : boolean column was not being added for score and arithmetic phenotypes.

- Python
Published by sprivite 10 months ago

https://github.com/bayer-group/phenex - v0.6.0

MAJOR ADDITIONS

Introducing BinPhenotype : BinPhenotype converts numeric values into categorical bin labels. To use, pass it a numeric valued phenotype such as AgePhenotype, MeasurementPhenotype, ArithmeticPhenotype, or ScorePhenotype.
Introducing UserDefinedPhenotype!! 🎉🎉 UserDefinedPhenotype allows users of PhenEx to implement custom functionality within a single phenotype. To use, the user must pass a function that returns an ibis table. Fully implemented with tests and documentation.

UserDefinedPhenotype is especially useful for two use cases: 1. Hybrid workflows: If you have performed cohort extraction outside of PhenEx (e.g. in R, SQL) but would like to use PhenEx to calculate baseline characteristics and outcomes, we can set the entry criterion to a UserDefinedPhenotype and read a dataframe of PERSONIDS and INDEXDATES. In this way, PhenEx flexibly allows us to use multiple tools in our analysis. 2. Custom event definitions: If you need to define events based on complex logic that is not easily expressed using the built-in PhenEx functionality, you can use UserDefinedPhenotype to implement this logic in a custom function. - Introducing EventCountPhenotype!! EventCountPhenotype allows users of PhenEx to 1. count the number of distinct days on which an event defined by another phenotype occurs 2. filter by the number of events that occur, allowing detection of e.g. 'at least three instances of AF code within 90 days prior of index date' 3. filter by the number of days between any pair of events, allowing for detection of e.g. 'two occurrences of AF code separated by more than 90 days'.

Full implementation, unit tests and documentation added. - Added complete implementation of DuckDBConnector class : prior, only the SnowflakeConnector was fully functional. Adding parallel functionality to DuckDBConnector. - Introducing DerivedTables and CombineOverlappingPeriods 🎉🍾 : we require ADT feeds (admission discharge transfer). An ADT feed takes overlapping and consecutive visits from the visitsoccurrence table and combines them into a single time period with a single start and end date. Added here is a proposal for how derived tables can be implemented in PhenEx, as well as an initial, highly imperfect implementation of combining overlapping periods. - DerivedTables are any table that are generated from the source data and do not require patient level specification (i.e. they are not phenotypes, as phenotypes subset the data using patient level criteria). Here are ADT feeds, but one can imagine data cleaning steps implemented in this manner. These derived tables are defined by the user in a manner similar to phenotypes; user specifies the source table domain key. Different is that the user defines the output destination table domain key. Derived tables are then generated during cohort execution and appended to the subsettablesentry, and thus also present in the subsettables_index, for use by all all phenotypes, except the entry criterion, accessible by the output destination domain key. - CombineOverlappingPeriods is our first DerivedTable that contains a non-performant implementation as a placeholder until a more performant implementation is written. It uses pandas rather than ibis and thus will have performance issues with large cohorts. It has been executed on cohorts up to 300k patients without problems.

NOT BACKWARDS COMPATIBLE

Change EventCountPhenotype keyword argument return_event to component_date_select : prior, EventCountPhenotype keyword for selection of date of first or second event was called 'returnevent'. Now, in order to harmonize interface, it is changed to 'componentdate_select', which is the term used for MeasurementChangePhenotype.
Updated interface for CategoricalPhenotype : Prior, CategoricalPhenotype duplicated keyword parameters of CategoricalFilter i.e. columnname and allowedvalues. Now CategoricalPhenotype takes directly a CategoricalFilter as a keyword argument categorical_filter. This harmonizes interface with TimeRangePhenotype, AgePhenotype i.e. we always pass filters and do not duplicate filter keyword arguments. This also adds new functionality, allowing CategoricalPhenotype to operate on multiple columns by taking advantage of the logical operations provided by CategoricalFilter. Updated tests and added tests for time range filtering.
Update ContinuousCoveragePhenotype : renamed to TimeRangePhenotype!! : we have updated our interface guidelines; we no longer duplicate keyword arguments within phenotypes if they are passed by filters.
- ContinuousCoverPhenotype previously had a duplicate implementation of relative time range filtering. We now pass a RelativeTimeRangeFilter directly using the keyword argument relative_time_range
- ContinuousCoveragePhenotype has been updated to work with any table with a start_date and an end_date (either directly or provided by mappers). This allows for usage with the CombineOverlappingPeriods derived table. It has been renamed to TimeRangePhenotype to reflect that this.

CHANGES PRIOR BEHAVIOR

Added new keyword parameter allow_null_end_date to TimeRangePhenotype : TimeRangePhenotype currently requires that the event date of interest (usually index date) is within the start_date and end_date. As this is often used for identify patients with continuous insurance coverage, we found that often patients that continue be enrolled in the data source / continue to have coverage have a null end_date. We now allow the end_date to be null; in fact, the default value of allow_null_end_date is set to True. This may change previous executions of studies.
Update all naming of tables and columns to uppercase : ibis allows lowercase in table names and column names. This is messy because snowpark and R don't allow this (and SQL doesn't care). We now require all table and column names to be capitalized to improve compatibility with downstream and other tool usage. This implementation ensures uppercase table names and column names by :
1. all phenotype names are now uppercase, meaning that all column names using phenotype names will also be capitalized.
2. all tables written by the cohort will be upper case, as the table create enforces uppercase when writing

MINOR ADDITIONS

Add pretty display to Waterfall Reporter : Waterfall Reporter previously outputs a pandas dataframes with all numeric data types. This made displays of the waterfall table rather unpleasing with NaNs displayed where values were not applicable (i.e. all summary statistics for a binary variable). Now have added
- added pretty_display keyword argument, set to true by default, which casts the table to strings and fills nulls with empty strings.
- a percentage column, showing how many patients remain after application
Updated TimeToEvent plotting : previously created one plot with all outcomes. Now added class methods to create :
1. plot_multiple_kaplan_meier : a figure with multiple KM curves for selected outcomes. No risk counts displayed.
2. plot_single_kaplan_meier : a figure with a single selected outcome with risk counts. Additionally, for both methods, can write figures to disk by passing the path_dir keyword argument.

BUG FIXES

Fixed reporting of BinPhenotype, ScorePhenotype and EventCountPhenotype in Table1 Reporter :
- BinPhenotype is reported as a categorical value, so each bin is displayed in table1 automatically with count of patients in that bin
- ScorePhenotype is now reported as a categorical value, so each score and count of patients with that score are automatically added to Table1
- EventCountPhenotype is now reported as a numerical value, so the summary statistics are displayed automatically in Table1
Fixed strange behavior in Table1 Reporter : table 1 had strange behavior, placing counts with wrong phenotypes; the counts were correct for a phenotype, but assigned to the incorrect label. The implementation of Table1 Reporter was changed from using the baseline characteristics table to using the phenotype tables themselves. This solves the issue. Additional changes to logging to prevent multiple log statements.
Fixed to OMOPObservationTable mapper : previously had the incorrect OMOP column name for the 'code' key defined. Now corrected the 'code' key to OBSERVATION_CONCEPT_ID. This allows CodelistPhenotype to correctly work on the OBSERVATION_CONCEPT_ID column to filter for encounter types to the ObservationTable.
Fixed Waterfall 'waterfall' column bug : There was an issue where the 'remaining' column could increase because it was not counting distinct patient ids. If an inclusion/exclusion phenotype was returning non-distinct patient ids, it could appear that the waterfall count was increasing, when this was by definition impossible. Now displaying distinct patient ids in the waterfall column, thus displaying the correct count of patients remaining in the cohort at each row in the attrition table.
Fixed to TimeToEvent Reporter: TimeToEventReporter was incorrectly identifying patients who had an event due to a bug in the selection of the 'first event date'. This was due to ibis.least returning null if any column was null. New implementation fixes this, correctly identifying the first event date.

Fixed to Table1 Reporter: Added ScorePhenotype and Arithmetic Phenotype to Table1 Reporters' Value reporting. This allows Table1 to display descriptive statistics for the value column of ScorePhenotype and ArithmeticPhenotype.

Fixed Waterfall reporter count of N : Previously, waterfall reporter was counting number of events and not number of distinct patient ids. N column fixed to display number of unique patient ids with a given inclusion/exclusion criteria on that row.
Fixed bug in CategoricalPhenotype : CategoricalPhenotype with new implementation was always attempting to perform filtering, even when allowed_values was set to null. This is counter to how CategoricalPhenotype is often used as a baseline characteristic, to allow reporting of counts of all categories present. Fixed bug to allow non-filtering by CategoricalPhenotype.
Fixed LogicPhenotype negation of single item : LogicPhenotype was failing when expression was negation of a single phenotype when using a SnowflakeBackend. Negation of a single phenotype was tested using DuckDB in the unit tests, but for some reason when run in Snowflake produced an error. This is related to the coalesce to fill in non-null dates for horizontal date selection; Snowflake backend did not like coalesce with a single column name. Have implemented the fix; if only a single column is present, return selection of that column.

- Python
Published by a-hartens 11 months ago

https://github.com/bayer-group/phenex - v0.5.0

Added time to event reporter (KM curves)
Added ISTH Major Bleeding phenotype
Bug fixes to MeasurementChangePhenotype
Fixes to Table1 (ensure reporting of categorical phenotypes)
Updates to documentation and roadmap

- Python
Published by sprivite about 1 year ago

https://github.com/bayer-group/phenex - v0.4.3

Updates: * Categorical Filter : Added isnull, notnull, isin, notin operators. * Phenotypes : Added description to phenotypes

Fixes: * Table One : Corrected table display issues. * Ibis Null Handling : Fixed date/datetime null issues. * General Code Cleanup : Formatted with black, removed unused print statements.