MatSurv

MatSurv: Survival analysis and visualization in MATLAB - Published in JOSS (2020)

https://github.com/aebergl/matsurv

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
    3 of 6 committers (50.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Scientific Fields

Materials Science Physical Sciences - 40% confidence
Last synced: 4 months ago · JSON representation

Repository

Survival analysis in MATLAB

Basic Info
  • Host: GitHub
  • Owner: aebergl
  • License: mit
  • Language: MATLAB
  • Default Branch: master
  • Size: 11.5 MB
Statistics
  • Stars: 22
  • Watchers: 2
  • Forks: 13
  • Open Issues: 3
  • Releases: 2
Created over 8 years ago · Last pushed 5 months ago
Metadata Files
Readme Contributing License Code of conduct

README.md

MatSurv View MatSurv on File Exchange DOI

MatSurv is a simple survival analysis function for MATLAB (version 2016b and later) that creates a KM plot with risk table. Survival statistics, such as log-rank p-value and hazard ratio (HR) are also calculated. The log-rank test has been evaluated to give the same results as SAS and R. The style of the KM plot is easily changed with input parameters. No additional toolboxes are needed or depended upon. MatSurv was inspired by the survminer R-package.

The general usage is: matlab [p, fh, stats] = MatSurv(TimeVar, EventVar, GroupVar, 'param', value, …)

Table of contents

Why Matsurv

MatSurv allows MATLAB users to create KM-plots with a risk table and also to perform a log-rank test between two or more groups. An event is, for example, death, relapse of disease, or a new metastatic tumor. If none of these events occur during the study period, the time-to-event is unknown, this point is called censored. A risk table describe the number of patients that are still “at-risk” at a specific timepoint. MatSurv also provides a fine grained customization of the KM-plots, making it suitable for publications. MatSurv hopefully will make life easier for fellow Bioinformaticians (and other professionals) who prefer MATLAB over R.

Citing MatSurv

Creed et al., (2020). MatSurv: Survival analysis and visualization in MATLAB. Journal of Open Source Software, 5(46), 1830, https://doi.org/10.21105/joss.01830

MATLAB Release Compatibility

Compatible with R2016b to R2024b

Recent Improvements

2025-08-04 : Updated calculation of covariance matrix to check and adjust for dividing by zero

2022-02-17 : Added options to use all four quartile groups, use 'CutPoint','QuartileAll

2020-04-08 : Added logrank test for trend, use 'LogRankTrend',true

Simple Example

The following code loads the data from "Freireich, EJ et al. 1963, Blood, 21, 699-716)" and creates a KM plot with risk table. The time unit is weeks and the x-axis step length is changed to 4. The risk table shows how many are at risk (alive) for each time point. Censored points are marked with a vertical line.

```matlab

[p,fh,stats]=MatSurv([], [], [],'Xstep',4);

``` MatSurv example

Using MatSurv

Installation

Simply put MatSurv.m in any directory of your choice and make sure it is added to your path.

Usage

MatSurv(TimeVar, EventVar, GroupVar,'param', value, ...) creates a Kaplan-Meier plot with a risk table and calculates a log-rank p-value.

[p] = MatSurv( ... ) returns the log-rank p-value

[p, fh] = MatSurv( ... ) returns both p-value and figure handle

[p, fh, stats] = MatSurv( ... ) returns additions stats from the log-rank test

[p, fh, stats] = MatSurv([], [], [], ... ) loads a test dataset from "Freireich, EJ et al. 1963, Blood, 21, 699-716"

INPUTS:

  • TimeVar is a vector with numeric time to event, either observed or censored. Values equal or less than zero will be removed by default

  • EventVar is a vector or cell array defining events or censored observations. Events are defined with a 1 and censored point with a 0. By default 'Dead', 'Deceased', 'Relapsed', 'Yes' are considered as events. 'Alive', 'Living', 'Not Relapsed', 'DiseaseFree', 'No' are considered as censored. 'EventDefinition' can be used to define other types of events.

  • GroupVar is a vector or cell array defining the different groups. if GroupVar is a numeric vector, median-cut will be used as a default.

OUTPUTS:

  • p : Log-rank p-value

  • fh : Figure handle for KM plot figure

  • stats : Structure with additional statistics in the following fields: ```matlab struct with fields:

      GroupNames: Cell with group names 
            p_MC: log rank p-value (Mantel-Cox) 
         Chi2_MC: Chi square (Mantel-Cox) 
      HR_logrank: Hazard Ratio (log rank)
    

    HR95CIlogrank: 95 percentile Confidence Intervals [lower upper] HRlogrankInv: Inverted Hazard Ratio (log rank) HR95CIlogrankInv: Inverted 95 percentile Confidence Intervals [lower upper] HRMH: Hazard Ratio (Mantel-Haenszel) HR95CIMH: 95 percentile Confidence Intervals [lower upper] HRMHInv: Inverted Hazard Ratio (Mantel-Haenszel) HR95CIMH_Inv: Inverted 95 percentile Confidence Intervals [lower upper] MedianSurvivalTime: Median survival time for each group

```

More Examples

Additional options

Below are some examples for how to create different styles of KM plots and also how one can make changes using the figure handle.

In the example below, we show how we can change some of the properties of the KM plot via various name-value pair arguments.

```matlab

[p,fh,stats]=MatSurv([],[],[],'Xstep',4,... 'TitleOptions',{'Color','r','Interpreter','none'},'InvHR',1,... 'Xlim',32,'XMinorTick',3,'LineColor',[0 0 1;1 0 1],'LineStyle',{'-',':'},... 'LineWidth',3,'CensorLineColor','k','RT_KMplot',1);

``` MatSurv example

Example with multiple groups

This example is taken from the TCGA laml data set. Obtaining the data from cBioPortal can be found in the MatSurv/Article/MATLAB/getlamlRC_data.m script. The samples are diveded into three groups based on their Cyto score. It is clear from the KM-Plot below that these groups have different outcomes.

For this example we will load the data directly.

```matlab load lamlRCdata.mat

[p,fh,stats]=MatSurv(lamlRCTimeVar, lamlRCEventVar, lamlRCGroupVar,... 'GroupsToUse', {'Good','Intermediate','Poor'},'Xstep',24); ``` Multiple groups MatSurv example

Example with merging groups

Groups can be merged using a multilevel cell as GroupToUse input This example will merge the poor and N.D group. The first element in the cell array will define the name of the merged group and can either be the name of an existing group or a new group name.

```matlab load lamlRCdata.mat

[p,fh,stats]=MatSurv(lamlRCTimeVar, lamlRCEventVar, lamlRCGroupVar,... 'GroupsToUse', {'Good','Intermediate',{'Poor + N.D.','Poor','N.D.'}},'Xstep',24); ``` Multiple merged groups MatSurv example

Example with gene expression data

This example is also taken from the TCGA LAML dataset but we in this example we will be using RNAseq gene expression data for the hepatocyte growth factor (HGF) gene. HGF gene expression has been related to outcome in a variety of cancers, including of the lungs, pancreas, thyroid, colon, and breast. Obtaining the data from cBioPortal can be found in the MatSurv/Article/MATLAB/getlamlHGFgenedata.m script. The expression level of a gene is continues and if no prior knowledge is available, the median is frequqently used to divide the samples into two groups, see the first graph below. Using the top 25% and bottom 25%, quartiles, is also frequently used, see the second graph below. Finally, if one or several cut-points level are known, these can also be used, third graph below. For this example we will load the data directly.

```matlab load lamlHGFgene_data.mat

% Using median cut [p,fh,stats]=MatSurv(lamlHGFgeneTimeVar,lamlHGFgeneEventVar,HGF_gene,'Xstep',12,'InvHR',1);

% Using quartile [p,fh,stats]=MatSurv(lamlHGFgeneTimeVar,lamlHGFgeneEventVar,HGF_gene,'Xstep',12,'InvHR',1,... 'CutPoint','quartile');

% Using Two Cut points [p,fh,stats]=MatSurv(lamlHGFgeneTimeVar,lamlHGFgeneEventVar,HGF_gene,'Xstep',12,'InvHR',1,... 'CutPoint',[6 12]);

```

Median cut

Median MatSurv example

Quartile

Quartile MatSurv example

Two cut points

Two Cut points MatSurv example

Unit Test

A test script for MatSurv can be found in the UnitTest directory.

List of all input options

  • NoPlot: A true/false value which, if true, no figure is created (default: false)

  • NoRiskTable: A true/false value which, if true, no risk table is included in the KM plot. (default: false)

  • CutPoint: Either a string or scalar/vector with cut points to be used for defining groups based on a continuous GroupVar input variable Allowed names are: 'Median', 'Quartile', 'QuartileAll' or 'Tertile' If a scalar or vector is given, the groups will be defined based on the cut points. (default: 'Median')

  • GroupsToUse: Cell array defining what groups to use from the GroupVar variable. Groups can be merged using a multilevel cell structure, for example: {{'Group 1+2','Group1','Group2'},'Group3','Group4'} Group 1 & 2 will be merged and called Group 1+2 (default: all groups are used)

  • GroupOrder: A cell array or vector defining the group order to be used in the legend. The vector needs to have the same number of elements as groups while the cell array does not have that requirement. (default: Groups are sorted by GroupsToUse if defined, else alphabetically)

  • EventDefinition: Two element cell array where the first cell defines the event and the second defines censored values. Example {'Dead,'Alive'}

  • TimeMin: Scalar defining minimum valid time point. Subjects with time values below this will be removed. (default: 0)

  • MinNumSamples: Scalar defining minimum number of samples for a Group Groups with less samples will be removed. (default: 2)

  • TimeMax: Scalar value defining right censoring time. Subjects with TimeVar > TimeMax will be set to TimeMax and considered as censored. (default: [])

  • LogRankTrend: A true/false for performing a log rank test for trend requires equally spaced ordered groups (default: false)

  • PairWiseP: A true/false value for calculating pairwise log-rank test between group pairs; useful if there are more than two groups. (default: false)

  • NoWarnings: A true/false value which, if true, no warnings are printed if subjects are removed. (default: false)

  • MedianLess: By default 'x < median' is used for median cut, but if false 'x > median' is used instead, only affect the results when there is an odd number of samples (default: true)

KM plot options * legend: Whether to show group legend. Default: true

  • LineColor: Either a matrix of size numLevels-by-3 representing the colormap to be used, or a string for a MATLAB colormap (lines, parula, cool, prism) or 'JCO' 'nejm' 'Lancet' 'Science' 'Nature' 'lines' for a set of journal dependent palettes or custom default 'aeb01' (default:'aeb01')

  • FlipGroupOrder: Flips the order of the groups in the legend. (default: false)

  • FlipColorOrder: Flips the color order of the groups. (default: false)

  • KM_position: Vector defining the KM axes for the KM plot. (default: [0.3 0.4 0.68 0.45])

  • RT_position: Vector defining the risk table axes for the KM plot. (default: [0.3 0.05 0.68 0.20])

  • TimeUnit: String defining the time unit displayed on the x-axis. (default: 'Months')

  • BaseFontSize: Base font size for all text in the plot. (default: 16)

  • DispP: A true/false value which, if true, the log-rank test p-value is displayed on the KM plot. (default: true)

  • DispHR: A true/false value which, if true, the HR is displayed on the KM plot. (default: true)

  • Use_HR_MH: A true/false value which, if true, Mantel-Haenszel HR is displayed instead of the logrank HR. (default: true)

  • InvHR: A true/false value which, if true, the inverted HR value is displayed on the KM plot. (default: false)

  • DrawMSL: A true/false value which, if true, a line for the median survival time is drawn in the KM-plot. (default: false)

  • XLim: Vector defining the x-limit. Does not affect the log-rank test. (default: automatic)

  • LineWidth: Scalar defining the line width used in the KM plot. (Default: 2)

  • LineStyle: Cell array defining the line style for the KM plot. If an array is given each group will have different linestyle, for example LineStyle,{'-','--',':','-.'} (Default: {'-'})

  • CensorLineWidth: Scalar defining the line width of the censored ticks. (default: 2)

  • CensorLineLength: Scalar defining the length of the censored ticks. (Default: 0.02)

  • CensorLineColor: Text string defining color of censor ticks. 'same' gives the same colors as the lines while 'k' makes them black (Default: 'same')

  • Xstep: Scalar defining the x-tick step length. (defaut: automatic)

  • XTicks: Vector defining the position of the x-tick marks. (Default: automatic)

  • XMinorTick: Scalar defining the number of minor ticks between major x-ticks. (Default: 1)

  • Xlabel: Text string for x-label (Default: 'Time (Months)')

  • XlabelOptions: MATLAB Name-value pair arguments for x-label. (Default: '')

  • XLabelFontSize: Scalar describing x-label font size change compared to base font size. (Default: 0)

  • XTickFontSize: Scalar describing x-tick font size change compared to base font size. (Default: -2)

  • YLim: Vector defining the range of the Y-axis (Default: [0 1])

  • YTicks: Vector defining the position of the x-tick marks. (Default: [0:0.2:1])

  • YMinorTick: Scalar defining the number of minor ticks between major y-ticks. (Default: 1)

  • Ylabel: Text string for y-label. (Default: 'Survival Probability' )

  • YlabelOptions: MATLAB name-value pair arguments for y-label. (Default: '')

  • YLabelFontSize: Scalar describing y-label font size change compared to base font size. (Default: 0)

  • YTickFontSize: Scalar describing y-tick font size change compared to base font size. (Default: -2)

  • Title: Text string for title. (Default:'')

  • TitleOptions: MATLAB name-value pair arguments for title. (Default:'')

  • LegendFontSize: Scalar describing legend font size change compared to base font size. (Default: -2)

  • PvalFontSize: Scalar describing p-value font size change compared to base font size. (Default: 0)

Risk table plot options

  • RT_KMplot: A true/false value which, if true, the risk table is placed as a part of the KM-plot. (default: False)

  • RT_XAxis: A true/false value which, if true, a X-axis line is included in the risk table. (default: True)

  • RT_FontSize: Scalar describing risk table font size change compared to base font size. (Default: 0)

  • RT_Color: Text string defining color of risk table text. 'same' gives the same colors as the groups in the KM plot while 'k' would make them black. (Default: 'same')

  • RT_Title: Text string for risk table title. (Default: '' )

  • RT_TitleOptions: MATLAB name-value pair arguments for risk table title. (Default: '')

  • RT_YLabel: A true/false value for displaying the group names on the risk table y axis. (Default: true )

Owner

  • Login: aebergl
  • Kind: user
  • Location: Tampa, USA
  • Company: Moffitt

JOSS Publication

MatSurv: Survival analysis and visualization in MATLAB
Published
February 13, 2020
Volume 5, Issue 46, Page 1830
Authors
Jordan H. Creed
Moffitt Cancer Center, Tampa, Florida, United States
Travis A. Gerke
Moffitt Cancer Center, Tampa, Florida, United States
Anders E. Berglund
Moffitt Cancer Center, Tampa, Florida, United States
Editor
Christopher R. Madan ORCID
Tags
MATLAB survival analysis

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 214
  • Total Committers: 6
  • Avg Commits per committer: 35.667
  • Development Distribution Score (DDS): 0.29
Past Year
  • Commits: 1
  • Committers: 1
  • Avg Commits per committer: 1.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
aebergl a****l@g****m 152
jhcreed J****d@m****g 54
Travis Gerke t****e@h****u 5
zmi z****k@g****m 1
pjl54 p****4@c****u 1
Patrick Leo p****4@a****u 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 8
  • Total pull requests: 5
  • Average time to close issues: 3 months
  • Average time to close pull requests: about 2 months
  • Total issue authors: 6
  • Total pull request authors: 4
  • Average comments per issue: 1.25
  • Average comments per pull request: 0.2
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • ManuelaS (2)
  • ashrafinia (2)
  • bbjiang (1)
  • zhenweishi (1)
  • malkhodari (1)
  • iahncajigas (1)
Pull Request Authors
  • zmiimz (2)
  • pjl54 (1)
  • tgerke (1)
Top Labels
Issue Labels
bug (1) enhancement (1) question (1)
Pull Request Labels