In the mlr3 ecosystem it is more common to refer to tasks and learners rather than datasets and models, and there are real differences between the terms which we unfortunately use inconsistently.

Measures

The following table provides a brief overview of the measures used in this benchmark. Unfortunately, the identifiers used under the hood do not directly correspond to measure names and abbreviations as used in the paper.

  • ID refers to the shorthand used in the result files listed above.
  • mlr3 ID refers to the measure as it is implemented in mlr3proba
  • Label refers to the measure as it is named consistently throughout the paper and resulting plots.
Type ID mlr3 ID Label
Discrimination harrell_c surv.cindex Harrell's C
uno_c surv.cindex Uno's C
Scoring Rule isll surv.isll Integrated Survival Log-Likelihood (ISLL)
isll_erv surv.isll Integrated Survival Log-Likelihood (ISLL) [ERV]
isbs surv.brier Integrated Survival Brier Score (ISBS)
isbs_erv surv.brier Integrated Survival Brier Score (ISBS) [ERV]
Calibration dcalib surv.dcalib D-Calibration
alpha_calib surv.calib_alpha Van Houwelingen's Alpha

Integrated measures are always integrated up to the conventional 80% quantile.

Tasks (Datasets)

The following table gives a summary of the included tasks (datasets) in the benchmark.

Violation of the proportional hazards (PH) assumption is derived from the p-value of the global significance test conducted with cox.zph() at the 5% level (i.e., ph_violated = p.value > 0.05).

Learners (Models)

This table shows the learner (models) used in the benchmark with their mlr3 IDs and additional metadata.

  • “Parameters” is 0 for learners such as KM, NA, CPH, which do not have any hyperparameters. It is also 0 for CoxBoost, which uses its own tuning method.
  • “Survival Prediction” indicates whether the learner provides a survival probability prediction (a distr object), which allows evaluation with measures like the ISBS.
  • “Internal CV” indicates learners internally perform CV, i.e., GLMN (via cv.glmnet) and CoxB (via optimCoxBoostPenalty()).
  • “Exhaustive Search” indicates whether the tuning space was small enough to perform exhaustive grid search with fewer than 50 evaluations
  • “Scale” analogously indicates whether scaling to unit variance and 0 mean is performed.
  • “Encode” indicates whether treatment encoding is performed as part of the pre-processing pipeline before the learner sees the data.
Learner IDs in benchmark with associated mlr3 identifiers
Learner mlr3 ID Survival Prediction Parameters Internal CV Exhaustive Search Scale Encode
KM surv.kaplan 0 --- --- --- ---
NEL surv.nelson 0 --- --- --- ---
AK surv.akritas 1 --- --- --- ---
CPH surv.coxph 0 --- --- --- ---
GAM surv.gam.cox 0 --- --- --- ---
GLMN surv.cv_glmnet 1 --- ---
Pen surv.penalized 2 --- --- --- ---
NCV surv.cv_ncvsurv 1 --- ---
AFT surv.parametric 1 --- --- ---
Flex surv.flexible 1 --- --- ---
RFSRC surv.rfsrc 5 --- --- --- ---
RAN surv.ranger 5 --- --- --- ---
CIF surv.cforest 5 --- --- --- ---
ORSF surv.aorsf 2 --- --- --- ---
RRT surv.rpart --- 1 --- --- ---
MBSTCox surv.mboost 4 --- --- --- ---
MBSTAFT surv.mboost --- 4 --- --- --- ---
CoxB surv.cv_coxboost 0 --- ---
XGBCox surv.xgboost.cox 5 --- --- ---
XGBAFT surv.xgboost.aft --- 7 --- --- ---
SSVM surv.svm --- 4 --- ---