Measures

The following table provides a brief overview of the measures used in this benchmark. Unfortunately, the identifiers used under the hood do not directly correspond to measure names and abbreviations as used in the paper.

  • ID refers to the shorthand used in the result files listed above.
  • mlr3 ID refers to the measure as it is implemented in mlr3proba
  • Label refers to the measure as it is named consistently throughout the paper and resulting plots.
ID mlr3 ID Label
harrell_c surv.cindex Harrell’s C
uno_c surv.cindex Uno’s C
isll surv.isll Integrated Survival Log-Likelihood (ISLL)
isll_erv surv.isll Integrated Survival Log-Likelihood (ISLL) [ERV]
isbs surv.brier Integrated Survival Brier Score (ISBS)
isbs_erv surv.brier Integrated Survival Brier Score (ISBS) [ERV]
dcalib surv.dcalib D-Calibration
alpha_calib surv.calib_alpha Van Houwelingen’s Alpha

Tasks

The following table gives a summary of the included datasets (tasks) in the benchmark.

Code
# tasks = load_task_data()
tasktab = load_tasktab()
ℹ Loading '/home/burk/projects/paper_2023_survival_benchmark/tables/tasktab.csv'
Code
tasktab |>
  dplyr::select(task_id, n, p, events, censprop, n_uniq_t) |>
  dplyr::arrange(-n) |>
  dplyr::mutate(
    n = n,
    p = p,
    events = events,
    censprop = round(100 * censprop, 1),
    n_uniq_t = n_uniq_t,
    repeats = assign_repeats(events)
  ) |>
  setNames(c("Dataset", "N", "p", "Events", "Censoring %", "# Unique Time Points", "# CV Repeats")) |>
  reactable::reactable(sortable = TRUE, filterable = TRUE, pagination = FALSE)

Learners

This table shows the models (learners) used in the benchmark with their mlr3 IDs and additional metadata.

  • “Parameters” is 0 for learners such as KM, NA, CPH, which do not have any hyperparameters. It is also 0 for CoxBoost, which uses its own tuning method.
  • “Survival Prediction” indicates whether the learner provides a survival probability prediction (a distr object), which allows evaluation with measures like the ISBS.
  • “Internal CV” indicates learners internally perform CV, i.e., GLMN (via cv.glmnet) and CoxB (via optimCoxBoostPenalty()).
  • “Exhaustive Search” indicates whether the tuning space was small enough to perform exhaustive grid search with fewer than 50 evaluations
  • “Scale” analogously indicates whether scaling to unit variance and 0 mean is performed.
  • “Encode” indicates whether treatment encoding is performed as part of the pre-processing pipeline before the learner sees the data.
Code
lrntbab = load_lrntab()
ℹ Loading '/home/burk/projects/paper_2023_survival_benchmark/tables/learners.csv'
Code
lrntab |>
  dplyr::select(id, base_lrn, surv_pred, params, internal_cv, grid, scale, encode) |>
  dplyr::mutate(dplyr::across(dplyr::where(is.logical), \(x) ifelse(x, "\u2705", ""))) |>
  kableExtra::kbl(
    align = "llccccc",
    caption = "Learner IDs in benchmark with associated mlr3 identifiers",
    col.names = c(
      "Learner",
      "mlr3 ID",
      "Survival Prediction",
      "Parameters",
      "Internal CV",
      "Exhaustive Search",
      "Scale",
      "Encode"
    )
  ) |>
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover")) |>
  kableExtra::column_spec(2, width = "5%") |>
  kableExtra::column_spec(3, width = "5%") |>
  kableExtra::column_spec(4, width = "20%") |>
  kableExtra::column_spec(7, width = "25%")
Learner IDs in benchmark with associated mlr3 identifiers
Learner mlr3 ID Survival Prediction Parameters Internal CV Exhaustive Search Scale Encode
KM surv.kaplan 0
NEL surv.nelson 0
AK surv.akritas 1
CPH surv.coxph 0
GLMN surv.cv_glmnet 1
Pen surv.penalized 2
AFT surv.parametric 1
Flex surv.flexible 1
RFSRC surv.rfsrc 5
RAN surv.ranger 5
CIF surv.cforest 5
ORSF surv.aorsf 2
RRT surv.rpart 1
MBSTCox surv.mboost 4
MBSTAFT surv.mboost 4
CoxB surv.cv_coxboost 0
XGBCox surv.xgboost.cox 5
XGBAFT surv.xgboost.aft 7
SSVM surv.svm 4