| Type | ID | mlr3 ID | Label |
|---|---|---|---|
| Discrimination | harrell_c | surv.cindex | Harrell's C |
| uno_c | surv.cindex | Uno's C | |
| Scoring Rule | isll | surv.isll | Integrated Survival Log-Likelihood (ISLL) |
| isll_erv | surv.isll | Integrated Survival Log-Likelihood (ISLL) [ERV] | |
| isbs | surv.brier | Integrated Survival Brier Score (ISBS) | |
| isbs_erv | surv.brier | Integrated Survival Brier Score (ISBS) [ERV] | |
| Calibration | dcalib | surv.dcalib | D-Calibration |
| alpha_calib | surv.calib_alpha | Van Houwelingen's Alpha |
In the mlr3 ecosystem it is more common to refer to tasks and learners rather than datasets and models, and there are real differences between the terms which we unfortunately use inconsistently.
Measures
The following table provides a brief overview of the measures used in this benchmark. Unfortunately, the identifiers used under the hood do not directly correspond to measure names and abbreviations as used in the paper.
IDrefers to the shorthand used in the result files listed above.mlr3 IDrefers to the measure as it is implemented in mlr3probaLabelrefers to the measure as it is named consistently throughout the paper and resulting plots.
Integrated measures are always integrated up to the conventional 80% quantile.
Tasks (Datasets)
The following table gives a summary of the included tasks (datasets) in the benchmark.
Violation of the proportional hazards (PH) assumption is derived from the p-value of the global significance test conducted with cox.zph() at the 5% level (i.e., ph_violated = p.value > 0.05).
Learners (Models)
This table shows the learner (models) used in the benchmark with their mlr3 IDs and additional metadata.
- “Parameters” is 0 for learners such as KM, NA, CPH, which do not have any hyperparameters. It is also 0 for CoxBoost, which uses its own tuning method.
- “Survival Prediction” indicates whether the learner provides a survival probability prediction (a
distrobject), which allows evaluation with measures like the ISBS. - “Internal CV” indicates learners internally perform CV, i.e., GLMN (via
cv.glmnet) and CoxB (viaoptimCoxBoostPenalty()). - “Exhaustive Search” indicates whether the tuning space was small enough to perform exhaustive grid search with fewer than 50 evaluations
- “Scale” analogously indicates whether scaling to unit variance and 0 mean is performed.
- “Encode” indicates whether treatment encoding is performed as part of the pre-processing pipeline before the learner sees the data.
| Learner | mlr3 ID | Survival Prediction | Parameters | Internal CV | Exhaustive Search | Scale | Encode |
|---|---|---|---|---|---|---|---|
| KM | surv.kaplan | ✅ | 0 | --- | --- | --- | --- |
| NEL | surv.nelson | ✅ | 0 | --- | --- | --- | --- |
| AK | surv.akritas | ✅ | 1 | --- | --- | --- | --- |
| CPH | surv.coxph | ✅ | 0 | --- | --- | --- | --- |
| GAM | surv.gam.cox | ✅ | 0 | --- | --- | --- | --- |
| GLMN | surv.cv_glmnet | ✅ | 1 | ✅ | --- | --- | ✅ |
| Pen | surv.penalized | ✅ | 2 | --- | --- | --- | --- |
| NCV | surv.cv_ncvsurv | ✅ | 1 | ✅ | --- | --- | ✅ |
| AFT | surv.parametric | ✅ | 1 | --- | ✅ | --- | --- |
| Flex | surv.flexible | ✅ | 1 | --- | ✅ | --- | --- |
| RFSRC | surv.rfsrc | ✅ | 5 | --- | --- | --- | --- |
| RAN | surv.ranger | ✅ | 5 | --- | --- | --- | --- |
| CIF | surv.cforest | ✅ | 5 | --- | --- | --- | --- |
| ORSF | surv.aorsf | ✅ | 2 | --- | --- | --- | --- |
| RRT | surv.rpart | --- | 1 | --- | ✅ | --- | --- |
| MBSTCox | surv.mboost | ✅ | 4 | --- | --- | --- | --- |
| MBSTAFT | surv.mboost | --- | 4 | --- | --- | --- | --- |
| CoxB | surv.cv_coxboost | ✅ | 0 | ✅ | --- | --- | ✅ |
| XGBCox | surv.xgboost.cox | ✅ | 5 | --- | --- | --- | ✅ |
| XGBAFT | surv.xgboost.aft | --- | 7 | --- | --- | --- | ✅ |
| SSVM | surv.svm | --- | 4 | --- | --- | ✅ | ✅ |