Divya Dennis1, Preethi Sara George1, Roshni S2, Aleyamma Mathew1, Rosewin Mariya Joy1, Jagathnath Krishna K M1*
1Division of Cancer Epidemiology & Biostatistics, Regional Cancer Centre, Thiruvananthapuram, India
2Department of Radiation Oncology, Regional Cancer Centre, Thiruvananthapuram, India
*Corresponding Author Jagathnath Krishna K M, Division of Cancer Epidemiology & Biostatistics, Regional Cancer Centre, Thiruvananthapuram, India.
Received: July 28, 2022
Accepted: September 28, 2022
Published: September 28, 2022
Citation: Dennis, Preethi Sara George, Roshni S, Aleyamma Mathew, Rosewin Mariya Joy and Jagathnath Krishna K M. (2022) “Modelling Unobserved Random Heterogeneity in Time- To- Event Cancer Survival Data”, J Oncology and Cancer Screening, 4(3); DOI: http;//doi.org/009.2022/1.1058.
Copyright: © 2022 Jagathnath Krishna K M. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Background: In time-to-event cancer survival studies, unobserved random heterogeneity, the frailty plays a vital role. Different probability distributions are used to model this hidden heterogeneity and hence model selection is very crucial for identifying the risk for survival. The present study aimed to identify the best fitted frailty model for estimating the risk for colo-rectal cancer (CRC) survival and thereby compare with Cox proportional hazards model (CPHM).
Materials & Methods:
We considered 390 CRC patients with covariates age, sex, stage at diagnosis, comorbidity and recurrence status. Risk for survival was assessed using frailty and CPH models. Akaike information criteria (AIC) were used to identify the best fitted frailty distribution. The predictability of the models was done using survival concordance measure (C- index). Sub-sampling was done to assess the consistency of the frailty model using computer generated random samples.
Results:
Log-normal frailty model was identified as the best fitted random heterogeneity factor. The predictability measure, was higher for log-normal frailty model (C-index: 0.795 to 0.946) than CPHM (C-index: 0.495 to 0.726). Stage and recurrence status were the significant predictors with almost similar hazard rates for both the models. The model selection using sub- sampling approach showed that log-normal frailty as a consistent model.
Conclusion: The log-normal frailty was identified as the best predictive and consistent model for incorporating the effect of random heterogeneity in cancer survival.
Introduction:
In time-to-event analysis, unobserved random heterogeneity plays a significant role in predicting the risk for survival. The extended form of Cox proportional hazards model (CPHM) (Cox 1972), accommodate the random heterogeneity, the frailty model (Clayton 1978 and Vaupel et al. 1979), in which different parametric distributions (gamma, log-normal, Inverse Gaussian, positive stable and compound Poisson distributions) are used. The most popular frailty model in time-to-event analysis is the gamma distributions, in which the relative heterogeneity remains constant (Hougaard, 1986). However, log-normal frailty model was useful for random effects or mixed models (McGilchrist and Aisbett, 1991). Such model extends to more than two levels and to more general random-intercept and random-slope models. These kinds of models were least explored in literature
Further, the impact of misspecification of frailty distribution on the prediction of shared frailties with different estimation methods such as expectation maximizatiuon (EM) algorithm, penalized partial likelihood and penalized likelihood considering various frailty distributions such as gamma, compound Poisson, inverse Gaussian, log-normal, positive stable and power variance Function (PVF) distributions (Jiang et al 2020). Also, multivariate failure time distributions were derived from shared frailty and copulas (Wang et al. 2021; Marshall and Olkin 1988). Gamma frailty models were used for identifying the risk factors for cancer survival (Krishna et al. 2021). Camilleri et al (2022) presented gamma and the inverse Gaussian frailty models due to their flexibility in modeling heterogeneity. They also considered shared and unshared frailty models and identified the best fitted model using AIC criterion. They found that unshared gamma and inverse Gaussian were better models. Studies have assessed the consistency of best fitted models using simulation data (Austin 2012; Kim 2019).
As there exist different distributions to model frailty, the identification of best model is very crucial in estimating the risk. The present study aims to identify of best frailty model based on AIC criteria, compared the concordance of the best fitted frailty model with traditionally used Cox model, also assessed the consistency in frailty model selection using sub-populations. All the models were illustrated using colo-rectal cancer patient data with a 5-year follow-up, registered in the Regional Cancer Centre, Thiruvananthapuram.
Materials:
A total of 390 colo-rectal cancer cases were considered for the present study, among them 217 patients were died during the 5-year period and rest of the patients were considered as censored observations. The covariates considered in this study were age (≤ 50 years, > 50 years), sex (male, female), any comorbidity (Yes, No), stage at diagnosis (stage I, II, III & IV), metastasis (Yes, No), lymph node involvement (Yes, No) and recurrence status (Yes, No). The follow-up times were given in months.
Methods;
To model the random heterogeneity, we used gamma, log-normal, inverse Gaussian and power variance family (PVF) distributions as frailty variable. To identify the best fitted model, Akaike information criterion (AIC) was used. Further the frailty model was compared with Cox proportional hazards model and the comparison was done using survival concordance measure, C- index (Antolini, Boracchi and Biganzoli, 2005). Sub-population of the given data was randomly chosen using computer generated random numbers to assess the consistency in frailty model selection. Sub-populations of sample size 150 and 250 were considered. The data analysis was
done using R software and the package used were “survival”, “parfm”, “frailtySurv”, “dplyr” and “finalfit”.
Results:
The log-normal distribution was identified as the best fitted frailty model and was considered for further analysis (Table 1). The risk for survival among the colorectal cancer patients based on independent factors is given in Table 2. For univariable analysis, the covariates, stage, metastasis and recurrence are found to have significant risk for survival (p-value < 0.05) for CPH and log- normal frailty with a significant random effect. The concordance for log-normal frailty models was higher than CPH model for all the variables (Table 2). Both Cox model and frailty model showed that patients with metastasis compared with those patients without metastasis has (HR=4.4, 95% CI: 3.4, 6.0) high risk for mortality. The C-index was higher for log-normal frailty model (C-index ranges from 0.795 to 0.946) than CPH model (C-index ranges from 0.495 to 0.726) (Table 2).
The HR for stage III compared with stage I was 3.4 (95% CI: 1.62 - 7.19) for CPHM and (95% CI: 1.65 - 7.69) in log-normal frailty model and for stage IV compared with stage I, HR = 10.8 (95% CI: 5.04 - 23.24) in CPHM and HR = 11.2 (95% CI: 5.51 - 26.99) in lognormal.
Table 3 shows that the results obtained in Cox model and frailty model are similar. In both Cox model and frailty model it is obtained that recurrence in patients compared with no recurrence in patients has HR = 2.5 (95% CI: 1.61 - 3.89).
Multi-variable analyses were given in Table 3. For composite stage, the HR for stage III compared with stage I was HR = 3.169 (95% CI: 1.49 - 6.70) in CPH model and HR = 3.332 (95% CI: 1.53 - 7.25) in log-normal model (Table 3). Patients in stage IV compared with stage I has a high risk with HR = 10.30 (95% CI: 4.78 - 22.210) in CPH model and HR = 12.313 (95% CI: 5.47 - 27.66) in log-normal frailty model. The concordance for frailty models was higher than CPH model (Table 3). Based on the randomly selected subpopulation analyses with sample sizes 150 and 250, using AIC criterion, inverse Gaussian frailty was comparable with log-normal frailty model, however log-normal frailty was slightly better than all other models (Table 4).
Variables |
AIC SCORES |
|
||
Gamma |
Inverse Gaussian |
Log-normal |
PVF |
|
Age |
2296.592 |
2296.51 |
2296.508 |
2298.534 |
Sex |
2294.554 |
2294.468 |
2294.466 |
2296.492 |
Metastasis |
2288.652 |
2288.602 |
2288.602 |
2290.634 |
Lymph Node involvement |
2295.208 |
2295.13 |
2295.128 |
2297.156 |
Comorbidity |
2296.626 |
2296.544 |
2296.542 |
2298.568 |
Composite stage |
2121.02 |
2121.502 |
2120.878 |
2122.878 |
Recurrence |
2282.714 |
2282.298 |
2282.299 |
2284.24 |
Table 1: Frailty model identification using AIC criteria
Variables |
Cox Proportional hazards model |
Log-normal frailty model |
|||||
HR (95% CI) |
P Value |
C-Index (SE) |
HR (95% CI) |
P value |
C-Index (SE) |
Frailty (P value) |
|
Age (<=50 vs >50) |
1.043 |
0.797 |
0.499 |
1.041 |
0.810 |
0.946 |
0.103 |
|
(0.75- 1.44) |
|
(0.018 ) |
(0.74- 1.45) |
|
(0.007) |
(0.41) |
Sex (Male vs Female) |
0.925 (0.68- 1.24) |
0.611 |
0.504 (0.02 ) |
0.925 (0.68- 1.25) |
0.620 |
0.915 (0.01 ) |
0.083 (0.41) |
Metastasis (Yes vs No) |
4.399 (3.254- 5.949) |
<0.001 |
0.618 (0.017 ) |
4.399 (3.254- 5.949) |
<0.001 |
0.885 (0.011) |
0.001 (0.92) |
Lymph node involvement |
1.209 (0.90- 1.62) |
0.206 |
0.523 (0.02 ) |
1.216 (0.89- 1.64) |
0.210 |
0.795 (0.016 ) |
0.120 (0.40) |
(Yes vs No) |
|
|
|
|
|
|
|
Comorbidity (Yes vs No) |
1.04 (0.77- 1.39) |
0.793 |
0.495 (0.02 ) |
1.037 (0.76- 1.40) |
0.810 |
0.926 (0.007 ) |
0.083 (0.41) |
Stage (2 vs 1) |
1.231 |
0.577 |
0.694 |
1.232 |
0.590 |
0.856 |
0.141 |
|
(0.59- 2.56) |
|
(0.02 ) |
(0.58-2.60) |
|
(0.016 ) |
(0.33) |
Stage (3 vs 1) |
3.414 |
0.001 |
|
3.573 |
0.001 |
0.856 |
|
|
(1.62-7.19) |
|
|
(1.65-7.69) |
|
(0.016 ) |
|
Stage (4 vs 1) |
10.826 |
<0.001 |
|
11.203 |
<0.001 |
0.856 |
|
|
(5.04- 23.24) |
|
|
(5.51-26.99) |
|
(0.016 ) |
|
Recurrence (Yes vs No) |
2.506 (1.61-3.89) |
<0.001 |
0.526 ( 0.008 ) |
2.506 (1.61-3.89) |
<0.001 |
0.945 ( 0.01 ) |
0.0013 (0.38) |
Table 2: Comparison of predictability of the fitted models using concordance measure
Variables |
Cox Proportional hazards model |
Log-normal frailty model |
||
HR(CI) |
P value |
HR(CI) |
P Value |
|
Metastasis |
0.752(0.52-1.079) |
0.121 |
0.755(0.51-1.10) |
0.150 |
Stage(2 vs 1) |
1.218(0.58-2.53) |
0.597 |
1.22(0.57-2.61) |
0.110 |
Stage(3 vs 1) |
3.169(1.49-6.70 ) |
0.0025 |
3.332(1.53-7.25) |
0.0024 |
Stage(4 vs 1) |
10.306(4.78- 22.210) |
<0.001 |
12.313(5.47-27.66) |
<0.001 |
Recurrence |
2.105(1.33-3.31) |
0.0013 |
2.26( 1.36-3.74) |
0.0015 |
Frailty |
: NIL |
Frailty : 0.183 |
||
Concordance |
: 0.726 (SE =0.0241) |
(P value=0.290) |
||
|
|
Concordance = 0.822 (SE = 0.016) |
Table 3: Comparison of predictability of fitted models using concordance measure: A multi-variable analysis
Variables |
AIC Values |
|||||||
Gamma |
Inverse Gaussian |
Log-normal |
PVF |
|||||
|
150 |
250 |
150 |
250 |
150 |
250 |
150 |
250 |
Age |
643.52 |
1365.03 |
643.46 |
1365.01 |
643.45 |
1365.01 |
644.95 |
1366.85 |
Sex |
641.46 |
1362.85 |
641.41 |
1362.77 |
641.41 |
1362.76 |
643.41 |
1364.77 |
Lymph node involvement |
635.62 |
1363.10 |
635.56 |
1363.05 |
635.56 |
1363.04 |
637.56 |
1365.06 |
Comorbidity |
642.58 |
1362.82 |
642.63 |
1362.77 |
642.53 |
1362.76 |
644.53 |
1364.78 |
Composite stage |
575.56 |
1265.38 |
575.41 |
1265.21 |
575.38 |
1265.21 |
577.36 |
1267.19 |
Recurrence |
670.71 |
1357.47 |
910.43 |
1357.21 |
910.5 |
1357.21 |
658.65 |
1359.18 |
Metastasis |
614.88 |
1316.99 |
614.29 |
1316.10 |
614.28 |
1316.10 |
616.14 |
1317.90 |
Table 4: Assessment of consistency of frailty model selection using subpopulation of size
Discussion:
In the present study, it was observed that log-normal frailty model was the best fit to model unobserved random heterogeneity in colo-rectal cancer survival. The choice of frailty distribution is one of the critical phases in frailty modeling. The selection of different models as frailty for survival data have been discussed extensively by Hougaard (2000). Zhou et al. (2015) used covariate adjusted frailty model for clustered time to event data and showed the superiority of frailty over CPH model. Saeedi et al. (2017) considered cirrhosis patient mortality data and used gamma frailty to assess the risk. Frailty models and their merits over CPH model has been discussed by Yazdani (2019), Faradmal et al. (2012) and Talebi et al (2020) using cancer
survival data. Perperoglou et al. (2007) fitted frailty, CPH, time-depended Cox and cure rate models for breast cancer data with long-term survival and identified frailty as the significant model. Gurmu (2018) used parametric frailty model to assess the survival time among cervical cancer patients. The frailty models were looked into by several scholars (Hougaard, 1995, 2000; Therneau and Grambsch 2000; Noh et al., 2006; Liu et al., 2011; Zhou et al. 2015). Monaco et al (2018) introduced a general semi-parametric shared frailty model with gamma, log-normal, inverse Gaussian and power variance function as the frailty distributions, and provides consistent estimators of the standard errors of the parameters‟ estimators. This study indicated the applicability of frailty model in survival studies and their advantages over Cox model. Another on risk assessment for breast cancer using gamma frailty model by Krishna et al. (2021) also reported that frailty models as a better predictive model compared CPH model. All these studies suggested using frailty models to CPHM, which substantiates our findings of superiority of frailty model. Callegaro et al. (2012) introduce a new class of frailty model called log-skew-normal frailty model leading to an extension of the log-normal model. They compared Cox with log normal frailty model and log-skew-normal frailty model and illustrate it with a case study of multiple myeloma patients with autologous stem cells transplantation. They found that log-skew- normal frailty has a significant role in predicting the risk for survival. Nath et al (2020) studied a risk assessment in liver transplantation patients using frailty models. They compared the risk of survival by different frailty models using parametric approach namely, gamma, inverse Gaussian, positive stable and lognormal. Weibull and exponential distribution were considered as the baseline hazard. In this study they also used AIC and BIC criterion to identify better model. Legesse et al. (2022) used shared log-normal frailty model to know whether the recurrent event (time to recovery) is an associated factors of type 2 diabetes. This study also pointed that log-normal frailty model was better model compared to CPH. Similar to these studies, our study also identified log-normal frailty model as a better model compared to CPH model.
Among the covariates considered in the present study, metastases at presentation, stage at diagnosis and recurrence status have identified as significant factors on overall survival of colo- rectal cancer. The survival concordance was higher for log-normal frailty model in multivariable analysis also. In the multivariable analysis, covariates stage and recurrence jointly turned out to be significant prognostic factors and the estimated HR corresponding to log-normal frailty was higher than those obtained using CPHM. Though the frailty coefficient was insignificant, the concordance was always higher for log-normal frailty model. Hence it is suggested frailty models in cancer survival studies.
Jiang et al. (2020) used frailty models to identify the risk factors for hospital readmissions of colo-rectal cancer patients. They considered the covariates such as patient gender (male or female), tumour stage by Duke‟s classification (Stage A-B, Stage C, or Stage D), type of treatment (chemotherapy or radiotherapy), and Charlson comorbidity index (index = 0, 1 or 2, ≥3; time-dependent). They identified stage as one of the prognostic factor. From the simulation study, they concluded that the shared gamma frailty can provide reliable prediction on frailties even when the frailty distribution is miss-specified.
The present study used survival concordance to identify the best predictive model and was observed that log-normal frailty model as the best model based on the uni-variable and multi-variable analysis compared to CPH model. The predictability of log-normal frailty model using C-index ranges from 0.795 to 0.946 and that for CPHM ranges between 0.495 to 0.726, indicating lower predictability for CPHM. The measure of model discrimination analyses using concordance measures was not been discussed much in literature. The only article which highlighted the ability of concordance measure for model identification was by Krishna et al. (2021), which compared the predictability of the gamma frailty model with CPHM and found the gamma frailty as a better predictive model with high C-index.
Several studies have assessed the consistency of the derived model using simulated data. Kim (2019) used simulation methods to assess the consistency of posterior consistency in dispersion parameters and frailty coefficients. Austin (2012) used Monte Carlo simulation method to study the effect of time dependent covariates on CPH models. The present study assessed the consistency of the fitted frailty model using computer generated random sub-population. Based on the randomly selected subpopulation of sample sizes 150 and 250, using AIC criterion, inverse Gaussian frailty was comparable with log-normal frailty model, however log-normal frailty was slightly better than all other frailty models, which establishes the consistency of the log-normal frailty model.
In conclusion, this study observed that frailty modeling is the best approach to incorporate the hidden random heterogeneity in cancer survival. Randomly selected sub-sample analysis would help to assess the consistency of the best fitted model.
Acknowledgement: The authors are grateful to Department of Health Research (ICMR), Government of India (Grant id: R.11012/03/2021-GIA/HR) for the funding support to carry out the study.
Conflict of interest: There exists no conflict among the authors.