Modelling Unobserved Random Heterogeneity in Time- To- Event Cancer Survival Data

Authors

Divya Dennis1, Preethi Sara George1, Roshni S2, Aleyamma Mathew1, Rosewin Mariya Joy1, Jagathnath Krishna K M1*
1Division   of  Cancer Epidemiology &  Biostatistics,  Regional Cancer Centre, Thiruvananthapuram, India
2Department of Radiation Oncology, Regional Cancer Centre, Thiruvananthapuram, India

Article Information

*Corresponding Author Jagathnath Krishna K M, Division  of  Cancer Epidemiology & Biostatistics,  Regional            Cancer Centre, Thiruvananthapuram, India.

Received: July 28, 2022                                      
Accepted: September 28, 2022
Published: September 28, 2022

Citation: Dennis, Preethi Sara George, Roshni S, Aleyamma Mathew, Rosewin Mariya Joy and  Jagathnath Krishna K M. (2022) “Modelling Unobserved Random Heterogeneity in Time- To- Event Cancer Survival Data”, J Oncology and Cancer Screening, 4(3); DOI: http;//doi.org/009.2022/1.1058.
Copyright: © 2022 Jagathnath Krishna K M. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background: In time-to-event cancer survival studies, unobserved random heterogeneity, the frailty plays a vital role. Different probability distributions are used to model this hidden heterogeneity and hence model selection is very crucial for identifying the risk for survival. The present study aimed to identify the best fitted frailty model for estimating the risk for colo-rectal cancer (CRC) survival and thereby compare with Cox proportional hazards model (CPHM).


Materials & Methods:

We considered 390 CRC patients with covariates age, sex, stage at diagnosis, comorbidity and recurrence status. Risk for survival was assessed using frailty and CPH models. Akaike information criteria (AIC) were used to identify the best fitted frailty distribution. The predictability of the models was done using survival concordance measure (C- index). Sub-sampling was done to assess the consistency of the frailty model using computer generated random samples.

Results:

Log-normal frailty model was identified as the best fitted random heterogeneity factor. The predictability measure, was higher for log-normal frailty model (C-index: 0.795 to 0.946) than CPHM (C-index: 0.495 to 0.726). Stage and recurrence status were the significant predictors with almost similar hazard rates for both the models. The model selection using sub- sampling approach showed that log-normal frailty as a consistent model.

Conclusion: The log-normal frailty was identified as the best predictive and consistent model for incorporating the effect of random heterogeneity in cancer survival.


Keywords: time-to-event data; frailty model; cox proportional hazards model; log-normal distribution; colorectal cancer

Introduction:

In time-to-event analysis, unobserved random heterogeneity plays a significant role in predicting the risk for survival. The extended form of Cox proportional hazards model (CPHM) (Cox 1972), accommodate the random heterogeneity, the frailty model (Clayton 1978 and Vaupel et al. 1979), in which different parametric distributions (gamma, log-normal, Inverse Gaussian, positive stable and compound Poisson distributions) are used. The most popular frailty model in time-to-event analysis is the gamma distributions, in which the relative heterogeneity remains constant (Hougaard, 1986). However, log-normal frailty model was useful for random effects or mixed models (McGilchrist and Aisbett, 1991). Such model extends to more than two levels and to more general random-intercept and random-slope models. These kinds of models were least explored in literature

Further, the impact of misspecification of frailty distribution on the prediction of shared frailties with different estimation methods such as expectation maximizatiuon (EM) algorithm, penalized partial likelihood and penalized likelihood considering various frailty distributions such as gamma, compound Poisson, inverse Gaussian, log-normal, positive stable and power variance Function (PVF) distributions (Jiang et al 2020). Also, multivariate failure time distributions were derived from shared frailty and copulas (Wang et al. 2021; Marshall and Olkin 1988). Gamma frailty models were used for identifying the risk factors for cancer survival (Krishna et al. 2021). Camilleri et al (2022) presented gamma and the inverse Gaussian frailty models due to their flexibility in modeling heterogeneity. They also considered shared and unshared frailty models and identified the best fitted model using AIC criterion. They found that unshared gamma and inverse Gaussian were better models. Studies have assessed the consistency of best fitted models using simulation data (Austin 2012; Kim 2019).

As there exist different distributions to model frailty, the identification of best model is very crucial in estimating the risk. The present study aims to identify of best frailty model based on AIC criteria, compared the concordance of the best fitted frailty model with traditionally used Cox model, also assessed the consistency in frailty model selection using sub-populations. All the models were illustrated using colo-rectal cancer patient data with a 5-year follow-up, registered in the Regional Cancer Centre, Thiruvananthapuram.

Materials:

A total of 390 colo-rectal cancer cases were considered for the present study, among them 217 patients were died during the 5-year period and rest of the patients were considered as censored observations. The covariates considered in this study were age (≤ 50 years, > 50 years), sex (male, female), any comorbidity (Yes, No), stage at diagnosis (stage I, II, III & IV), metastasis (Yes, No), lymph node involvement (Yes, No) and recurrence status (Yes, No). The follow-up times were given in months.

Methods;

To model the random heterogeneity, we used gamma, log-normal, inverse Gaussian and power variance family (PVF) distributions as frailty variable. To identify the best fitted model, Akaike information criterion (AIC) was used. Further the frailty model was compared with Cox proportional hazards model and the comparison was done using survival concordance measure, C- index (Antolini, Boracchi and Biganzoli, 2005). Sub-population of the given data was randomly chosen using computer generated random numbers to assess the consistency in frailty model selection. Sub-populations of sample size 150 and 250 were considered. The data analysis was

done using R software and the package used were “survival”, “parfm”, “frailtySurv”, “dplyr” and “finalfit”.

Results:

The log-normal distribution was identified as the best fitted frailty model and was considered for further analysis (Table 1). The risk for survival among the colorectal cancer patients based on independent factors is given in Table 2. For univariable analysis, the covariates, stage, metastasis and recurrence are found to have significant risk for survival (p-value < 0.05) for CPH and log- normal frailty with a significant random effect. The concordance for log-normal frailty models was higher than CPH model for all the variables (Table 2). Both Cox model and frailty model showed that patients with metastasis compared with those patients without metastasis has (HR=4.4, 95% CI: 3.4, 6.0) high risk for mortality. The C-index was higher for log-normal frailty model (C-index ranges from 0.795 to 0.946) than CPH model (C-index ranges from 0.495 to 0.726) (Table 2).

The HR for stage III compared with stage I was 3.4 (95% CI: 1.62 - 7.19) for CPHM and (95% CI: 1.65 - 7.69) in log-normal frailty model and for stage IV compared with stage I, HR = 10.8 (95% CI: 5.04 - 23.24) in CPHM and HR = 11.2 (95% CI: 5.51 - 26.99) in lognormal.

Table 3 shows that the results obtained in Cox model and frailty model are similar. In both Cox model and frailty model it is obtained that recurrence in patients compared with no recurrence in patients has HR = 2.5 (95% CI: 1.61 - 3.89).

Multi-variable analyses were given in Table 3. For composite stage, the HR for stage III compared with stage I was HR = 3.169 (95% CI: 1.49 - 6.70) in CPH model and HR = 3.332 (95% CI: 1.53 - 7.25) in log-normal model (Table 3). Patients in stage IV compared with stage I has a high risk with HR = 10.30 (95% CI: 4.78 - 22.210) in CPH model and HR = 12.313 (95% CI: 5.47 - 27.66) in log-normal frailty model. The concordance for frailty models was higher than CPH model (Table 3). Based on the randomly selected subpopulation analyses with sample sizes 150 and 250, using AIC criterion, inverse Gaussian frailty was comparable with log-normal frailty model, however log-normal frailty was slightly better than all other models (Table 4).

Variables

AIC SCORES

 

Gamma

Inverse Gaussian

Log-normal

PVF

Age

2296.592

2296.51

2296.508

2298.534

Sex

2294.554

2294.468

2294.466

2296.492

Metastasis

2288.652

2288.602

2288.602

2290.634

Lymph Node involvement

2295.208

2295.13

2295.128

2297.156

Comorbidity

2296.626

2296.544

2296.542

2298.568

Composite stage

2121.02

2121.502

2120.878

2122.878

Recurrence

2282.714

2282.298

2282.299

2284.24

Table 1: Frailty model identification using AIC criteria                    

Variables

Cox Proportional hazards model

Log-normal frailty model

HR (95% CI)

P

Value

C-Index

(SE)

HR (95% CI)

P value

C-Index

(SE)

Frailty

(P value)

Age (<=50 vs >50)

1.043

0.797

0.499

1.041

0.810

0.946

0.103

 

(0.75- 1.44)

 

(0.018 )

(0.74- 1.45)

 

(0.007)

(0.41)

Sex (Male vs

Female)

0.925

(0.68- 1.24)

0.611

0.504

(0.02 )

0.925

(0.68- 1.25)

0.620

0.915

(0.01 )

0.083

(0.41)

Metastasis (Yes vs

No)

4.399

(3.254- 5.949)

<0.001

0.618

(0.017 )

4.399

(3.254- 5.949)

<0.001

0.885

(0.011)

0.001

(0.92)

Lymph node

involvement

1.209

(0.90- 1.62)

0.206

0.523

(0.02 )

1.216

(0.89- 1.64)

0.210

0.795

(0.016 )

0.120

(0.40)

(Yes vs No)

 

 

 

 

 

 

 

Comorbidity (Yes vs

No)

1.04

(0.77- 1.39)

0.793

0.495

(0.02 )

1.037

(0.76- 1.40)

0.810

0.926

(0.007 )

0.083

(0.41)

Stage (2 vs 1)

1.231

0.577

0.694

1.232

0.590

0.856

0.141

 

(0.59- 2.56)

 

(0.02 )

(0.58-2.60)

 

(0.016 )

(0.33)

Stage (3 vs 1)

3.414

0.001

 

3.573

0.001

0.856

 

 

(1.62-7.19)

 

 

(1.65-7.69)

 

(0.016 )

 

Stage (4 vs 1)

10.826

<0.001

 

11.203

<0.001

0.856

 

 

(5.04- 23.24)

 

 

(5.51-26.99)

 

(0.016 )

 

Recurrence (Yes vs

No)

2.506

(1.61-3.89)

<0.001

0.526

( 0.008 )

2.506

(1.61-3.89)

<0.001

0.945

( 0.01 )

0.0013

(0.38)

Table 2: Comparison of predictability of the fitted models using concordance measure

 

Variables

Cox Proportional hazards model

Log-normal frailty model

HR(CI)

P value

HR(CI)

P Value

Metastasis

0.752(0.52-1.079)

0.121

0.755(0.51-1.10)

0.150

Stage(2 vs 1)

1.218(0.58-2.53)

0.597

1.22(0.57-2.61)

0.110

Stage(3 vs 1)

3.169(1.49-6.70 )

0.0025

3.332(1.53-7.25)

0.0024

Stage(4 vs 1)

10.306(4.78- 22.210)

<0.001

12.313(5.47-27.66)

<0.001

Recurrence

2.105(1.33-3.31)

0.0013

2.26( 1.36-3.74)

0.0015

Frailty

:       NIL

Frailty                   : 0.183

Concordance

: 0.726 (SE =0.0241)

(P value=0.290)

 

 

Concordance = 0.822 (SE = 0.016)

Table 3: Comparison of predictability of fitted models using concordance measure: A multi-variable analysis

 

Variables

AIC Values

Gamma

Inverse Gaussian

Log-normal

PVF

 

150

250

150

250

150

250

150

250

Age

643.52

1365.03

643.46

1365.01

643.45

1365.01

644.95

1366.85

Sex

641.46

1362.85

641.41

1362.77

641.41

1362.76

643.41

1364.77

Lymph node

involvement

635.62

1363.10

635.56

1363.05

635.56

1363.04

637.56

1365.06

Comorbidity

642.58

1362.82

642.63

1362.77

642.53

1362.76

644.53

1364.78

Composite

stage

575.56

1265.38

575.41

1265.21

575.38

1265.21

577.36

1267.19

Recurrence

670.71

1357.47

910.43

1357.21

910.5

1357.21

658.65

1359.18

Metastasis

614.88

1316.99

614.29

1316.10

614.28

1316.10

616.14

1317.90

Table 4: Assessment of consistency of frailty model selection using subpopulation of size

Discussion:

In the present study, it was observed that log-normal frailty model was the best fit to model unobserved random heterogeneity in colo-rectal cancer survival. The choice of frailty distribution is one of the critical phases in frailty modeling. The selection of different models as frailty for survival data have been discussed extensively by Hougaard (2000). Zhou et al. (2015) used covariate adjusted frailty model for clustered time to event data and showed the superiority  of frailty over CPH model. Saeedi et al. (2017) considered cirrhosis patient mortality data and used gamma frailty to assess the risk. Frailty models and their merits over CPH model has been discussed by Yazdani (2019), Faradmal et al. (2012) and Talebi et al (2020) using cancer

survival data. Perperoglou et al. (2007) fitted frailty, CPH, time-depended Cox and cure rate models for breast cancer data with long-term survival and identified frailty as the significant model. Gurmu (2018) used parametric frailty model to assess the survival time among cervical cancer patients. The frailty models were looked into by several scholars (Hougaard, 1995, 2000; Therneau and Grambsch 2000; Noh et al., 2006; Liu et al., 2011; Zhou et al. 2015). Monaco et al (2018) introduced a general semi-parametric shared frailty model with gamma, log-normal, inverse Gaussian and power variance function as the frailty distributions, and provides consistent estimators of the standard errors of the parameters‟ estimators. This study indicated the applicability of frailty model in survival studies and their advantages over Cox model. Another on risk assessment for breast cancer using gamma frailty model by Krishna et al. (2021) also reported that frailty models as a better predictive model compared CPH model. All these studies suggested using frailty models to CPHM, which substantiates our findings of superiority of frailty model. Callegaro et al. (2012) introduce a new class of frailty model called log-skew-normal frailty model leading to an extension of the log-normal model. They compared Cox with log normal frailty model and log-skew-normal frailty model and illustrate it with a case study of multiple myeloma patients with autologous stem cells transplantation. They found that log-skew- normal frailty has a significant role in predicting the risk for survival. Nath et al (2020) studied a risk assessment in liver transplantation patients using frailty models. They compared the risk of survival by different frailty models using parametric approach namely, gamma, inverse Gaussian, positive stable and lognormal. Weibull and exponential distribution were considered as the baseline hazard. In this study they also used AIC and BIC criterion to identify better model. Legesse et al. (2022) used shared log-normal frailty model to know whether the recurrent event (time to recovery) is an associated factors of type 2 diabetes. This study also pointed that log-normal frailty model was better model compared to CPH. Similar to these studies, our study also identified log-normal frailty model as a better model compared to CPH model.

Among the covariates considered in the present study, metastases at presentation, stage at diagnosis and recurrence status have identified as significant factors on overall survival of colo- rectal cancer. The survival concordance was higher for log-normal frailty model in multivariable analysis also. In the multivariable analysis, covariates stage and recurrence jointly turned out to be significant prognostic factors and the estimated HR corresponding to log-normal frailty was higher than those obtained using CPHM. Though the frailty coefficient was insignificant, the concordance was always higher for log-normal frailty model. Hence it is suggested frailty models in cancer survival studies.

Jiang et al. (2020) used frailty models to identify the risk factors for hospital readmissions of colo-rectal cancer patients. They considered the covariates such as patient gender (male or female), tumour stage by Duke‟s classification (Stage A-B, Stage C, or Stage D), type of treatment (chemotherapy or radiotherapy), and Charlson comorbidity index (index = 0, 1 or 2, ≥3; time-dependent). They identified stage as one of the prognostic factor. From the simulation study, they concluded that the shared gamma frailty can provide reliable prediction on frailties even when the frailty distribution is miss-specified.

The present study used survival concordance to identify the best predictive model and was observed that log-normal frailty model as the best model based on the uni-variable and multi-variable analysis compared to CPH model. The predictability of log-normal frailty model using C-index ranges from 0.795 to 0.946 and that for CPHM ranges between 0.495 to 0.726, indicating lower predictability for CPHM. The measure of model discrimination analyses using concordance measures was not been discussed much in literature. The only article which highlighted the ability of concordance measure for model identification was by Krishna et al. (2021), which compared the predictability of the gamma frailty model with CPHM and found the gamma frailty as a better predictive model with high C-index.

Several studies have assessed the consistency of the derived model using simulated data. Kim (2019) used simulation methods to assess the consistency of posterior consistency in dispersion parameters and frailty coefficients. Austin (2012) used Monte Carlo simulation method to study the effect of time dependent covariates on CPH models. The present study assessed the consistency of the fitted frailty model  using computer generated random sub-population. Based on the randomly selected subpopulation of sample sizes 150 and 250, using AIC criterion, inverse Gaussian frailty was comparable with log-normal frailty model, however log-normal frailty was slightly better than all other frailty models, which establishes the consistency of the log-normal frailty model.

In conclusion, this study observed that frailty modeling is the best approach to incorporate the hidden random heterogeneity in cancer survival. Randomly selected sub-sample analysis would help to assess the consistency of the best fitted model.

Acknowledgement: The authors are grateful to Department of Health Research (ICMR), Government of India (Grant id: R.11012/03/2021-GIA/HR) for the funding support to carry out the study.

Conflict of interest: There exists no conflict among the authors.

References

    1.  
  1. Aalen OO. Modelling heterogeneity in survival analysis by the compound Poisson distribution. The Annals of Applied Probability. 1992; 2:951-72.
  2. Antolini L, Boracchi P, Biganzoli E. A time‐dependent discrimination index for survival data. Statistics in medicine. 2005; 24(24):3927-44.
  3. Austin PC. Generating survival times to simulate Cox proportional hazards models with time-varying covariates. Statistics in medicine. 2012; 31: 3946-3958.
  4. Callegaro A, Iacobelli S. The Cox shared frailty model with log-skew-normal frailties. Statistical Modelling. 2012; 12(5):399-418.
  5. Camilleri L, Grech L, Manche A. Identifying Risk Factors of Aortic Valve Replacement Using Frailty Models. XJENZA. 2022; 2022:03.
  6. Clayton DG. A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika. 1978; 65(1):141-51.
  7. Cox DR. Regression models and life‐tables. Journal of the Royal Statistical Society: Series B (Methodological). 1972; 34(2):187-202.
  8. Gurmu SE. Assessing survival time of women with cervical cancer using various parametric frailty models: a case study at Tikur anbessa specialized hospital, Addis Ababa, Ethiopia. Annals of Data Science. 2018; 5(4):513-27.
  9. Faradmal J, Talebi A, Rezaianzadeh A, Mahjub H. Survival analysis of breast cancer patients using cox and frailty models. Journal of Research in Health Sciences. 2012; 12(2):127-30.
  10. Hougaard P. A class of multivanate failure time distributions. Biometrika. 1986; 73(3):671- 8.
  11. Hougaard P. Analysis of multivariate survival data. Springer, New York, 2000.
  12. Hougaard P. Frailty models for survival data. Lifetime data analysis. 1995; 1(3):255-73.
  13. Jiang X, Liu W, Zhang B. A note on the prediction of frailties with misspecified shared frailty models. Journal of Statistical Computation and Simulation. 2021; 91(2):219-41.
  14. Kim G. Posterior consistency in frailty models and simulation studies to test the presence of random effects. Journal of Korean Statistical Society. 2019; 48(1): 146-168
  15. Krishna KM, Traison T, Sebastian SM, George PS, Mathew A. Gamma frailty model for survival risk estimation: an application to cancer data. Epidemiologic Methods. 2021;10(1).
  16. Legesse A. Retrospective Study of Recurrence and Associated Factors of Type 2 Diabetes Treated at Adama General Hospital, Oromia, Ethiopia: A Comparison of Cox-PH and Shared Lognormal Frailty Models. International Journal of Endocrinology. 2022; 2022.
  17. Liu D, Kalbfleisch JD, Schaubel DE. A positive stable frailty model for clustered failure time data with covariate‐dependent frailty. Biometrics. 2011; 67(1):8-17.
  18. Marshall AW, Olkin I. Families of multivariate distributions. Journal of the American statistical association. 1988; 83(403):834-41.
  19. McGilchrist CA, Aisbett CW. Regression with frailty in survival analysis. Biometrics. 1991; 461-6.
  20. Monaco JV, Gorfine M, Hsu L. General semiparametric shared frailty model: estimation and simulation with frailtySurv. Journal of statistical software. 2018;86.
  21. Nath DC, Bhattacharjee A, Vishwakarma RK. Risk assessment in liver transplantation patients: A shared frailty parametric approach. Clinical Epidemiology and Global Health. 2016 ; 4(1):1-5.
  22. Noh M, Ha ID, Lee Y. Dispersion frailty models and HGLMs. Statistics in medicine. 2006; 25(8):1341-54.
  23. Perperoglou A, Keramopoullos A, van Houwelingen HC. Approaches in modelling long‐ term survival: an application to breast cancer. Statistics in medicine. 2007; 26(13):2666-85.
  24. Saeedi E, Abolaghasemi J, Tousi MN, Khosravi S. Application of Gamma Frailty Model in Survival of Liver Cirrhosis Patients. International Journal of Health and Medical Engineering. 2017; 11(5):278-81.
  25. Schoenfeld D. Chi-squared goodness-of-fit tests for the proportional hazards regression model. Biometrika. 1980; 67(1):145-53.
  26. Talebi A, Mohammadnejad A, Akbari A, Pourhoseingholi MA, Doosti H, Moghimi- Dehkordi B, Agah S, Bahardoust M. Survival analysis in gastric cancer: a multi-center study among Iranian patients. BMC surgery. 2020; 20(1):1-8.
  27. Therneau TM, Grambsch PM. Modeling Survival Data. Springer, New York, 2000.
  28. Vaupel JW, Manton KG, Stallard E. The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography. 1979; 16(3):439-54.
  29. Wang YC, Emura T. Multivariate failure time distributions derived from shared frailty and copulas. Japanese Journal of Statistics and Data Science. 2021; 4(2):1105-31.
  30. Yazdani A, M. Yaseri, S. Haghighat, A. Kaviani, and H. Zeraati. „„Investigation of Prognostic Factors of Survival in Breast Cancer Using a Frailty Model: A Multicenter Study.‟‟ Breast Cancer: Basic and Clinical Research. 2019; 13: 1−10
  31. Zhou H, Hanson T, Jara A, Zhang J. Modeling county level breast cancer survival data using a covariate-adjusted frailty proportional hazards model, Ann pl Stat 2015; 9 (1): 43- 68.