The immuno-oncologic (IO) mechanism of action may lead to an overall survival (OS) hazard that changes over time, producing shapes that standard parametric extrapolation methods may struggle to reflect. Furthermore, selection of the most appropriate extrapolation method for health technology assessment is often based on trial data with limited follow-up.
To examine this problem, we fitted a range of extrapolation methods to patient-level survival data from CheckMate 025 (NCT01668784, CM-025), a phase III trial comparing nivolumab with everolimus for previously treated advanced renal cell carcinoma (aRCC), to assess their predictive accuracy over time.
Six extrapolation methods were examined: standard parametric models, natural cubic splines, piecewise models combining Kaplan-Meier data with an exponential or non-exponential distribution, response-based landmark models, and parametric mixture models. We produced three database locks (DBLs) at minimum follow-ups of 15, 27, and 39 months to align with previously published CM-025 data. A three-step evaluation process was adopted: (1) selection of the distribution family for each method in each of the three DBLs, (2) internal validation comparing extrapolation-based landmark and mean survival with the latest CM-025 dataset (minimum follow-up, 64 months), and (3) external validation of survival projections using clinical expert opinion and long-term follow-up data from other nivolumab studies in aRCC (CheckMate 003 and CheckMate 010).
All extrapolation methods, with the exception of mixture models, underestimated landmark and mean OS for nivolumab compared with CM-025 long-term follow-up data. OS estimates for everolimus tended to be more accurate, with four of the six methods providing landmark OS estimates within the 95% confidence interval of observed OS as per the latest dataset. The predictive accuracy of survival extrapolation methods fitted to nivolumab also showed greater variation than for everolimus. The proportional hazards assumption held for all DBLs, and a dependent log-logistic model provided reliable estimates of longer-term survival for both nivolumab and everolimus across the DBLs. Although mixture models and response-based landmark models provided reasonable estimates of OS based on the 39-month DBL, this was not the case for the two earlier DBLs. The piecewise exponential models consistently underestimated OS for both nivolumab and everolimus at clinically meaningful pre-specified landmark time points.
This aRCC case study identified marked differences in the predictive accuracy of survival extrapolation methods for nivolumab but less so for everolimus. The dependent log-logistic model did not suffer from overfitting to early DBLs to the same extent as more complex methods. Methods that provide more degrees of freedom may accurately represent survival for IO therapy, particularly if data are more mature or external data are available to inform the long-term extrapolations.