Data from: Developing and validating a survival prediction model for NSCLC patients through distributed learning across three countries

Arthur Jochems; Timo M. Deist; Issam El Naqa; Marc Kessler; Chuck Mayo; Jackson Reeves; Shruti Jolly; Martha Matuszak; Randall Ten Haken; Johan van Soest; Cary Oberije; Corinne Faivre-Finn; Gareth Price; Dirk De Ruysscher; Philippe Lambin; André Dekker

Title	Data from: Developing and validating a survival prediction model for NSCLC patients through distributed learning across three countries
Publication Type	Dataset
Year of Publication	2017
Authors	Jochems, A, Deist, TM, Naqa, IEl, Kessler, M, Mayo, C, Reeves, J, Jolly, S, Matuszak, M, Haken, RTen, van Soest, J, Oberije, C, Faivre-Finn, C, Price, G, De Ruysscher, D, Lambin, P, Dekker, A
Publisher	CancerData
Publication Language	eng
Keywords	Bayesian network, lung cancer, NSCLC, prediction model
Abstract	Purpose Tools for survival prediction for non-small cell lung cancer (NSCLC) patients treated with (chemo)radiotherapy are of limited quality. In this work, we develop a predictive model of survival at two years based on a large volume of historical patient data, as a proof of concept, using a distributed learning approach. Patients and methods Clinical data from 698 lung cancer patients, treated with curative intent with chemoradiation (CRT) or radiotherapy (RT) alone were collected and stored in 2 different cancer institutes (559 patients at Maastro clinic (Netherlands), 139 at University of Manchester (UK). The model was further validated on 196 patients originating from the University of Michigan (USA). A Bayesian network model is adapted for distributed learning (watch the animation). Two-year post-treatment survival was chosen as endpoint. The Institute 1 cohort data is publicly available and the developed models can be found at PredictCancer.org) Results Variables included in the final model were T and N stage, age, performance status, and total tumor dose. The model has an AUC of 0.66 on the external validation set and an AUC of 0.62 on a 5-fold cross-validation. A model based on T and N stage performed with an AUC of 0.47 on the validation set, significantly worse than our model (P<0.001). A high- and low-risk chance of survival group can be identified using the model presented in this study, these groups have significantly different overall survival (P<0.01). Conclusion Distributed learning from federated databases allows learning of predictive models on data originating from multiple institutions while avoiding many of the data sharing barriers. We believe that Distributed learning is the future of sharing data in health care.
DOI	10.17195/candat.2017.02.2
Original Publication	10.1016/j.ijrobp.2017.04.021

File:

Attachment	Size
Jochems-2017-MaastroDataUnbinned.csvdisplayed 1350 times	62.76 KB