Multitask Learning with Convolutional Neural Networks and Vision Transformers Can Improve Outcome Prediction for Head and Neck Cancer Patients


Multitask Learning with Convolutional Neural Networks and Vision Transformers Can Improve Outcome Prediction for Head and Neck Cancer Patients

Starke, S.; Zwanenburg, A.; Leger, K.; Lohaus, F.; Linge, A.; Schreiber, A.; Kalinauskaite, G.; Tinhofer, I.; Guberina, N.; Guberina, M.; Balermpas, P.; von der Grün, J.; Ganswindt, U.; Belka, C.; Peeken, J. C.; Combs, S. E.; Böke, S.; Zips, D.; Richter, C.; Troost, E. G. C.; Krause, M.; Baumann, M.; Löck, S.

Neural-network-based outcome predictions may enable further treatment personalization of patients
with head and neck cancer. The development of neural networks can prove challenging when a limited number of cases is available. Therefore, we investigated whether multitask learning strategies, implemented through the simultaneous optimization of two distinct outcome objectives (multi-outcome) and combined with a tumor segmentation task, can lead to improved performance of convolutional neural networks (CNN) and vision transformers (ViT). Model training was conducted on two distinct multicenter datasets for the endpoints loco-regional control (LRC) and progression-free survival (PFS), respectively. The first dataset consisted of pre-treatment computed tomography (CT) imaging for 290 patients and the second dataset contained combined positron emission tomography (PET)/CT for 224 patients. Discriminative performance was assessed by the concordance index (C-index). Risk stratification was evaluated using log-rank tests. Across both datasets, CNN and ViT model ensembles achieved similar results. Multitask approaches showed favorable performance in most investigations. Multi-outcome CNN models trained with segmentation loss were identified as the optimal strategy across cohorts. On the PET/CT dataset, an ensemble of multi-outcome CNNs trained with segmentation loss achieved the best discrimination (C-index: 0.29, 95% confidence interval (CI): 0.22-0.36) and successfully stratified patients into groups with low and high risk of disease progression (p=0.003). On the CT dataset, ensembles of multi-outcome CNNs and of single-outcome ViTs trained with segmentation loss performed best (C-index: 0.26 and 0.26, CI: 0.18-0.34 and 0.18-0.35, respectively), both with significant risk stratification for LRC in independent validation (p=0.002 and p=0.011). Further validation of the developed multitask-learning models is planned based on a prospective validation study, which is currently ongoing.

Keywords: survival analysis; vision transformer; convolutional neural network; multitask learning; tumor segmentation; head and neck cancer; Cox proportional hazards; loco-regional control; progression-free survival; discrete-time survival models

Related publications

Permalink: https://www.hzdr.de/publications/Publ-37388
Publ.-Id: 37388