Results from the Image Biomarker Standardisation Initiative

Results from the Image Biomarker Standardisation Initiative

Zwanenburg, A.; Abdalah, M.; Ashrafinia, S.; Beukinga, J.; Bogowicz, M.; Dinh, C. V.; Götz, M.; Hatt, M.; Leijenaar, R.; Lenkowicz, J.; Morin, O.; Rao, A.; Socarras Fernandez, J.; Vallieres, M.; van Dijk, L.; van Griethuysen, J.; van Velden, F. H. P.; Whybra, P.; Troost, E.; Richter, C.; Löck, S.

Purpose: Radiomics is the high-throughput analysis of medical images for treatment individualisation. It conventionally relies on the quantification of different characteristics of a region of interest (ROI) delineated in the image, such as the mean intensity, volume and textural heterogeneity. The lack of standardisation of image features is one of the major limitations for reproducing and validating radiomic studies, and thus a major hurdle for further developments in the field and for clinical translation. To overcome this challenge, a large international collaboration of 19 teams from 8 countries was initiated to establish an image feature ontology, and to provide definitions of commonly used features, benchmarks for testing feature extraction and image processing software, and reporting guidelines.

Methods: The initiative consisted of two phases. In phase 1, 351 commonly used features were specified and benchmarked against a simple digital phantom, without any requirement for image pre-processing steps. The feature set consisted of commonly used radiomic features and encompasses statistical, morphological and texture characteristics of the ROI, both slice-by-slice (2D) and as a volume (3D). In phase 2, image pre-processing steps were introduced, and features were benchmarked by evaluating five pre-processing configurations on a lung cancer patient CT image. The configurations differ in treatment of the image stack (2D: A-B; 3D: C-E), the interpolation method (none: A; bi/trilinear: B-D, tricubic: E) and the grey-level discretisation method (fixed bin size: A, C; fixed number of bins: B, D-E).

Both phases were iterative, and participants had the opportunity to compare results and update their workflow implementation. We set the most frequently contributed value of each feature as its benchmark value, and subsequently determined its reliability based on the number of contributing groups and the consensus level.

Results: 19 different software implementations were tested. In both phases, only a small number of features were found to be reliable initially. The number of reliable features increased over time as problems were identified and resolved, see Figure 1 and Table 1. Remaining features for which no agreement was reached were not commonly implemented (< 3 agreeing teams), and could therefore not be reliably assessed.

Conclusion: We addressed the lack of standardised feature definitions, implementation and image pre-processing steps for radiomics by providing reliable benchmark values for commonly used features. During the initiative, the 19 teams demonstrated large initial differences, yet nevertheless managed to converge to common reference values by increasing adherence to standardised definitions. Therefore, the use of our standardised definitions and benchmarks to test and update radiomics software is imperative to increase reproducibility of future radiomics studies.

  • Lecture (Conference)
    ESTRO 37, 20.-24.04.2018, Barcelona, España
  • Open Access Logo Abstract in refereed journal
    Radiotherapy and Oncology 127(2018), S543-S544
    DOI: 10.1016/S0167-8140(18)31291-X

Publ.-Id: 26220