A Truly Spatial Random Forests Algorithm for Geoscience Data Analysis and Modelling


A Truly Spatial Random Forests Algorithm for Geoscience Data Analysis and Modelling

Talebi, H.; Peeters, L. J. M.; Otto, A.; Tolosana Delgado, R.

Spatial data mining helps to find hidden but potentially informative patterns from large and high-dimensional geoscience data. Non-spatial learners generally look at the observations based on their relationships in the feature space, which means that they cannot consider spatial relationships between regionalised variables. This study introduces a novel spatial random forests technique based on higher-order spatial statistics for analysis and modelling of spatial data. Unlike the classical random forests algorithm that uses pixelwise spectral information as predictors, the proposed spatial random forests algorithm uses the local spatial-spectral information (i.e., vectorised spatial patterns) to learn intrinsic heterogeneity, spatial dependencies, and complex spatial patterns. Algorithms for supervised (i.e., regression and classification) and unsupervised (i.e., dimension reduction and clustering) learning are presented. Approaches to deal with big data, multi-resolution data, and missing values are discussed. The superior performance and usefulness of the proposed algorithm over the classical random forests method are illustrated via synthetic and real cases, where the remotely sensed geophysical covariates in North West Minerals Province of Queensland, Australia, are used as input spatial data for geology mapping, geochemical prediction, and process discovery analysis

Keywords: Geostatistical learning; Higher-order spatial statistics; Random forests; Spatial correlation; Spatial data

Permalink: https://www.hzdr.de/publications/Publ-34024