Geostatistics for compositional data: from spatial interpolation to high dimensional prediction


Geostatistics for compositional data: from spatial interpolation to high dimensional prediction

Tolosana Delgado, R.; van den Boogaart, K. G.

Geostatistics is a name given to a series of statistical and machine learning tools devised to treat a spatially dependent variable with the goal of interpolating it. The key tool of classical geostatistics is the covariance function, capturing the covariance (matrix) between the variable (vector) observed at two locations in space. Pawlwowsky-Glahn and Olea (2004; "Geostatistical analysis for compositional data") already extended this framework to deal with spatially dependent compositional data, taking a logratio transformation, i.e. by means of the covariance function of the logratio transformed scores. Given a spatially dependent compositional data set, if we had available a model for the covariance function, it would be possible to predict the composition at a new location by means of multivariate multiple linear regression. The typical approach to obtain this covariance is to restrict it to be location-independent (but still depend on the lag difference between locations), and give it a parametric form. This vector of parameters is then either fitted via maximum likelihood, or else data-driven to specific collections of spread statistics of the sample. Similar approaches can be followed with compositions. Several such data driven methods have been proposed for compositions, which can be seen as choosing an \emph{oblique logratio} such that the covariance function becomes a diagonal matrix for all lags (and by extension, for all pairs of locations), with the resulting diagonal elements easily modelled separately. In this contribution we will discuss the several implications of these methodologies to obtain a parametric model for the covariance function, how to use this function to predict the composition at any location, the subcompositional properties of this predictor, and how this whole framework can be used beyond spatial statistics, to establish (almost) non-parametric predictive models for compositional responses with high dimensional regressors.

Keywords: variogram; auto-covariance; cross-covariance; minimum-maximum autocorrelation factors; kriging

  • Open Access Logo Contribution to proceedings
    CoDaWork2022: 9th International Workshop on Compositional Data Analysis, 28.06.-01.07.2022, Tolouse, Frankreich

Permalink: https://www.hzdr.de/publications/Publ-34162