On the Interpretation of Principal Balances for Compositional Data Sets


On the Interpretation of Principal Balances for Compositional Data Sets

Martin-Fernandez, J. A.; Pawlowsky-Glahn, V.; Egozcue, J. J.; Tolosana-Delgado, R.

To analyse compositional data (CoDa) sets it is advisable to apply the principle of working on orthonormal coordinates. Sequential Binary Partition (SBP) is a technique to construct an interpretable basis. The provided coordinates are called balances, and they are complemented with a descriptive tool, the CoDa-dendrogram. The goal of a Principal Balances method is to identify an orthonormal basis of the simplex such that the coordinates are balances approaching the properties of principal component analysis (PCA) of CoDa. PCA is one of the main tools for exploratory analysis and modelling of CoDa. However, the main shortcoming of PC's is the difficulty in interpreting the resulting coordinates because a PC is a function of all the original parts. Balances are log-contrasts resulting from a logratio of two geometric means of two groups of parts. Their interpretation is considerably simpler than the interpretation of PC's. The resulting procedures provide tools improving interpretability and intuitive dimension reduction. The algorithm to compute principal balances requires an exhaustive search along all possible sets of orthonormal balances. The consumption of computational time may be considerable for even a small number of parts. Here, to reduce this time, the sets of possible partitions for up to 15 parts is stored. For comparison, two suboptimal, but feasible, algorithms are introduced: (i) searching for balances following a constraint PC's approach; and (ii) a hierarchical cluster analysis of variables based on the Aitchison distance between parts. The properties and performance of these three algorithms are illustrated using a data set of geochemical composition of glacial sediments. Results obtained corroborate the theoretical properties of the methods: they approximate reasonably well the PC's improving the interpretability. However, the price payed is a smaller amount of variance explained by the first balances and the lack of uncorrelation between the coordinates.

  • Poster
    18th Annual Conference IAMG2017, 02.-09.09.2017, Perth, Australia

Permalink: https://www.hzdr.de/publications/Publ-25091