Modelling and Evaluation
Dr. Raimon Tolosana Delgado
Senior scientist / Geostatistics, Compositional Data Analysis, Process Modelling, Predictive Geomet
|Phone:||+49 351 260 4415|
|Address||Chemnitzer Str. 40|
- Predictive geometallurgy: Geometallurgy, understood as the interface discipline between ore geology and mineralogy, mining and processing engineering and metallurgy, it is the backbone of the research at HIF. In this area, we are working on the establishment of methods for the statistical analysis of the wealth of data produced in a geometallurgical chacterisation campaign, in order to derive predictive models that allow to quantitatively forecast the behaviour of ores and waste along the value chain. A technology platform (databases, analysis software, interfaces to existing software) is being as well developed. Particular focus is placed on the obtention of geostatistical methods and 3D models for geometallurgical properties, and on the usage of big data methods for the establishment of the forecasting models.
- Compositional data analysis (CoDa). If the variables of a data set inform of the relative importance of a set of parts in a whole, then the data set should be considered a composition. These are typically described in relative units, %, mg/l, ppm, molarity, etc; they are always positive, and their total sum is equal or smaller than a constant (100%, 106, etc, so called the closure). Because of these general limitations, CoDa should not be treated with classical correlation based statistical techniques. Instead, one should work with a one-to-one set of log-ratios (capturing the relative character of compositions mentioned before). The main research in this topic is adapting and applying statistical methods to compositions. Most classical multivariate statistical methods can be modified to work meaningfully with these logratio transformed data: new and old descriptive statistics and diagrams, linear models, geostatistics, latent variable models (factor analysis, endmember unmixing problems), time series, etc, they can all be quickly adapted. The only general requirements are: to use multivariate methods (as a composition has always many variables) and to avoid interpreting results in terms of "absolute increment" or "absolute decrement". Conveying relative information, CoDa can only provide assessment on relative increments/decrements, i.e. enrichment/depletion of one component with respect to another. Lessons learnt from CoDa analysis can also be useful for many other kinds of data with restrictions (with positive variables, grainsize distributions, orientations and other spherical information as the most relevant). Current research focuses on the multivariate calibration of several kinds of bulk and locally resolved analytical methods.
- Geostatistics. Data in geosciences are often georeferenced, i.e. we know and use the positions in space (and/or time) where the samples were taken. In these cases, it is often natural to assume that data taken at neighbouring locations are probably more similar than data taken at locations far apart. This idea of increasing variability with increasing spatial distance is behind the concept of variogram, a function that describes how the variance of the difference of pairs of data increases with the distance between the sampling locations. Knowing the variogram of a data set allows us to map the variable in space using optimal interpolations, in the sense that they minimize the interpolation error variance. The main interest in this field is compositional geostatistics, i.e. the obtention of consistent spatial models and maps of compositional data. The key idea here is to work in the set of all possible pairwise log-ratios. This gives a flexible set of tools and solutions, consistent with both conventional geostatistics and logratio CoDa methodologies. CoDa-geostatistics finds applications from intra-crystal variability analysis to national-scale geochemical surveys. Current research in this field is focused on: block kriging for CoDa, and geostatistical simulation of mineralogically-consistent compositional data.
- Bayesian statistics. In the Bayesian paradigm, we want to know the probable value of some physically meaningful model parameters which condition the available data. In this framework, data are considered known random functions of the parameters, and the goal is to estimate the distribution of the uncertainty about the parameters conditional on the data. Really interesting applications cannot analytically resolve this parameter posterior distribution, and must resort to computationally intensive Markov Chain Monte Carlo methods. This is a very general methodology with many varied applications in all fields of science. Within the scope of HIF activities, endmember problems stand out. In an endmember problem, one assumes that a given sampled signal (chemical composition, XRD or Raman spectrum, etc) is an additive mixture of some pure endmember signals (known or unknown), and the goal is to unmix the signal and estimate the proportions of each endmember in the samples. In most cases, some or all information available and desired (sampled signals, endmember signals and endmember proportions) are CoDa, and probabilistic models must therefore be adapted to this fact.
- Model-data merging techniques. Sometimes, the physically meaningful parameters do not control the available data directly, but through their influence on some state variables of a differential equation system. These typically model reactive-diffusion-advection processes or Lotka-Volterra-like dynamic systems. Bayesian analysis offers a framework to understand the relations between the several parameters, state variables and data. Parameters can be estimated from available data (calibration), concurrent alternative models can be ranked in their goodness of fit (validation) and predicted state variables can be perturbed to fit the data (assimilation). Most often, these systems are regionalized, thus requiring geostatistical tools in several intermediate steps. Current research along this line relates to the establishment of physically sound minerals processing forecasting models, in multi-point simulation algorithms for complex data types under known physical regimes (i.e. sedimentation processes), and in resource model updating methods.
- R programming. R is a multi-platform, free and open source statistical environment that has become a sort of de facto standard in Statistics. We have been working since 2003 in a package for compositional analysis (called compositions), and are dealing now with geostatistical applications (with a new package gmGeostats), latent variable models, grainsize distributions applications and textural analysis. Userfriendlyness of these R packages is also a line of work. Currently, taylored packages for the analysis of geometallurgical data (particle based measurements as MLA, QUEMSCAN; bulk measurements as XRD, XRF, AES, OES; spot measurements as EPMA, LAICPMS; geometallurgical tests, etc) is being developed.