Contacts
Antonio Lijoi and colleagues propose a model to study heterogeneous data

Non-parametric Bayesian inference is a flexible and effective approach for analyzing complex phenomena. It has been proven successful in several applied fields ranging from genomics to functional data analysis, from clinical trials to topic modeling, just to mention a few. Some recent interesting developments have been achieved in the analysis of data arising from DNA sequencing where non-parametric Bayesian models yield simple and intuitive tools to predict the number of new genes that would be discovered in an additional sample by analyzing only a fraction of a genomic library or to estimate the so-called sample coverage.

Things get much more complicated when heterogeneous data are available as in the case of DNA sequences coming from different tissues of an organism. This is the problem that Antonio Lijoi, Igor Prünster, Federico Camerlenghi, and Peter Orbanz face in Distribution Theory for Hierarchical Processes, forthcoming on Annals of Statistics. The authors propose a general model for data that are affected by a source of heterogeneity: this is the typical setting that characterizes meta-analysis experiments and is of great interest in machine learning applications. Patients treated in different hospitals or documents issued by different areas of the same organization are both examples of heterogeneous populations that share common features.

"Hierarchical processes are useful to address this problem. They arise as the composition of discrete random probability measures", Antonio Lijoi says. "Our paper presents novel theoretical results and describes two classes of algorithms that can be readily implemented. On the one hand, the so-called 'marginal' algorithms provide approximate samples from predictive laws in heterogeneous populations. On the other hand, 'conditional' algorithms generate realizations of the underlying random probability measures conditionally on the data. They allow us not only to make predictions, but also yield a more reliable evaluation of the uncertainty associated with them".

Some promising new developments in this study concern survival analysis in the presence of covariate-dependent data.

Read more about this topic:
Riccardo Zecchina. Teaching Machines How to Learn to Improve Business and Life
Carlo Baldassi. Learning Is a Quantum Question
Daniele Durante. How to Study Dynamic Networks
Dirk Hovy. The algorithm that Prevents Suicide
Alessia Melegaro. The Network Modeling Chip that Fights Influenza
Raffaella Piccarreta, Marco Bonetti. A (Statistical) Model for Life