Finding the Most Valuable Data Points for Predicting Extreme Event Statistics with Likelihood-Weighted Active Learning

This abstract has open access
Summary
Understanding how, when, and why extreme events occur is an important problem in many fields (e.g. extreme climate events, rogue ocean waves, materials science, etc.). However, because these events are infrequent and/or severe by nature, it is often necessary to collect a large amount of data to understand these events and their relationship to the system. As a result, data sets created from experiments and simulations often contain many nearly repetitive, and thus unnecessary, points. Data sets used for such analysis become even larger for problems in high dimensions. Furthermore, training predictive models with these large data sets requires vast computing time and power. The question becomes how to select a subset of points to reduce training time while accurately representing the distribution of the original data. We present an active learning framework that uses a likelihood-weighted sampling criterion to sequentially select optimal training input points that give rise to outputs in the tails of the distribution (i.e. most relevant to the dynamics of extreme events). To compute the criterion, we use neural network architectures capable of making probabilistic predictions. We test the method by predicting the maximum future wave height of a 1D dispersive nonlinear wave model from a high-dimensional set of initial conditions. The likelihood-weighted search algorithm is able to accurately reproduce the probability density function of the original data sets using a fraction of the original points.
Abstract ID :
125

Associated Sessions

6 visits