Scalable Algorithms For Optimal Experimental Design For .

6d ago
3 Views
0 Downloads
4.85 MB
39 Pages
Transcription

Scalable algorithms for optimal experimental design forinfinite-dimensional nonlinear Bayesian inverse problemsAlen Alexanderian (Math/NC State), Omar Ghattas (ICES/UT-Austin),Noémi Petra (Applied Math/UC Merced), Georg Stadler (Courant/NYU)Data Assimilation and Inverse ProblemsMathematics Institute, University of WarwickCoventry, United KingdomFebruary 23, 2016

The conceptual optimal experimental design problemData: d RqModel parameter field: m {m(x)}x DBayesian inverse problem:Use data d to infer the model parameterfield m in the Bayesian sense, i.e., updatethe prior state of knowledge on mOptimal experimental design:How to design observation system for d to“optimally” infer m?Omar Ghattas (UT Austin)Large-scale OEDFebruary 23, 20162 / 32

Some relevant references on OED for PDEsE. Haber, L. Horesh, L. Tenorio, Numerical methods for experimental design oflarge-scale linear ill-posed inverse problems, Inverse Problems, 24:125-137, 2008.E. Haber, L. Horesh, L. Tenorio, Numerical methods for the design of large-scalenonlinear discrete ill-posed inverse problems, Inverse Problems, 26, 025002, 2010.E. Haber, Z. Magnant, C. Lucero, L. Tenorio, Numerical methods for A-optimaldesigns with a sparsity constraint for ill-posed inverse problems, ComputationalOptimization and Applications, 1-22, 2012.Q. Long, M. Scavino, R. Tempone, S Wang, Fast estimation of expectedinformation gains for Bayesian experimental designs based on Laplaceapproximations, Comp. Meth. in Appl. Mech. and Eng., 259:24–39, 2013.Q. Long, M. Scavino, R. Tempone, S. Wang, A Laplace method forunder-determined Bayesian optimal experimental designs, Comp. Meth. in AppliedMech. and Engineering, 285:849–876, 2015.F. Bisetti, D. Kim, O. Knio, Q. Long, R. Tempone, Optimal Bayesian experimentaldesign for priors of compact support with application to shocktube experiments forcombustion kinetics, Intl. Journal for Num. Methods in Engineering, 2016.X. Huan and Y.M. Marzouk, Simulation-based optimal Bayesian experimentaldesign for nonlinear systems, Journal of Computational Physics, 232:288-317, 2013.X. Huan, Y. Marzouk, Gradient-based stochastic optimization methods in Bayesianexperimental design, Intl. Journal for Uncertainty Quantification, 4(6), 2014.Omar Ghattas (UT Austin)Large-scale OEDFebruary 23, 20163 / 32

How we define an optimal designground truth sensor locationsGiven locations xi of possible sensor sites,choose optimal weights wi for a subset (usingsparsity constraints): x1 , . . . , xnsdesign : w1 , . . . , wnsMAP solutionObtain data, solve Bayesian inverse problem:data likelihood, prior posteriorvarianceWe seek an A-optimal design:minimize average posterior varianceOmar Ghattas (UT Austin)Large-scale OEDFebruary 23, 20164 / 32

Bayesian inverse problem in Hilbert space (A. Stuart, 2010)D: bounded domainH L2 (D)m H: parameterParameter-to-observable map: f : H RqAdditive Gaussian noise:η N (0, Γnoise )d f (m) η,Likelihood:πlike (d m) expno1 (f (m) d)T Γ 1noise (f (m) d)2Gaussian prior measure: µprior N (mpr , Cprior )Bayes Theorem (in infinite dimensions):dµpost πlike (d m)dµprior “dµpost πlike (d m) dµprior ”A.M. Stuart, Inverse problems: A Bayesian perspective, Acta Numerica, 2010.Omar Ghattas (UT Austin)Large-scale OEDFebruary 23, 20165 / 32

The experimental design and a weighted inference problem design : Want wi {0, 1}relax x 1 , . . . , x nsw1 , . . . , wns0 wi 1 threshold wi {0, 1}w-weighted data-likelihood:πlike (d m; w) expno1 (f (m) d)T Wσ (f (m) d)2f : parameter-to-observable mapW : diag(wi )Wσ : 1W2σnoiseOmar Ghattas (UT Austin)Large-scale OEDFebruary 23, 20166 / 32

A-optimal design in Hilbert spaceCovariance function: cpost (x, y) Cov {m(x), m(y)}Z1cpost (x, x) dxaverage posterior variance D DCovariance operator:Z[Cpost u](x) cpost (x, y)u(y) dyDMercer’s Theorem:Zcpost (x, x) dx tr(Cpost )DA-optimal design criterion:Choose a sensor configuration to minimize tr(Cpost )Omar Ghattas (UT Austin)Large-scale OEDFebruary 23, 20167 / 32

Relationship to some other OED criteriaFor Gaussian prior and noise, and linear parameter-to-observable map:Maximizing the expected information gain is equivalent to D-optimal design: 11Eµprior Ed m {Dkl (µpost , µprior )} log det Γpost log det Γprior22Minimizing Bayes risk of posterior mean is equivalent to A-optimal design: Eµprior Ed m km mpost (d)k2Z Z km mpost (d)k2 πlike (d m) dd µprior (dm)H Y tr(Γpost )For infinite dimensions, see:A. Alexanderian, P. Gloor, and O. Ghattas, On Bayesian A- and D-optimal experimental designsin infinite dimensions, Bayesian Analysis, to appear, 2016.http://dx.doi.org/10.1214/15-BA969Omar Ghattas (UT Austin)Large-scale OEDFebruary 23, 20168 / 32

ChallengesThe infinite-dimensional Bayesian inverse problem is “merely” an innerproblem within OEDNeed trace of posterior covariance operator, which would be intractable inthe large-scale settingWe approximate the posterior covariance by the inverse of the Hessian at themaximum a posteriori (MAP) point, but explicit construction of the Hessianis still very expensiveFor nonlinear inverse problem, Hessian depends on data, which are notavailable a prioriFor efficient optimization, we need derivatives of the trace of the inverseHessian with respect to the sensor weightsWe seek scalable algorithms, i.e., those whose cost (in terms of forward PDEsolves) is independent of the data dimension and the parameter dimensionOmar Ghattas (UT Austin)Large-scale OEDFebruary 23, 20169 / 32

A-optimal design problem: Linear caseLinear parameter-to-observable map: f (m) : FmGaussian prior measure: µprior N (mprior , Cprior )Gaussian prior and noise Gaussian posteriormpost arg minm1W 1/2 (Fm d)2 2 Γ 1noiseE1 D 1Cprior (m mprior ), m mprior2{z}J (m) : negative log-posteriorCpost (w) H 1 (w), 1H(w) F Wσ F CpriorOED problem:minimize trw F Wσ F 1Cprior 1 penaltyA. Alexanderian, N. Petra, G. Stadler, O. Ghattas, A-optimal design of experiments for infinite-dimensionalBayesian linear inverse problems with regularized 0 -sparsification, SIAM Journal on Scientific Computing,36(5):A2122–A2148, 2014.Omar Ghattas (UT Austin)Large-scale OEDFebruary 23, 201610 / 32

A-optimal design problem: Nonlinear caseNonlinear case: Gaussian prior and noise 6Gaussian posteriorUse Gaussian approximation to posterior: for given w and dCompute the maximum a posteriori probability (MAP) estimate mMAP (w; d)mMAP (w; d) : arg min J (m, w, d)mGaussian approximation to posterior at MAP point N mMAP (w; d), H 1 mMAP (w; d), w; dProblem: data d not available a prioriOmar Ghattas (UT Austin)Large-scale OEDFebruary 23, 201611 / 32

A-optimal designs for nonlinear inverse problemsGeneral formulation: minimize average posterior variance: tr H 1 mMAP (w; d), w; dminimizewOmar Ghattas (UT Austin)Large-scale OEDFebruary 23, 201612 / 32

A-optimal designs for nonlinear inverse problemsGeneral formulation: minimize expected average posterior variance:n ominimizeEdtr H 1 mMAP (w; d), w; dwOmar Ghattas (UT Austin)Large-scale OEDFebruary 23, 201612 / 32

A-optimal designs for nonlinear inverse problemsGeneral formulation: minimize expected average posterior variance:n ominimize Eµprior Ed m tr H 1 mMAP (w; d), w; dwOmar Ghattas (UT Austin)Large-scale OEDFebruary 23, 201612 / 32

A-optimal designs for nonlinear inverse problemsGeneral formulation: minimize expected average posterior variance:n ominimize Eµprior Ed m tr H 1 mMAP (w; d), w; d γP (w)wP (w): sparsifying penalty functionIn practice: get data from a few training models m1 , . . . , mndmi : draws from priorTraining data:di f (mi ) η i ,Omar Ghattas (UT Austin)Large-scale OEDi 1, . . . , ndFebruary 23, 201612 / 32

A-optimal designs for nonlinear inverse problemsGeneral formulation: minimize expected average posterior variance:n ominimize Eµprior Ed m tr H 1 mMAP (w; d), w; d γP (w)wP (w): sparsifying penalty functionIn practice: get data from a few training models m1 , . . . , mndmi : draws from priorTraining data:di f (mi ) η i ,i 1, . . . , ndThe problem to solve in practice:minimizewOmar Ghattas (UT Austin)nd 1 Xtr H 1 mMAP (w; di ), w; di γP (w)nd i 1Large-scale OEDFebruary 23, 201612 / 32

Randomized trace estimationA Rn n — symmetricTrace estimator:tr(A) ntr1 Xz T Az i ,ntr i 1 iz i — random vectorsGaussian trace estimator: z i independent draws from N (0, I)For z N (0, I) E z T Az tr(A) 2Var z T Az 2 kAkFClustered eigenvalues Good approximation with small ntrEfficient means of approximating trace of inverse HessianM. Hutchinson, A stochastic estimator of the trace of the influence matrix for Laplacian smoothingsplines (1990).H. Avron and S. Toledo, Randomized algorithms for estimating the trace of an implicit symmetric positivesemi-definite matrix (2011).Omar Ghattas (UT Austin)Large-scale OEDFebruary 23, 201613 / 32

Optimization problem for computing A-optimal designOED objective function:Ψ(w) Omar Ghattas (UT Austin)nd 1 Xtr H 1 w; di )nd i 1Large-scale OEDFebruary 23, 201614 / 32

Optimization problem for computing A-optimal designOED objective function:Ψ(w) nd Xntr1 Xzk , H 1 w; di ) zknd ntr i 1k 1Omar Ghattas (UT Austin)Large-scale OEDFebruary 23, 201614 / 32

Optimization problem for computing A-optimal designOED objective function:Ψ(w) nd Xntr1 Xh zk , H 1 w; di ) zk ind ntr i 1{z} k 1yikAuxiliary variable: yik H 1 w; di )zkOmar Ghattas (UT Austin)Large-scale OEDFebruary 23, 201614 / 32

Optimization problem for computing A-optimal designOED objective function:Ψ(w) nd Xntr1 Xh zk , H 1 w; di ) zk ind ntr i 1{z} k 1yikAuxiliary variable: yik H 1 w; di )zkOED optimization problemnd Xntr1 Xhzk , yik i γP (w)minimizewnd ntr i 1k 1where mMAP (w; di ) arg min J m, w; di , i 1, . . . , ndm H mMAP (w; di ), w; di yik zk , i 1, . . . , nd , k 1, . . . , ntrOmar Ghattas (UT Austin)Large-scale OEDFebruary 23, 201614 / 32

Application: subsurface flowForward problem ·(em u) fin Du gon ΓDe u · n hon ΓNmu: pressure fieldm: log-permeability field (inversion parameter)Left: true parameter; Right: pressure-fieldOmar Ghattas (UT Austin)Large-scale OEDFebruary 23, 201615 / 32

Bayesian inverse problem: Gaussian approximationMAP point is solution tominimize J (m, w; d) : m11(Bu d)T Wσ (Bu d) hm mprior , m mprior iE22where ·(em u) fin Du gon ΓDem u · n 0on ΓNB is observation operatorCameron-Martin inner-product:ED 1/2 1/2hm1 , m2 iE Cprior m1 , Cprior m2 ,1/2m1 , m2 range(Cprior )Covariance of Gaussian approximation to posterior:C H(m, w; d) 1 [ 2m J (m, w; d)] 1with m mMAP (w; d)Omar Ghattas (UT Austin)Large-scale OEDFebruary 23, 201616 / 32

Optimization problem for A-optimal designminimizenw [0,1]snd Xntr1 Xhzk , yik i γP (w)nd ntr i 1k 1where for i 1, . . . , nd , k 1, . . . , ntr ·(emi ui ) f ·(emi pi ) B Wσ (Bui di ) 1Cprior(mi mpr ) emi ui · pi 0 {z} m J (mi ) ·(emi vik ) ·(yik emi ui ) ·(emi qik ) ·(yik emi pi ) B Wσ Bvik 1Cprioryik yik emi ui · pi emi vik · pi ui · qik zk {z}H(mi )yikOmar Ghattas (UT Austin)Large-scale OEDFebruary 23, 201617 / 32

The Lagrangian for the OED problemOEDL nd: ntr1 XXhzk , yik i γP (w)nd ntr i 1 k 1ndX emi ui , uiemi p f, ui h, uii 1 w, {ui }, {mi }, {pi }, {vik }, {qik }, {yik }, {ui }, {mi }, {pi }, {vik }, {qik }, {yik }ndX i , pi ΓN B Wσ (Bui di ), pi i 1 ndX (mi mpr ), mi miE mi e ui , pi i 1 nd ntrXX emi v ik , vik yik emi ui , vik i 1 k 1 nd ntrXX mi qik , qik yik e mi vemi p i , qik B Wσ Bvik , qik i 1 k 1 nd ntrXX yik eik , pi yik , yikE yik emi ui , qik (yik yik emi ui , pi )i 1 k 1 zk , yikOmar Ghattas (UT Austin) Large-scale OEDFebruary 23, 201618 / 32

The adjoint OED problem and the gradient The adjoint OED problem for {u i }, {m i }, {p i }, {vik}, {qik}, {yik} mi B Wσ Bqik ·(yike pi ) ·(emi vik) 0 1 m emi qik· pi Cprioryik yike ui · pi emi ui · vik 1zknd ntr m ·(emi qik) ·(yike ui ) 0B Wσ Bp i ·(m i emi pi ) ·(emi u i ) b1i 1emi p i · pi Cpriorm i m i emi ui · pi emi ui · u i b2i ·(emi p i ) ·(m i emi ui ) b3iGradient for the OED problem:g(w) ndXΓ 1noise (Bui di )Bp i i 1nd XntrX Bqik,i 1 k 1 qik, yik, and vikcan be eliminated: qik vik , yik yik ,Omar Ghattas (UT Austin)Γ 1noise BvikLarge-scale OED vik qikFebruary 23, 201619 / 32

Computing the OED objective function and its gradientSolve inverse problems: mi (w), ui (mi (w)) and pi (mi (w)), i 1, . . . , ndSolve for (vik , yik , qik ), i 1, . . . , nd , k 1, . . . , ntr : ·(emi vik ) ·(yik emi ui ) ·(emi qik ) ·(yik emi pi ) B Wσ Bvik 1Cprioryik yik emi ui · pi emi vik · pi ui · qik zkObjective function evaluation: Ψ(w) Solve for(p i , m i , u i ),1nd ntrPnd Pntri 1k 1i 1, . . . , ndB Wσ Bp i ·(m i emi pi ) ·(emi u i )mie p i· pi mi ·(e p i ) 1Cpriorm i hzk , yik i m i emi ui ·(m i emi ui )· pi emi ui · b1i ui b2i b3iGradient:g(w) ndXΓ 1noise (Bui di )Bp i i 1Omar Ghattas (UT Austin)Large-scale OEDnd ntrX1 XΓ 1 Bviknd ntr i 1 k 1 noiseBvikFebruary 23, 201620 / 32

Computational cost: the number of PDE solvesr: rank of Hessian of the data misfitIndependent of mesh (beyond a certain resolution)Independent of sensor dimension (beyond a certain dimension)Cost (in PDE solves) of evaluating OED objective function & gradient1Cost of solving inverse problems 2 r nd Nnewton2Cost of computing OED objective 2 r ntr nd3Cost of computing OED gradient 2 ntr nd 2 r ndLow-rank approx to misfit Hessian Cost in (2) and (3) 2 r ndOED optimization problem:Solved via quasi-Newton interior pointNumber of quasi-Newton iterations insensitive to parameter/sensor dimensionOm