Random Matrix Theory, Numerical Computation And

7m ago
44 Views
0 Downloads
655.89 KB
30 Pages
Transcription

Random Matrix Theory, Numerical Computation andApplicationsAlan Edelman, Brian D. Sutton, and Yuyang WangAbstract. This paper serves to prove the thesis that a computational trickcan open entirely new approaches to theory. We illustrate by describing suchrandom matrix techniques as the stochastic operator approach, the method ofghosts and shadows, and the method of “Riccatti Diffusion/Sturm Sequences,”giving new insights into the deeper mathematics underneath random matrixtheory.Contents1. Introduction: A Computational Trick Can Also Be a Theoretical Trick2. Random Matrix Factorization3. Stochastic Operators4. Sturm sequences and Riccati Diffusion5. Ghosts and Shadows6. Universality of the Smallest Singular ValueAcknowledgementReferences14915192728281. Introduction: A Computational Trick Can Also Be a TheoreticalTrickWe advise mathematicians not to dismiss an efficient computation as mere “implementation details”, it may be where the next theory comes from. This note willsupply examples (real case outlined in Table 1). (Throughout the notes, matlabcodes are in typewriter font. In Table 1, trideig and maxeig can be downloadedfrom [Per].) 1We start with the famous semicircle distribution f (x) 2π4 x2 ; illustratedat the bottom (a) of Algorithm 1. This distribution depicts the histogram of theKey words and phrases. Random Matrix Theory, Numerical Linear Algebra, Stochastic Operator, Ghosts and Shadows.The first author was supported in part by DMS 1035400 and DMS 1016125. Note to ourreaders: These notes are in large part a precursor to a book on Random Matrix Theory that willbe forthcoming. We reserve the right to reuse materials in the book. All codes were tested inmatlab2012a.1

2ALAN EDELMAN, BRIAN D. SUTTON, AND YUYANG WANGRMT LawsAll eigsSemicircle LawMax eigTracy-Widom LawTheoriesInspired by ComputationNaive ComputationClever Computational TricksA randn(n)A sqrt(chi2rnd((n-1):-1:1))v eig((A A’)/sqrt(2*n))v trideig(randn(n,1),A)Tridiagonal2Space: O(n )O(n)models (2.3)Time: O(n3 )O(n2 )A randn(n)k round(n-10*n (1/3)-1)Truncatedvs eig((A A’)/sqrt(2*n)) A sqrt(chi2rnd((n-1):-1:k))Storage,v max(vs)v maxeig(randn(n-k 1,1),A)Bisection,Space: O(n2 )O(10 n1/3 )Sturm Sequence,Time: O(n3 )O((10 n1/3 )2 )Sparse EigensolverTridiagonal and Bidiagonal models (Section 2)Stochastic Operators (Section 3)Sturm sequence and Ricatti difussion (Section 4)Method of Ghosts and Shadows (Section 5)Table 1. A Computational Trick Can Also Be a Theoretical Trick. n eigenvalues of a symmetric random n n matrix S (AT A)/ 2n obtainedby symmetrizing a matrix whose elements follow the standard normal distribution,i.e., in matlab notation: A randn(n).The complex versionstarts with A randn(n) sqrt(-1)*randn(n) and forms (AH A)/2 n to get the semicircle law. The Tracy-Widom distribution(illustrated in Algorithm 1 bottom (b)) describes the normalized largest eigenvalue,which, in the complex case, is Z d2(1.1)f (x) exp (t x)q(t) dt ,dxxwhere q(t) is the solution of a so-called Painlevé II differential equation q̈(t) tq(t) 2q(t)3 , with the boundary condition that as t , q(t) is asymptoticto the Airy function Ai(t). Algorithm 1 shows Monte Carlo experiments for thesemicircle law and the Tracy-Widom law.We recommend Bornemann’s code as the current best practice for computingthe Tracy-Widom density f (x) [Bor10]. Alternatively, we present a simpler methodin Algorithm 2, showing that even the formidable is but a few lines of matlab.The semicircle and Tracy-Widom laws are theorems as n but computations for small n suffice for illustration. The real S is known as the GaussianOrthogonal Ensemble (GOE) and the “complex S” the Gaussian Unitary Ensemble (GUE). In general, they are instances of β-Hermite ensemble where β 1, 2correspond to the real and complex cases respectively.As we can see in Algorithm 1, direct random matrix experiments usually involvecalculating the eigenvalues of random matrices, i.e. eig(s). Since many linearalgebra computations require O(n3 ) operations, it seems more feasible to take nrelatively small, and take a large number of Monte Carlo instances. This is ourstrategy in Algorithm 1.

RANDOM MATRIX THEORY, NUMERICAL COMPUTATION AND APPLICATIONS3Algorithm 1 Semicircle Law (β 1) and the Tracy-Widom distribution (β 2)%Experiment :Demonstration o f S e m i c i r c l e and Tracy Widom d i s t r i b u t i o n%P l o t :Histogram o f t h e e i g e n v a l u e s and t h e l a r g e s t e i g e n v a l u e%Theory :S e m i c i r c l e and Tracy Widom as n i n f i n i t y ;%% Parametersn 100;% matrix s i z et 5000;% trialsv [];% e i g e n v a l u e samplesvl [];% l a r g e s t e i g e n v a l u e samplesdx . 2 ;% binsize%% Experimentf o r i 1: t%% Sample GOE and c o l l e c t t h e i r e i g e n v a l u e sa randn ( n ) ;% n by n m a t r i x o f random Gaussianss (a a ’ ) / 2 ;% sy mm et ri ze m a t r i xv [v ; e ig ( s ) ] ; % e i g e n v a l u e s%% Sample GUE and c o l l e c t t h e i r l a r g e s t e i g e n v a l u e sa randn ( n) sqrt ( 1) randn ( n ) ;% random nxn complex m a t r i xs (a a ’ ) / 2 ;% Hermitian m a t r i xv l [ v l ;max( e ig ( s ) ) ] ; % L a r g e s t E i g e n v a l u eend%% S e m i c i r c l e lawv v/ sqrt ( n / 2 ) ;% normalize eigenvalues% Plot[ count , x] h i s t ( v , 2 : dx : 2 ) ;bar ( x , count / ( t n dx ) , ’ y ’ )% Theoryhold onplot ( x , sqrt (4 x . ˆ 2 ) / ( 2 pi ) , ’ LineWidth ’ , 2 )axis ( [ 2 . 5 2 . 5 .1 . 5 ] )hold o f f%% Tracy Widom d i s t r i b u t i o nv l n ˆ ( 1 / 6 ) ( v l 2 sqrt ( n ) ) ;% normalized eigenvalues% Plotf i g u r e ; [ count , x] h i s t ( v l , 5 : dx : 2 ) ;bar ( x , count / ( t dx ) , ’ y ’ )% Theoryhold 150.100.05 0.1 2.5 2 1.5 1 0.500.51(a) Semicircle Law1.522.50 5 4 3 2 1012(b) Tracy-Widom distribution

4ALAN EDELMAN, BRIAN D. SUTTON, AND YUYANG WANGAlgorithm 2 Tracy-Widom distribution (β 2)%Theory :Compute and p l o t t h e Tracy Widom d i s t r i b u t i o n%%Parameterst 0 5;% r i g h t endpointtn 8;% leftendpointdx . 0 0 5 ;% discretization%%Theory : The d i f f e r e n t i a l e q u a t i o n s o l v e rdeq @( t , y ) [ y ( 2 ) ; t y (1) 2 y ( 1 ) ˆ 3 ; y ( 4 ) ; y ( 1 ) ˆ 2 ] ;o p t s o d e s e t ( ’ r e l t o l ’ , 1 e 12 , ’ a b s t o l ’ , 1 e 15);y0 [ a i r y ( t 0 ) ; a i r y ( 1 , t 0 ) ; 0 ; a i r y ( t 0 ) ˆ 2 ] ; % boundary c o n d i t i o n s[ t , y] ode45 ( deq , t 0 : dx : tn , y0 , o p t s ) ;% solveF2 exp( y ( : , 3 ) ) ;% the d i s t r i b u t i o nf 2 gradient ( F2 , t ) ;% the density%% P l o tplot ( t , f 2 , ’ LineWidth ’ , 2 )axis ([ 5 2 0 . 5 ] )In fact, sophisticated matrix computations involve a series of reductions. Withnormally distributed matrices, the most expensive reduction steps can be avoidedon the computer as they can be done with mathematics! All of a sudden O(n3 )computations become O(n2 ) or even better. The resulting matrix requires lessstorage either using sparse formulas or data structures with even less overhead.The story gets better. Random matrix experiments involving complex numbersor even over the quaternions reduce to real matrices even before they need to bestored on a computer.The story gets even better yet. On one side, for finite n, the reduced formleads to the notion of a “ghost” random matrix quantity that exists for every β(not only real, complex and quaternions), and a “shadow” quantity which may bereal or complex which allows for computation. On the other hand, the reducedforms connect random matrices to the continuous limit, stochastic operators, whichin some ways represent a truer view of why random matrices behave as they do.The rest of the notes is organized as follows. In Chapter 2, we prepare ourreaders with preliminaries of matrix factorization for random matrices. In Chapter 3, stochastic operator is introduced with applications and we discuss Sturmsequences and Ricatti diffusion in Chapter 4. We introduce “ghost” and “shadow”techniques for random matrices in Chapter 5. The final chapter is devoted to thesmallest singular value of randn(n).Note: It has now been eight years since the first author has written a large survey for Acta Numerica [ER05], and two years since the applications survey [EW13].This survey is meant to be different as we mean to demonstrate the thesis in thevery name of this section.2. Random Matrix FactorizationIn this section, we will provide the details of matrix reductions that do not require a computer. Then, we derive the reduced forms of β-Hermite and β-Laguerreensembles, which is summarized in Table 2 and Table 3 shows how to generate themin sparse formula. Later this section, we give an overview of how these reductionslead to various computational and theoretical impact.

RANDOM MATRIX THEORY, NUMERICAL COMPUTATION AND 5Modelsmatlab (β 1)g randn(n,n);WignereigTridiagonal (2.3)H (g g’)/2;g randn(m,n);Wishart svdBidiagonal (2.4)L (g’*g)/m;Table 2. Hermite and Laguerre ensembles.Ensemblematlab commands (Statistics Toolbox required)Hermite%dHHPick n , be ta sqrt ( c h i 2 r n d ( beta [ n : 1 : 1 ] ) ) ’ ; spdiags ( d , 1 , n , n ) spdiags ( randn ( n , 1 ) , 0 , n , n ) ; (H H’ ) / sqrt ( 2 ) ;Laguerre%%dsBLP i c k m, n , b e t aP i c k a b e t a ( n 1)/2 sqrt ( c h i 2 r n d ( 2 a beta [ 0 : 1 : n 1 ] ) ) ’ ; sqrt ( c h i 2 r n d ( beta [ n : 1 : 1 ] ) ) ’ ; spdiags ( s , 1, n , n ) spdiags ( d , 0 , n , n ) ; B B’ ;Table 3. Generating the Hermite and Laguerre ensembles assparse matrices.2.1. The Chi-distribution and orthogonal invariance. There are two keyfacts to know about a vector of independent standard normals. Let vn denote sucha vector. In matlab this would be randn(n,1). Mathematically, we say that then elements are iid standard normals (i.e., mean 0, variance 1). Chi distribution: the Euclidean length kvn k, which is the square rootof the sum of the n squares of Gaussians, has what is known as the χndistribution. Orthogonal invariance: for any fixed orthogonal matrix Q, or if Q israndom and independent of vn , the distribution of Qvn is identical tothat of vn . In other words, it is impossible to tell the difference betweena computer generated vn or Qvn upon inspecting only the output. It isneasy to see that the density of vn is (2π) 2 e on the length of vn .kvn k22which only dependsWe shall see that these two facts allow us to transform matrices involving standardnormals to simpler forms.For reference, we mention that the χn distribution has the probability density2f (x) xn 1 e x /2.2n/2 1 Γ(n/2)Notice that there is no specific requirement that n be an integer, despite our originalmotivation as the length of a Gaussian vector. The square of χn is the distributionthat underlies the well-known Chi-squared test. It can be seen that the mean ofχ2n is n. For integers, it is the sum of the n standard normal variables. We have

6ALAN EDELMAN, BRIAN D. SUTTON, AND YUYANG WANGthat vn is the product of the random scalar χn , which serves as the length, and anindependent vector that is uniform on the sphere, which serves as the direction.2.2. The QR decomposition of randn(n). Given a vector vn , we can readily construct an orthogonal reflection or rotation Hn such that Hn vn kvn ke1 ,where e1 denotes the first column of the identity. We do this using the standardtechnique of Householder transformations [TB97] (see Lec. 10) in numerical linearalgebra, which is a reflection across the external angle bisector of these two vectors.TIn this case, Hn I 2 wwwhere w vn kvn ke1 .wT wTherefore, if vn follows a multivariate standard normal distribution, Hn vnyields a Chi distribution for the first element and 0 otherwise. Furthermore, letrandn(n) be an n n matrix of iid standard normals. It is easy to see now thatthrough successive Householder reflections of size n, n 1, . . . , 1 we can orthogonallytransform randn(n) into the upper triangular matrix χnG. G G χn 1 . . . G G . .H1 H2 · · · Hn 1 Hn randn(n) Rn . χ2 G χ1Here all elements are independent and represent a distribution and each G is an iidstandard normal. It is helpful to watch a 3 3 matrix turn into R3 : χ3 G Gχ3 G Gχ3 G GG G G G G G 0 G G 0 χ2 G 0 χ2 G .00 χ100 G0 G GG G GThe Gs as the computation progresses are not the same numbers, but merelyindicating that the distributions remain unchanged. With a bit of care we can saythatrandn(n) (orthogonal uniform with Haar measure) · Rnis the QR decomposition of randn(n). Notice that in earlier versions of lapackand matlab [Q, R] qr(randn(n)) did not always yield Q with Haar measure.Random matrix theory provided the impetus to fix this!One immediate consequence is the following interesting fact(2.1)IE[det[randn(n)]2 ] n!.2.3. The tridiagonal reduction of the GOE. Eigenvalues are usually introduced for the first time as the roots of the characteristic polynomial. Manypeople just assume that this is the definition that is used during a computation,but it is well-established that this is not a good method for computing eigenvalues.Rather, a matrix factorization is used. In the case that S is symmetric, an orthogonal matrix Q is found such that QT SQ Λ is diagonal. The columns of Q arethe eigenvectors and the diagonal of Λ are the eigenvalues.Mathematically, the construction of Q is an iterative procedure, requiring infinitely many steps to converge. In practice, S is first tridiagonalized through a finiteprocess which usually takes the bulk of the time. The tridiagonal is then iterativelydiagonalized. Usually, this tridiagonal to diagonal step takes a negligible amountof time to converge in finite precision.

RANDOM MATRIX THEORY, NUMERICAL COMPUTATION AND APPLICATIONS7 Suppose A randn(n) and S (A AT )/ 2, we can tridiagonalize S withthe finite Householder procedure (see [TB97] f