Sparse Semidefinite Programming Relaxations for Large Scale Polynomial Optimization and Their Applications to Differential Equations Martin Mevissen Submitted in partial fulfillment of the requirements for the degree of DOCTOR OF SCIENCE Department of Mathematical and Computing Sciences Tokyo Institute of Technology August 2010 2 Acknowledgements The endeavor for the PhD in Tokyo and to complete this thesis would not have been possible without the support of a number of great people. My greatest appreciation goes to my advisor Masakazu Kojima for his encouragement, guidance and continued support. I am grateful for his interest in my studies and the insight I gained by collaborating with him. I am indebted to him for providing me with an environment to enjoy research to the fullest ever since I joined his group for writing my Diploma thesis back in 2006. Moreover, I would like to express my gratitude to Nobuki Takayama, who hosted me twice at Kobe university. I am thankful for his sincere interest and his engagement in our joint work with Kosuke Yokoyama. My gratitude extends to Yasumasa Nishiura and Takashi Teramoto for inviting me to Hokkaido University in 2008. They offered me a great environment to learn more about reaction-diffusion equations. I am also thankful to Jean Lasserre and Didier Henrion for hosting me at LAAS in 2009, and I am looking forward to continuing our joint work in the future. I would like to thank Sunyoung Kim, Makoto Yamashita and Jiawang Nie for our fruitful collaborations and exciting discussions. In particular, I would like to express my gratitude to Makoto for his constant and patient technical support. Many thanks go to Hans-Jakob Lüthi for supporting this venture in Japan from early on and his encouraging advice. Also, I would like to thank the German Academic Exchange Service for enabling me to pursue this journey with its Doctoral Scholarship for three years. The stay at Tokyo Institute of Technology would have been inconceivable without the people I shared this time, many thoughts and the bond of friendship. In particular, I would like to thank Paul Sheridan, for his unshakable optimism, Ken Shackleton, for his large-heartedness, Matthias Hietland Heie, for his open mind, Hiroshi Sugimoto, for his noble heart, Kojiro Akiba, for all our conversation, Mikael Onsjö, for being a great host, and Tomohiko Mizutani, for helping me out with a lot of things, when I was a newcomer. In my life in Japan, I was glad to find many friends who I can count on and who gave me the chance to call this place home. I enjoyed greatly everything I shared with them. Thank you so much, Shuji, Yoko, Jif, Azra, Masa, Mari, Moe, Hiroshi, Shota, Chiaki, Soji, Naomi, Tomoko. Finally, my deepest gratefulness goes to my parents. Their encouragement and love have been with me all the time. They stood by me every day of my life. I dedicate this thesis to them. 3 4 To my parents Contents 1 Introduction 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Outline of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 9 10 11 2 Semidefinite Programming and Polynomial Optimization 2.1 Positive polynomials and polynomial optimization . . . . . . . . . . . . . 2.1.1 Decomposition of globally nonnegative polynomials . . . . . . . . . 2.1.2 Decomposition of polynomials positive on closed semialgebraic sets 2.1.3 Dense SDP relaxations for polynomial optimization problems . . . 2.1.4 Sparse SDP relaxations for polynomial optimization problems . . . 2.2 Exploiting sparsity in linear and nonlinear matrix inequalities . . . . . . . 2.2.1 An SDP example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Positive semidefinite matrix completion . . . . . . . . . . . . . . . 2.2.3 Exploiting the domain-space sparsity . . . . . . . . . . . . . . . . . 2.2.4 Duality in positive semidefinite matrix completion . . . . . . . . . 2.2.5 Exploiting the range-space sparsity . . . . . . . . . . . . . . . . . . 2.2.6 Enhancing the correlative sparsity . . . . . . . . . . . . . . . . . . 2.2.7 Examples of d- and r-space sparsity in quadratic SDP . . . . . . . 2.3 Reduction techniques for SDP relaxations for large scale POP . . . . . . . 2.3.1 Transforming a POP into a QOP . . . . . . . . . . . . . . . . . . . 2.3.2 Quality of SDP relaxations for QOP . . . . . . . . . . . . . . . . . 2.3.3 Numerical examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 13 13 15 18 25 28 29 31 33 37 40 42 46 49 50 57 60 3 SDP Relaxations for Solving Differential Equations 3.1 Numerical analysis of differential equations . . . . . . . . . . . . . . 3.1.1 The finite difference method . . . . . . . . . . . . . . . . . . . 3.1.2 The finite element method and other numerical solvers . . . . 3.2 Differential equations and the SDPR method . . . . . . . . . . . . . 3.2.1 Transforming a differential equation into a sparse POP . . . 3.2.2 The SDPR method . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Enumeration algorithm . . . . . . . . . . . . . . . . . . . . . 3.2.4 Discrete approximations to solutions of differential equations 3.3 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 A nonlinear elliptic equation with bifurcation . . . . . . . . . 3.3.2 Illustrative nonlinear PDE problems . . . . . . . . . . . . . . 3.3.3 Reaction-diffusion equations . . . . . . . . . . . . . . . . . . . 3.3.4 Differential algebraic equations . . . . . . . . . . . . . . . . . 3.3.5 The steady cavity flow problem . . . . . . . . . . . . . . . . . 3.3.6 Optimal control problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 67 68 72 74 74 80 83 86 86 87 89 96 103 105 114 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 CONTENTS 4 Concluding Remarks and Future Research 123 4.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 4.2 Outlook on future research directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Notation N Nn Z R Rn Rn×m Sn Sn+ Sn (E, ?) Sn+ (E, ?) Sn (E, 0) Sn+ (E, 0) R[x] R[x]d R[x, P A]2 R[x] Λ(d) u(x, A) G(N, E) < ≻ • det(·) rank(·) Tr(·) deg(·) supp(·) natural numbers n- dimensional vector space with entries in N integers real numbers n-dimensional vector space with real entries vector space of n by m matrices with real entries vector of symmetric matrices in Rn×n cone of symmetric, positive semidefinite matrices in Rn×n partial symmetric matrices with entries specified in E matrices in Sn (E, ?) that can be completed to positive semidefinite matrices symmetric matrices with nonzero entries on the diagonal and in E positive semidefinite matrices in Sn (E, 0) ring of multivariate polynomials in the n dimensional variable x with coefficients in R set of polynomials of degree less or equal d set of polynomials supported on A ⊂ Nn , R[x, A] = {p ∈ R[x] | supp(p) ⊂ A} set of squares polynomials, P of sums P R[x]2 := p ∈ R[x] | p = ri=1 p2i , pi ∈ R[x] for some r ∈ N set of multivariate indices of degree less or equal d, Λ(d) = {α ∈ Nn : | α |≤ d} monomial vector for A ⊂ Nn , u(x, A) = (xα | α ∈ A) graph with vertex set N and edge set E positive semidefinite matrix positive definite matrix Pn Pn interior product on Sn , A • B = i=1 j=1 Ai,j Bi,j determinant of a matrix rank of a matrix P trace of a matrix, Tr(A) := ni=1 Ai,i degree of a polynomial support of a polynomial, supp(p) := {α ∈ Nn | pα 6= 0} 7 8 CONTENTS Mw (y) Mw (y, I) Mw (y g) Mw (y g, I) imaginary unit, i.e., I 2 = −1 basic, closed semialgebraic set generated by g1 , . . . , gm ∈ R[x], K(g1 , . . . , gm ) := {x ∈ Rn | g1 (x) ≥ 0, . . . , gm (x) ≥ 0} quadraticP module defined P by g1 , . . . , gm , M (K) = R[x]2 + g1 R[x]2 + . . . + gm R[x]2 approximation M (K) Pmof order d forP Md (K) = σ g | σ ∈ R[x]2 , deg(σi gi ) ≤ 2d i i i i=0 multiplicative convex cone generated by R[x]2 and g1 , . . . , gm , P P P2 hg1 , . . . , gm i = M (K) + g1 g2 R[x]2 + . . . + g1 g2 · · · gm R[x]2 multiplicative monoid Qr generated by g1 , . . . , gm , O(g1 , . . . , gm ) = { i=1 ti | ti ∈ {g1 , . . . , gm } for r ∈ N} ideal generated by g1 , . . . , gm , I(g1 , . . . , gm ) = R[x] + g1 R[x] + . . . + gm R[x] moment matrix of order w for the vector y partial moment matrix, contains only those components yα of y with α ∈ I localizing matrix of order w for vector y and g ∈ R[x] partial localizing matrix MkS Mk α ki,j k0 tC ǫsc ǫobj ω me Nx hx ui,j higher monomial set of a POP higher monomial list of a POP dividing coefficient number of substitutions required by an algorithm to transform a given POP into a QOP total computation time of an algorithm scaled feasibility error of numerical solution for a POP optimality error of SDP relaxation solution for a POP relaxation order of dense or sparse SDP relaxation maximum eigenvalue of a Jacobian of a system of polynomial equations number of grid points in x-direction in a discretized domain distance of two grid points in x-direction in a discretized domain approximation of u at grid point (xi , yj ) in a finite difference scheme max min sup inf lbd ubd maximum minimum supremum infimum lower bound upper bound SDP POP QOP ODE PDE OCP FDM FEM FVM SQP QSDP semidefinite program polynomial optimization problem quadratic optimization problem ordinary differential equation partial differential equation optimal control problem finite difference method finite element method finite volume method sequential quadratic programming quadratic semidefinite program I K(g1 , . . . , gm ), K M (K) Md (K) P2 hg1 , . . . , gm i O(g1 , . . . , gm ) I(g1 , . . . , gm ) Chapter 1 Introduction 1.1 Motivation A wide variety of problems arising in mathematics, physics, engineering, control and computer science can be formulated as optimization problems, where all functions in objective and constraints are multivariate polynomials over the field of real numbers - so called polynomial optimization problems (POP). In general, polynomial optimization problems are severely non-convex and NP-hard to solve. In recent years there has been active research in semidefinite programming (SDP) relaxation methods for POPs. That is, a general, non-convex POP is relaxed to a convex optimization problem, an SDP. Solving the SDP provides either a lower bound to the minimum, or approximations for minimum and global minimizers of the POP. First convexification techniques for general POPs have been proposed by Shor [91] and Nesterov [70]. Since the pioneering work of Shor, convexification and in particular SDP relaxation techniques have been used in an ever increasing number of applications and problems. One of the classical examples is the SDP relaxation for a non-convex quadratic programming formulation of the NP-hard max-cut problem [26]. Other NPhard problems, that can be formulated as POPs are {0, 1}− linear programming, or testing whether a symmetric matrix is copositive [67]. A breakthrough in this field was Lasserre’s seminal paper [51]. Given the feasible set of a general POP is a basic, compact semialgebraic set, a hierarchy of SDP relaxations can be constructed, which provides a monoton increasing sequence of lower bounds for the minimum of the POP. Lasserre showed this sequence converges to the minimum of the POP under a fairly general condition. Lasserre’s relaxation and also the approach [79] by Parrilo do rely on the representation of nonnegative polynomials as sum of squares of polynomials and the dual theory of moments. Despite being a powerful theoretical tool to approximate minimum and minimizer of general polynomial optimization problems, the Lasserre relaxation is not practical even for medium scale POPs. In fact, the size of the matrix inequality constraints in the SDP relaxations grows rapidly for increasing order of the hierarchy. Thus, in the case of a medium or large scale POPs the SDP relaxations becomes intractable for current SDP solvers as SeDuMi [95] or SDPA [104], even for small choices of the relaxation order. However, large scale POP arise from challenging problems and efficient methods to solve them are in high demand. For instance one problem, which received lots of attention recently, is the sensor network localization problem [6, 71, 43]. A first approach to reduce the size of SDP relaxations for POPs has been the concept of correlative sparsity of a POP [47, 102, 52]. Exploiting structured sparsity enables to attempt POP of larger dimension by a hierarchy of sparse SDP relaxations. Still, the capacity of current SDP solvers is limiting the applicability of sparse SDP relaxations for large scale POP. As one way to take advantage of sparsity more efficiently, we develop a general notion of sparsity in linear and nonlinear matrix inequalities and show how to exploit this sparsity via positive semidefinite matrix completion. We demonstrate how so called domain-space and range-space sparsity can be used to decrease the size of SDP relaxations for large scale POPs substantially. Another technique to attempt large scale POPs is based on the idea to reduce the size of the sparse SDP relaxations by transforming a general POP into an equivalent quadratic optimization problem (QOP). For an important class of large scale POPs the size of sparse SDP relaxations for the equivalent QOPs is far 9 10 CHAPTER 1. INTRODUCTION smaller than the size of sparse SDP relaxations for the original POPs. The second topic of this thesis is to investigate how to efficiently apply sparse SDP relaxation techniques for an important class of challenging problems, the numerical analysis of differential equations. For most problems involving ordinary or partial differential equations it is not possible to find analytic solutions - in particular if the equations are nonlinear in the unknown function. Even the problem to find approximations to the solutions of ODEs or PDEs by numerical methods is well known to be a hard problem, which begins to attract attention by researchers in moment, SDP and numerical algebra techniques. On the one hand a moment based approach [5] has been proposed to find tight bounds for linear functionals defined on linear PDEs. On the other hand, a homotopy continuation based approach [2, 33, 34] has been proposed to find all solutions of a discretized, possibly nonlinear PDE. We will show how to transform a problem involving differential equations into a POP by using standard finite difference schemes. The dimension of these POPs is determined by the discretization of the domain of the differential equation. Thus, for fine discretizations we obtain a challenging class of large scale POPs. These POPs satisfy both, correlative and domain-space sparsity, which enables us to apply sparse SDP relaxation techniques efficiently. The sparse SDP relaxation method is of particular interest for PDE problems with several solutions. We propose different algorithms based on the sparse SDP relaxation method to detect several or even all solutions of a system of nonlinear PDE. It is a strength of this method, that a wide variety of nonlinear PDE problems can be solved: Nonlinear elliptic, parabolic and hyperbolic equations, reaction-diffusion equations, steady state Navier-Stokes equations in fluid dynamics, differential algebraic equations or nonlinear optimal control problems. 1.2 Contribution This thesis is largely based on the content of prior publications of the author. Its contributions can be summarized as follows. • We present a general framework to detect, characterize and exploit sparsity in an optimization problem with linear and nonlinear matrix inequality constraints via positive semidefinite matrix completion. We distinguish two types of sparsity, domain-space sparsity for the symmetric matrix variable in objective and constraint functions of the problem, and range-space sparsity. Four conversion methods are proposed to exploit these two types of sparsity. We demonstrate the efficiency of these conversion methods on SDP relaxations for sparse, large-scale POP derived from discretizing partial differential equations and the sensor network localization problem. This result is based on our work [42]. • Based on the observation dating back to Shor [92], that any POP can be written as an equivalent QOP, we develop four heuristics for transforming a POP into a QOP. We show, that sparsity of the POP is maintained under our transformation procedures, and propose different techniques to improve the quality of the sparse SDP relaxations for the QOP, which are weaker than the more expensive sparse SDP relaxations for the equivalent POP. This technique is shown to be very efficient for large-scale POP: We are able to obtain highly accurate approximations to the global optimizers of the POP by solving SDP relaxations of vastly reduced size. This work is presented in detail in [62]. • We are the first to introduce a method based on sparse SDP relaxations to solve systems of linear and partial differential equations [61, 63]. Unlike the approach [5] we are able to approximate the actual solutions of an ordinary or partial differential equation. Moreover, our approach is applicable to nonlinear differential equations, whereas the technique [5] is limited to linear PDEs. Furthermore, compared to the numerical algebraic approach [2, 33, 34], we can solve a system of polynomial equations derived from a PDE for a much finer discretization by exploiting sparsity. Also, we do not aim at finding all complex solutions, but we detect the real solutions to that system of equations one by one. • Comparing the sparse SDP relaxation method to solve differential equations to existing PDE solvers, our approach has the following advantages: (a) We can add polynomial inequality constraints for the unknown solutions of the differential equations to the system of equations obtained by the finite 1.3. OUTLINE OF THE THESIS 11 difference discretization of the PDE, which can be understood as restricting the space of functions we are searching for solutions. (b) We can detect particular solutions of a PDE, by choosing an appropriate objective function for the sparse POP derived from the PDE problem or by adding inequality constraints to that POP. (c) We are able to systematically enumerate all solutions of a discretized PDE problem by iteratively applying the SDP relaxation method. (d) We exploit the fact, that the sparse SDP relaxations provide an approximation to the global optimizer of a POP. Thus, even if the accuracy of the solution of the SDP relaxation is not high, it is a good initial guess for locally convergent solvers for many PDE problems. This fact is in particular interesting for PDE problems with many solutions. These results are based on our work in [61, 63, 62]. • The sparse SDP relaxation method for solving large scale POP derived from differential equations can be applied to solve nonlinear, optimal control problems. Unlike the moment method in [54] our method yields approximations to the optimal control, trajectories and value of a control problem in addition to provide lower bounds for the optimal value of the control problems. 1.3 Outline of the thesis This thesis consists of two main parts. In the first part given by Chapter 2 we introduce the approaches to use methods from convex optimization to solve general, nonconvex polynomial optimization problems. We begin in 2.1 with introducing the historical background of characterizing positive polynomials, the problem of minimizing multivariate polynomials over basic, closed semialgebraic sets and the dense Lasserre relaxation, a sequence of semidefinite programs whose minima converge to the minimum of a polynomial optimization problem under fairly general conditions. Finally, we review the method of exploiting correlative sparsity of a POP to construct a sequence of sparse SDP relaxations. In 2.2 we present a general framework to exploit domain- and range space sparsity in problems involving linear or nonlinear matrix inequalities. This technique can be applied to the large scale SDP relaxations for large scale POP. In 2.3 we introduce the approach to reduce the size of dense or sparse SDP relaxations for large scale POP, which is based on the idea to transform a general POP into an equivalent QOP. In the second part presented by Chapter 3 we show how to use the methods and techniques from Chapter 2 for the numerical analysis of ordinary and partial differential equations. First we give an overview over existing numerical methods for solving partial differential equations, in particular the two most common approaches, the finite difference method and the finite element method, in 3.1. In 3.2 we introduce our method to transform a problem involving partial differential equations into a POP via finite difference discretization, and to solve the resulting large scale POP by the SDP relaxation techniques from Chapter 2. In 3.3 we apply our SDP relaxation method to a variety of different PDE problems such as nonlinear elliptic, parabolic and hyperbolic equations, differential algebraic equations, reaction-diffusion equations, fluid dynamics and nonlinear optimal control. Finally, we summarize the thesis in Chapter 4 with some concluding remarks and give an outlook on possible future research directions based on the methods and results presented here. 12 CHAPTER 1. INTRODUCTION Chapter 2 Semidefinite Programming and Polynomial Optimization 2.1 Positive polynomials and polynomial optimization Polynomial optimization and the problem of global nonnegativity of polynomials are active fields of research and remain in the focus of researchers from various areas as real algebra, semidefinite programming and operator theory. Shor [91] was the first who introduced the idea of applying a convex optimization technique to minimize an unconstrained multivariate polynomial. Also, Nesterov [70] was one of the first who discussed to exploit the duality of moment cones and cones of nonnegative polynomials in a convex optimization framework. He showed the characterization of a moment cone by linear matrix inequalities, i.e., semidefinite constraints, if the elements of the corresponding cone of nonnegative polynomials can be written as sum of squares. The next milestone in minimizing multivariate polynomials was given by Lasserre [51], who applied recent real algebraic results by Putinar [81] to construct a sequence of semidefinite program relaxations whose optima converge to the optimum of a polynomial optimization problem. Another approach to apply real algebraic results to attempt the problem of nonnegativity of polynomials was introduced by Parrilo [79]. We attempt to solve the following polynomial optimization problem: min p(x) s.t. gi (x) ≥ 0 (2.1) ∀i = 1, . . . , m where p, g1 , . . . , gm ∈ R [x]. Problem (2.1) can also be written as minx∈K p(x) (2.2) where K the basic, closed semialgebraic set that is defined by the polynomials g1 , . . . , gm . Let p⋆ denote the optimal value of problem (2.2) and K ⋆ := {x⋆ ∈ K | ∀x ∈ K : p(x⋆ ) ≤ p(x)}. In the case K compact, K ⋆ 6= ∅, if K 6= ∅. 2.1.1 Decomposition of globally nonnegative polynomials The origin of research in characterizing nonnegative and positive polynomials lies in Hilbert’s 17th problem, whether it is possible to express a nonnegative rational function as sum of squares of rational functions. This question was answered positively by Artin in 1927. Moreover, the question arises, whether it is possible to express any nonnegative polynomial as sum of squares of polynomials. In the case of univariate polynomials the answer to this question is yes, as stated in the following theorem. Theorem 2.1 Let p ∈ R [x], x ∈ R. Then, p(x) ≥ 0 for all x ∈ R if and only if p ∈ 13 P R[x]2 . 14 CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION Proof ” ⇐ ” : Trivial. ” ⇒ ” : Let p(x) ≥ 0 for all x ∈ R. It is obvious that deg(p) = 2k for some k ∈ N. Then, the real roots of of p(x) should have even multiplicity, otherwise p(x) would alter its sign in a neighborhood of a root. Let λi , i = 1, . . . , r be its real roots with corresponding multiplicity 2mi . Its complex roots can be arranged in conjugate pairs, aj + Ibj , aj − Ibj , j = 1, . . . , h. Then, h r Y Y 2mi ((x − aj )2 + b2j ). (x − λi ) p(x) = C j=1 i=1 Note that the leading coefficient C needs to be positive. Thus, by expanding the terms in the products, we see that p(x) can be written as a sum of squares of polynomials, of the form 2 k k X X vij xj . p(x) = i=0 j=0 However, Hilbert himself already noted that not every nonnegative polynomial can be written as sum of squares. For instance the Motzkin form M , M (x, y, z) = x4 y 2 + x2 y 4 + z 6 − 3x2 y 2 z 2 is nonnegative but not sum of squares. In fact Hilbert gave a complete characterization of the cases where nonnegativity and the existence of a sum of squares decomposition are equivalent. Definition 2.1 A form is a polynomial where all the monomials have the same total degree m. Pn,m denotes the set of nonnegative forms of degree m in n variables, Σn,m the set of forms p such that p = Σk h2k , where hk are forms of degree m 2. There is a correspondance between forms in n with power m and polynomials in n − 1 variables with degree less or equal to m. In fact, a form in n variables of degree m can be dehomogenized to a polynomial in n − 1 variables by fixing any of the n variables to the constant value 1. Conversely, given a polynomial in n − 1 variables in can be homogenized by multiplying each monomial by powers of a new variable such that the degree of all monomials equals m. Obviously, Σn,m ⊆ Pn,m holds for all n and m. The following Theorem is due to Hilbert. Theorem 2.2 Σn,m ⊆ Pn,m holds with equality only in the following cases: (i) Bivariate forms: n = 2, (ii) Quadratic forms: m = 2, (iii) Ternary quartic forms: n = 3, m = 4. We interprete the three cases in Theorem 2.2 in terms of polynomials. The first one corresponds to the equivalence of nonnegativity and sum of squares condition in the univariate case as in Theorem (2.1). The second one is the case of quadratic polynomials, where the sum of squares decomposition follows from an eigenvalue/eigenvector factorization. The third case corresponds to quartic polynomials in two variables. Relevance of sum of squares characterizations Recall that the constraints of our original polynomial optimization problem are nonnegativity constraints for polynomials of the type gi (x) ≥ 0 (i = 1, . . . , m). The question, whether a given polynomial is globally nonnegative is decidable, for instance by the TarskiSeidenberg decision procedure [7]. Nonetheless, regarding complexity, the general problem of testing global nonnegativity of a polynomial function is NP-hard [67], if the degree of the polynomial is at least four. Therefore it is reasonable to substitute the nonnegativity constraints by expressions that can be decided easier. It was shown by Parrilo that the decision whether a polynomial is sum of squares is equivalent to a semidefinite program as stated in the following theorem. 15 2.1. POSITIVE POLYNOMIALS AND POLYNOMIAL OPTIMIZATION Theorem 2.3 The existence of a sum of squares decomposition of a polynomial in n variables of degree 2d can be decided by solving a semidefinite programming [79]. feasibility problem If the polynomial is dense, n+d n+d the dimensions of the matrix inequality are equal to × . d d αn 1 α2 Proof Let p ∈ R [x] with degree 2d. Recall u(x, Λ(d)) denotes vector of monomials xα 1 x2 · · · xn the ordered Pn n+d with i=1 αi ≤ d. The length of u(x, Λ(d)) is s := s(d) = . d P Claim: p ∈ P R[x]2 if and only if ∃V ∈ Ss+ such that p = u(x, Λ(d))T V u(x, Λ(d)). Pf: ⇒: p ∈ R[x]2 , i.e. p= r X qi2 = i=1 r X (wiT u(x, Λ(d)))2 T = u(x, Λ(d)) r X wi wiT i=1 i=1 ! u(x, Λ(d)). Pr Thus, V = i=1 wi wiT and V ∈ Ss+ . ⇐: As V ∈ Ss+ there exists a Cholesky factorization V = W W T , where W ∈ Rs×s and let wi denote the ith column of W . We have p = u(x, Λ(d))T V u(x, Λ(d)) = s X i=1 wi wiT u(x, Λ(d)) = s X (wiT u(x, Λ(d)))2 , i=1 i.e., p ∈ R[x]. Thus, the claim follows. P Expanding the quadratic form gives p = si,j=1 Vi,j u(x, Λ(d))i u(x, Λ(d))j . Equating the coefficients in this expression with the coefficients of the corresponding monomials in the original form for p generates a set of linear equalities for the variables Vi,j (i, j = 1, . . . , s). Adding the constraint VP∈ Ss+ to those linear equality constraints, we conditions for p which are equivalent to claiming p ∈ R[x]2 . Therefore, to decide P obtain 2 whether p ∈ R[x] is equivalent to solving a semidefinite programming feasibility problem. 2.1.2 Decomposition of polynomials positive on closed semialgebraic sets Real algebraic geometry deals with the analysis of the real solution set of a system of polynomial equations. The main difference to algebraic geometry in the complex case lies in the fact that R is not algebraically closed. One of the main results of real algebra are the Positivstellensätze which provide certificates in the case a semialgebraic set is empty. Improved versions of the Positivstellensätze can be obtained in case of compact semialgebraic sets. General semialgebraic sets The Positivstellensatz below is due to Stengle; a proof can be found in [7]. Theorem 2.4 (Stengle) Let (fj )j=1,...,t , (gk )k=1,...,m , (hl )l=1,...,k be finite families of polynomials in R [x]. The following properties are equivalent: gj (x) ≥ 0, j = 1, . . . , m (i) x ∈ Rn | fs (x) 6= 0, s = 1, . . . , t = ∅. hi (x) = 0, i = 1, . . . , k (ii) There exist g ∈ Σ2 hg1 , . . . , gm i, f ∈ O(f1 , . . . , ft ), the multiplicative monoid generated by f1 , . . . , ft , h ∈ I(h1 , . . . , hk ), the ideal generated by h1 , . . . , hk , such that g + f 2 + h = 0. To understand the differences between the real and the complex case, and the use of the Positivstellensatz 2.4 consider the following example. 16 CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION Example 2.1 Consider the very simple quadratic equation x2 + ax + b = 0. By the fundamental theorem of algebra, the equation has always solutions in C. For the case when the solution is required to be real, the solution set will be empty if and only if the discriminant D satisfies D := b − In this case taking g := 2 1 a √ (x + ) , 2 D a2 > 0. 4 f := 1, h := − 1 2 (x + ax + b), D the identity g + f 2 + h = 0 is satisfied. It is to remark, the Positivstellensatz represents the most general deductive system for which inferences from the given equations can be made. It guarantees the existence of infeasibility certificates given by the polynomials f , g and h. For complexity reasons these certificates cannot be polynomial time checkable for every possible instance, unless NP=co-NP. Parrilo showed that it is possible that the problem of finding infeasibility certificates is equivalent to an semidefinite program, if the degree of the possible multipliers is restricted [79]. Theorem 2.5 Consider a system of polynomial equalities and inequalities as in Theorem 2.4. Then, the search for bounded degree Positivstellensatz infeasibility certificates can be done using semidefinite programming. If the degree bound is chosen to be large enough , then the SDPs will be feasible, and the certificates are obtained from its solution. Proof: Consequence of the Positivstellensatz and Theorem 2.3, c.f. [79]. As the feasible set of (2.2) is a closed semialgebraic set, we are interested in characterizations for these sets and polynomials positive on semialgebraic sets. The Positivstellensatz allows to deduce conditions for the positivity or the nonnegativity of a polynomial over a semialgebraic set. A direct consequence of the Positivstellensatz is the following corollary [7], pp. 92. Corollary 2.1 Let g1 , . . . , gm ∈ R [x] , K = {x ∈ Rn | g1 (x) ≥ 0, . . . , gm (x) ≥ 0} and f ∈ R[x]. Then: (i) ∀x ∈ K f (x) ≥ 0 ⇔ ∃ s ∈ N ∃ g, h ∈ Σ2 hg1 , . . . , gm i s.t. f g = f 2s + h. (ii) ∀x ∈ K f (x) > 0 ⇔ ∃ g, h ∈ Σ2 hg1 , . . . , gm i s.t. f g = 1 + h. Proof (i) Apply the Positivstellensatz to the set {x ∈ Rn | g1 (x) ≥ 0, . . . , gm (x) ≥ 0, −f (x) ≥ 0, f (x) 6= 0} . (ii) Apply the Positivstellensatz to the set {x ∈ Rn | g1 (x) ≥ 0, . . . , gm (x) ≥ 0, −f (x) ≥ 0} . These conditions for the nonnegativity and positivity of polynomials on semialgebraic sets can be improved under additional assumptions. We present these improved conditions for compact semi-algebraic sets in the following section. 2.1. POSITIVE POLYNOMIALS AND POLYNOMIAL OPTIMIZATION 17 Compact semialgebraic sets It is our aim to characterize polynomials that are positive or nonnegative on compact semialgebraic sets. A first characterization is a theorem due to Schmüdgen [87]: Theorem 2.6 (Schmüdgen) Let K = {x ∈ Rn ; g1 (x) ≥ 0, . . . , gm (x) ≥ 0} be a compact semialgebraic subset of Rn and let p be a positive polynomial on K. Then p ∈ Σ2 hg1 , . . . , gm i. It was Putinar [81] who simplified this characterization under an additional assumption. P Definition 2.2 A quadratic module M (K) is called archimedean if N − ni=1 x2i ∈ M (K) for some N ∈ N. Theorem 2.7 (Putinar) Let p be a polynomial, positive on the compact semialgebraic set K and M (K) archimedian, then p ∈ M (K). Thus, under the additional assumption of M (K) being archimedian, we obtain the stricter characterization p ∈ M (K) ⊆ Σ2 hg1 , . . . , gm i instead of p ∈ Σ2 hg1 , . . . , gm i. The original proof of Theorem 2.7 is due to Putinar [81]. In this proof Putinar applies the separation theorem for convex sets and some arguments from functional analysis. A new proof due to Schweighofer [88] avoids the arguments from functional analysis and requires only results from elementary analysis. A further theorem by Schmüdgen [87] provides equivalent conditions for M (K) being archimedian. Theorem 2.8 The following are equivalent: (i) There exist finitely many t1 , . . . , ts ∈ M (K) such that the set {x ∈ Rn | t1 (x) ≥ 0, . . . , ts (x) ≥ 0} (which contains K) is compact and Q i∈I ti ∈ M (K) for all I ⊂ {1, . . . , s}. (ii) There exists some p ∈ M (K) such that {x ∈ Rn | p(x) ≥ 0} is compact. Pn (iii) There exists an N ∈ N such that N − i=1 x2i ∈ M (K), i.e., M (K) is archimedian. (iv) For all p ∈ R [x], there is some N ∈ N such that N ± p ∈ M (K). Thus, for any polynomial p positive on K, p ∈ M (K) holds, if one of the conditions in Theorem 2.8 is satisfied. Whether it is decidable that one of the equivalent conditions hold is not known and subject of current research. However, for a given polynomial optimization problem with compact feasible set K, it is easy to make the Pncorresponding quadratic module M (K) archimedian. We just need to add a redundant constraint N − i=1 x2i ≥ 0 for a sufficiently large N . Example 2.2 Consider the compact semialgebraic set K = x ∈ R2 | g(x) = 1 − x21 − x22 ≥ 0 . The quadratic module M (K) is archimedian, as 1 − x21 − x22 = 02 + 12 · g(x) ∈ M (K). The polynomials f1 (x) := x1 + 2 and x31 + 2 are positive on K. Thus f1 , f2 ∈ M (K) with Theorem 2.7. Their decomposition can be derived as f1 (x) f2 (x) = x1 + 2 = 2x31 + 3 = 12 (x1 + 1)2 + 21 x22 + 1 + 12 (1 − x21 − x22 ), = (x31 + 1)2 + (x21 x2 )2 + (x1 x2 )2 + x22 + 1 + (x41 + x21 + 1) (1 − x21 − x22 ). The next example demonstrates that in general not every polynomial nonnegative on a compact semialgebraic set K is contained in M (K) even if M (K) is archimedian. 18 CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION Example 2.3 Consider the compact semialgebraic set K = x ∈ R | g1 (x) := x2 ≥ 0, g2 (x) := −x2 ≥ 0 . It is obvious that M (K) is archimedian. Also, it is easy to see that there are no q, r, s ∈ p(x) := x = q(x) + r(x) x2 + s(x) (−x2 ), P R[x]2 such that although p is nonnegative on K. However, the polynomial pa ∈ R[x] defined by pa (x) = x + a for a > 0 can be decomposed as 1 1 (x + 2a)2 − x2 . pa (x) = x + a = 4a 4a Thus pa ∈ M (K) for all a > 0. Remark 2.1 Given a compact semialgebraic set K, it apparently holds, any positive polynomial on K belongs to the cone M (K) if and only if M (K) is archimedian. Theorem 2.7 is called Putinar’s Positivstellensatz. Obviously, it does not really characterize the polynomials positive on K since the polynomials in M (K) must only be nonnegative on K. Also, it does not fully describe the polynomials nonnegative on K since they are not always contained in M (K). However, it is Theorem 2.7 that is exploited by Lasserre in order to attempt the polynomial optimization problem. 2.1.3 Dense SDP relaxations for polynomial optimization problems The idea to apply convex optimization techniques to solve polynomial optimization problems was first proposed in the pioneering work of Shor [91]. Shor introduced lower bounds for the global minimum of a polynomial function p. These bounds are derived by minimizing a quadratic function subject to quadratic constraints. Also Nesterov discussed the minimizaion of univariate polynomials and mentioned the problem of minimizing multivariate polynomials in [70]. It was Lasserre [51] who first realized the possibility to apply Putinar’s Positivstellensatz, Theorem 2.7, to solve a broader class of polynomial optimization problems, that goes beyond the case where p − p⋆ can be described as sum of squares of polynomials. At first, we introduce Lasserre’s approach to derive semidefinite relaxations for minimizing a polynomial over a semialgebraic set, as Putinar’s theorem is applied directly there. At second, we present the unconstrained case. Since semialgebraic sets enter through the backdoor, in order to be able to apply Putinar’s Positivstellensatz, we present it after the constrained case. Lasserre’s relaxation in the constrained case After studying positivity and nonnegativity of polynomials and the related problem of moments, we attempt the inital polynomial optimization problem (2.2) over a compact semialgebraic set K, minx∈K p(x). One of the major obstacles for finding the optimum p⋆ is the fact that the set K and the function p are far from being convex. The basic idea of Lasserre’ s approach [51] is to convexify problem (2.2). We outline this procedure of convexification. It has to be emphasized that Lasserre’s approach is based on two assumptions. First we require the semi-algebraic set K to be compact, and second we assume M (K) is archimedian. These assumptions imply, we are able to apply Putinar’s Positivstellensatz to polynomials positive on K. At first we note, p⋆ = sup {a ∈ R | p − a ≥ 0 on K} = sup {a ∈ R | p − a > 0 on K} . Since we assume that M (K) archimedian, we apply Theorem 2.7 to (2.3). Thus p⋆ ≤ sup {a ∈ R | p − a ∈ M (K)} ≤ sup {a ∈ R | p − a ≥ 0 on K} = p⋆ . (2.3) 2.1. POSITIVE POLYNOMIALS AND POLYNOMIAL OPTIMIZATION 19 Finally we obtain p⋆ = sup {a ∈ R | p − a ∈ M (K)} . As a second approach, we note for the minimum p⋆ of (2.1) holds Z ⋆ p = inf p dµ | µ ∈ MP (K) , (2.4) (2.5) where MP (K) ⊆ M(K) denotes the set R of all Borel measures on K which are also probability measures. ≤′ holds since p(x) ≥ p⋆ on K implies p dµ ≥ p⋆ . And ′ ≥′ follows as each x feasible in (2.1) corresponds to a µ = δx ∈ M(K), where δx the Dirac measure at x. In order to get rid of the set M(K) in (2.5) we exploit the following theorem by Putinar [81]. ′ Theorem 2.9 For any map L : R [x] → R, the following are equivalent: (i) L is linear, L(1) = 1 and L(M (K)) ⊆ [0, ∞) . (ii) L is integration with respect to a probability measure µ on K, i.e., Z ∃ µ ∈ MP (K) : ∀ p ∈ R [x] : L(p) = p dµ. Proof C.f. [88], pp. 10. This theorem does not really characterize MP (K), but all real families (yα )α∈Nn that are sequences of moments of probability measures on K, i.e., Z yα = xα dµ ∀ α ∈ Nn , αn 1 where xα = xα 1 · · · xn . This statement is true, as every linear map L : R [x] → R is given uniquely by its α values L(x ) on the basis (xα )α∈Nn of R [x]. With Theorem 2.9 we obtain p⋆ = inf {L(f ) | L : R [x] → R is linear , L(1) = 1, L(M (K)) ⊆ [0, ∞)} . (2.6) Recall (2.4) as p⋆ = sup {a ∈ R | f − a ∈ M (K)} . Thus (2.6) can be understood as a primal approach to the original problem (2.1) and (2.4) as a dual approach. Due to complexity reasons it is necessary to introduce relaxations to these primal-dual pair of optimization problems, in order tonsolve the problem (2.1). Therefore we approximate M (K) by the sets o P Pm 2 R [x] , deg(σi gi ) ≤ 2ω for an Mω (K) ⊆ R [x], where Mω (K) := i=0 σi gi | σi ∈ ω ∈ N := {s ∈ N | s ≥ ωmax := max {ω0 , ω1 , . . . , ωm }} , degp i ωi := ⌈ degg 2 ⌉ (i = 1, . . . , m), ω0 := ⌈ 2 ⌉. Replacing M (K) by Mω (K) motivates to consider the following pair of optimization problems for a ω ∈ N : (Pω ) min (Dω ) max L(p) subject to a subject to L : R [x]2ω → R is linear, L(1) = 1 and L (Mω (K)) ⊆ [0, ∞) . a ∈ R and p − a ∈ Mω (K). (2.7) The optimal values of (Pω ) and (Dω ) are denoted by Pω⋆ and Dω⋆ , respectively. The parameter ω ∈ N is called the relaxation order of (2.7). It determines the size of the relaxations (Pω ) and (Dω ) to (2.2) and therefore also the numerical effort that is necessary to solve them. 20 CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION Theorem 2.10 (Lasserre) Assume M (K) is archimedian. (Pω⋆ )ω∈N and (Dω⋆ )ω∈N are increasing sequences that converge to p⋆ and satisfy Dω⋆ ≤ Pω⋆ ≤ p⋆ for all ω ∈ N . Moreover, if p − p⋆ ∈ M (K), then Dω⋆ = Pω⋆ = p⋆ for a sufficiently large relaxation order ω, i.e. strong duality holds. Proof Since the feasible set of (2.6) is a subset of the feasible set of (Pω ), Pω⋆ ≤ p⋆ . Moreover, if L feasible for (Pω ) and a for (Dω ), L(p) ≥ a holds since p−a ∈ Mω (K) implies L(p)−a = L(p)−aL(1) = L(p−a) ≥ 0. Thus Dω⋆ ≤ Pω⋆ . Obviously, a feasible a for (Dω ) is feasible for (Dω+1 ), and every feasible L of (Pω+1 ) is feasible for (Pω ). This implies (Pω⋆ )ω∈N and (Dω⋆ )ω∈N are increasing. Furthermore, as for any ǫ > 0 there exists a sufficiently large ω ∈ N such that p − p⋆ + ǫ ∈ Mω (K) by Theorem 2.7, i.e. p⋆ − ǫ feasible for (Dω ), the convergence follows. If p − p⋆ ∈ M (K), p − p⋆ ∈ Mω (K) for ω sufficiently large. Thus p⋆ feasible for (Dω ) und therefore Dω⋆ = Pω⋆ = p⋆ . If M (K) not archimedian, we are still able to exploit Schmuedgen’s Positivstellensatz to characterize p − a in (Dω ). As a next step we follow Lasserre’s observation and translate (Dω ) and (Pω ) into a pair of primal-dual semidefinite programs. Definition 2.3 Let L : R[x] → R be linear functional, a sequence y = (yα )α∈Nn be given by yα := L(xα ), and d ∈ N fixed. The moment matrix Md (y) of order d is the matrix with rows and columns indexed by u(x, Λ(d)), such that Md (y)α,β = L(xα xβ ) = yα+β ∀α, β ∈ Nn with | α |, | β |≤ d. n+d The size of Md (y) is given by the | u(x, Λ(d)) |= , the number of components of y needed for d constructing Md (y) is given by n + 2d | (yα )|α|≤2d |= . 2d In the case n = d = 2, the moment matrix is given by y00 y10 y01 y20 y11 y02 y10 y20 y11 y30 y21 y12 y01 y11 y02 y21 y12 y03 . M2 (y) = y20 y30 y21 y40 y31 y22 y11 y21 y12 y31 y22 y13 y02 y12 y03 y22 y13 y04 P Let g ∈ R[x] with g(x) = α∈Nn gα xα . The localizing matrix Md (y) of order d associated with g and y is the matrix with rows and columns indexed in ud (x), obtained from the moment matrix by X Md (g y)α,β := L(g(x)xα xβ ) = hγ yγ+α+β ∀α, β ∈ Nn , | α |, | β |≤ d. γ∈Nn For g(x) = x21 + 2x2 + 3, n = 2 and d = 1, the localizing matrix is given by y20 + 2y01 + 3y00 y30 + 2y11 + 3y10 y21 + 2y02 + 3y01 M1 (g y) = y30 + 2y11 + 3y10 y40 + 2y21 + 3y20 y31 + 2y12 + 3y11 y21 + 2y02 + 3y01 y31 + 2y12 + 3y11 y22 + 2y03 + 3y02 . We will exploit the following key lemma [88]. Lemma 2.1 Suppose L : R [x] → R is a linear map. Then L(Mω ) ⊆ [0, ∞) if and only if the m+ 1 matrices Mω−ωi (y gi ) < 0, ∀ i ∈ {0, . . . , m} , 21 2.1. POSITIVE POLYNOMIALS AND POLYNOMIAL OPTIMIZATION where g0 := 1, ω0 = 0 and the sequence y defined by yα := L(xα ). Moreover, ) (m X s(ω−ωi ) . hMω−ωi (u2ω−2ωi (x) gi ), Gi i | G0 , . . . , Gm ∈ S+ Mω (K) = i=0 Proof C.f. [88], p. 19. Using this Lemma, we reformulate (2.7) as P dSDPω min α∈Λ(2ω) yα pα s.t. y ∈ R|Λ(2ω)| , y0 = 1 and Mω−ωi (y gi ) < 0, i = 1, . . . , m Mω (y) < 0, dSDP⋆ω max a s(ω) s(ω−ωi ) for i = 1, . . . , m and s.t. a P∈mR, G0 ∈ S+ , Gi ∈ S+ i=0 hMω−ωi (u(x, Λ(2ω)gi ), Gi i = p − a, (2.8) P α where p(x) = α∈Λ(2ω) pα x . We call dSDPω the dense Lasserre relaxation or the dense SDP relaxation of relaxation order ω for the polynomial optimization problem (2.1). By sorting the monomials in the moment and localizing matrix inequality constraints in (2.8), we can express Mω (u(x, Λ(2ω))) and Mω−ωi (u(x, Λ(2Ω) gi )) as Mω (u(x, Λ(2ω))) = X α∈Λ(2ω) Bα xα , Mω−ωi (u(x, Λ(2ω − 2ωi )) gi ) = X Cαi xα , α∈Λ(2ω) for some matrices Bα ∈ Ss(ω) and Cαi ∈ Ss(ω−ωi ) . Thus we can rewrite the primal-dual pair of SDP (2.8) as the primal-dual pair of equivalent SDP in standard form P (PωSDP ) min α∈Λ(2ω) pα yα |Λ(2ω)| s.t. y ∈ RP , y0 = 1, and B0 + α∈Λ(2ω)\{0} yα Bα < 0, P C0i + α∈Λ(2ω)\{0} yα Cαi < 0, i = 1, . . . , m (2.9) Pm (DωSDP ) max −G0 (1, 1) − i=1 hC0i , Gi i s(ω) s(ω−ωi ) s.t. a ∈ R, G0 ∈PS+ , Gi ∈ S+ for i ∈ {1, . . . , m} and m hBα , G0 i + i=1 hCαi , Gi i = pα , 0 6= α ∈ Λ(2ω) In general SDP can be solved in polynomial time. Efficient solvers for the SDP (2.9) in standard form are provided by the software packages SeDuMi [95] and SDPA [104]. Lasserre’s relaxation in the unconstrained case The procedure to derive a sequence of convergent SDP relaxations in the case of an unconstrained polynomial optimization problem minn p(x), (2.10) x∈R where p ∈ R [x] and p⋆ := minx p(x), is similar to the constrained case, which we discussed in the previous subsection. Let p be of even degree 2l, otherwise inf p = −∞. Moreover, we will exploit the characterization of sum of squares decompositions by semidefinite matrices and Putinar’s Positivstellensatz. In order to apply this theorem, it is necessary to construct an appropriate semialgebraic set. First, we derive the following relaxation, R p⋆ = inf n p dµ | µ ∈ MP (Rn ) o (2.11) s(l) ≥ inf L(p) | L : R [x] → R, L(1) = 1, Ml (L(x)) ∈ S+ . 22 CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION We order the expression Ml (L(x)) and introduce symmetric matrices Bα ∈ Ss(l) such that Ml (L(x)) = P α α α∈Λ(2l) Bα L(x ). Finally we identify yα = L(x ) for α ∈ Λ(2l) \ {0} and y0 = 1 to obtain the following relaxation for (2.10) P (Pl ) min Pα pα yα (2.12) s.t. α6=0 yα Bα < −B0 . As in the constrained case we can apply a dual approach to (2.10), P p⋆ = sup n {a ∈ R | p(x) − a ≥ 0 ∀ x ∈ Rn } ≥ sup o a ∈ R | p(x) − a ∈ R[x]2 s(l) . = sup a | p(x) − a = hMl (x) , Gi, G ∈ S+ (2.13) Thus, we derive another relaxation to problem (2.10), (Dl ) max −G(1, 1) s.t. hBα , Gi = pα , G < 0. α 6= 0 (2.14) With the duality theory of convex optimization it can be shown easily, that the two convex programs (2.12) and (2.14) are dual to each other. In the case (2.14) has an interior feasible solution, strong duality holds, that is Pl⋆ = Dl⋆ . The idea of the following theorem was proposed by Shor [91] first. The presented version is due to Lasserre [51]. Theorem 2.11 (Shor) If the nonnegative polynomial p−p⋆ is a sum of squares of polynomials, then (2.10) is equivalent to (2.12). More precisely, p⋆ = ZP and, if x⋆ is a global minimizer of (2.10), then y ⋆ := x⋆1 , . . . , x⋆n , (x⋆1 )2 , x⋆1 x⋆2 , . . . , (x⋆1 )2m , . . . , (x⋆n )2m is a minimizer of (2.12). Next, we treat the general case, that is, when p − p⋆ is not sum of squares. As mentioned at the beginning we have to construct a semialgebraic set in order to be able to apply Putinar’s Positivstellensatz. Suppose we know that a global minimizer x⋆ of p(x) has norm less than a for some a > 0, that is, p(x⋆ ) = p⋆ and || x⋆ ||2 ≤ a. Then, with x → qa (x) = a2 − || x ||22 , we have p(x) − p⋆ ≥ 0 on Ka := {qa (x) ≥ 0}. Obviously, M (Ka ) is archimedian, as the condition (iii) in Theorem 2.8 is satisfied for N = a2 . Now, we can use that every polynomial f , strictly positive on the semialgebraic set Ka is contained in the quadratic module M (Ka ). For every ω ≥ l, consider the following semidefinite program P (Pωa ) min α pα y α , s.t. Mω (y) < 0, (2.15) Mω−1 (qa y) ≥ 0. Writing Mω−1 (qa y) = nite program P α yα Dα , for appropriate matrices Dα (| α |≤ 2ω), the dual of (Pωa ) is the semidefi(Dωa ) max s.t. −G(1, 1) − a2 H(1, 1), hG, Bα i + hH, Dα i = pα , α 6= 0. (2.16) Then, the following theorem is due to Lasserre [51]. Theorem 2.12 (Lasserre) Given (Pωa ) and (Dωa ) for some a > 0 such that || x⋆ ||2 ≤ a for some global minimizer x⋆ . Then 23 2.1. POSITIVE POLYNOMIALS AND POLYNOMIAL OPTIMIZATION (a) as ω → ∞, one has inf(Pωa ) ↑ p⋆ . Moreover, for ω sufficiently large, there is no duality gap between (Pωa ) and its dual (Dωa ), and (Dωa ) is solvable. (b) min(Pωa ) = p⋆ if and only if p − p⋆ ∈ Mω (Ka ). In this case, the vector y ⋆ := x⋆1 , . . . , x⋆n , (x⋆1 )2 , x⋆1 x⋆2 , . . . , (x⋆1 )2ω , . . . , (x⋆n )2ω is a minimizer of (Pωa ). In addition, max(Pωa ) = min(Dωa ). Proof (a) From x⋆ ∈ Ka and with y ⋆ := x⋆1 , . . . , x⋆n , (x⋆1 )2 , x⋆1 x⋆2 , . . . , (x⋆1 )2ω , . . . , (x⋆n )2ω it follows that Mω (y ⋆ ), Mω−1 (qa y ⋆ ) < 0 so that y ⋆ is feasible for (Pωa ) and thus inf(Pωa ) ≤ p⋆ . Now, fix ǫ > 0 arbitrary. Then, p − p⋆ + ǫ > 0 and therefore, with Theorem 2.7 there is some N0 such that r2 r1 X X tj (x)2 qi (x)2 + q(x) p − p⋆ + ǫ = i=1 j=1 for some polynomials qi (x), i = 1, . . . , r1 , of degree at most N0 , and some polynomials tj (x), j = 1, . . . , r2 , of degree at most N0 − 1. Let qi ∈ Rs(N0 ) , tj ∈ Rs(N0 −1) be the corresponding vectors of coefficients, and let r2 r1 X X T tj tTj qi qi , Z := G := i=1 j=1 so that G, H < 0. It is immediate to check that (G, H) feasible for (Dωa ) with value −G(1, 1) − a2 H(1, 1) = (p⋆ − ǫ). From weak duality follows convergence as p⋆ − ǫ ≤ inf(Pωa ) ≤ p⋆ . For strong duality and for (b), c.f. [51]. We needed to add the constraint qa (x) ≥ 0, in order to show convergence of the SDP relaxation (Pωa ). For applications it is often sufficient to consider an SDP relaxation, which does not take into account this constraint. Thus, we denote the primal-dual pair of SDP dSDPω dSDP⋆ω P min α pα y α s.t. Mω (y) < 0, max −G(1, 1) s.t. hG, Bα i = pα ∀α 6= 0, as the dense SDP relaxation of relaxation order ω for polynomial optimization problem (2.10), which is consistent with the dense SDP relaxation for the constrained case. This sequence of SDP is not guaranteed to converge to the minimum of (2.10) for ω → ∞. However, it provides a non-decreasing sequence of lower bounds to p⋆ , min(dSDPωmax ) ≤ min(dSDPωmax +1 ) ≤ . . . ≤ p⋆ . 24 CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION Global minimizer Usually one is not only interested in finding the minimum value p⋆ of p on K, but also in obtaining a global minimizer x⋆ ∈ K ⋆ with p(x⋆ ) = p⋆ . It will be shown that in Lasserre’s procedure not only min(dSDPω ) converges to the infimum p⋆ , but also a convergence to the minimizer x⋆ of (2.2) in the case it is unique. Definition 2.4 Lω solves (Pω ) nearly to optimality (ω ∈ N ) if Lω is a feasible solution of (Pω ) (ω ∈ N ) such that limω→∞ Lω (p) = limω→∞ Pω⋆ . This notation is useful because (Pω ) might not possess an optimal solution, and even if it does, we might not be able to compute it exactly. For an example, c.f. [88], Example 22. Obviously Lω solves (Pω ) nearly to optimality (ω ∈ N ) if and only if limω→∞ Lω (f ) = p⋆ . The following theorem is the basis for the convergence to a minimizer in the case K ⋆ is a singleton. Theorem 2.13 Suppose K 6= ∅ and Lω solves (Pω ) nearly to optimality (ω ∈ N ). Then Z α α ⋆ ∀d ∈ N : ∀ǫ > 0 : ∃k0 ∈ N ∩ [d, ∞) : ∀k ≥ k0 : ∃µ ∈ M(K ) : Lω (x ) − x dµ < ǫ. α∈Λ(2d) Proof [88], p. 11. In the convenient case where K ⋆ is a singleton it is possible to guarantee convergence of the minimizer: Corollary 2.2 K ⋆ = {x⋆ } is a singleton and Lω solves (Pω ) nearly to optimality (ω ∈ N ). Then lim (Lω (x1 ), . . . , Lω (xn )) = x⋆ . ω→∞ Proof We set d = 1 in Theorem 2.13 and note that M(K ⋆ ) contains only the Dirac measure δx⋆ at the point x⋆ . It is possible to apply Corollary 2.2 to certify that p⋆ has almost been reached after successively solving the relaxations (Pω ). Corollary 2.3 Suppose M (K) is archimedian, p has a unique minimizer on the compact semialgebraic set K and Lω solves (Pω ) nearly to optimality for all ω ∈ N . Then holds for all ω ∈ N , Lω (p) ≤ p⋆ ≤ p(Lω (x1 ), . . . , Lω (xn )), and the lower and upper bounds for p⋆ converge to p⋆ for ω → ∞. Proof Lω (p) ≤ p⋆ follows from Theorem 2.10. The convergence of p(Lω (x1 ), . . . , Lω (xn )) is a consequence of Corollary 2.2. To see that p⋆ is a lower bound, observe that gi (Lω (x1 ), . . . , Lω (xn )) = Lω (gi ) ≥ 0, whence (Lω (x1 ), . . . , Lω (xn )) ∈ K for all k ∈ N . The case where several optimal solutions exist is more difficult to handle. In fact, as soon as there are two or more global minimizers, it often occurs that symmetry in the problem prevents the nearly optimal solutions of the SDP relaxations to converge to a particular minimizer. Henrion and Lasserre established a sufficient condition for the dense SDP relaxations to detect all optimal solutions [35]. Given the dense SDP relaxation dSDPω for some order ω ≥ ωmax and y ⋆ an optimal solution of this semidefinite program. In the case rankMω (y ⋆ ) = rankMω−ωmax (y ⋆ ) (2.17) ⋆ holds, the SDP relaxation dSDPω is exact. That is, min(dSDPω ) = p . Moreover, Henrion and Lasserre provided an algorithm for extracting all global optimal solutions of the POP (2.1), if (2.17) holds. See [35] for details. Note, (2.17) is not necessary, dSDPω may be exact already for some relaxation order ω with ω until rankMω (yω⋆ ) > rankMω−ωmax (yω⋆ ). For many POP it may not be practical to increase (2.17) holds, n+ω as the size of the moment and localizing matrix constraints in dSDPω given by grows rapidly n for increasing ω. 2.1. POSITIVE POLYNOMIALS AND POLYNOMIAL OPTIMIZATION 2.1.4 25 Sparse SDP relaxations for polynomial optimization problems The dense SDP relaxation (2.8) by Lasserre is a powerful theoretical result since it allows to approximate the solutions of polynomial optimization problems (2.1) as closely as desired by solving a finite sequene of n+ω SDP relaxations. However, since the size of the SDP relaxation grows as , even for medium scale ω POPs the SDP relaxations become intractable for present SDP solvers in the case of small choices of the relaxation order ω. Therefore, it is crucial to reduce the size of the semidefinite programs to be solved, in order to be able to attempt large scale POPs. In this section we will review the approach [102] to exploit sparsity in a large scale POP by introducing a sequence of sparse SDP relaxations, which is of much smaller size than the dense SDP relaxations (2.8). A second method to exploit sparsity in a general optimization problem with linear and/ or nonlinear matrix inequality constraints is presented in [42]. In many problems of type (2.1), the involved polynomials p, g1 , . . . , gm are sparse. Waki, Kojima, Kim and Muramatsu constructed a sequence of SDP relaxations which exploits the sparsity of such polynomial optimization problems [102]. This method shows strong numerical efforts in comparision to Lasserre’s relaxations (2.8). The convergence of the sparse SDP relaxations to the optimum of the original problem (2.1) was shown by Lasserre [52] and Kojima and Muramatsu [49]. In the following, we give a review of the sparse SDP relaxations for POP with structured sparsity. Let the polynomial optimization problem be given as in (2.2), minx∈K p(x), where K is a compact semialgebraic set defined by the m inequality constraints g1 ≥ 0, . . . , gm ≥ 0. We characterize sparsity for a POP (2.2) with the following definition. Definition 2.5 Given a POP (2.2), we denote the n × n symbolic matrix R defined by ⋆, if xi xj occurs in some monomial of p, Ri,j = ⋆, if xi and xj occur in the same gl (l = 1, . . . , m), 0, else, as the correlative sparsity pattern matrix of the POP. The graph G = (V, N ) with vertex set V := {1, . . . , n} and edge set N := {i, j} ∈ V 2 | Ri,j = ⋆ , is called the corresponding correlative sparsity pattern graph. A POP is defined to be correlatively sparse, if R is sparse. We will construct a sequence of SDP relaxations to this polynomial optimization problem, which exploits the sparsity pattern characterized by the correlative sparsity pattern matrix R. Under a certain condition on the sparsity pattern of the problem, the optima of these SDP relaxations converge to the optimum of the polynomial optimization problem (2.2). First, let {1, . . . , n} be the union ∪qk=1 Ik of subsets Ik ⊂ {1, . . . , n}, such that every gj , j ∈ {1, . . . , m} is only concerned with variables {xi | i ∈ Ik } for some k. And it is required the objective p can be written as p = p1 + . . . + pq where each pk uses only variables {xi | i ∈ Ik }. A possible choice for the sets I1 , . . . , Ip are the maximal cliques of the correlative sparsity pattern graph G. In order to tackle the sparse SDP relaxations we need some further definitions. Definition 2.6 Given a subset I of {1, . . . , n} we define the sets AI AIω = {α ∈ Nn : αi = 0 if i ∈ / I} , P = α ∈ Nn : αi = 0 if i ∈ / I and i∈I αi ≤ ω . Then, we define R [x, G] := {f ∈ R [x] : supp(f ) ⊆ G}. Also, the restricted moment matrix Mr (y, I) and localizing matrices Mr (gy, I) are defined for I ⊆ {1, . . . , n} , r ∈ N and g ∈ R [x]. They are 26 CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION obtained from Mr (y) and Mr (gy) by retaining only those rows (and columns) α ∈ Nn of Mr (y) and Mr (gy) with supp(α) ⊆ AIr . In doing so, Mr (y, I) and Mr (gy, I) can be interpreted as moment and localizing matrices with rows and columns indexed in the canonical basis u(x, AIr ) of R x, AIr . Finally, we denote the P P 2 2 set of sum of square polynomials in R [x, G] as R [x, G] . In analogy to Theorem 2.3, R [x, G] can be written as X 2 R [x, G] = u(x, G)T V u(x, G) : V < 0 . Let m′ be the number of inequality constraints which define the basic, closed, semialgebraic set K. In our initial setting m′ = m, but later on m′ > m may hold, in the case we add further inequality constraints to restrict the set K. We introduce a condition for the index sets I1 , . . . , Iq . Assumption 1: Let K ⊆ Rn as in (2.23). The index set J = {1, . . . , m′ } is partitioned into q disjoint sets Jk , k = 1, . . . , q, and the collections {Ik } and {Jk } satisfy: 1. For every j ∈ Jk , gj ∈ R x, AIk , that is, for every j ∈ Jk , the constraint gj (x) ≥ 0 is only concerned with the variables x(Ik ). Equivalently, viewing gj as a polynomial in R [x] , gjα 6= 0 ⇒ supp(α) ∈ AIk . 2. The objective function p ∈ R [x] can be written p= q X k=1 pk , with pk ∈ R x, AIk , k = 1, . . . , q. Equivalently, fα 6= 0 ⇒ supp(α) ∈ ∪qk=1 AIk . Example 2.4 For n = 6 and m = 6, let g1 (x) = x1 x2 − 1, g2 (x) = x21 + x2 x3 − 1, g3 (x) = x2 + x23 x4 , and g4 (x) = x3 + x5 , g5 (x) = x3 x6 , g6 (x) = x2 x3 . Then we can construct {Ik } and {Jk } for q = 4 with I1 = {1, 2, 3} , I2 = {2, 3, 4} , I3 = {3, 5} , I4 = {3, 6} , J1 = {1, 2, 6} , J2 = {3} , J3 = {4} , J4 = {5} . Now, we can construct sparse SDP relaxations in analogy to the dense SDP relaxations (2.8). For each deg g j = 1, . . . , m′ write ωj = ⌈ 2 j ⌉. Then, for some ω ∈ N define the following semidefinite program (sSDPω ) inf y s.t. P α pα y α Mω (y, Ik ) < 0, k = 1, . . . , q, Mω−ωj (gj y, Ik ) < 0, j ∈ Jk ; k = 1, . . . , q, y0 = 1. (2.18) Program (2.18) is well defined under Assumption 3, and it is easy to see, that it is an SDP relaxation of problem (2.2). In fact, it is also easy to see, that sSDPω is a weaker relaxation for (2.2) than dSDPω , as the partial moment and localizing matrices in the constraints of (2.18) are minors of the full moment and localizing matrices in the constraints of (2.8), i.e., min(sSDPω ) ≤ min(dSDPω ) ≤ min(POP) ∀ω ∈ N . We call (2.18) the sparse SDP relaxations relaxations for polynomials optimization Lasserre or sparse problems. There are symmetric matrices Bαk and Cαjk such that P Mω (y, Ik ) = Pα∈Nn yα Bαk , k = 1, . . . , q, Mω−ωj (gj y, Ik ) = α∈Nn yα Cαjk , k = 1, . . . , q, j ∈ Jk , (2.19) 2.1. POSITIVE POLYNOMIALS AND POLYNOMIAL OPTIMIZATION with Bαk = 0 and Cαjk = 0 whenever supp(α) ∈ / AIk . Then we can rewrite (2.18) as P inf y α pα y α X s.t. yα Bαk < −B0k , k = 1, . . . , q, 06X =α∈Nn yα Cαjk < 06=α∈Nn −C0jk , 27 (2.20) j ∈ Jk ; k = 1, . . . , q, and we derive the dual of this semidefinite program as (sSDP⋆ω ) sup X Yk ,Zjk ,λ k:α∈AIk λ i h P hYk , Bαk i + j∈Jk hZjk , Cαjk i + λδα0 = pα ∀ α ∈ Γω , (2.21) Yk , Zjk < 0, j ∈ Jk , k = 1, . . . , q, Sq Ik n where Γω := α ∈ N : α ∈ k=1 A ; | α |≤ 2ω . The main advantage of the sparse SDP relaxations is the reduction of the size of the matrix inequality constraints. In order to understand the improved efficiency, let us compare the computational complexity of theP dense relaxation (PωSDP ) and the sparse relaxation (Pωsp ). The number of variables in (Pωsp ) is bounded q nk +2ω by k=1 . Supposed nk ≈ nq for all k, the number of variables is bounded by O(q( nq )2ω ), a strong ω improvement compared with O(n2ω ), the number of variables in (PωSDP ). Also in (Pωsp ) there are p LMI constraints of size O(( nq )ω ) and m + q LMI constraints of size O(( nq )ω−ωmax ), to be compared with a single LMI constraint of size O(nω ) and m LMI constraints of size O(nω−ωmax ) in (PωSDP ). As pointed out, the sparse SDP relaxations are weaker than the dense ones. The question arises, whether we still have convergence to the minimum of the POP. This question was answered positively by Lasserre [52]. We need two further conditions to show convergence. Assumption 2: For all k = 1, . . . , q − 1, Ik+1 ∩ k [ j=1 Ij ⊆ Is for some s ≤ k. (2.22) The property of Assumption 2 is called the running intersection property. Note that (2.22) is always satsfied for q = 2. Since property (2.22) depends on the ordering, it can be satisfied possibly after some relabelling of the {Ik }. In the case of Example 2.4 it is easy to check Assumption 2 is satisfied, but in general it is not obvious. However, Waki et al. [102] presented a general procedure to guarantee Assumption 2 is satisfied. Given G = (V, E) the correlative sparsity pattern graph, we denote by G̃ = (V, Ẽ) its chordal extension. A graph is said to be chordal if every (simple) cycle of the graph with more than three edges has a chord. A graph G(V, E) is a chordal extension of G(V, E) if it is a chordal graph and E ⊆ E. See [4] for basic properties on choral graphs. Then, the maximal cliques C1 , . . . , Cq of G̃ satisfy the running intersection property, and the number q of maximal cliques in a chordal graph is bounded by n. Furthermore, there are efficient algorithms to determine the maximal cliques of a chordal graph, whereas it is NP-hard to determine the maximal cliques of an arbitrary graph. Assumption 3: Let K ⊆ Rn be a closed semialgebraic set. Then, there is M > 0 such that || x ||∞ < M for all x ∈ K. This assumption implies || x(Ik ) ||2∞ < nk M 2 , k = 1, . . . , q, where x(Ik ) := {xi | i ∈ Ik }, and therefore we add to K the q redundant quadratic constraints gm+k (x) := nk M 2 − || x(Ik ) ||≥ 0, k = 1, . . . , q, and set m′ = m + q, so that K is now defined by: K := {x ∈ Rn | gj (x) ≥ 0, j = 1, . . . , m′ } . (2.23) i h Notice that gm+k ∈ R x, AI2k for every k = 1, . . . , q. With Assumption 3, K is a compact semialgebraic set. Moreover, Assumption 3 is needed to guarantee the quadratic module M (K) is archimedian, the condition of Putinar’s Positivstellensatz. Finally, we obtain the following convergence result. 28 CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION Theorem 2.14 Let p⋆ denote the global minimum of (2.2) and let Assumption 1-3 hold. Then: (a) inf(sSDPω ) ↑ p⋆ as ω → ∞. (b) If K has nonempty interior, then strong duality holds and (sSDP⋆ω ) solvable for sufficiently large ω, i.e., inf(sSDPω ) = max(sSDP⋆ω ). (c) Let y ω be a nearly optimal solution of (sSDPω ), with e.g. X α pα yα ≤ inf(sSDPω ) + 1 , ω ∀ω ≥ ω0 , and let ŷ ω := {yαω : | α |= 1}. If (2.2) has a unique global minimizer x⋆ ∈ K, then ŷ ω → x⋆ ω → ∞. as Proof C.f. [52]. As in the dense case, it is also possible to extract global minimizers of the POP from the sparse SDP relaxations in certain cases where the minimizer of the POP is not unique. In fact, Lasserre derived the following sparse version of condition (2.17): Given y ⋆ is an optimal solution for the sparse SDP relaxation sSDPω for some order ω ≥ ωmax . If the rank conditions, rankMω (y ⋆ , Ih ) = rankMω−ah (y ⋆ , Ih ) ⋆ rankMω (y , Ih ∩ Ih′ ) = 1 ∀ h ∈ {1, . . . , q}, ∀h 6= h′ with Ih ∩ Ih′ 6= ∅, (2.24) with ah := maxj∈Jh ωj , hold, then sSDPω is exact and all global minimizers can be extracted. However, (2.24) are very restrictive sufficient conditions for the SDP relaxations to be exact, and it is not practical to apply them to large scale POP in most cases. The software SparsePOP [103] is an implementation of the sparse SDP relaxations. The running intersection property is guaranteed by choosing the maximal cliques of the chordal extension of the correlative sparsity pattern graph as the index sets I1 , . . . , Iq in (2.18). Instead of imposing the additional constraints of Assumption 3, in SparsePOP linear box constraints are imposed for each component of x ∈ Rn , lbdi ≤ xi ≤ ubdi ∀ i ∈ {1, . . . , n}. (2.25) Moreover, SparsePOP adds small linear perturbation terms to the objective function of the POP, in order to enforce the POP to have a unique global minimizer. 2.2 Exploiting sparsity in linear and nonlinear matrix inequalities Optimization problems with nonlinear matrix inequalities, including quadratic and polynomial matrix inequalities, are known as hard problems. They frequently belong to large-scale optimization problems. We present a basic framework for exploiting the sparsity characterized in terms of a chordal graph structure via positive semidefinite matrix completion [28]. Depending on where the sparsity is observed, two types of sparsities are studied: the domain-space sparsity (d-space sparsity) for a symmetric matrix X that appears as a variable in objective and/or constraint functions of a given optimization problem and is required to be positive semidefinite, and the range-space sparsity (r-space sparsity) for a matrix inequality involved in the constraint of the problem. The d-space sparsity is basically equivalent to the sparsity studied by Fukuda et. al [21, 68] for an equality standard form SDP. One of the two d-space conversion methods proposed in this section corresponds to an extension of their conversion method, and the other d-space conversion method is an extension of the method used for the sparse SDP relaxation of polynomial optimization problems in [102, 103] and for the sparse SDP relaxation of a sensor network localization problem in [43]. The r-space sparsity concerns with a matrix inequality M (y) < 0, (2.26) 2.2. EXPLOITING SPARSITY IN LINEAR AND NONLINEAR MATRIX INEQUALITIES 29 involved in a general nonlinear optimization problem. Here M denotes a mapping from Rs into Sn . If M is linear, (2.26) is known as a linear matrix inequality (LMI), which appears in the constraint of a dual standard form of SDP. If each element of M (y) is a multivariate polynomial function in y ∈ Rs , (2.26) is called a polynomial matrix inequality and the SDP relaxation [36, 37, 46, 48, 49, 52], which is an extension of the SDP relaxation [51] for POP, can be applied to (2.26). We assume a similar chordal graph structured sparsity as the d-space sparsity on the set of row and column index pairs (i, j) of the mapping M such that Mij is not identically zero, i.e., Mij (y) 6= 0 for some y ∈ Rs . A representative example satisfying the r-space sparsity can be found with tridiagonal M . We do not impose any additional assumption on (2.26) to derive a r-space conversion method. When M is polynomial in y ∈ Rs , we can effectively combine it with the sparse SDP relaxation method [46, 49] for polynomial optimization problems over symmetric cones to solve (2.26). We propose two methods to exploit the r-space sparsity. One may be regarded as a dual of the d-space conversion method by Fukuda et. al [21]. More precisely, it exploits the sparsity of the mapping M in the range space via a dual of the positive semidefinite matrix completion to transform the matrix inequality (2.26) to a system of multiple matrix inequalities with smaller sizes and an auxiliary vector variable z ∈ Rq . The resulting matrix inequality system is of the form fk (y) − L e k (z) < 0 (k = 1, 2, . . . , p), M (2.27) fk denotes a mapping from and y ∈ Rs is a solution of (2.26) if and only if it satisfies (2.27) for some z. Here M s k e R into the space of symmetric matrices with some size and L a linear mapping from Rq into the space of fk (k = 1, 2, . . . , p) symmetric matrices with the same size. The sizes of symmetric matrix valued mappings M and the dimension q of the auxiliary variable vector z are determined by the r-space sparsity pattern of M . fk are all 2 × 2 and q = n − 2. The other r-space conversion For example, if M is tridiagonal, the sizes of M method corresponds to a dual of the second d-space conversion method mentioned previously. We discuss how the d-space and r-space conversion methods enhance the correlative sparsity for POP introduced in the previous section. Furthermore, we present numerical results to demonstrate how the size of problems involving large scale matrix inequalities is reduced under the four proposed conversion methods. 2.2.1 An SDP example A simple SDP example is shown to illustrate the two types of sparsities considered in this paper, the d-space sparsity and the r-space sparsity, and compare it to the correlative sparsity from 2.1.4 that characterizes the sparsity of the Schur complement matrix. Let A0 be a tridiagonal matrix in Sn such that A0ij = 0 if |i − j| > 1, and define a mapping M from Sn into Sn by 1 − X11 0 0 M (X) = ... 0 X21 0 1 − X22 0 ... 0 X32 0 0 .. . ... 0 X43 ... ... .. 0 0 X12 X23 0 X34 . ... 1 − Xn−1,n−1 ... Xn,n−1 for every X ∈ Sn . Consider an SDP minimize A0 • X subject to M (X) < 0, X < 0. ... Xn−1,n 1 − Xnn (2.28) Among the elements Xij (i = 1, 2, . . . , n, j = 1, 2, . . . , n) of the matrix variable X ∈ Sn , the elements Xij with |i − j| ≤ 1 are relevant and all other elements Xij with |i − j| > 1 are unnecessary in evaluating the objective function A0 • X and the matrix inequality M (X) < 0. Hence, we can describe the d-sparsity 30 CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION pattern as a symbolic tridiagonal matrix with the nonzero symbol ⋆ ⋆ ⋆ 0 ... ⋆ ⋆ ⋆ ... .. 0 . ⋆ ⋆ . . .. . . . . . . .. . 0 0 . . . .. 0 0 ... ... On the other hand, the r-space sparsity pattern ⋆ 0 0 ⋆ ... ... 0 0 ⋆ ⋆ 0 0 0 .. . ⋆ ⋆ 0 0 0 . ... ⋆ ⋆ is described as ... 0 ⋆ ... 0 ⋆ .. . . ... ... ... ⋆ ⋆ ... ⋆ ⋆ Applying the d-space conversion method using basis representation described in 2.2.3, and the r-space conversion method using clique trees presented in 2.2.5, we can reduce the SDP (2.28) to n−1 X 0 0 0 Aii Xii + 2Ai,i+1 Xi,i+1 + Ann Xnn minimize i=1 1 0 X11 −X12 subject to − < 0, 0 0 −X −z 21 1 1 0 Xii −Xi,i+1 (2.29) < 0 (i = 2, 3, . . . , n − 2), − 0 0 −Xi+1,i zi−1 − zi 1 0 Xn−1,n−1 −Xn−1,n < 0, − −X X + z 0 1 n,n−1 n,n n−2 −Xii −Xi,i+1 0 0 < 0 (i = 1, 2, . . . , n − 1). − −Xi+1,i −Xi+1,i+1 0 0 This problem has (3n − 3) real variables Xii (i = 1, 2, . . . , n), Xi,i+1 (i = 1, 2, . . . , n − 1) and zi (i = 1, 2, . . . , n − 2), and (2n − 1) linear matrix inequalities with size 2 × 2. Since the original SDP (2.28) involves an n × n matrix variable X and an n × n matrix inequality M (X) < 0, we can expect to solve the SDP (2.29) much more efficiently than the SDP (2.28) as n becomes larger. We can formulate both SDPs in terms of a dual standard form for SeDuMi [95]: maximize bT y subject to c − AT y < 0, where b ∈ Rl , A ∈ Rl×m and c ∈ Rm for some positive integers l and m. Table 2.2.1 shows numerical results on the SDPs (2.28) and (2.29) solved by SeDuMi. We observe that the SDP (2.29) greatly reduces the size of the coefficient matrix A, the number of nonzeros in A and the maximum SDP block compared to the original SDP (2.28). In addition, it should be emphasized that the l × l Schur complement matrix is sparse in the SDP (2.29) while it is fully dense in the the original SDP (2.28). As shown in Figure 2.1, the Schur complement matrix in the SDP (2.29) allows a very sparse Cholesky factorization. The sparsity of the Schur complement matrix is characterized by the correlative sparsity from 2.1.4. Notice a hidden correlative sparsity in the SDP (2.28), that is, each element Xij of the matrix variable X appears at most once in the elements of M (X). This leads to the correlative sparsity when the SDP (2.28) is decomposed into the SDP (2.29). The sparsity of the Schur complement matrix and the reduction in the size of matrix variable from 10000 to 2 are the main reasons that SeDuMi can solve the largest SDP in Table 1 with a 29997 × 79992 coefficient matrix A in less than 100 seconds. 2.2. EXPLOITING SPARSITY IN LINEAR AND NONLINEAR MATRIX INEQUALITIES 31 SeDuMi CPU time in seconds (sizeA, nnzA, maxBl, nnzSchur) the SDP (2.28) the SDP (2.29) 0.2 (55×200,128,10,3025) 0.1 (27×72, 80,2,161) 1091.4 (5050×20000,10298,100,25502500) 0.6 (297×792,890,2,1871) OOM 6.3 (2997×7992,8990,2,18971) OOM 99.2 (29997×79992,89990,2,189971) n 10 100 1000 10000 Table 2.1: Numerical results on the SDPs (2.28) and (2.29). Here sizeA denotes the size of the coefficient matrix A, nnzA the number of nonzero elements in A, maxBl the maximum SDP block size, and nnzSchur the number of nonzeros in the Schur complement matrix. OOM means out of memory error. 0 0 5 50 100 10 150 15 200 20 250 25 0 5 10 15 nz = 94 20 25 0 50 100 150 nz = 1084 200 250 Figure 2.1: The sparsity pattern of the Cholesky factor of the Schur complement matrix for the SDP (2.29) with n = 10 and n = 100. 2.2.2 Positive semidefinite matrix completion A problem of positive semidefinite matrix completion is: Given an n × n partial symmetric matrix X with entries specified in a proper subset F of N × N , where N = {1, . . . , n}, find an X ∈ Sn+ satisfying X ij = Xij ((i, j) ∈ F ) if it exists. If X is a solution of this problem, we say that X is completed to a positive semidefinite symmetric matrix X. For example, the following 3 × 3 partial symmetric matrix 3 3 X = 3 3 2 2 2 is completed to a 3 × 3 positive semidefinite symmetric 3 X = 3 2 matrix 3 2 3 2 . 2 2 For a class of problems of positive semidefinite matrix completion, we discuss the existence of a solution and its characterization in this section. This provides a theoretical basis for both d- and r-space conversion methods. 32 CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION Let us use a graph G(N, E) with the node set N and an edge set E ⊆ N × N to describe a class of n × n partial symmetric matrices. We assume that (i, i) 6∈ E, i.e., the graph G(N, E) has no loop. We also assume that if (i, j) ∈ E, then (j, i) ∈ E, and (i, j) and (j, i) are interchangeably identified. Define E• Sn (E, ?) Sn+ (E, ?) = E ∪ {(i, i) : i ∈ N }, = the set of n × n partial symmetric matrices with entries specified in E • , = {X ∈ Sn (E, ?) : ∃X ∈ Sn+ ; X ij = Xij if (i, j) ∈ E • } (the set of n × n partial symmetric matrices with entries specified in E • that can be completed to positive semidefinite symmetric matrices). For a graph G(N, E) shown in Figure 2.2 as X11 X22 X33 S6 (E, ?) = X43 X61 X62 X63 Let #C = C = = S SC + X(C) = J(C) = an illustrative example, we have X16 X26 X34 X36 : Xij ∈ R (i, j) ∈ E • . X44 X45 X54 X55 X56 X65 X66 (2.30) the number of elements in C for every C ⊆ N , {X ∈ Sn : Xij = 0 if (i, j) 6∈ C × C} for every C ⊆ N, {X ∈ SC : X < 0} for every C ⊆ N , e ∈ SC such that X eij = Xij ((i, j) ∈ C × C) X for every X ∈ Sn and every C ⊆ N , {(i, j) ∈ C × C : 1 ≤ i ≤ j ≤ n} for every C ⊆ N . ′ Note that X ∈ SC is an n × n matrix although Xij = 0 for every (i, j) 6∈ C × C. Thus, X ∈ SC and X ′ ∈ SC can be added even when C and C ′ are distinct subsets of N . When all matrices involved in an equality or a matrix inequality belong to SC , matrices in SC are frequently identified with the #C × #C matrix whose elements are indexed with (i, j) ∈ C × C. If N = {1, 2, 3} and C = {1, 3}, then a matrix variable X ∈ SC ⊂ Sn has full and compact representations as follows: X11 0 X13 X11 X13 0 0 0 . and X = X= X31 X33 X31 0 X33 It should be noted that X ∈ SC ⊂ Sn has elements Xij with (i, j) ∈ C ×C in the 2×2 compact representation on the right. Let Eij = the n × n symmetric matrix with 1 in (i, j)th and (j, i)th elements and 0 elsewhere for every (i, j) ∈ N × N . Then Eij (1 ≤ i ≤ j ≤ n) form a basis of Sn . Obviously, if i, j ∈ C ⊆ N , then Eij ∈ SC . We also observe the identity X(C) = X (i,j)∈J(C) Eij Xij for every C ⊆ N. (2.31) 2.2. EXPLOITING SPARSITY IN LINEAR AND NONLINEAR MATRIX INEQUALITIES 33 This identity is utilized in 2.2.3. With these notations we can now state the result from matrix completion which forms the basis for our d-space and r-space conversion techniques. Let G(N, E) be a graph and Ck (k = 1, . . . , p) be its maximal k cliques. We assume that X ∈ Sn (E, ?). The condition X(Ck ) ∈ SC + (k = 1, 2, . . . , p) is necessary for n X ∈ S+ (E, ?). For the graph G(N, E) shown in Figure 2.2, the maximal cliques are C1 = {1, 6}, C2 = {2, 6}, C3 = {3, 4}, C4 = {3, 6}, C5 = {4, 5} and C6 = {5, 6}. Hence, the necessary condition for X ∈ S6 (E, ?) to be completed to a positive semidefinite matrix is that its 6 principal submatrices X(Ck ) (k = 1, 2, . . . , 6) are positive semidefinite. Although this condition is not sufficient in general, it is a sufficient condition for X ∈ Sn+ (E, ?) when G(N, E) is chordal. As stated in 2.1.4, in this case, the number of the maximal cliques is bounded by the number of nodes of G(N, E), i.e., p ≤ n. In general we have the following result. Lemma 2.2 Let Ck (k = 1, 2, . . . , p) be the maximal cliques of a chordal graph G(N, E). Suppose that k X ∈ Sn (E, ?). Then X ∈ Sn+ (E, ?) if and only if X(Ck ) ∈ SC + (k = 1, 2, . . . , p). Proof: C.f. [28]. Since the graph G(N, E) in Figure 2.2 is not a chordal graph, we can not apply Lemma 2.2 to determine whether X ∈ S6 (E, ?) of the form (2.30) belongs to S6+ (E, ?). In such a case, we need to introduce a chordal extension of the graph G(N, E) to use the lemma effectively. Figure 2.3 shows two chordal extensions. If we choose the left graph as a chordal extension G(N, E) of G(N, E), the maximal cliques are C1 = {3, 4, 6}, C2 = {4, 5, 6}, C3 = {1, 6} and C4 = {2, 6}, consequently, X ∈ S6+ (E, ?) is characterized by k X(Ck ) ∈ SC + (k = 1, 2, 3, 4). 2 3 4 1 6 5 Figure 2.2: A graph G(N, E) with N = {1, 2, 3, 4, 5, 6} 2 3 4 2 3 4 1 6 5 1 6 5 (a) (b) Figure 2.3: Chordal extensions of the graph G(N, E) given in Figure 2.2. (a) The maximal cliques are C1 = {3, 4, 6}, C2 = {4, 5, 6}, C3 = {1, 6} and C4 = {2, 6}. (b) The maximal cliques are C1 = {3, 4, 5}, C2 = {3, 5, 6}, C3 = {1, 6} and C4 = {2, 6}. Remark 2.2 To compute the positive definite matrix completion of a matrix, we can recursively apply Lemma 2.6 of [21]. A numerical example is shown on page 657 of [21]. 2.2.3 Exploiting the domain-space sparsity In this section, we consider a general nonlinear optimization problem involving a matrix variable X ∈ Sn : minimize f0 (x, X) subject to f (x, X) ∈ Ω and X ∈ Sn+ , (2.32) 34 CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION where f0 : Rs × Sn → R, f : Rs × Sn → Rm and Ω ⊂ Rm . Let E denote the set of distinct row and column index pairs (i, j) such that a value of Xij is necessary to evaluate f0 (x, X) and/or f (x, X). More 1 2 precisely, for Xkl = Xij (k, l) 6= (i, j), f0 (x, X 1 ) 6= f0 (x, X 2 ) and/or f (x, X 1 ) 6= f (x, X 2 ) hold for some x ∈ Rs , X 1 ∈ Sn and X 2 ∈ Sn . Consider a graph G(N, E). We call E the d-space sparsity pattern and G(N, E) the d-space sparsity pattern graph. If G(N, E) is an extension of G(N, E), then we may replace the condition X ∈ Sn+ by X ∈ Sn+ (E, ?). To apply Lemma 2.2, we choose a chordal extension G(N, E) of G(N, E). Let C1 , C2 , . . . , Cp be its maximal cliques. Then we may regard f0 and f as functions in x ∈ Rs and X(Ck ) (k = 1, 2, . . . , p), i.e., there are functions f˜0 and f˜ in the variables x and X(Ck ) (k = 1, 2, . . . , p) such that f0 (x, X) = f˜0 (x, X(C1 ), X(C2 ), . . . , X(Cp )) for every (x, X) ∈ Rs × Sn , (2.33) f (x, X) = f˜(x, X(C1 ), X(C2 ), . . . , X(Cp )) for every (x, X) ∈ Rs × Sn . Therefore, the problem (2.32) is equivalent to minimize subject to f˜0 (x, X(C1 ), X(C2 ), . . . , X(Cp )) f˜(x, X(C1 ), X(C2 ), . . . , X(Cp )) ∈ Ω and k X(Ck ) ∈ SC + (k = 1, 2, . . . , p). (2.34) As an illustrative example, we consider the problem whose d-space sparsity pattern graph G(N, E) is shown in Figure 2.2: X Xij minimize − (i,j)∈E, i<j (2.35) 6 X (Xii − αi )2 ≤ 6, X ∈ S6+ , subject to i=1 where αi > 0 (i = 1, 2, . . . , 6). As a chordal extension, we choose the graph G(N, E) in (a) of Figure 2.3. Then, the problem (2.34) becomes 4 X ˜ minimize f0k (X(Ck )) k=1 (2.36) 4 X Ck ˜ subject to fk (X(Ck )) ≤ 6, X(Ck ) ∈ S+ (k = 1, 2, 3, 4), k=1 where f˜01 (X(C1 )) f˜03 (X(C3 )) f˜1 (X(C1 )) f˜2 (X(C2 )) f˜4 (X(C4 )) = = = = = −X34 − X36 , f˜02 (X(C2 )) = −X45 − X56 , −X16 , f˜04 (X(C4 )) = −X26 , (X33 − α3 )2 + (X44 − α4 )2 + (X66 − α6 )2 , (X55 − α5 )2 , f˜3 (X(C3 )) = (X11 − α1 )2 , (X22 − α2 )2 . (2.37) k The positive semidefinite condition X(Ck ) ∈ SC + (k = 1, 2, . . . , p) in the problem (2.34) is not an ordinary positive semidefinite condition in the sense that overlapping variables Xij ((i, j) ∈ Ck ∩ Cl ) exist Cl k in two distinct positive semidefinite constraints X(Ck ) ∈ SC + and X(Cl ) ∈ S+ if Ck ∩ Cl 6= ∅. We describe two methods to transform the condition into an ordinary positive semidefinite condition. The first one was given in the papers [21, 68] where a d-space conversion method was proposed, and the second one was originally used for the sparse SDP relaxation of polynomial optimization problems [102, 103] and also in the paper [43] where a d-space conversion method was applied to an SDP relaxation of a sensor network localization problem. We call the first one the d-space conversion method using clique trees and the second one the d-space conversion method using basis representation. 2.2. EXPLOITING SPARSITY IN LINEAR AND NONLINEAR MATRIX INEQUALITIES 35 The d-space conversion method using clique trees We can replace X(Ck ) (k = 1, 2, . . . , p) by p independent matrix variables X k (k = 1, 2, . . . , p) if we add all k l equality constraints Xij = Xij for every (i, j) ∈ Ck ∩ Cl with i ≤ j and every pair of Ck and Cl such that Ck ∩ Cl 6= ∅. For the chordal graph G(N, E) given in (a) of Figure 2.3, those equalities turn out to be the 8 equalities k l 1 2 1 2 X66 − X66 = 0 (1 ≤ k < l ≤ 4), X44 = X44 , X46 = X46 These equalities are linearly dependent, and we can choose a maximal number of linearly independent equalities that are equivalent to the original equalities. For example, either of a set of 5 equalities 1 2 1 2 1 2 1 3 1 4 X44 − X44 = 0, X46 − X46 = 0, X66 − X66 = 0, X66 − X66 = 0, X66 − X66 = 0. (2.38) and a set of 5 equalities 1 2 1 2 1 2 2 3 3 4 X44 − X44 = 0, X46 − X46 = 0, X66 − X66 = 0, X66 − X66 = 0, X66 − X66 =0 (2.39) is equivalent to the set of 8 equalities above. In general, we use a clique tree T (K, E) with K = {C1 , C2 , . . . , Cp } and E ⊆ K × K to consistently choose a set of maximal number of linearly independent equalities. Here T (K, E) is called a clique tree if it satisfies the clique-intersection property, that is, for each pair of nodes Ck ∈ K and Cl ∈ K, the set Ck ∩ Cl is contained in every node on the (unique) path connecting Ck and Cl . See [4] for basic properties on clique trees. We fix one clique for a root node of the tree T (K, E), say C1 . For simplicity, we assume that the nodes C2 , . . . , Cp are indexed so that if a sequence of nodes C1 , Cl2 , . . . , Clk forms a path from the root node C1 to a leaf node Clk , then 1 < l2 < · · · < lk , and each edge is directed from the node with a smaller index to the other node with a larger index. Thus, the clique tree T (K, E) is directed from the root node C1 to its leaf nodes. Each edge (Ck , Cl ) of the clique tree T (K, E) induces a set of equalities k l Xij − Xij = 0 ((i, j) ∈ J(Ck ∩ Cl )), or equivalently, Eij • X k − Eij • X l = 0 ((i, j) ∈ J(Ck ∩ Cl )), where J(C) = {(i, j) ∈ C × C : i ≤ j} for every C ⊆ N . We add equalities of the form above for all (Ck , Cl ) ∈ E when we replace X(Ck ) (k = 1, 2, . . . , p) by p independent matrix variables X k (k = 1, 2, . . . , p). We thus obtain a problem minimize f˜0 (x, X 1 , X 2 , . . . , X p ) subject to f˜(x, X 1 , X 2 , . . . , X p ) ∈ Ω, (2.40) k l Eij • X − Eij • X = 0 ((i, j, k, l) ∈ Λ), C X k ∈ S+k (k = 1, 2, . . . , p), where Λ = {(g, h, k, l) : (g, h) ∈ J(Ck ∩ Cl ), (Ck , Cl ) ∈ E}. (2.41) This is equivalent to the problem (2.34). See Section 4 of [68] for more details. Now we illustrate the conversion process above by the simple example (2.35). Figure 2.4 shows two clique trees for the graph given in (a) of Figure 2.3. The left clique tree in Figure 2.4 leads to the 5 equalities in (2.38), while the right clique tree in Figure 2.4 induces the 5 equalities in (2.39). In both cases, the problem (2.40) has the following form minimize subject to 4 X k=1 4 X k=1 fˆ0k (X k ) fˆk (X k ) ≤ 6, the 5 equalities in (2.38) or (2.39), k X k ∈ SC + (k = 1, 2, 3, 4), 36 CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION C ={3,4,6} 1 C ={2,6} 4 C ={1,6} 3 C ={4,5,6} 2 C ={2,6} 4 C ={3,4,6} 1 C ={1,6} 3 C ={4,5,6} 2 Figure 2.4: Two clique trees with K = {C1 = {1, 2}, C2 = {1, 4}, C3 = {1, 6}, C4 = {1, 3, 5}}. where 1 1 2 2 fˆ01 (X 1 ) = −X34 − X36 , fˆ02 (X 2 ) = −X45 − X56 , 4 4 3 3 ˆ ˆ f03 (X ) = −X16 , f04 (X ) = −X26 , fˆ1 (X 1 ) = (X 1 − α3 )2 + (X 1 − α4 ) + (X 1 − α6 )2 , 44 33 66 2 3 4 fˆ2 (X 2 ) = (X55 − α5 )2 , fˆ3 (X 3 ) = (X11 − α1 )2 , fˆ4 (X 4 ) = (X22 − α2 )2 . Remark 2.3 The d-space conversion method using clique trees can be implemented in many different ways. The fact that the chordal extension G(N, E) of G(N, E) is not unique offers flexibility in constructing an optimization problem of the form (2.40). More precisely, a choice of chordal extension G(N, E) of G(N, E) decides how “small” and “sparse” an optimization problem of the form (2.40) is, which is an important issue for solving the problem more efficiently. For the size of the problem (2.40), we need to consider the sizes of the matrix variables X k (k = 1, 2, . . . , p) and the number of equalities in (2.40). Note that the sizes of the matrix variables X k (k = 1, 2, . . . , p) are determined by the sizes of the maximal cliques Ck (k = 1, 2, . . . , p). This indicates that a chordal extension G(N, E) with smaller maximal cliques Ck (k = 1, 2, . . . , p) may be better theoretically. (In computation, however, this is not necessarily true because of overhead of processing too many small positive semidefinite matrix variables.) The number of equalities in (2.40) or the cardinality of Λ is also determined by the chordal extension G(N, E) of G(N, E). Choosing a chordal extension G(N, E) with smaller maximal cliques increases the number of equalities. Balancing these two contradicting targets, decreasing the sizes of the matrix variables and decreasing the number of equalities was studied in the paper [68] by combining some adjacent cliques along the clique tree T (K, E). See Section 4 of [68] for more details. In addition to the choice of a chordal extension G(N, E) of G(N, E), the representation of the functions and the choice of a clique tree add flexibilities in the construction of the problem (2.40). That is, the representation of the functions f0 : Rs × Sn → R and f : Rs × Sn → Rm in the vector variable x and the matrix variables X(Ck ) (k = 1, 2, . . . , p) as in (2.33); for example, we could move the term (X66 − α6 )2 from f˜1 (x, X(C1 )) to either of f˜k (x, X(Ck )) (k = 2, 3, 4). These choices of the functions f0 , f and a clique tree affect the sparse structure of the resulting problem (2.40), which is also important for efficient computation. The domain-space conversion method using basis representation Define J¯ = p [ J(Ck ), k=1 ¯ = (Xij : (i, j) ∈ J) ¯ f¯0 (x, (Xij : (i, j) ∈ J)) = ¯ ¯ f (x, (Xij : (i, j) ∈ J)) = ¯ the vector variable consisting of Xij ((i, j) ∈ J), s n f0 (x, X) for every (x, X) ∈ R × S , f (x, X) for every (x, X) ∈ Rs × Sn . 2.2. EXPLOITING SPARSITY IN LINEAR AND NONLINEAR MATRIX INEQUALITIES 37 We represent each X(Ck ) in terms of a linear combination of the basis Eij ((i, j) ∈ J(Ck )) of the space SCk as in (2.31) with C = Ck (k = 1, 2, . . . , p). Substituting this basis representation into the problem (2.34), we obtain ¯ minimize f¯0 (x, (Xij : (i, j) ∈ J)) subject to f¯(x,X (Xij : (i, j) ∈ J¯) ∈ Ω, (2.42) Ck Eij Xij ∈ S+ (k = 1, 2, . . . , p). (i,j)∈J(Ck ) We observe that the illustrative example (2.35) is converted into the problem X Xij minimize − (i,j)∈E, i<j 6 X 2 (Xii − αi ) ≤ 6, subject to i=1X Ck Eij Xij ∈ S+ (k = 1, 2, 3, 4). (2.43) (i,j)∈J(Ck ) Remark 2.4 Compared to the d-space conversion method using clique trees, the d-space conversion method using basis representation described above provides limited flexibilities. To make the size of the problem (2.42) smaller, we need to select a chordal extension G(N, E) of G(N, E) with smaller maximal cliques Ck (k = 1, 2, . . . , p). As a result, the sizes of semidefinite constraints become smaller. As we mentioned in Remark 2.3, however, too many smaller positive semidefinite matrix variables may yield heavy overhead in computation. 2.2.4 Duality in positive semidefinite matrix completion In order to present the r-space conversion methods in the next section, we need to derive some results, which can be understood as a dual approach to the positive semidefinite matrix completion approach from the 2.2.2. Throughout this section, we assume that G(N, E) denotes a chordal graph. In Lemma 2.2, we have described a necessary and sufficient condition for a partial symmetric matrix X ∈ Sn (E, ?) to be completed to a positive semidefinite symmetric matrix. Let Sn (E, 0) = Sn+ (E, 0) = {A ∈ Sn : Aij = 0 if (i, j) 6∈ E • }, {A ∈ Sn (E, 0) : A < 0}. In this section, we derive a necessary and sufficient condition for a symmetric matrix A ∈ Sn (E, 0) to be positive semidefinite, i.e., A ∈ Sn+ (E, 0). This condition is used for the range-space conversion methods in Section 5. We note that these two issues have primal-dual relationship: X A ∈ Sn+ (E, 0) if and only if Aij Xij ≥ 0 for every X ∈ Sn+ (E, ?). (2.44) (i,j)∈E • Suppose A ∈ Sn (E, 0). Let C1 , C2 , . . . , Cp be the maximal cliques of G(N, E). Then, we can consistently p X ek ∈ SCk (k = 1, 2, . . . , p) such that A = ek . We know that A is positive decompose A ∈ Sn (E, 0) into A A k=1 semidefinite if and only if A • X ≥ 0 for every X ∈ Sn+ . This relation and Lemma 2.2 are used in the following. Since A ∈ Sn (E, 0), this condition can be relaxed to the condition (2.44). Therefore, A is positive semidefinite if and only if the following SDP has the optimal value 0. " p # X X ek A minimize Xij subject to X ∈ Sn+ (E, ?). (2.45) (i,j)∈E • k=1 ij 38 CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION We can rewrite the objective function as X (i,j)∈E • " p X k=1 ek A # Xij = p X k=1 ij = (i,j)∈E • p X k=1 X ek Xij A ij ek • X(Ck ) for every X ∈ Sn (E, ?). A + ek ∈ SCk (k = 1, 2, . . . , p). Applying Lemma 2.2 to the constraint Note that the second equality follows from A n X ∈ S+ (E, ?) of the SDP (2.45), we obtain an SDP minimize p X ek • X(Ck ) A k=1 k subject to X(Ck ) ∈ SC + (k = 1, 2, . . . , p), (2.46) which is equivalent to the SDP (2.45). The SDP (2.46) involves multiple positive semidefinite matrix variables with overlapping elements. We have described two methods to convert such multiple matrix variables into independent ones with no overlapping elements in Sections 2.2.3 and 2.2.3, respectively. We apply the method given in Section 2.2.3 to the SDP (2.46). Let T (K, E) be a clique tree with K = {C1 , C2 , . . . , Cp } and E ⊆ K × K. Then, we obtain an SDP minimize p X k=1 subject to ek • X k A Eij • X k − Eij • X l = 0 ((i, j, k, l) ∈ Λ), k X k ∈ SC + (k = 1, 2, . . . , p), (2.47) which is equivalent to the SDP (2.46). Here Λ is given in (2.41). Theorem 2.15 A ∈ Sn (E, 0) is positive semidefinite if and only if the system of LMIs ek − L e k (z) < 0 (k = 1, 2, . . . , p). A (2.48) has a solution v = (vghkl : (g, h, k, l) ∈ Λ). Here z = (zghkl : (g, h, k, l) ∈ Λ) denotes a vector variable consisting of zghkl ((g, h, k, l) ∈ Λ), and X X e k (z) = − L Eij zijhk + Eij zijkl (i, j, h); (i, j, h, k) ∈ Λ (i, j, l); (i, j, k, l) ∈ Λ for every z = (zijkl : (i, j, k, l) ∈ Λ) (k = 1, 2, . . . , p). (2.49) Proof: In the previous discussions, we have shown that A ∈ Sn (E, 0) is positive semidefinite if and only if the SDP (2.47) has the optimal value 0. The dual of the SDP (2.47) is maximize 0 subject to (2.48). (2.50) The primal SDP (2.47) attains the objective value 0 at a trivial feasible solution (X1 , X2 , . . . , Xp ) = (0, 0, . . . , 0). If the dual SDP (2.50) is feasible or the system of LMIs (2.48) has a solution, then the primal SDP (2.47) has the optimal value 0 by the week duality theorem. Thus we have shown the “if part” of the theorem. Now suppose that the primal SDP (2.47) has the optimal value 0. The primal SDP (2.47) has an interior-feasible solution; for example, take X k to be the #Ck ×#Ck identity matrix in SCk (k = 1, 2, . . . , p). By the strong duality theorem (Theorem 4.2.1 of [69]), the optimal value of the dual SDP (2.50) is zero, which implies that (2.50) is feasible. As a corollary, we obtain the following (Theorem 2.3 of [1]). 2.2. EXPLOITING SPARSITY IN LINEAR AND NONLINEAR MATRIX INEQUALITIES 39 k Theorem 2.16 A ∈ Sn (E, 0) is positive semidefinite if and only if there exist Y k ∈ SC + (k = 1, 2, . . . , p) p X Y k. which decompose A as A = k=1 Proof: Since the “if part” is straightforward, we prove the “only if” part. Assume that A is positive ek − L e k (z̃) (k = 1, 2, . . . , p). semidefinite. By Theorem 2.15, the LMI (2.48) has a solution z̃. Let Y k = A p X k e k (z̃) = 0 by construction, we obtain the desired result. L Then Y k ∈ SC + (k = 1, 2, . . . , p). Since k=1 Conversely, Theorem 2.15 can be derived from Theorem 2.16. In the paper [1], Theorem 2.16 was proved by Theorem 7 of Grone et al. [28] (Lemma 2.2 in this thesis). We conclude this section by applying Theorem 2.15 to the case of the chordal graph G(N, E) given in (a) of Figure 2.3. The maximal cliques are C1 = {3, 4, 6}, C2 = {4, 5, 6}, C3 = {1, 6} and C4 = {2, 6}, so that A ∈ S6 (E, 0) is decomposed into 4 matrices e1 A e2 A e3 A e4 A or, e1 A e3 A = = = = = = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A33 A43 0 A63 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A54 0 0 A11 0 0 0 0 A61 0 0 A34 A44 0 A64 0 0 0 0 0 A36 0 A46 0 0 0 A66 0 0 0 A45 A55 A65 ∈ S{3,4,6} , 0 0 0 ∈ S{4,5,6} , 0 A56 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A16 0 0 0 0 0 0 0 0 0 0 0 0 0 A22 0 0 0 0 0 0 0 A62 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A26 0 0 0 0 0 0 0 0 ∈ S{1,6} , ∈ S{2,6} , A33 A34 A36 0 A45 0 e2 = A54 A55 A56 ∈ S{4,5,6} , A43 A44 A46 ∈ S{3,4,6} , A 0 A65 0 A63 A64 A66 A A A11 A16 22 26 e4 = ∈ S{2,6} ∈ S{1,6} , A A62 0 A61 0 (2.51) in the compact representation. We note that this decomposition is not unique. For example, we can move e1 to any other A ek . We showed two clique trees with K = {C1 , C2 , C3 , C4 } the (6, 6) element A66 from A in Figure 2.4. For the left clique tree, we have Λ = {(4, 4, 1, 2), (4, 6, 1, 2), (6, 6, 1, 2), (6, 6, 1, 3), (6, 6, 1, 4))}. 40 CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION Thus, the system of LMIs (2.48) becomes A33 A34 A43 A44 − z4412 A63 A64 − z4412 z4412 A45 z4612 A54 A55 A56 z4612 A65 z6612 A11 A16 < 0, A61 z6613 A36 < 0, A46 − z4612 A − z − z − z 6612 6613 6614 66 < 0, A22 A62 A26 z6614 < 0. For the right clique tree, we have Λ = {(4, 4, 1, 2), (4, 6, 1, 2), (6, 6, 1, 2), (6, 6, 2, 3), (6, 6, 3, 4)} and A33 A34 A36 A43 A44 − z4412 A46 − z4612 < 0, A A − z A − z 63 64 4412 66 6612 z4412 A45 z4612 A54 A55 A56 < 0, z A z − z 4612 65 6612 6623 A11 A16 A22 A26 < 0, < 0. A61 z6623 − z6634 A62 z6634 2.2.5 (2.52) (2.53) Exploiting the range-space sparsity In this section, we present two range-space conversion methods, the r-space conversion method using clique trees based on Theorem 2.15 and the r-space conversion method using matrix decomposition based on Theorem 2.16. The range-space conversion method using clique trees Let F = {(i, j) ∈ N × N : Mij (y) 6= 0 for some y ∈ Rs , i 6= j}. We call F the r-space sparsity pattern and G(N, F ) the r-space sparsity pattern graph of the mapping M : Rs → Sn . Apparently, M (y) ∈ Sn (F, 0) for every y ∈ Rs , but the graph G(N, F ) may not be chordal. Let G(N, E) be a chordal extension of G(N, F ). Then M (y) ∈ Sn (E, 0) for every y ∈ Rs . (2.54) Let C1 , C2 , . . . , Cp be the maximal cliques of G(N, E). fk (k = 1, 2, . . . , p) to decompose the mapping M : Rs → To apply Theorem 2.15, we choose mappings M n S such that M (y) = p X k=1 fk (y) for every y ∈ Rs , M fk : Rs → SCk (k = 1, 2, . . . , p). M (2.55) Let T (K, E) be a clique tree where K = {C1 , C2 , . . . , Cp } and E ⊂ K × K. By Theorem 2.15, y is a solution of (2.26) if and only if it is a solution of fk (y) − L e k (z) < 0 (k = 1, 2, . . . , p) M (2.56) e k in (2.49). for some z = (zghkl : (g, h, k, l) ∈ Λ), where Λ is given in (2.41) and L We may regard the r-space conversion method using clique trees described above as a dual of the d-space conversion method using clique trees applied to the SDP minimize M (y) • X subject to X < 0, (2.57) 2.2. EXPLOITING SPARSITY IN LINEAR AND NONLINEAR MATRIX INEQUALITIES 41 where X ∈ Sn denotes a variable matrix and y ∈ Rs a fixed vector. We know that M (y) < 0 if and only if the optimal value of the SDP (2.57) is zero, so that (2.57) serves as a dual of the matrix inequality M (y) < 0. Each element zijkl of the vector variable z corresponds to a dual variable of the equality constraint Eij • X k − Eij • X l = 0 in the problem (2.40), while each matrix variable X k ∈ SCk in the fk (y) − L e k (z) < 0. problem (2.40) corresponds to a dual matrix variable of the kth matrix inequality M Remark 2.5 On the flexibilities in implementing the r-space conversion method using clique trees, the comments in Remark 2.3 are valid if we replace the sizes of the matrix variable X k by the size of the fk : Rs → SCk and the number of equalities by the number of elements zijkl of the vector variable mapping M z. The correlative sparsity of (2.56) depends on the choice of the clique tree and the decomposition (2.55). As an example, we consider the case where M is tridiagonal, i.e., the (i, j)th element Mij of M is zero if |i − j| ≥ 2, to illustrate the range space conversion of the matrix inequality (2.26) into the system of matrix inequalities (2.56). By letting E = {(i, j) : |i − j| = 1}, we have a simple chordal graph G(N, E) with no cycle satisfying (2.54), its maximal cliques Ck = {k, k + 1} (k = 1, 2, . . . , n − 1), and a clique tree T (K, E) with K = {C1 , C2 , . . . , Cn−1 } and E = {(Ck , Ck+1 ) ∈ K × K : k = 1, 2, . . . , n − 2}. For every y ∈ Rs , let fk M (y) = Mkk (y) Mk,k+1 (y) ∈ SCk Mk+1,k (y) 0 Mn−1,n−1 (y) Mn−1,n (y) ∈ SCk Mn,n−1 (y) Mnn (y) if 1 ≤ k ≤ n − 2, if k = n − 1. fk : Rs → SCk (k = 1, 2, . . . , n − 1) as in (2.55) with Then, we can decompose M : Rs → Sn (E, 0) into M p = n − 1. We also see that Λ = e k (z) = L {(k + 1, k + 1, k, k + 1) : k = 1, 2, . . . , n − 2}, E22 z2212 ∈ SC1 −Ek,k zk,k,k−1,k + Ek+1,k+1 zk+1,k+1,k,k+1 ∈ SCk −En−1,n−1 zn−1,n−1,n−2,n−1 ∈ SCn−1 if k = 1, if k = 2, 3, . . . , n − 2, if k = n − 1, Thus the resulting system of matrix inequalities (2.56) is M11 (y) M12 (y) < 0, M21 (y) −z2212 Mkk (y) + zk,k,k−1,k Mk,k+1 (y) < 0 (k = 2, 3, . . . , n − 2), −zk+1,k+1,k,k+1 Mk+1,k (y) Mn−1,n−1 (y) + zn−1,n−1,n−2,n−1 Mn−1,n (y) < 0. Mn,n−1 (y) Mnn (y) The range-space conversion method using matrix decomposition By Theorem 2.16, we obtain that y ∈ Rs is a solution of the matrix inequality (2.26) if and only if there exist Y k ∈ SCk (k = 1, 2, . . . , p) such that p X k=1 k Y k = M (y) and Y k ∈ SC + (k = 1, 2, . . . , p). Let J = ∪pk=1 J(Ck ) and Γ(i, j) = {k : i ∈ Ck , j ∈ Ck } ((i, j) ∈ J). Then we can rewrite the condition above as X k Eij • Y k − Eij • M (y) = 0 ((i, j) ∈ J) and Y k ∈ SC (2.58) + (k = 1, 2, . . . , p). k∈Γ(i,j) 42 CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION We may regard the r-space conversion method using matrix decomposition as a dual of the d-space conversion method using basis representation applied to the SDP (2.57) with a fixed y ∈ Rs . Each variable Xij ((i, j) ∈ J) in the problem (2.42) corresponds to a dual real variable of the (i, j)th equality constraint of the problem (2.58), while X each matrix variable Y k in the problem (2.58) corresponds to a dual matrix k Eij Xij ∈ SC variable of the constraint + . (i,j)∈J(Ck ) Remark 2.6 On the flexibilities in implementing the r-space conversion method using matrix decomposition, the comments in Remsark 2.4 are valid if we replace the sizes of the semidefinite constraints by the sizes of the matrix variables Y k (k = 1, 2, . . . , p). We illustrate the r-space conversion method using matrix decomposition with the same example where M is tridiagonal as in Section 2.2.3 In this case, we see that p = n − 1, = {k, k + 1} (k = 1, 2, . . . , n − 1), = {(k, k), (k, k + 1), (k + 1, k + 1)} (k = 1, 2, . . . , n − 1), [ J = {(k, k) : k = 1, 2, . . . , n} {(k, k + 1) : k = 1, 2, . . . , n − 1}, {1} if i = j = 1, {k} if i = k, j = k + 1 and 1 ≤ k ≤ n − 1, Γ(i, j) = {k − 1, k} if i = j = k and 2 ≤ k ≤ n − 1, {n − 1} if i = j = n. Ck J(Ck ) Hence, the matrix inequality (2.26) with the tridiagonal M : Rs → Sn is converted into E11 • Y 1 − E11 • M (y) = 0, Ek,k+1 • Y k − Ek,k+1 M (y) = 0 (k = 1, 2, . . . , n − 1), Ekk • Y k−1 + Ekk • Y k − Ekk • M (y) = 0 (k = 2, . . . , n − 1), Enn • Y n−1 − Enn • M (y) = 0, k k Y Y kk k,k+1 k ∈ SC (k = 1, 2, . . . , n − 1). Yk = k k + Yk+1,k Yk+1,k+1 2.2.6 Enhancing the correlative sparsity When we are concerned with the SDP relaxation of polynomial SDPs (including ordinary polynomial optimization problems) and linear SDPs, another type of sparsity called the correlative sparsity plays an important role in solving the SDPs efficiently. The correlative sparsity was dealt with extensively in the paper [45], and was introduced in 2.1.4 . It is known that the sparse SDP relaxation [51, 102] for a correlatively sparse polynomial optimization problem leads to an SDP that can maintain the sparsity for primal-dual interior-point methods. See Section 6 of [45]. In this section, we focus on how the d-space and r-space conversion methods enhance the correlative sparsity. We consider a polynomial SDP of the form maximize f0 (y) subject to k Fk (y) ∈ Sm + (k = 1, . . . , p). (2.59) Here f0 ∈ R[y], Fk a mapping from Rn into Smk with all polynomial components in y ∈ Rn . For simplicity, we assume that f0 is a linear function of the form f0 (y) = bT y for some b ∈ Rn . In this case, with the definition from 2.1.4 the correlative sparsity pattern graph is given by the graph G(N, E) with the node set N = {1, 2, . . . , n} and the edge set i 6= j, both values yi and yj are necessary E = (i, j) ∈ N × N : . to evaluate the value of Fk (y) for some k When a chordal extension G(N, E) of the correlative sparsity pattern graph G(N, E) is sparse or all the maximal cliques of G(N, E) are small-sized, we can effectively apply the sparse SDP relaxation [51, 102] to 2.2. EXPLOITING SPARSITY IN LINEAR AND NONLINEAR MATRIX INEQUALITIES 43 the polynomial SDP (2.59). As a result, we have a linear SDP satisfying a correlative sparsity characterized by the same chordal graph structure as G(N, E). More details can be found in Section 6 of [45]. Even when the correlative sparsity pattern graph G(N, E) or its chordal extension G(N, E) is not sparse, the polynomial SDP may have “a hidden correlative sparsity” that can be recognized by applying the d-space and/or rspace conversion methods to the problem to decompose a large size matrix variable (and/or inequality) into multiple smaller size matrix variables (and/or inequalities). To illustrate this, let us consider a polynomial SDP of the form minimize bT y subject to F (y) ∈ Sn+ , where F denotes a mapping from Rn into Sn defined by 1 − y14 0 0 4 0 1 − y 0 2 .. . 0 0 F (y) = ... ... ... 0 0 0 y1 y2 y2 y3 y3 y4 ... ... .. . 0 0 y1 y2 y2 y3 0 y3 y4 ... 4 1 − yn−1 . . . yn−1 yn ... yn−1 yn 1 − yn4 . This polynomial SDP is not correlatively sparse at all (i.e., G(N, E) becomes a complete graph) because all variables y1 , y2 , . . . , yn are involved in the single matrix inequality F (y) ∈ Sn+ . Hence, the sparse SDP relaxation (2.18) is not effective for this problem. Applying the r-space conversion method using clique trees to the polynomial SDP under consideration, we have a polynomial SDP minimize bT y 1 − y14 y1 y2 < 0, subject to y y z 1 2 1 4 (2.60) 1 − yi yi yi+1 < 0 (i = 2, 3, . . . , n − 2), y y −z + z i i+1 i−1 i 4 1 − yn−1 yn−1 yn < 0, 4 yn−1 yn 1 − yn − zn−2 which is equivalent to the original polynomial SDP. The resulting polynomial SDP now satisfies the correlative sparsity as shown in Figure 2.5. Thus the sparse SDP relaxation (2.18) is efficient for solving (2.60). The correlative sparsity is important in linear SDPs, too. We have seen such a case in Section 2.2.1. We can rewrite the SDP (2.28) as n−1 X 0 0 0 Aii Xii + 2Ai,i+1 Xi,i+1 − Ann Xnn maximize − i=1 n−1 n X X (2.61) Ein Xi,i+1 < 0, Eii Xii + subject to I − i=1 i=1 X Eij Xij < 0, 1≤i≤j≤n where I denotes the n×n identity matrix. Since the coefficient matrices of all real variables Xij (1 ≤ i ≤ j ≤ n) are nonzero in the last constraint, the correlative sparsity pattern graph G(N, E) forms a complete graph. Applying the d-space conversion method using basis representation and the r-space conversion method using clique trees to the original SDP (2.28), we have reduced it to the SDP (2.29) in Section 2.1. We rewrite the constraints of the SDP (2.29) as an ordinary LMI form: maximize bT y s X (2.62) k k subject to A0 − Ah yh < 0 (k = 1, 2, . . . , p). h=1 44 CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION 0 0 5 5 10 10 15 15 20 20 25 25 30 30 35 35 0 5 10 15 20 nz = 218 25 30 35 0 5 10 15 20 nz = 128 25 30 35 Figure 2.5: The correlative sparsity pattern of the polynomial SDP (2.60) with n = 20, and its Cholesky factor with a symmetric minimum degree ordering of its rows and columns. Here p = 2n − 2, s = 3n − 3, each Akh is 2 × 2 matrix (k = 1, 2, . . . , p, h = 0, 1, . . . , 3n − 3), b ∈ R3n−3 , y ∈ R3n−3 , and each element yh of y corresponds to some Xij or some zi . Comparing the SDP (2.61) with the SDP (2.62), we notice that the number of variables is reduced from n(n + 1)/2 to 3n − 3, and the maximum size of the matrix inequality is reduced from n to 2. Furthermore, the correlative sparsity pattern graph becomes sparse. See Figure 2.6. 0 0 5 50 100 10 150 15 200 20 250 25 0 5 10 15 nz = 161 20 25 0 50 100 150 nz = 1871 200 250 Figure 2.6: The correlative sparsity pattern of the SDP (2.62) induced from (2.29) with n = 10 and n = 100, and its Cholesky factor with a symmetric minimum degree ordering of its rows and columns. Now we consider an SDP of the form (2.62) in general. The edge set E of the correlative sparsity pattern graph G(N, E) is written as E = (g, h) ∈ N × N : g 6= h, Akg 6= 0 and Akh 6= 0 for some k , where N = {1, 2, . . . , s}. It is known that the graph G(N, E) characterizes the sparsity pattern of the Schur 2.2. EXPLOITING SPARSITY IN LINEAR AND NONLINEAR MATRIX INEQUALITIES 45 complement matrix of the SDP (2.62). More precisely, if R denotes the s × s sparsity pattern of the Schur complement matrix, then Rgh = 0 if (g, h) 6∈ E • . Furthermore, if the graph G(N, E) is chordal, then there exists a perfect elimination ordering, a simultaneous row and column ordering of the Schur complement matrix that allows a Cholesky factorization with no fill-in. For the SDP induced from (2.29), we have seen the correlative sparsity pattern with a symmetric minimum degree ordering of its rows and columns in Figure 2.6, which coincides with the sparsity pattern of the Schur complement matrix whose symbolic Cholesky factorization is shown in Figure 2.1. Remark 2.7 As mentioned in Remark 2.5, the application of r-space conversion method using clique trees to reduce the SDP (2.28) to the SDP (2.29) can be implemented in many different ways. In practice, it should be implemented to have a better correlative sparsity in the resulting problem. For example, we can reduce the SDP (2.28) to n−1 X 0 0 0 Aii Xii + 2Ai,i+1 Xi,i+1 + Ann Xnn minimize i=1 1 0 X11 −X12 subject to < 0, − 0 0 −X −z 21 1 1 0 Xii −Xi,i+1 (2.63) < 0 (i = 2, 3, . . . , n − 2), − −zi 0 0 −Xi+1,i Xn−1,n−1 −Xn−1,n 1 0 Pn−2 < 0, − 0 1 z −X X + i n,n−1 n,n i=1 0 0 −Xii −Xi,i+1 − < 0 (i = 1, 2, . . . , n − 1), 0 0 −Xi+1,i −Xi+1,i+1 which is different from the SDP (2.29). This is obtained by choosing a different clique tree in the rspace conversion method using clique trees for the SDP (2.28). In this case, all auxiliary variables zi (i = 1, 2, . . . , n−2) are contained in a single matrix inequality. This implies that the corresponding correlative sparsity pattern graph G(N, E) involves a clique of size n − 2. See Figure 2.7. Thus the correlative sparsity becomes worse than the previous conversion. Among various ways of implementing the d- and r-space conversion methods, determining which one is effective for a better correlative sparsity will be a subject which requires further study. 0 0 5 50 100 10 150 15 200 20 250 25 0 5 10 15 nz = 217 20 25 0 50 100 150 nz = 11377 200 250 Figure 2.7: The correlative sparsity pattern of the SDP (2.62) induced from (2.63) with n = 10 and n = 100 where the rows and columns are simultaneously reordered by the Matlab function symamd (a symmetric minimum degree ordering). 46 CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION 2.2.7 Examples of d- and r-space sparsity in quadratic SDP We present how to take advantage of the d- and r-space conversion methods introduced in the previous sections for the class of quadratic SDP, and demonstrate the effectiveness of these methods on examples of quadratic SDP. In the following we first consider a quadratic SDP of the form minimize s X ci xi subject to M (x) < 0, (2.64) i=1 where ci ∈ [0, 1] (i = 1, 2, . . . , s), M : Rs → Sn , and each non-zero element Mij of the mapping M : Rs → Sn is a polynomial in x = (x1 , x2 , . . . , xs ) ∈ Rs with degree at most 2. We apply d- and r-space conversion methods to (2.64), get a new quadratic SDP with smaller size matrix inequality constraints and relax this quadratic SDP to obtain a linear SDP which can be solved by standard SDP solvers. The test problems of quadratic SDP we consider for numerical experiments are three max-cut problems, a Lovas theta problem, a box-constrained quadratic problem from SDPLIB [9], a sensor network localization problem and discretized partial differential equations (PDE) with Neumann and Dirichlet boundary conditions. In fact, a more detailed and systematic study of SDP relaxations exploiting d- and r-space sparsity in the case of quadratic optimization problems derived from PDEs compared to the hierarchy of sparse SDP relaxations (2.18) is presented in Chapter 3. SDP relaxations of a quadratic SDP In this subsection, we apply the d- and r-space conversion methods to the quadratic SDP (2.64), and derive four kinds of SDP relaxations: (a) a dense SDP relaxation without exploiting any sparsity. (b) a sparse SDP relaxation by applying the d-space conversion method using basis representation given in 2.2.3. (c) a sparse SDP relaxation by applying the r-space conversion method using clique trees in 2.2.5. (d) a sparse SDP relaxation by applying both, the d-space conversion method using basis representation and the r-space conversion method using clique trees. We write each non-zero element Mij (x) as Mij (x) = Qij • 1 xT x xxT for every x ∈ Rs . for some Qij ∈ S1+s . Assume that the rows and columns of each Qij are indexed from 0 to s. Let us cij : Rs × Ss → R of the quadratic function Mij : Rs → R: introduce a linearization (or lifting) M T cij (x, X) = Qij • 1 x M for every x ∈ Rs and X ∈ Ss , x X c : Rs × Ss → Sn of M : Rs → Sn whose (i, j)th element is M cij . which induces a linearization (or lifting) M Then we can describe the dense SDP relaxation (a) for (2.64) as minimize n X i=1 c(x, X) < 0 and ci xi subject to M 1 x xT X For simplicity, we rewrite the dense SDP relaxation above as n X c(W ) < 0, W00 = 1 and W < 0, ci W0i subject to M (a) minimize i=1 < 0. 2.2. EXPLOITING SPARSITY IN LINEAR AND NONLINEAR MATRIX INEQUALITIES 47 where T s (W01 , W02 , . . . , W0s ) = x ∈ R and W = 1 xT x X ∈ S1+s . Let G(N ′ , F ′ ) be the d-space sparsity pattern graph for the SDP (a) with N ′ = {0, 1, . . . , s}, and F ′ = the set ofPdistinct row and column index pairs (i, j) of Wij that is necessary to evaluate the objective n function i=1 ci W0i and/or the LMI M (W ) < 0. Let G(N ′ , E ′ ) be a chordal extension of G(N ′ , F ′ ), and ′ ′ C1 , C2 , . . . , Cr′ be the maximal cliques of G(N ′ , E ′ ). Applying the d-space conversion method using basis representation, we obtain the SDP relaxation n X ci W0i minimize i=1 c((Wij : (i, j) ∈ J)) < 0, W00 = 1, (b) subject to M X C′ Eij Wij ∈ S+k (k = 1, 2, . . . , r). (i,j)∈J(Ck′ ) Here J = ∪rk=1 J(Ck′ ), (Wij : (i, j) ∈ J) = the vector variable of the elements Wij ((i, j) ∈ J) and c((Wij : (i, j) ∈ J)) M = c(W ) for every W ∈ Ss (E ′ , 0). M To apply the r-space conversion method using clique trees to the quadratic SDP (2.64), we assume that M : Rs → Sn (E, 0) for some chordal graph G(N, E) where N = {1, 2, . . . , n} and E ⊆ N × N . Then, we convert the matrix inequality M (x) < 0 in (2.64) into an equivalent system of matrix inequalities (2.56). The application of the LMI relaxation described above to (2.56) leads to the SDP relaxation n X minimize ci W0i (c) i=1 k e k (z) < 0 (k = 1, 2, . . . , p), W00 = 1, W < 0, subject to M (W ) − L k fk : Rs → SCk . We may apply the where M : S1+s → SCk denotes a linearization (or lifting) of M linearization to (2.64) first to derive the dense SDP relaxation (a), and then apply the r-space conversion method using clique trees to (a). This results in the same sparse SDP relaxation (c) of (2.64). Note that both c take values from Sn (E, 0). Thus, they provide the same r-space sparsity pattern characterized M and M by the chordal graph G(N, E). Finally, the sparse SDP relaxation (d) is derived by applying the d-space conversion method using basis representation to the the sparse LMI relaxation (c). We note that the d-space sparsity pattern graph for the SDP (c) with respect to the matrix variable W ∈ S1+s is the same as the one for the SDP (a). Hence, the sparse SDP relaxation (d) is obtained in the same way as the SDP (b) is obtained from the SDP (a). Consequently, we have the sparse SDP relaxation n X ci W0i minimize i=1 k e k (z) < 0 (k = 1, 2, . . . , p), W00 = 1, (d) subject to M ((Wij : (i, j) ∈ J)) − L X Cj′ Eαβ Wαβ ∈ S+ (j = 1, 2, . . . , r). ′ (α,β)∈J(Ck ) Here J = ∪rk=1 J(Ck′ ), (Wij : (i, j) ∈ J) = the vector variable of the elements Wij ((i, j) ∈ J) and k M ((Wij : (i, j) ∈ J)) = k M (W ) for every W ∈ Ss (E ′ , 0). Quadratic SDPs with d- and r-sparsity from randomly generated sparse graphs Quadratic SDP problems were constructed by first generating two graphs G(Nd , Ed ) with Ns = {1, 2, . . . , 1+ s} and G(Nr , Er ) with Nr = {1, 2, . . . , n} using the Matlab program generateProblem.m [44], which was 48 CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION developed for sensor network localization problems. Sparse chordal extensions G(Nd , E d ) and G(Nr , E r ) were then obtained by the Matlab functions symamd.m and chol.m. Next, we generated data matrices Qij ∈ S1+s (i = 1, 2, . . . , n, j = 1, 2, . . . , n) and a data vector c ∈ Rs so that the d- and r-space pattern graphs of the resulting quadratic SDP coincide with G(Nd , Ed ) and G(Nr , Er ), respectively. Some characteristics of the chordal extensions G(Nd , E d ) of G(Nr , Er ) and G(Nr , E r ) of G(Nr , Er ) used in the experiments are shown in Table 4. For the problem with s = 40 and n = 640, the d- and r-space sparsity pattern obtained from the symmetric approximate minimum degree permutation of rows and columns by the Matlab function symamd.m is displayed in Figure 2.8. s 80 320 40 40 n 80 320 160 640 Domain space sparsity #E d NoC Max Min 143 63 3 3 649 260 7 3 70 30 3 3 70 30 3 3 Range space sparsity #E d NoC Max Min 216 72 7 3 840 301 9 3 426 150 7 3 1732 616 13 3 Table 2.2: Some characteristics of d- and r-sparsities of the tested quadratic SDPs. #E d (or #E r ) denotes the number of edges of G(Nd , E d ) (or G(Nr , E r )), NoC the number of the maximal cliques of G(Nd , E d ) (or G(Nr , E r )), Max the maximum size of the maximal cliques of G(Nd , E d ) (or G(Nr , E r )), and Min the minimum size of the maximal cliques of G(Nd , E d ) (or G(Nr , E r )). 0 0 5 100 10 200 15 300 20 25 400 30 500 35 600 40 0 5 10 15 20 25 nz = 181 30 35 40 0 100 200 300 nz = 3432 400 500 600 Figure 2.8: The d-space sparsity pattern of the quadratic SDP with s = 40 and n = 640 on the left and the r-space sparsity pattern on the right. Table 5 shows numerical results on the quadratic SDPs whose d- and r- sparsity characteristics are given in Table 4. We observe that both the d-space conversion method using basis representation in (b) and the r-space conversion method using clique tree in (c) work effectively, and that their combination (d) results in the shortest CPU time among the four methods. Quadratic SDPs arising from applications For additional numerical experiments, we selected five SDP problems from SDPLIB [9], an quadratic SDP from a sensor network localization problem, and two quadratic SDP derived from PDE with Neumann and Dirichlet boundary condition. The test problems in Table 2.4 are 2.3. REDUCTION TECHNIQUES FOR SDP RELAXATIONS FOR LARGE SCALE POP s 80 n 80 320 320 40 160 40 640 49 SeDuMi CPU time in seconds (the size of the Schur complement matrix, the max. size of matrix variables) (a) (b) (c) (d) 296.51 1.38 1.58 0.73 (3321, 81) (224, 80) ( 801, 81) (252, 19) OOM 74.19 80.09 35.20 (970, 320) (322, 321) (1216, 20) 6.70 4.22 2.91 0.74 (861, 160) (111, 160) (1626, 41) (207, 21) 158.95 151.20 120.86 5.71 (861, 640) (111, 640) (6776, 41) (772, 21) Table 2.3: Numerical results on the quadratic SDPs with d- and r- sparsity from randomly generated sparse graphs. OOM indicates out of memory error in Matlab. mcp500-1, maxG11, maxG32: An SDP relaxation of the max cut problem from SDPLIB. thetaG11: An SDP relaxation of the Lovasz theta problem from SDPLIB. qpG11: An SDP relaxation of the box constrained quadratic problem form SDPLIB. d2n01s1000a100FSDP: A full SDP relaxation [6] of the sensor network localization problem with 1000 sensors, 100 anchors distributed in [0, 1]2 , radio range = 0.1, and noise = 10%. The method (iii) in Table 2.4 for this problem is equivalent to the method used in SFSDP [43], a sparse version of full SDP relaxation. ginzOrNeum(11): An SDP relaxation of the discretized nonlinear, elliptic PDE (4.4) (Case II, Neumann boundary condition) of [61]. We choose a 11 × 11 grid for the domain [0, 1]2 of the PDE. pdeBifurcation(20): An SDP relaxation of the discretized nonlinear, elliptic PDE (4.5) (Dirichlet boundary condition) of [61]. We choose a 20 × 20 grid for the domain [0, 1]2 of the PDE. The SDP relaxations (i), (ii) and (iii) in Table 2.4 indicate (i) a dense SDP relaxation without exploiting any sparsity. (ii) a sparse SDP relaxation by applying the d-space conversion method using clique trees given in Section 3.1. (iii) a sparse SDP relaxation by applying the d-space conversion method using basis representation given in Section 3.2. Table 2.4 shows that CPU time spent by (ii) is shorter than that by (i) and (iii) for all tested problems except for mcp500-1 and pdeEllipticNeum11. Notice that it took shorter CPU time to solve (iii) than (i) except for maxG32 and thetaG11. We confirm that applying at least one of the d-space conversion methods greatly reduces CPU time for the test problems. The d-space sparsity patterns for the test problems are displayed in Figures 2.9 and 2.10. 2.3 Reduction techniques for SDP relaxations for large scale POP The global minimization of a multivariate polynomial over a semialgebraic set is a severely nonconvex, difficult optimization problem in general. In 2.1.3 a hierarchy of SDP relaxations has been proposed whose 50 CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION Problem mcp500-1 maxG11 maxG32 thetaG11 qpG11 d2n01s1000a100FSDP ginzOrNeum(11) pdeBifurcation(20) SeDuMi CPU time (size.SC.mat., Max.size.mat.var.) (i) (ii) (iii) 65.5 (500, 500) 94.5 (7222, 44) 15.9 (2878, 44) 220.5 (800, 800) 12.1 (2432, 80) 26.8 (8333, 24) 5373.8 (2000, 2000) 971.4 (13600, 210) OOM 345.9 (2401, 801) 23.9 (4237, 81) 458.5 (9134, 25) 2628.5 (800, 1600) 16.0 (2432, 80) 72.5 (9133, 24) 5193.5 (4949, 1002) 16.9 (7260, 45) 19.5 (15691, 17) 216.1 (1453, 485) 2.2 (1483, 17) 2.1 (1574, 4) 1120.4 (2401, 801) 4.3 (2451, 17) 5.3 (2001, 3) Table 2.4: Numerical results on SDPs from some applications. size.SC.mat. denotes the size of the Schur complement matrix and Max.size.mat.var. the maximum size of matrix variables. optima have been proven to converge to the optimum of a POP for increasing order of the relaxation. The practical use of this powerful theoretical result has been limited by the capacity of current SDP solvers, as the size of the SDP relaxations grows rapidly with increasing order. A first approach to attempt this problem has been the concept to exploit structured sparsity in a POP [47]. Whenever a POP satisfies a certain sparsity pattern, a convergent sequence of sparse SDP relaxations (2.18) of substantially smaller size can be constructed. Compared to the dense SDP relaxation (2.8), the sparse SDP relaxation (2.18) can be applied to POPs of larger scale. Still, the size of the sparse SDP relaxation remains the major obstacle in order to solve large scale POPs, which contain polynomials of higher degree. We propose a substitution procedure to transform an arbitrary POP into an equivalent quadratic optimization problem (QOP). It is based on replacing quadratic terms in higher degree monomials by new variables successively, and adding the substitution relations as constraints to the optimization problem. The idea to transform a POP into an equivalent QOP can be traced back to Shor [92], who exploited it to derive dual lower bounds for the minimum of a polynomial function. As the substitution procedure is not unique, we introduce different heuristics which aim at deriving a QOP with as few additional variables as possible. Moreover, we show that sparsity of a POP is maintained under the substitution procedure. The main advantage of deriving an equivalent QOP for a POP is that the sparse SDP relaxation of first order can be applied to solve it approximately. The substitution procedure and the considerations to minimize the number of additional variables while maintaining the sparsity are presented in 2.3.1. While a POP and the QOP derived from it are equivalent, we face the problem that the quality of the SDP relaxation for a QOP deteriorates in many cases. We discuss in 2.3.2 how to tighten the SDP relaxation for a QOP in order to achieve good approximations to the global minimum even for SDP relaxation of first or second order. For that purpose methods as choosing appropriate lower and upper bounds for the multivariate variables, Branch-and-Cut bounds to shrink the feasible region of the SDP relaxation and locally convergent optimization methods are proposed. Finally, the power of this technique is demonstrated in 2.3.3, where it is applied to solve various large scale POP of higher degree. 2.3.1 Transforming a POP into a QOP The aim of 2.3 is to propose a technique to reduce the size of SDP relaxations for general POPs, which enables us to attempt large scale polynomial optimization efficiently. This technique, which transforms a POP into an equivalent QOP, reduces the size of the SDP relaxation by decreasing the minimum relaxation order ωmax , whereas the technique due to [102] presented in 2.1.4 aims at reducing the SDP relaxation n+ω by replacing matrix inequality constraints of size through matrix inequality constraints of size ω 51 2.3. REDUCTION TECHNIQUES FOR SDP RELAXATIONS FOR LARGE SCALE POP 0 200 0 0 100 200 200 400 300 600 400 800 500 1000 600 1200 700 1400 400 600 800 1000 1200 1400 1600 1800 2000 800 0 500 1000 nz = 10000 1500 2000 0 100 200 300 400 nz = 5601 500 600 700 800 1600 0 200 400 600 800 1000 nz = 4800 1200 1400 1600 Figure 2.9: The d-space sparsity pattern with the symmetric approximate minimum degree permutation of rows and columns provided by the Matlab function symamd.m for maxG32(left), thetaG11(middle) and qpG11(right). 0 0 100 50 200 100 300 150 400 200 0 100 200 300 500 400 250 600 300 700 500 350 600 800 400 700 900 450 1000 0 200 400 600 nz = 9138 800 1000 0 50 100 150 200 250 300 nz = 2421 350 400 450 800 0 100 200 300 400 nz = 3201 500 600 700 800 Figure 2.10: The d-space sparsity pattern with the symmetric approximate minimum degree permutation of rows and columns provided by the Matlab function symamd.m for d2n01s1000a100FSDP(left), pdeEllipticNeum11(middle) and pdeEllipticBifur20(right). ni + ω p . A general QOP is a special case of the POP (2.1), where the polynomials p, gi (i = 1, . . . , m) ω are at most of degree 2. With respect to the definition of ωk , the minimal relaxation order ωmax of the sparse SDP relaxation (2.18) equals one. As pointed out in [102], the sparse SDP relaxation sSDP1 and the dense SDP relaxation dSDP1 of order one are equivalent for any QOP. The equivalence of a QOP and its SDP relaxation has been shown for a few restrictive classes of QOPs. For instance, if in a QOP p and −gi (i = 1, . . . , m) are convex quadratic polynomials, the QOP is equivalent to the corresponding SDP relaxation [56]. Also, equivalence of QOPs and their SDP relaxations was shown for the class of uniformly OD-nonpositive QOPs [41]. As shown in [52], min(sSDPω ) → min(POP) for ω → ∞, but to the best of our knowledge there is no result for a rate of convergence or guaranteed approximation of the global minimum for a fixed relaxation order ω ≥ ωmax in the case of a general POP. To illustrate the idea of our transformation technique, consider the following example of a simple unconstrained POP, whose optimal value is −∞: min 10x31 − 102 x31 x2 + 103 x21 x22 − 104 x1 x32 + 105 x42 (2.65) It is straight forward that POP (2.65) is equivalent to min 10x1 x3 − 102 x3 x4 + 103 x24 − 104 x4 x5 + 105 x25 s.t. x3 = x21 , x4 = x1 x2 , x5 = x22 , (2.66) 52 CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION where we introduced three additional variables x3 , x4 and x5 . Obviously QOP (2.66) is not the only QOP equivalent to POP (2.65): The QOP min s.t. 10x3 − 102 x2 x3 + 103 x5 x6 − 104 x1 x4 + 105 x2 x4 x3 = x1 x5 , x4 = x2 x6 , x5 = x21 , x6 = x22 , (2.67) is equivalent to (2.65) as well. We notice the number of additional variables in QOP (2.66) equals three, whereas it equals four in QOP (2.67). Thus, there are numerous ways to transform a higher degree POP into a QOP in general. For the transformation procedures we are proposing, we consider 1) the number of additional variables should be as small as possible, in order to obtain a SDP relaxation of smaller size, 2) sparsity of a POP should be maintained under the transformation and 3) the quality of the SDP relaxation for the derived QOP should be as good as possible. How to deal with 3) is discussed in 2.3.2, 1) and 2) are discussed in the following. Maintaining sparsity The transformation proposed in the previous subsection raises the question, whether the correlative sparsity of a POP is preserved under the transformation, i.e., whether the resulting QOP is correlatively sparse as well. Let POP⋆ be a correlatively sparse POP of dimension n, G(N, E ′ ) the chordal extension of its csp graph, (C1 , . . . , Cp ) the maximal cliques of G(N, E ′ ) and nmax = maxi=1,...,p | Ci |. Let xn+1 = xi xj be ˜ denote the POP derived after substituting the substitution variable for some i, j ∈ {1, . . . , n}. Let POP ⋆ ′ xn+1 = xi xj in POP . Given the chordal extension G(N, E ) of the csp graph of POP⋆ , a chordal extension ˜ over the vertex set Ñ = N ∪ {n + 1} can be obtained by the extension: For a clique of the csp graph of POP Cl with {i, j} ⊂ Cl add the edges {v, n + 1} for all v ∈ Cl and obtain the clique C̃l . For each clique Ck not containing {i, j}, set C̃k = Ck . In the end we obtain the graph G(Ñ , Ẽ ′ ) which is a chordal extension of ˜ the csp graph G(Ñ , Ẽ) of POP. Note, (C̃1 , . . . , C̃p ) are maximal cliques for G(Ñ , Ẽ ′ ) and for all C̃l holds | C̃l |≤| Cl | +1, i.e. ñmax ≤ nmax + 1. Moreover, the number of maximal cliques p remains unchanged under the transformation. As pointed out, G(Ñ , Ẽ ′ ) is one possible chordal extension of G(Ñ , Ẽ). It seems reasonable to expect that the heuristics we are using for the chordal extension, such as the reverse CuthillMcKee and the symmetric minimum degree ordering, add less edges to G(Ñ , Ẽ) than we did in constructing G(Ñ , Ẽ ′ ). Thus, we are able to apply the sparse SDP relaxations efficiently to the POPs derived after each iteration of the transformation algorithm. For illustration we consider Figure 2.11 and Figure 2.12, where the csp matrices of two POPs and their QOPs are pictured. 0 0 5 10 10 20 15 30 20 40 25 50 30 60 35 70 0 5 10 15 20 nz = 598 25 30 35 0 10 20 30 40 nz = 940 50 60 70 Figure 2.11: CSP matrix of the chordal extension of POP pdeBifurcation(7) (left) and its QOP (right) derived under strategy BI. 53 2.3. REDUCTION TECHNIQUES FOR SDP RELAXATIONS FOR LARGE SCALE POP 0 0 5 10 10 20 15 20 30 25 40 30 50 35 40 60 45 70 0 5 10 15 20 25 nz = 412 30 35 40 45 0 10 20 30 40 nz = 624 50 60 70 Figure 2.12: CSP matrix of the chordal extension of POP Mimura(25) (left) and its QOP (right) derived under strategy BI. We observe that the sparsity pattern of the chordal extension of the csp graph is maintained under the substitution procedure. However, if the number of substitutions, which is required to transform a higher degree POP into a QOP, is far greater than the number of variables of the original POP, it may occur that we obtain a dense QOP under the transformation procedure. To illustrate this effect, consider the chordal extension of csp matrix of the QOP derived for the POP randomEQ(7,3,5,8,0) which is pictured in Figure 2.13. In that example, the number n of variables of the original POP equals seven, the number of additional variables equals 108. 0 0 1 20 2 40 3 4 60 5 80 6 100 7 8 0 1 2 3 4 nz = 43 5 6 7 8 0 20 40 60 nz = 2235 80 100 Figure 2.13: CSP matrix of the chordal extension of POP randomWithEQ(7,3,5,8,0) (left) and its QOP (right). Minimizing the number of additional variables Let n denote the number of variables involved in a POP and ñ the number of variables in the corresponding QOP. The first question we are facing is, how to transform a POP into a QOP such that the number k0 := ñ − n of additional variables is as small as possible. Each additional variable xn+k corresponds to 54 CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION the substitution of a certain quadratic monomial xi xj by xn+k . Given an arbitrary POP, the question to find a substitution procedure minimizing ñ is a difficult problem. We propose four different heuristics for transforming a POP into a QOP, which aim at reducing the number k0 of additional variables. At the end of this section we give some motivation, why it is more important to find a strategy optimizing the quality of the SDP relaxation than one that minimizes the number k0 of additional variables. Our transformation algorithm iterates substitutions of pairs of quadratic monomials xi xj in the higher degree monomials in objective function and constraints by a new variable xn+k , and adding the substitution relation xn+k = xi xj as constraints to the POP. Let POP0 denote the original POP, and POPk the POP obtained after the k-th iteration, i.e. after substituting xn+k = xi xj and adding it as constraint to POPk−1 . The algorithm terminates as soon as POPk0 is a QOP for some k0 ∈ N. In each iteration of the transformation algorithm we distinguish two steps. The first one is to choose which pair of variables (xi , xj ) (1 ≤ i, j ≤ n + k) is substituted by the additional variable xn+k+1 . The second one is to choose to which extent xi xj is substituted by xn+k+1 in each higher degree monomial. Step 1: Choosing the substitution variables k Definition 2.7 Let POPk be a POP of dimension ñ with m̃ constraints (g1k , . . . , gm ). The higher monok k mial set MS of POP is given by MkS = α ∈ Nñ | ∃i ∈ {0, . . . , m̃} s.t. α ∈ supp(gik ) and | α |≥ 3 , where g0 := p, gi0 := gi , and the higher monomial list Mk of POPk by Mk = (α, wα ) | α ∈ MkS and wα := # i | α ∈ supp(gik ) . By Definition 2.7, the higher monomial list of a QOP is empty. Definition 2.8 Given α ∈ Nn and a pair (i, j) where 1 ≤ i, jα ≤ n, we define the dividing coefficient x xα α ∈ R[x] and / R[x]. ki,j ∈ N0 as the integer that satisfies α k kα +1 ∈ i,j i,j (xi xj ) k (xi xj ) 0 Given POP the k-th iterate of POP and its higher monomial list Mk , determine the symmetric matrix C(POPk ) ∈ R(n+k)×(n+k) given by X α ki,j wα . C(POPk )i,j = C(POPk )j,i = (α,wα )∈Mk We consider two alternatives to choose a pair (xi , xj )(1 ≤ i, j ≤ n + k) to be substituted by xn+k+1 : A. Naive criterion: xα xi xj ∈ R[x]. Choose a pair (xi , xj ) such that there exists a α ∈ MS (POPk ) which satisfies B. Maximum criterion: Choose a pair (xi , xj ) such that C(POPk )i,j ≥ C(POPk )u,v ∀1 ≤ u, v ≤ n+k. Step 2: Choose the substitution strategy Next we have to decide to what extent we substitute xn+k+1 = xi xj in each monomial of MS (POPk ). We will distinguish full and partial substitution. Let us demonstrate the importance of considering that question on the following two examples. Example 2.5 Consider two different substitution strategies for transforming the problem to minimize x41 into a QOP: min x41 (1) ւ ց (2) min x22 min x21 x2 s.t. x2 = x21 s.t. x2 = x21 (2.68) ↓ min x1 x3 s.t. x2 = x21 x3 = x1 x2 2.3. REDUCTION TECHNIQUES FOR SDP RELAXATIONS FOR LARGE SCALE POP 55 In both substitution strategies, we choose x21 for substitution in the first step. In (1) we fully substituted x21 by x2 , whereas in (2) we substituted x21 partially. By choosing full substitution in the first iteration in (1), we need one additional variable to obtain a QOP, partial substitution requires two additional variables to yield a QOP. Example 2.6 (1) min s.t. ↓ min s.t. ↓ min s.t. ւ x33 x1 x2 x3 ≥ 0 x3 = x21 min x61 s.t. x31 x2 ≥ 0 ց min s.t. ↓ min s.t. x3 x4 x1 x2 x3 ≥ 0 x3 = x21 x4 = x23 (2) x21 x23 x1 x2 x3 ≥ 0 x3 = x21 x24 x2 x4 ≥ 0 x3 = x21 x4 = x1 x2 (2.69) x3 x4 x2 x5 ≥ 0 x3 = x21 x4 = x23 x5 = x1 x3 In this example full substitution (1) of x21 requires three, and partial substitution (2) only two additional variables to yield a QOP. The examples illustrate it depends on the structure of the higher monomial set, whether partial or full substitution requires less additional variables and results in a smaller size of the SDP relaxation. In general partial and full substitution are given as follows. I. Full substitution: Let tf ri,j : R[x] → R[z], where x ∈ Rr and z ∈ Rr+1 for a r ∈ N and i, j ∈ {1, . . . , r}, be a linear operator defined by its mappings for each monomial xα , ( min(α ,α ) αj−1 αj −min(αi ,αj ) αj+1 αi−1 αi −min(αi ,αj ) αi+1 zj zj+1 . . . zrαr zr+1 i j , if i 6= j, z1α1 . . . zi−1 zi zi+1 . . . zj−1 r α αi tf i,j (x ) = ⌊ 2 ⌋ αi−1 mod(αi ,2) αi+1 zi+1 . . . zrαr zr+1 zi z1α1 . . . zi−1 , if i = j. P substitutes xi xj Thus, tf ri,j (g(x)) = α∈supp(g) cα (g)tf ri,j (xα ) for any g ∈ R[x]. The operator tf n+k i,j by xn+k+1 in each monomial to the maximal possible extent. II. Partial substitution: Let tp ri,j : R[x] → R[z], where x ∈ Rr and z ∈ Rr+1 for a r ∈ N and i, j ∈ {1, . . . , r}, be a linear operator defined by its mappings for each monomial xα , if i 6= j, tf ri,j (xα ), t r (xα ), if i = j and αi odd, f i,j tp ri,j (xα ) = r α if i = j and log2 (αi ) ∈ N0 , tf i,j (x ), z α1 . . . z αi−1 z gi z αi+1 . . . z αr z 21 (αi −gi ) , else, 1 i−1 i i+1 r where gi := gcd(2⌊log2 (αi )⌋ , αi ). Thus, tp ri,j (g(x)) = r+1 P r α α∈supp(g) cα (g)tp i,j (x ) for any g ∈ R[x]. We notice that full and partial substitution only differ in the case i = j, αi even and log2 (αi ) ∈ / N0 holds. By pairwise combining the choice of A or B in Step 1 and the choice of I or II in Step 2, we obtain four different procedures to transform POPk−1 into POPk that we denote as AI, AII, BI and BII. We do not expect AI or AII to result in a POP with a small number of substitutions, as A does not take into account 56 CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION the structure of the higher degree monomial list MSk−1 , but we use AI and AII to evaluate the potential of BI and BII. The numerical performance of these four procedures is demonstrated on some example POPs in Table 2.6, where n denotes the number of variables in the original POP, deg the degree of the highest order polynomial in the POP, and k0 the number of additional variables required to transform the POP into a QOP under the respective substitution strategy. The POPs pdeBifurcation(n) are derived from discretizing differential equations, which is the topic of Chapter 3, the other POPs are test problems from [102]. As expected, strategy B is superior to A for all but one example class of POP, when reducing the number of variables is concerned. The entire algorithm to transform a POP into a QOP can be summarized by the scheme in Table 2.5. As mentioned before the QOP of dimension n + k derived by AI, AII, BI or BII is equivalent to the original POP of dimension n. In fact it is easy to see, if x̃ ∈ Rn+k an optimal solution of the QOP, the vector (x̃1 , . . . , x̃n ) of the first n components of x̃ is an optimizer of the original POP. INPUT WHILE 1. 2. 3. OUTPUT POP0 with M0S MkS 6= ∅ Determine the pair (xi , xj ) for substitution by A or B. Apply tf ki,j or tp ki,j to each polynomial in POPk and derive POPk+1 . Update k → k + 1, POPk → POPk+1 , MkS → Mk+1 S . QOP = POPk0 Table 2.5: Scheme for transforming a POP into a QOP. POP BroydenBand(20) BroydenBand(60) nondquar(32) nondquar(8) optControl(10) randINEQ(8,4,6,8,0) randEQ(7,3,5,8,0) pdeBifurcation(5) pdeBifurcation(10) randINEQ(3,1,3,16,0) randUnconst(3,2,3,14,0) n 20 60 32 8 60 8 7 25 100 3 3 deg 6 6 4 4 4 8 8 3 3 16 14 k0 (AI) 229 749 93 21 60 253 135 25 100 145 86 k0 (AII) 211 691 93 21 60 307 146 25 100 192 107 k0 (BI) 60 180 94 22 60 248 116 25 100 105 63 k0 (BII) 40 120 94 22 60 238 115 25 100 117 69 Table 2.6: Number of required variables for strategies AI, AII, BI and BII. Computational complexity Finally, let us consider how the size of the sparse SDP relaxation of order ω = 1 for a QOP depends on the number k0 of additional variables. Let a sparse POP of dimension n be given by the polynomials (p, g1 , . . . , gm ) and the maximal cliques (C1 , . . . , Cp ) of the chordal extension. With the construction in Maintaining sparsity above, the corresponding QOP of dimension ñ = n + k0 has the maximal cliques (C̃1 , . . . , C̃p ) such that Ci ⊆ C̃i and ñi ≤ ni + k0 for all (i = 1, . . . , p), where ni =| Ci | and ñi =| C̃i |. All partial localizing matrices M0 (gk y, F̂k ) are scalars in sSDP1 (QOP ). The size of the partial moment matrices M1 (y, C̃i ) is d(1, ñi ) = ñi + 1 ≤ ni + k0 + 1 = O(k0 ). (2.70) 2.3. REDUCTION TECHNIQUES FOR SDP RELAXATIONS FOR LARGE SCALE POP 57 Thus, the size of the linear matrix inequality is bounded by m+k X0 j=1 1+ p X i=1 d(1, ñi ) ≤ m + k0 + p (nmax + k0 + 1) ≤ m + k0 + n (nmax + k0 + 1). (2.71) The length of the vector variable y in sSDP1 (QOP ) is bounded by Pp Pp | y | ≤ i=1 | y(C̃p ) |= i=1 d(2, 2ñi ) ≤ 12 p (2nmax + 2k0 + 2) (2nmax + 2k0 + 1) ≤ 2p (nmax + k0 + 1)2 ≤ 2n (nmax + k0 + 1) = O(k02 ). (2.72) Thus, the size of the linear matrix inequalities of the sparse SDP relaxation is linear and the length of the moment vector y quadratic in the number k0 of additional variables. For this reason the computational cost does not grow too fast, even if k0 is not minimal. Heuristics BI and BII are sufficient in order to derive QOP with a small number k0 of additional variables. Moreover, the bounds (2.71) and (2.72) for the size of the primal and dual variables of the SDP relaxation for the QOP are to be compared to the respective bounds for the SDP relaxation of the POP. If we assume ωmax = ωi for all i ∈ {1, . . . , m}, the size of the linear matrix inequality in the SDP relaxation of order ωmax for the original POP can be bounded by m X j=1 d(nj , ωmax − ωj ) + p X i=1 d(ni , ωmax ) ≤ m + n nmax + ωmax ωmax , (2.73) and the length of the moment vector by p X i=1 d(2ni , 2ωmax ) ≤ n 2nmax + 2ωmax 2nmax . (2.74) Already for ωmax = 2 the bounds (2.73) and (2.74) are of second and fourth degree in nmax , whereas (2.71) and (2.72) are linear and quadratic in nmax + k0 , respectively. Therefore we can expect a substantial reduction of the SDP relaxation under the transformation procedure. Note, we did not exploit any sparsity in the SDP relaxation or any intersection of the maximal cliques (C1 , . . . , Cp ) and (C̃1 , . . . , C̃p ) when deriving these bounds. Thus, the actual size of SDP relaxations in numerical experiments may be far smaller than the one suggested by these bounds. 2.3.2 Quality of SDP relaxations for QOP A polynomial optimization problem (POP) and the quadratic optimization problem (QOP) derived from it under one of the transformation strategies AI, AII, BI or BII are equivalent. However, the same statement does not hold for the SDP relaxations of both problems. In fact, the SDP relaxation for QOP is weaker than the SDP relaxation for the original POP. Before stating this negative result, we consider an example to illustrate it. Example 2.7 Let a POP and its equivalent QOP be given by POP min s.t. ⇔ QOP min x̃23 x21 x22 s.t. x̃1 x̃3 ≥ 0 x21 x2 ≥ 0 x̃1 x̃2 = x̃3 . 58 CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION The dense SDP relaxations of minimal relaxation order dSDP2 (POP) and dSDP1 (QOP) are given by min y22 s.t. y21 ≥ 0 M2 (y) = y00 y10 y01 y20 y11 y02 y10 y20 y11 y30 y21 y12 y01 y11 y02 y21 y12 y03 y20 y30 y21 y40 y31 y22 y11 y21 y13 y31 y22 y13 y02 y12 y03 y22 y13 y04 min ỹ002 s.t. ỹ101 ≥ 0 ỹ110 = ỹ001 <0 ỹ000 ỹ100 M1 (ỹ) = ỹ010 ỹ001 ỹ100 ỹ200 ỹ110 ỹ101 ỹ010 ỹ110 ỹ020 ỹ011 ỹ001 ỹ101 < 0. ỹ011 ỹ002 The equivalence of POP and QOP holds with the relation = (x̃1 , x̃2 , x̃3 , x̃21 , x̃1 x̃2 , x̃1 x̃3 , x̃22 , x̃2 x̃3 , x̃23 ) (x1 , x2 , x1 x2 , x21 , x1 x2 , x21 x2 , x22 , x1 x22 , x21 x22 ). (2.75) Given a feasible solution y ∈ Rd(2,4) = R15 for dSDP2 (POP), we exploit the relations (2.75) to define a vector ỹ = (ỹ000 , ỹ100 , ỹ010 , ỹ001 , ỹ200 , . . . , ỹ002 ) ∈ Rd(3,1) = R10 as ỹ := (y00 , y10 , y01 , y11 , y20 , y11 , y21 , y02 , y12 , y22 ). Then ỹ110 = y11 = ỹ001 holds by definition of ỹ, and ỹ101 dSDP2 (P OP ). Furthermore, for the moment matrix, we have y00 y10 y01 y10 y20 y11 M1 (ỹ) = y01 y11 y02 y11 y21 y12 = y21 ≥ 0 as y is a feasible solution of y11 y21 . y12 y22 Thus M1 (ỹ) < 0, as M1 (ỹ) is a principal submatrix of M2 (y) < 0. It follows that ỹ is feasible for dSDP1 (QOP) and that min(dSDP1 (QOP)) ≤ min(dSDP2 (POP)) holds. A generalization of the observation in Example 2.7 is given by the following proposition. Proposition 2.1 Let a POP of dimension n with ωmax > 1 of form (2.1) be given by the set of polynomials (p, g1 , . . . , gm ) and the corresponding QOP of dimension n + k derived via AI, AII, BI or BII by (p̃, g̃1 , . . . , g̃m ). Then, for each feasible solution of dSDPωmax (POP), there exists a feasible solution of dSDP1 (QOP) with the same objective value. Thus, min(dSDP1 (QOP))≤ min(dSDPωmax (POP)). Proof: Let y ∈ Rd(n,2ωmax) be a feasible solution of SDPωmax (POP). Each yα corresponds to a monomial xα for all α with | α |≤ 2ωmax , x ∈ Rn . Moreover, with respect to the substitution relation for all monomials x̃α ∈ Rn+k with | α |≤ 2 there exists a monomial xβ(α) ∈ Rn , such that x̃α = xβ(α) , | β(α) |≤ 2ωmax . (2.76) As β(·) in (2.76) is constructed via the substitution relations, β(α1 ) = β(α2 ) (2.77) holds for α1 , α2 ∈ Nn+k with | α1 |=| α2 |≤ 2, whenever QOP has a substitution constraint x̃α1 = x̃α2 . Now, define ỹ ∈ Rd(n+k,2) where ỹα := yβ(α) for all | α |≤ 2. Then, ỹ is feasible for dSDP1 (QOP), as all equality constraints derived from substitutions are satisfied due to (2.77), and as the principal submatrices of moment matrix M1 (ỹ) and of the localizing matrices M0 (ỹg̃k ) (k = 1, . . . , m), which are obtained by simultaneously deleting rows/columns linear dependent on the remaining rows/columns, are principal 2.3. REDUCTION TECHNIQUES FOR SDP RELAXATIONS FOR LARGE SCALE POP 59 submatrices of Mωmax (y) and Mωmax −ωk (y gk ) (k=1,. . . ,m), respectively. Finally, the objective values for y and ỹ coincide. This result for the dense SDP relaxation can be extended to the sparse SDP relaxation of minimal relaxation order in an analog manner, if the maximal cliques (C̃1 , . . . , C̃m ) of the chordal extended csp graph of the QOP are chosen appropriately with respect to the maximal cliques of the chordal extended csp graph of the POP. Therefore it seems reasonable to expect that in general sSDP1 for the QOP provides an approximation to the global minimum of the POP which is far weaker than the one provided by sSDPωmax for the original POP. One possibility to strengthen the SDP relaxation for QOPs is to increase the relaxation order to some ω > 1. But, as in the case of the SDP relaxation for POP we can not guarantee to find the global minimum of a general QOP for any fixed ω ∈ N. Moreover each of the additional equality constraints results in 21 (d(n, ω − 1) + 1)d(n, ω − 1) equality constraints in sSDPω (QOP) for ω > 1. Therefore it seems more promising to consider additional techniques to improve the quality of the sSDP1 for QOPs. Local optimization methods As pointed out before, the minimum of the sparse SDP relaxation converges to the minimum of the QOP for ω → ∞. Moreover, an accurate approximation can be obtained by the sparse SDP relaxation of order ω ∈ {ωmax , . . . , ωmax + 3} for many POPs [102]. However, the quality of the sparse SDP relaxation for the QOP is weaker than the one for the original POP. Therefore, the solution provided by the sparse SDP relaxation for the QOP can be understood as a first approximation to the global minimizer of the original POP, and it may serve as initial point for a locally convergent optimization technique applied to the original POP. For instance sequential quadratic programming (SQP) [8] can be applied to POP where the sparse SDP solution for the corresponding QOP is taken as starting point. In the case a POP has equality constraints only, the number of constraints coincides with the number of variables and the feasible set is finite, we may succeed in finding the global optimizer of the POP by applying Newton’s method for nonlinear systems [77] to the polynomial system given by the feasible set of the POP, again starting from the solution provided by the sparse SDP relaxation for the QOP. Higher accuracy via Branch-and-Cut bounds The sparse SDP relaxations (2.18) incorporate lower and upper bounds for each component of the ndimensional variable, lbdi ≤ xi ≤ ubdi , ∀i ∈ {1, . . . , n}, (2.78) in order to establish the compactness of the feasible set of a POP. Compactness is a necessary condition to guarantee the convergence of the sequence of sparse SDP relaxations towards the global optimum of the POP. Moreover, the numerical performance for solving the sparse SDP relaxations depends heavily on the bounds (2.78). The tighter we choose these bounds, the better approximates the solution of the SDP the minimizer of the POP. Prior to solving the sparse SDP relaxations for the QOP derived from a POP, we fix the bounds (2.78) for the components of the POP and determine lower and upper bounds for the additional variables according to the substitution relation. For instance for xn+1 = x2i the bounds are defined as ( 0, if lbdi ≤ 0 ≤ ubdi lbdn+1 = , 2 2 (2.79) min(lbdi , ubdi ), else, 2 2 ubdn+1 = max(lbdi , ubdi ). In 2.3.3 we will discuss the sensitivity of the choice of the lower and upper bounds on the accuracy of the SDP solution for some example POPs. A more sophisticated technique to increase the quality of the SDP relaxation of the QOP is inspired by a Branch-and-Cut algorithm for bilinear matrix inequalities due to Fukuda and Kojima [22]. As nonconvex quadratic constraints can be reduced to bilinear ones, we are able to adapt this technique for a QOP derived from a higher degree POP. The technique is based on cutting the feasible region of the SDP, such that every feasible solution of the QOP remains feasible for the SDP. We distinguish two sets of constraints which 60 CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION resemble the convex relaxations (5) proposed in [22]. Let (p, g1 , . . . , gm ) be a QOP with lower and upper bounds lbdi and ubdi for all components xi (i = 1, . . . , n). The first set of constraints we consider is the following. For each constraint gi (i = 1, . . . , m) of form xk = xi xj with i 6= j we add the following constraints to the QOP xk ≤ ubdj xi + lbdi xj − lbdi ubdj (2.80) xk ≤ lbdj xi + ubdi xj − ubdi lbdj . For each constraint of the form xk = x2i we add the following constraint to the QOP xk ≤ (ubdi + lbdi ) xi − lbdi ubdi . (2.81) The second set of constraints shrinks the feasible set of the SDP relaxation even further than the constraints (2.80) and (2.81). For each monomial xi xj of degree 2 which occurs in the objective p or one of the constraints gi (i = 1, . . . , m) of the QOP, we add constraints as follows. If the QOP contains a constraint gi (i = 1, . . . , m) of the form xk = xi xj , we add the constraints (2.80) for i 6= j and (2.81) for i = j. If the QOP does not contain a constraint xk = xi xj , we add the quadratic constraints xi xj xi xj for i 6= j and the constraint x2i ≤ ubdj xi + lbdi xj − lbdi ubdj ≤ lbdj xi + ubdi xj − ubdi lbdj ≤ (ubdi + lbdi ) xi − lbdi ubdi (2.82) (2.83) for i = j. When linearized, both, the linear constraints (2.80) and (2.81) and the quadratic constraints (2.82) and (2.83) result in a smaller feasible region of the SDP relaxation which still contains the feasible region of the QOP. The efficiency of these sets of additional constraints is demonstrated in Section 2.3.3 as well. Remark 2.8 A general QOP given by (p, g1 , . . . , gm ) of dimension n can be transformed into a QOP of dimension n+1 with linear objective function by adding the equality constraint h(x, xn+1 ) := xn+1 −p(x) = 0 and choosing xn+1 as objective. A QOP with linear objective is a special case of a quadratic SDP (2.64), which we can apply the SDP relaxation (a) - (d) from 2.2.7 to. Thus, an arbitrary POP can be attempted by three different SDP relaxations. 1) The sparse SDP relaxations (2.18) exploiting correlative sparsity applied directly to the POP, 2) the sparse SDP relaxations (2.18) applied to an equivalent QOP, and 3) the SDP relaxations (a)-(d) from 2.2.7 exploiting d- and/or - r-space sparsity applied to an equivalent QOP. Remark 2.9 The constraints (2.82) are a particular case of reformulation-linearization-techniques (RLT) [89, 90]. For the SDP relaxation (b) from 2.2.7, instead of the constraints (2.80) - (2.83) we impose RLT constraints in the following way: Add the constraints Wi,j Wi,j Wi,j Wi,j − lbdi W1,j − lbdj W1,i − ubdi W1,j − ubdj W1,i − lbdi W1,j − ubdj W1,i − lbdj W1,i − ubdi W1,j ≥ −lbdi lbdj , ≥ −ubdi ubdj , ≤ −lbdi ubdj , ≤ −lbdj ubdi , (2.84) to the SDP relaxation (b), if {i, j} subset of some clique of the chordal extension of the d-space sparsity pattern graph. The constraints (2.84) strengthen the SDP relaxation and preserve the d-space sparsity structure at the same time. In the latter, whenever we apply the SDP relaxations (b) from 2.2.7 to a QOP, we impose the constraints (2.84). 2.3.3 Numerical examples The substitution procedure and the sparse SDP relaxations are applied to a number of test problems. These test problems encompass medium and large scale POPs of higher degree. The numerical performance of the sparse SDP relaxations of these POPs under the transformation algorithm is evaluated. In the following the 2.3. REDUCTION TECHNIQUES FOR SDP RELAXATIONS FOR LARGE SCALE POP Substitution AI AII BI BII ñ 138 153 115 119 size(Aq ) [777, 6934] [828, 6922] [753, 5785] [788, 6497] 61 nnz(Aq ) 7106 7116 5934 6653 Table 2.7: Size of sSDP1 for QOPs from POP randEQ(7,3,5,8,0) with n = 7. Branch-and-Cut bounds (2.80) and (2.81) are denoted as linear BC-bounds, (2.82) and (2.83) as quadratic BC-bounds. The optional application of sequential quadratic programming starting from the solution of the SDP relaxation is abbreviated as SQP. Given a numerical solution x of an equality and inequality constrained POP, its scaled feasibility error is given by ǫsc = min {− | hi (x)/σi (x) |, min {gj (x)/σ̂j (x), 0} ∀ i, j} , where hi (i = 1, . . . , k) denote the equality constraints, gj (j = 1, . . . , l) the inequality constraints, and σi and σ̂j are the maxima of the monomials in the corresponding polynomials hi and gj at x, respectively. Note, an equality and inequality constrained POP is a special case of the POP (2.1), if we define fi := gi (i = 1, . . . , l), gi := hi (i = l + 1, . . . , l + k) and gi := −hi (i = k + l + 1, . . . , 2k + l). The value of the objective function at x is given by f0 (x). Let NC denote the number of constraints of a POP. ’OOM’ as entry for the scaled feasibility error denotes the size of the SDP is too large to be solved by SeDuMi [95] and results in a memory error (’Out of memory’). A two-component entry for lbd or ubd indicates that the first component is used as a bound for the first n2 variables and the second component for the remaining n2 variables of the POP. All numerical experiments are conducted on a LINUX OS with CPU 2.4 GHz and 8 Gb memory. The total processing time in seconds is denoted as tC . Randomly generated POPs As a first class of test problems, consider randomly generated POPs with inequality or equality constraints. We are interested in the numerical performance of the sparse SDP relaxation for the corresponding QOPs for different substitution strategies and different choices of lower, upper and Branch-and-Cut bounds. We will focus on comparing strategies BI and BII as they yield POPs with a small number of additional variables. For the random equality constrained POP randEQ(7,3,5,8,0) [102] of degree 8 with 7 variables, the size of the SDP relaxation sSDP4 is described by the matrix Ap of size [2870, 95628] with 124034 non-zero entries. This size is reduced substantially under each of the four substitution strategies, as can be seen in Table 2.7. In this table the matrix Aq in SeDuMi input format [95] and its number of nonzero entries nnz(Aq ) describe the size of the sparse SDP relaxation. The reduction of the size of the SDP relaxation results in reducing the total processing time tC by two magnitudes, as can be seen in Table 2.8. Moreover, as reported in Table 2.8, the performance of AI, AII, BI and BII is similar - with the one of BI being slightly better than the others. In this example with few equality constraints, it is easy to obtain a feasible solution, but it requires additional techniques as SQP to obtain an optimal solution. We know, min(sSDP1 (QOP )) and min(sSDP4 (P OP )) are lower bounds for min(POP). Moreover, min(sSDP1 (QOP )) ≤ min(sSDP4 (P OP )) holds with Proposition 2.1. As reported in Table 2.8 the bound provided by sSDP1 (QOP ) is much weaker than the one provided by sSDP4 (P OP ). Note, the objective value f0 (x) and min(sSDP1 (QOP )) improve significantly if the lower and upper bounds are chosen tighter. When chosen sufficiently tight, an accurate optimal solution can be achieved without applying SQP. The main advantage of the transformation is the reduction of the total processing time by two magnitudes. The results for the inequality constrained POP randINEQ(8,4,6,8,0) [102] with ωmax = 4 and 8 variables are given in Table 2.9. In the column for (lbd, ubd) the entries (−0.5, 0.5)⋆ and (−0.5, 0.5)⋆⋆ denote 62 CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION Substitution AI AII BI BII BI BI BI SQP no no no no no yes no no BC-bounds none none none none none none none none (lbd, ubd) (−∞, ∞) (-1,1) (-1,1) (-1,1) (-1,1) (-1,1) (-0.5,0.5) (-0.3,0.3) ω 4 1 1 1 1 1 1 1 n or ñ 7 138 153 115 119 115 115 115 NC 4 135 150 112 116 112 112 112 ǫsc 6e-11 1e-13 1e-13 1e-13 1e-13 7e-18 9e-14 1e-13 min(sSDPω ) -0.708 -247.50 -254.92 -299.11 -284.98 -299.11 -6.55 -1.28 f0 (x) -0.708 -0.508 -0.517 -0.567 -0.455 -0.708 -0.706 -0.708 tC 333 3 3 2 3 3 3 2 Table 2.8: Results for SDP relaxation of randEQ(7,3,5,8,0). ubd2 = 0.75 6= 0.5 and (ubd2 , ubd5 ) = (0.75, 0) 6= (0.5, 0.5), respectively. By imposing linear Branch-andCut bounds we obtain a feasible solution, and tightening lbd and ubd improves the objective value of the approximative solution. Though we did not achieve the optimal value attained by sSDP(POP), it seems reasonable to expect that successively tightening the bounds further yields a feasible solution with optimal objective value. As in the previous example tC could be reduced by two magnitudes. Substitution BI BI BI BI SQP no no no no no BC-bounds none none linear linear linear (lbd, ubd) (−∞, ∞) (−0.75, 0.75) (−0.75, 0.75) (−0.5, 0.5)⋆ (−0.5, 0.5)⋆⋆ ω 4 1 1 1 1 n or ñ 8 239 239 239 239 NC 3 234 680 680 680 ǫsc 0 -1.3 0 0 0 f0 (x) -1.5 -0.9 -0.6 -0.8 -1.2 tC 1071 14 17 17 16 Table 2.9: Results for SDP relaxation of randINEQ(8,4,6,8,0). BroydenBand Another test problem is the BroydenBand(n) problem [66]. It is an unconstrained POP of degree 6 and dimension n, and its global minimum is 0. Numerical results are given in Table 2.10. The performance of the sparse SDP relaxation for the QOP with initial bounds and without applying SQP is poor, the optimal value of the approximate solution and the lower bounds min(sSDP1 (QOP )) are far from the global optimum. Also, SQP does not succeed in detecting the global optimum if started from an arbitrary starting point. As reported in Table 2.10, SQP detects a local minimizer with objective 3, if the initial point is an SDP solution with loose bounds for the QOP. It is interesting to observe that tight lower and upper bounds, and Branch-and-Cut bounds in combination with applying SQP are crucial to obtain the global minimum by solving the sparse SDP relaxation for the QOP. In fact, when we apply substitution strategy BI imposing quadratic Branch-and-Cut bounds yields the global minimum, whereas in the case of applying BII Branch-and-Cut bounds are not necessary to obtain the global minimum. Note, the total processing time is reduced from around 1300 seconds to less than 5 seconds. POPs derived from partial differential equations An important class of large scale polynomial optimization problems of higher degree is derived from discretizing systems of partial differential equations (PDE). How to derive POPs from PDEs and how to interpret their solutions is the topic of Chapter 3 and discussed in detail there. In this section we show the 63 2.3. REDUCTION TECHNIQUES FOR SDP RELAXATIONS FOR LARGE SCALE POP Substitution BII BII BII BI BI BI BI BI BI BII BII BII BII BII BII SQP no yes yes yes no no no yes yes yes no no no yes yes yes BC-bounds none none linear quadratic none linear quadratic none linear quadratic none linear quadratic none linear quadratic (lbd, ubd) (−∞, +∞) (-1, 1) (-1, 1) (-1, 1) (-0.75,0) (-0.75,0) (-0.75,0) (-0.75, 0) (-0.75, 0) (-0.75, 0) (-0.75, 0) (-0.75, 0) (-0.75, 0) (-0.75, 0) (-0.75, 0) (-0.75, 0) ω 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 n or ñ 20 60 60 60 80 80 80 80 80 80 60 60 60 60 60 60 NC 0 40 100 1244 60 60 60 60 140 1284 40 100 1244 40 100 1244 min(sSDPω ) -3e-7 -128 -128 -106 -611 -611 -132 -1396 -611 -611 -26 -24 -8 -26 -24 -8 f0 (x) 5e-9 3 3 3 47 47 28 3 3 6e-8 33 24 9 1e-10 1e-6 2e-7 tC 1328 4 4 4 3 4 4 4 5 5 3 3 4 5 4 5 Table 2.10: Results for SDP relaxation for BroydenBand(20). transformation procedure from POP to QOP to be a very efficient technique for this class of POPs. Many POPs derived from PDE are of degree 3, but as the number of their constraints is in the same order as the number of variables, transformation into QOPs yields SDP relaxations of vastly reduced size. Due to the structure of the higher degree monomial set of these POPs, there is an unique way to transform them into QOPs. Therefore, we examine the impact of lower, upper and Branch-and-Cut bounds and not the choice of the substitution strategy. POP pdeBifurcation(6) pdeBifurcation(10) pdeBifurcation(14) Mimura(50) Mimura(50) Mimura(100) Mimura(100) StiffDiff(6,12) ginzOrDiri(9) ginzOrNeum(11) n 36 100 196 100 100 200 200 144 162 242 ñ 72 200 392 150 150 300 300 216 324 484 ωp 2 2 2 2 3 3 2 2 2 2 size(Ap ) [2186, 17605] [16592, 139245] [454497, 3822961] [3780, 31258] [19300, 280007] [39100, 565357] [7630, 63158] [18569, 163162] [74628, 666987] [166092, 1451752] nnz(Ap ) 23801 189737 5208475 39068 354067 713767 78818 219020 906558 2504418 ωq 1 1 1 1 2 2 2 1 1 1 size(Aq ) [422, 4039] [1643, 18646] [4126, 45189] [690, 5728] [7223, 76383] [14623, 155183] [1390, 11628] [878, 6700] [4567, 49305] [8063, 96367] nnz(Aq ) 4174 19039 46000 6078 91755 186155 12328 7402 50233 97776 Table 2.11: Size of the SDP relaxation for POP and QOP, respectively. Consider the POPs in Table 2.11, where ωp and ωq the relaxation order of sSDPω for POP and QOP, respectively, to demonstrate the reduction of the size of the SDP relaxation described by the size of the matrix A in SeDuMi input format [95] and its number of nonzero entries nnz(A). Thus, the SDP relaxations for the QOPs can be solved in vastly shorter time than the one for the original POPs. The computational results of the original SDP relaxation and the SDP relaxation of the QOPs for different lower, upper and Branch-and-Cut bounds are reported in Table 2.12 for the POP pdeBifurcation(·). In this example the accuracy of the sparse SDP relaxation for the QOP is improved by tightening the upper bounds for the 64 CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION components of the variable x̃ in the QOP. Also, the additional application of SQP improves the accuracy a lot. Additional Branch-and-Cut bounds seem to have no impact on the quality of the solution. The total processing time is reduced substantially under the transformation. The original sparse SDP relaxation for pdeBifurcation(14) of dimension 200 cannot be solved in SeDuMi due to a memory error, but the SDP relaxation for the corresponding QOP with tight upper bounds can be solved accurately in 100 seconds. POP pdeBifurcation(6) pdeBifurcation(6) pdeBifurcation(6) pdeBifurcation(6) pdeBifurcation(6) pdeBifurcation(6) pdeBifurcation(6) pdeBifurcation(10) pdeBifurcation(10) pdeBifurcation(10) pdeBifurcation(10) pdeBifurcation(10) pdeBifurcation(14) pdeBifurcation(14) pdeBifurcation(14) pdeBifurcation(14) pdeBifurcation(14) Substitution AI AI AI AI AI AI AI AI AI AI AI AI AI AI SQP no no no no yes no yes no no yes no yes no no yes no yes BC-bounds none none linear quadratic none none none none none none none none none none none none none ubd 0.99 0.99 0.99 0.99 0.99 0.45 0.45 0.99 0.99 0.99 0.45 0.45 0.99 0.99 0.99 0.45 0.45 ω 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 n or ñ 36 72 72 72 72 72 72 100 200 200 200 200 196 392 392 392 392 ǫsc 8e-11 9.6e-2 9.6e-2 9.6e-2 7.3e-9 1.5e-2 1.4e-11 3.1e-10 4.7e-2 2.7e-13 6.4e-3 1e-11 OOM 2.4e-2 7.9e-14 3.6e-3 5.2e-11 f0 (x) -9.0 -22.1 -22.1 -22.1 -9.0 -9.5 -9.0 -21.6 -56.0 -21.6 -23.2 -21.6 -103.1 -39.9 -43.1 -39.9 tC 14 2 2 2 5 1 2 2159 20 66 13 22 90 418 85 107 Table 2.12: Results for SDP relaxation for POP pdeBifurcation with lbd=0. In the case of POP Mimura(50), c.f. Table 2.13, quadratic Branch-and-Cut bounds are necessary in addition to applying SQP, in order to obtain an accurate approximate solution of the global minimizer. In the POPs in Table 2.14 it is sufficient to apply SQP starting from the solution of the sparse SDP relaxation for the QOP. For these problems tC can be reduced by up to two magnitudes. Furthermore, the original SDP relaxation for ginzOrDiri(9) and ginzOrDiri(13) is too large to be solved, whereas the SDP relaxations for the QOPs are tractable. POP Mimura(50) Mimura(50) Mimura(50) Mimura(50) Mimura(50) Mimura(50) Mimura(100) Mimura(100) Substitution AI AI AI AI - SQP no yes no yes no yes no yes BC-bounds none none none none quadratic quadratic none none ubd [11, 14] [11, 14] [11, 14] [11, 14] [11, 14] [11, 14] [11, 14] [11, 14] ω 2 2 1 1 1 1 3 3 n or ñ 100 100 150 150 150 150 200 200 ǫsc 1.8e-1 4.1e-9 6.1e-1 5.1e-3 3.3e-1 1.0e-13 4.5e-2 2.0e-11 f0 (x) -899 -701 -1067 -731 -1017 -719 -733 -712 tC 20 31 2 163 2 16 532 557 Table 2.13: Results for SDP relaxation for POP Mimura with lbd = [0, 0]. The POP ginzOrNeum(·) in Table 2.15 is another example where the global optimizer can be found in a processing time reduced by a factor 100, if the lower bounds lbd and upper bounds ubd are chosen sufficiently tight and SQP is applied. In Table 2.13 and Table 2.15 the first components of lbd and ubd correspond to the lower and upper bounds for (x1 , . . . , x n2 ), respectively, whereas the second components correspond to the lower and upper bounds for (x n2 +1 , . . . , xn ). 65 2.3. REDUCTION TECHNIQUES FOR SDP RELAXATIONS FOR LARGE SCALE POP POP ginzOrDiri(5) ginzOrDiri(5) ginzOrDiri(5) ginzOrDiri(5) ginzOrDiri(9) ginzOrDiri(9) ginzOrDiri(9) ginzOrDiri(13) ginzOrDiri(13) StiffDiff(4,8) StiffDiff(4,8) StiffDiff(6,12) StiffDiff(6,12) Substitution AI AI AI AI AI AI AI SQP no yes no yes no no yes no yes yes yes yes yes ubd 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 5 5 5 5 ω 2 2 1 1 2 1 1 2 1 2 2 1 1 n or ñ 50 50 100 100 162 324 324 338 676 64 96 144 216 ǫsc 6e-6 4e-15 3e-1 4e-11 OOM 1e-1 6e-12 OOM 7e-9 2e-11 7e-10 4e-9 8e-10 f0 (x) -25 -25 -100 -22 -324 -72 -158 -32 -32 -71 -71 tC 598 598 7 10 144 185 1992 54 4 1008 48 Table 2.14: Results for SDP relaxation for POP ginzOrDiri with lbd =0 and StiffDiff with lbd=0. POP ginzOrNeum(5) ginzOrNeum(5) ginzOrNeum(5) ginzOrNeum(5) ginzOrNeum(5) ginzOrNeum(5) ginzOrNeum(5) ginzOrNeum(5) ginzOrNeum(11) ginzOrNeum(11) ginzOrNeum(11) Substitution AI AI AI AI AI AI SQP no yes no yes no yes no yes no no yes lbd [0, 0] [0, 0] [0, 0] [0, 0] [1, 0.5] [1, 0.5] [1, 0.5] [1, 0.5] [1, 0.5] [1, 0.5] [1, 0.5] ubd [4, 2] [4, 2] [4, 2] [4, 2] [4, 1.5] [4, 1.5] [4, 1.5] [4, 1.5] [4, 1.5] [4, 1.5] [4, 1.5] ω 2 2 1 1 2 2 1 1 2 1 1 n 50 50 100 100 50 50 100 100 242 484 484 ǫsc 2.6 2e-13 24 8e-10 1e-1 2e-13 6e-2 4e-10 OOM 4e-2 5e-11 Table 2.15: Results for SDP relaxation for POP ginzOrNeum. f0 (x) -47 -45 -100 -45 -45 -45 -57 -45 tC 448 449 9 10 582 583 6 7 -263 -207 740 748 66 CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION Chapter 3 SDP Relaxations for Solving Differential Equations 3.1 Numerical analysis of differential equations Differential equations arise in models of many problems in engineerings, physics, chemistry, biology or economics. Only the simplest differential equations allow to find solutions given by explicit formulas. In most problems involving differential equations self-contained formulas for the solutions are not available. Therefore, one is interested in finding approximations to their solutions by applying numerical methods. In general we distinguish ordinary differential equations (ODE), which are differential equations where the unknown function is a function of a single variable, and partial differential equations (PDE), which are differential equations where the unknown function is a function of multiple independent variables and the equation involves its partial derivatives. Moreover, we distinguish linear and nonlinear differential equations. A differential equations is linear if the unknown function and its derivatives appear to the power one and nonlinear otherwise. The beginning of numerical analysis of ODE dates back to 1850 when Adams formulas were proposed, which are based on polynomial interpolation in equally spaced points. The idea is, given some initial value problem with ODE u′ = f (t, u) for t > t0 and u(t0 ) = u0 , we choose a time step ∆t > 0 and consider a finite set of time values tn = t0 + n∆t, n ≥ 0. We then replace the ODE by an algebraic expression that enables us to calculate a succession of approximate values vn ≈ u(tn ), n ≥ 0, where the simplest such approximate formula dates back to Euler: vn+1 = vn + ∆t f (tn , vn ) = vn + ∆tfn , fn := f (tn , vn ). The Adams formulas are higher order generalizations of Euler’s formula that are far more efficient at generating accurate approximate solutions. For instance, the fourth-order Adams-Bashworth formula is vn+1 = vn + 1 ∆t (55fn − 59fn−1 + 37fn−2 − 9fn−3 ) . 24 (3.1) The formula (3.1) is fourth order in the sense that it will normally converge at the rate O (∆t)4 . The second important class of ODE algorithms are the Runge-Kutta methods, which were developed at the beginning of the twentieth century [50, 86]. The most commonly used member of the family of RungeKutta methods is the fourth-order Runge-Kutta method, which advances a numerical solution from time 67 68 CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS step tn to tn+1 with the aid of four evaluations of the function f : a b c d vn+1 = ∆t f (tn , vn ), = ∆t f (tn + 21 ∆t, vn + 21 a), = ∆t f (tn + 12 ∆t, vn + 21 b), = ∆t f (tn + ∆t, vn + c), = vn + 16 (a + 2b + 2c + d). (3.2) Another seminal step in the numerical analysis of ODE is the concept of stability due to Dahlquist [17]. He introduced what might be called the fundamental theorem of numerical analysis: consistency + stability = convergence. This theory is based on precise definitions of these three notions. Consistency is the property that the discrete formula has locally positive order of accuracy and thus models the right ODE. Stability is the property that discretization errors introduced at one time step cannot grow unboundedly at later time steps. Convergence is the property that the numerical solution converges to the correct result as ∆t → 0. When it comes to the numerical analysis of PDEs, we distinguish three main classes of methods: Finite Difference Methods (FDM), Finite Element Methods (FEM) and Finite Volume Methods (FVM). The origin of the Finite Difference Method dates back to the paper [13] of Courant, Friedrichs and Lewy. A finite difference scheme is applied to formulate PDE problems as polynomial optimization problems in 3.2. We give an introduction to the FDM in 3.1.1. The FEM dates back to the 1960s and will be briefly introduced in 3.1.2. As in the case of ODEs, stability is a crucial issue in the numerical analysis of PDEs. The group around von Neumann and Lax discovered that some finite difference methods for PDEs were subject to catastrophic instabilities. The fundamental result linking convergence and stability of a finite difference scheme is the Lax equivalence theorem [57]. Numerical methods for finding approximate solutions to PDEs have successfully offered insights for important and difficult examples of PDE problems: The Schrödinger equation in chemistry, elasticity equations in structural mechanics, the Navier-Stokes equations in fluid dynamics, Maxwell’s equations in telecommunications, Einstein’s equations in cosmology, nonlinear wave equations in optical fibers, Black-Scholes equations in option pricing, reaction-diffusion equations in biological systems. Because such a variety of nonlinear PDE problems arises in many disciplines in science and engineering, which requires to solve largescale nonlinear algebraic systems, the numerical analysis of partial differential equations remains a very challenging field. 3.1.1 The finite difference method All discretization based methods for solving differential equations aim at algebraizing the differential equations. In the finite difference method (FDM) [19, 65, 96] the most important step to algebraize the equation is achieved via replacing differentials by finite differences. In a first step the domain of the differential equations needs to be discretized. Note, the FDM requires the geometry of the domain to be simple. We will restrict ourselves to intervals [xmin , xmax ] in the one-dimensional case, and to rectangular domains [xmin , xmax ] × [ymin , ymax ] in the two-dimensional case. We choose a discretization Nx or (Nx , Ny ), respec−ymin −xmin , hy = ymax tively and define hx = xmax Nx −1 Ny −1 , xi := xmin + (i − 1) hx , yj := ymin + (j − 1) hy , ui := u(xi ), ui,j := u(xi , yj ), (i = 1, . . . , Nx ; j = 1, . . . , Ny ), (i = 1, . . . , Nx ; j = 1, . . . , Ny ), (3.3) where u denotes the unknown function in a differential equation. There are three choices to approximate the first derivate ux at some point xi : u −u i i−1 (forward difference) hx , ui+1 −ui (3.4) ux (xi ) ≈ , (backward difference) h ui+1x−ui−1 , (central difference) 2hx 69 3.1. NUMERICAL ANALYSIS OF DIFFERENTIAL EQUATIONS An approximation to uxx is derived by successively forming (ux )x : uxx (xi ) ≈ ui+1 − 2ui + ui−1 . h2x (3.5) Choosing these approximations the question arises what errors are inherent in substituting differentials by finite differences. The following proposition provides a simple accuracy analysis. Proposition 3.1 Let u be a three times continuously differentiable function on Ω = [xmin , xmax ], with 0 ∈ Ω. It holds, a) | ui+1 −ui−1 2hx b) | ui+1 −ui hx − ux (xi ) |≤ 12 hx maxx∈Ω | uxx (x) |, c) | ui −ui−1 hx − ux (xi ) |≤ 12 hx maxx∈Ω | uxx (x) |, d) | ui+1 −2ui +ui−1 h2x − ux (xi ) |≤ 16 h2x maxx∈Ω | uxxx (x) |, − uxx (xi ) |≤ 1 2 12 hx maxx∈Ω | yxxxx (x) |. Proof: Without loss of generality set xi−1 := −h, xi := 0, xi+1 := h. Using Taylor’s theorem we expand ui−1 and ui+1 around 0 and obtain ui−1 ui+1 = ui − hx ux (xi ) + = ui + hx ux (xi ) + 1 2 2! hx uxx (xi ) 1 2 2! hx uxx (xi ) − + 1 3 3! hx uxxx (ξ1 ), 1 3 3! hx uxxx (ξ2 ), for some ξ1 ∈ [xi−1 , xi ], for some ξ2 ∈ [xi , xi+1 ]. Subtracting these two equations yields 1 1 2 (ui+1 − ui−1 ) = ux (xi ) + h [uxxx(ξ1 ) + uxxx(ξ2 )], 2hx 2 · 3! x and implies a). Now, expand ui+1 around 0 with second order remainder: ui+1 = ui + hx ux (xi ) + 1 2 h uxx (ξ1 ), for some ξ1 ∈ [xi , xi+1 ] 2! x This equation implies b): | 1 1 (ui+1 − ui ) − ux (xi ) |≤ hx max | uxx (x) | . x∈Ω hx 2 (3.6) c) is shown analogously to b). As for d), expand ui+1 and ui−1 around xi = 0 with fourth order remainder: ui−1 ui+1 = ui − hx ux (xi ) + = ui + hx ux (xi ) + 1 2 2! hx uxx (xi ) 1 2 2! hx uxx (xi ) − + 1 3 3! hx uxxx (xi ) 1 3 3! hx uxxx (xi ) + + 1 4 4! hx uxxxx (ξ1 ), 1 4 4! hx uxxxx (ξ2 ), for some ξ1 ∈ [xi−1 , xi ], for some ξ2 ∈ [xi , xi+1 ]. Addition of these two equations yields 1 ui+1 − 2ui + ui−1 − uxx (xi ) = h2x (uxxxx (ξ1 ) + uxxxx(ξ2 )) , h2x 4! which implies d). Proposition 3.1 implies, if uxxx(x) is bounded between xi−1 and xi+1 , the error replacing ux (xi ) by the central difference scheme is of order h2x , whereas the error of using a forward or a backward difference scheme is only O(hx ). In addition to discretizing the domain of a differential equation and approximating its differentials by finite difference schemes, one needs to take into account conditions for the unknown function on the boundary 70 CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS ∂Ω of the domain. The most common types of boundary conditions are Dirichlet and Neumann conditions. Given a boundary point x ∈ ∂Ω, its boundary condition is called Dirichlet, if u(x) is fixed on ∂Ω, and it orthogonal to ∂Ω is fixed on ∂Ω. We consider periodic is called Neumann if the partial derivative ∂u(x) ∂n boundary conditions as a third type, which is given, if values of u at the lower and upper end of its domain are identified. For example, in the one-dimensional case with Ω = [xmin , xmax ], periodic boundary conditions are given by u(xmin ) = u(xmax ). A differential equation equipped with boundary conditions on the entire ∂Ω is called a boundary value problem. In the one-dimensional case we have Nx and in the two-dimensional case we have Nx Ny unknown variables ui and ui,j , respectively. Replacing the differential equation at each interior grid point by its finite difference discretization generates Nx − 2 and (Nx − 2)(Ny − 2) equations, respectively. By exploiting relations between the boundary and the interior variables given by Dirichlet, Neumann or periodic boundary conditions and substituting them into the equations, we can reduce the number of variables to Nx − 2 and (Nx − 2)(Ny − 2), respectively. Thus, the number of variables coincides with the number of equations. In the case of a linear differential equation, finding the Nx Ny variables ui,j is therefore equivalent to solving a system of linear equations. In the case of a nonlinear differential equation, the far more challenging problem of a system of nonlinear algebraic equations needs to be solved. As mentioned in the introduction of this section, in order to establish the convergence ui,j → u(xi , yj ) (1 ≤ i ≤ Nx , 1 ≤ j ≤ Ny ) for (Nx , Ny ) → ∞, the notions of consistency and stability are crucial. Let D(u(x, y)) = f (x, y) ∀ (x, y) ∈ Ω, B(u(x, y)) = g(x, y) ∀ (x, y) ∈ ∂Ω, (3.7) a differential equation, where D(·) and B(·) are differential operators and f, g functions on Ω. Applying a finite difference discretization to (3.7) yields the system of equations Di,j ((uk,l )k,l ) = fi,j Bi,j ((uk,l )k,l ) = gi,j ∀ (i, j) ∈ {2, . . . , Nx − 1} × {2, . . . , Ny − 1}, ∀ (i, j) ∈ {1, Nx } × {1, . . . , Ny } ∪ {1, . . . , Nx } × {1, Ny }, (3.8) where fi,j := f (xi , yj ), gi,j := g(xi , yj ), Di,j and Bi,j finite difference approximations for the operators D and B, respectively. Definition 3.1 For the finite difference discretization (3.8) of the PDE problem (3.7) we define the local truncation error ri,j as ri,j := Di,j ((u(xk , yl ))k,l ) − fi,j , where (u(xk , yl ))k,l the vector of values of the exact solution u of (3.7) at the grid points (xk , yl ), which is approximated by the solution (uk,l )k,l of (3.8). The finite difference equation (3.8) is consistent with the original equation (3.7), if ri,j → 0 as (hx , hy ) → (0, 0). Consistency is a prerequisite for ui,j to converge to u(xi , yj ) as (hx , hy ) → 0, but it is not sufficient. We need to introduce the notion of stability of a difference scheme for that purpose: Definition 3.2 A finite difference scheme Di,j ((uk,l )k,l ) = fi,j for a first order PDE is stable if there is a J ∈ N and positive numbers hx0 and hy0 such that there exists a constant C for which hx Ny X l=1 2 | uk,l | ≤ Chx Ny J X X j=1 l=1 | uj,l |2 for k ∈ {1, . . . , Nx }, 0 < hx ≤ hx0 , and 0 < hy ≤ hy0 . A finite difference scheme Di,j ((uk,l )k,l ) = fi,j for a PDE which is second order in x is stable if there is a J ∈ N and positive numbers hx0 and hy0 such that there exists a constant C for which hx Ny X l=1 | uk,l |2 ≤ (1 + k 2 ) Chx for k ∈ {1, . . . , Nx }, 0 < hx ≤ hx0 , and 0 < hy ≤ hy0 . Ny J X X j=1 l=1 | uj,l |2 3.1. NUMERICAL ANALYSIS OF DIFFERENTIAL EQUATIONS 71 For characterizing stability the following notion is useful. Definition 3.3 An explicit finite difference scheme is any scheme that can be written as uk+1,l = a finite sum of ur,s with r ≤ k. A nonexplicit scheme is called implicit. In general, it is stability of a finite difference scheme which requires some effort to show, whereas consistency is straightforward. The oldest and most famous criterion for stability is the Courant-Friedrichs-Lewy condition: Theorem 3.1 For an explicit scheme for the hyperbolic PDE defined by ux + a uy = 0 (3.9) of the form uk+1,l = αuk,l−1 +βuk,l +γuk,l+1 , a necessary condition for stability is the Courant-FriedrichsLewy (CFL) condition, hx |≤ 1. |a hy Proof: [13] Moreover, Courant, Friedrichs and Lewy derived the general result: Theorem 3.2 There are no explicit, unconditionally stable, consitent finite difference schemes for hyperbolic systems (3.9) of partial differential equations. There is no general negative result like Theorem 3.2 for implicit schemes. Thus, from a stability point of view, it is advisable to choose central or backward difference approximations for the first order derivates. The result which links consistency, stability and convergence is the Lax-Richtmyer Equivalence Theorem: Theorem 3.3 A consistent finite difference scheme for a linear partial differential equation of first or second order for which the initial value problem is well-posed is convergent if and only if it is stable. Proof: [96] Thus, convergence of a finite difference scheme is usually proven by showing stability. There is no general convergence result for finite difference schemes of nonlinear partial differential equations. However, for certain classes of PDEs convergence of the FDM has been shown: Theorem 3.4 Let a parabolic PDE problem be given by a(x, y) uxx + d(x, y) ux − uy + f (y, u) = 0 ux (0, y) = ux (1, y) =0 u(x, 0) = u0 (x) ∀ (x, y) ∈ (0, 1) × (0, T ), ∀ y ∈ (0, T ), ∀ x ∈ (0, 1). If a, ax , ay , b, bx and by continuous in [0, 1] × [0, T ], there exists an a0 s.t. a ≥ a0 in [0, 1] × [0, T ], f , fu and fuu continuous in [0, T ] × R, there exists M0 ∈ R s.t. ∂f /∂u ≤ M0 in [0, T ] × R, some technical conditions hold and the solution u of the PDE problem is smooth, then (ui,j (Nx , Ny ))i,j converges uniformly to u in [0, 1] × [0, T ] as (Nx , Ny ) → ∞. Proof: Theorem 2.1 in [97]. Note, Theorem 3.4 can be extended to the case of arbitrary rectangular domains and arbitrary Dirichlet and Neumann conditions at xmin and xmax . To proof convergence of classes of elliptic or hyperbolic PDE problems is far more difficult and no result for a broad class of problems like Theorem 3.4 has been found in those cases. Some simple accuracy analysis of finite difference approximations was provided in Proposition 3.1, where we have seen that the accuracy of central differences is better than the one of forward and backward 72 CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS differences. On the other hand, it is a well known phenomenon that central difference approximations for the first derivatives may cause oscillations in the numerical solution of a boundary value problem [65]. These oscillations do not occur under forward or backward difference schemes. However, we have also discussed that a forward finite difference scheme may not be stable, whereas implicit finite difference schemes are unconditionally stable for many classes of PDE problems. Thus, when choosing a difference scheme we have to consider accuracy, stability and how to avoid oscillations. It depends on the PDE problem which finite difference scheme is ”the best” one. Therefore, numerous difference schemes have been proposed, which may use different difference approximations on certain segments of the domain, may use different difference approximations in the different dimensions, may depend of the type of the PDE or may depend of the type of boundary conditions. Detailed analysis of these issues for a variety of finite difference schemes for different classes of PDE problems is discussed in detail in [96]. To summarize, when solving a linear or nonlinear ODE or PDE problem with the FDM, the three main tasks are 1) to choose a finite difference scheme whose solutions are accurate approximations for solutions of the PDE problem, 2) to show convergence of the difference scheme, and 3) to solve the (nonlinear) algebraic system of equations (3.8). To solve a system of nonlinear equations is a very hard problem in general. As mentioned in 2.1 solving a system of nonconvex polynomial equations is NP-hard. A standard method to solve the system of algebraic equations resulting from a finite difference approximation of a nonlinear PDE is to solve the system of equations corresponding to the linear part of the PDE, and to take the solution of the linear system as a starting point for gradient type methods, Newton’s method or other iterative methods applied to the nonlinear system or to a system where the nonlinear part is successively increased. The eventual success of such a continuation type method is base on the assumption that the solution of nonlinear system does not change much if the nonlinear part is increased by a small factor. In this thesis we attempt problem 3) by reformulating (3.8) as a polynomial optimization problem (POP) and solving this POP by sparse semidefinite programming relaxation techniques introduced in Chapter 2. One of the main advantages of this approach is the fact, that we do not require any initial guess and the nonlinearity of the scheme is taken into account directly. This is presented in detail in 3.2. 3.1.2 The finite element method and other numerical solvers The finite element method In addition to the finite difference method which our technique to be proposed in 3.2 is based on, we briefly introduce alternative methods for solving a PDE problem numerically. The most important one is the finite element method (FEM). The FEM is a very active field of research and there is exhaustive literature on it. For example, see [11, 59, 106] for detailed introductions and more advanced studies. The FEM is based on the idea to approximate a solution u of a PDE problem by a function ũ which is an element of a finitedimensional subspace of the function space u belongs to. I.e., the FEM can be understood as a method which discretizes the space we are searching for solutions of a PDE problem, whereas in the FDM the PDE is discretized. The origins of the FEM date back to works of Rayleigh [83], Ritz [84] and Galerkin [23] at the beginning of the 20th century. The FEM in its modern formulation is due to Courant [16] and Turner, Clough, Martin and Topp [99], among others. Typically, a FEM approach to solve a PDE consists of the following steps. First of all one is looking for a solution u for a PDE problem defined on a domain Ω in a certain function space. The most common function space to this end is the Sobolov space H0s (Ω) ⊂ Ls (Ω). Given that function space, one replaces the PDE problem by a weak, variational formulation where the test functions are elements of the same space H0s (Ω). In a second step known as meshing the domain Ω is partitioned into a finite number of subdomains of simple geometry, which are called elements. We denote such a partition by T . In the one-dimensional case intervals, in the two-dimensional case triangles, and in the three-dimensional case tetrahedra are a common choice for the elements. These elements define a mesh for Ω with nd nodes. Then, in a third step H0s (Ω) is approximated by the nd -dimensional subspace which is spanned by functions f1 , . . . , fnd . A common choice for this basis are for instance piece-wise linear functions fi which equal one at the node i and are 0 at all other nodes. The larger nd , i.e., the finer we choose the mesh, the better approximates span(f1 , . . . , fnd ) the space H0s (Ω). When replacingP H0s (Ω) by nd span(f1 , . . . , fnd ) in the weak formulation of the PDE problem and approximating u by ũ = i=1 di fi , 73 3.1. NUMERICAL ANALYSIS OF DIFFERENTIAL EQUATIONS ones obtains a finite number of equations in the unknowns d1 , . . . , dnd . Solving this system of equations yields the numerical approximation ũ for a solution of the original PDE problem. Finally, as for the Finite Difference Method, convergence of a finite element discretization needs to be shown, i.e., ones has to show that span(f1 , . . . , fnd ) converges to a subspace dense in H0s (Ω) if the number nd of nodes in the mesh and the corresponding number of basis functions goes to infinity. One of the biggest advantages of the FEM is its sound mathematical basis. As the PDE is formulated as a variational problem one has lots of powerful tools from functional analysis at hand to proof convergence of a finite element discretization. Let us demonstrate the outlined procedure on the following, simple ODE problem. Find u ∈ H02 (Ω) such that u′′ (x) u(0) = g(x) = u(1) = 0. on x ∈ Ω := [0, 1], (3.10) Its weak formulation is given by Z 0 1 u′′ (x) φ(x)dx = Z 0 1 g(x) φ(x)dx ∀ φ ∈ H02 (Ω), (3.11) a partition is given by T h := {xi := i h | i ∈ {0, . . . , h1 }} for any h > 0 with h1 ∈ N. With this partition the nodes of the mesh are given by x1 , . . . , x h1 −1 , i.e., nd = h1 − 1. We define the finite dimensional subspace Vh = span(f1 , . . . , fnd ) of H02 (Ω) via the basis Pnd functions fi : Ω → R with fi (xj ) := δi,j and fi linear on each interval (xj , xj+1 ). Then, let ũ := i=1 di fi satisfy the finite-dimensional relaxation of the weak formulation (3.11): R 1 Pnd R1 ( i=1 di f ′′ i (x)) φ(x) dx = 0 g(x)φ(x) dx 0 R R Pnd 1 ′′ 1 ⇔ fj (x) dx = 0 g(x)fj (x) dx i=1 di 0 f i (x) R 1 dj−1 −2dj +dj+1 ⇔ = 0 g(x)fj (x) dx h ∀ φ ∈ Vh ∀ j ∈ {1, . . . , nd } ∀ j ∈ {1, . . . , nd } (3.12) which is a system of linear equations in d1 , . . . , dnd . Solving it provides a numerical approximation ũ to a weak solution u of ODE (3.10). If we moreover assume g(x) = g constant on [0, 1], we obtain the system of equations dj−1 − 2dj + dj+1 = g ∀ j ∈ {1, . . . , nd }. (3.13) h2 With the definition of the basis functions fi it is clear, in this example holds di = f (xi ) for all i ∈ {1, . . . , nd }. Thus, (3.13) is actually identical to the system of equations we obtain when approximating (3.10) by a finite difference scheme. However, this connection with finite difference methods does not hold in general. The Finite Element Method provides the user with a great deal of freedom, such as how to choose the finite dimensional subspace to approximate H0s (Ω) or the mesh for Ω, and for most other finite element discretizations there is no equivalent finite difference scheme. When comparing the FEM to the FDM, we already mentioned its sound basis from functional analysis as one main advantage. Another one is, it is highly flexible for different domains of PDE problems: complicated geometries can be dealt with easily, whereas the FDM is restricted to relatively simple geometries. On the other hand, the FDM is far easier to implement than the FEM for many PDE problems arising in applications. However, in both methods one of the greatest challenges is to solve a system of nonlinear algebraic equations, in the case they are applied to a system of nonlinear differential equations. In general it is highly dependent on the PDE problem to be solved numerically which method, FDM or FEM, provides a better approximation to the continuous world when choosing a similar discretization. Other numerical solvers Beside the finite difference method and the finite element method another important class of methods to solve differential equations numerically is the finite volume method (FVM). Similar to the FDM, values are calculated at discrete points on a mesh. In the FVM these values are derived from calculating volume integrals over small volumes around the node points of the mesh. The FVM applies the divergence theorem 74 CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS to convert volume into surface integrals, and it exploits, that a flux entering a given, small volume is equal to that leaving the volume. Each integration over a volume results in an algebraic equation. Thus, as in the FDM and FEM a system of algebraic equations needs to be solved to obtain a numerical solution. The FVM is popular for solving hyperbolic PDE problem, in particular in computational fluid dynamics. For a detailed introduction, see [58]. The Spectral method [27] is a class of techniques involving the use of the Fast Fourier Transform. It is suitable for PDE problems with very smooth solutions and provides highly accurate approximations for these problems. It is based on replacing the unknown function in the PDE by its Fourier series to get a system of ODEs in the time-dependent coefficients of the Fourier series. The spectral method and the FEM share the idea to approximate the solution of a PDE by a linear combination of basic functions. The difference being these basic functions in the Spectral method are nonzero over the entire domain, whereas the basic functions of the FEM are nonzero on small subdomains only. For this reason, the spectral method can be understood as a global approximation approach, whereas the FEM constitutes a local approximation approach. There are numerous other methods, such as multigrid methods, domain decomposition methods, level-set methods or meshfree methods. As PDE problems arise from very different settings and applications, it is highly dependent on a particular PDE which numerical method is most suitable for providing an accurate approximation. In all numerical methods a potentially hard system of algebraic equations need to be solved. In the next section we will attempt this problem for the FDM by sparse semidefinite programming relaxation and polynomial optimization techniques. Remark 3.1 We have seen, in each discretization-based method for nonlinear PDEs a system of algebraic equations needs to be solved. The classical tool for a system of nonlinear equations is Newton’s method, which converges locally quadratic. However, Newton’s method requires a starting point close to a solution of the system, in order to converge. For difficult nonlinear problems it may be a very challenging problem to find a good initial guess. There are various techniques to find a good initial guess for Newton’s method or other locally fast convergent techniques. One is to apply gradient methods or other first order techniques to find a rough approximation of a solution. Another one is to apply some homotopy-like continuation method, where the nonlinear problem is linearized and a solution of the linear problem is taken as initial point for a problem where the weight of the nonlinear part is increased incrementally. Finally, in many problems partial information of the solution, which may be obtained by numerical simulation, is utilized to get a sufficiently close initial guess. We will show in the following that SDP relaxation for polynomial problems are very useful to find a good initial guess for locally fast convergent methods. 3.2 3.2.1 Differential equations and the SDPR method Transforming a differential equation into a sparse POP In the previous section we gave an introduction to the numerical analysis of differential equations. In Chapter 2 we introduced polynomial optimization problems and their semidefinite programming (SDP) relaxations. In this section we will see how to transform a problem involving differential equations into a polynomial optimization problem (POP), in order to apply SDP relaxations to solve these differential equations numerically. For discretizing a differential equation we choose the Finite Difference Method (FDM). The FDM has the advantage of being easy to implement. Moreover, applying the FDM to a differential equation yields a sparse POP, as we will show in this section. In the following we will restrict ourselves to discussing the two-dimensional case. However, the procedures for differential equations with domains of different dimension are derived analogously. Recall, a general differential equation is given by D(u(x, y)) = f (x, y) ∀ (x, y) ∈ Ω, B(u(x, y)) = g(x, y) ∀ (x, y) ∈ ∂Ω, (3.14) 75 3.2. DIFFERENTIAL EQUATIONS AND THE SDPR METHOD where D(·) and B(·) are differential operators and f, g functions on Ω := [xmin , xmax ]×[ymin, ymax ]. Applying the FDM yields the system of equations Di,j ((uk,l )k,l ) = fi,j Bi,j ((uk,l )k,l ) = gi,j ∀ (i, j) ∈ {2, . . . , Nx − 1} × {2, . . . , Ny − 1}, ∀ (i, j) ∈ {1, Nx } × {1, . . . , Ny } ∪ {1, . . . , Nx } × {1, Ny }, (3.15) where fi,j := f (xi , yj ), gi,j := g(xi , yj ), Di,j and Bi,j finite difference approximations for the operators D and B, respectively. We assume the operators D and B are both polynomial in u(·, ·) and its derivates, which implies (3.15) is a system of polynomial equations. Note, we can reduce the dimension of this system of equations by exploiting the boundary conditions. In their most basic form Dirichlet, Neumann and periodic boundary conditons at xmin are given by u1,j = g1,j , u1,j = g1,j hx + u2,j and u1,j = uNx ,j , (3.16) respectively. If we substitute the ui,j corresponding to the boundary grid points in (3.15) by the terms given in (3.16), the number of variables in each direction is reduced by 2 in the case of Dirichlet and Neumann boundary conditions, or by 1 in the case of periodic boundary conditions. Thus, (3.15) is reduced to a system of n equations in n variables, D̂i,j ((uk,l )k,l ) − fˆi,j = 0 ∀ (i, j) ∈ {1, . . . , N̂x } × {1, . . . , N̂y }, (3.17) where n := N̂x N̂y . For instance under Neumann condition in x direction and periodic condition in y direction, n is given by n = N̂x N̂y = (Nx − 2)(Ny − 1). For simplicity of notation we will denote a solution of the original PDE problem (3.14) as u(·, ·) and a solution (uk,l )k,l of (3.17) as the vector u ∈ RN̂x N̂y , with u = (u1,1 , . . . , u1,N̂y , u2,1 , . . . , uN̂x ,N̂y ). As (3.15) is polynomial in the variable u, so is (3.17). We attempt to solve this system of equations by transforming it into a POP of type, min p(u) s.t. gi (u) ≥ 0 ∀i = 1, . . . , m, hj (u) = 0 ∀j = 1, . . . , k. (3.18) Note, (3.18) is a special case of (2.1), as an equality constraint h(x) = 0 is equivalent to the pair of inequality constraints h(x) ≥ 0 and −h(x) ≥ 0. Given a PDE problem (3.14), we take (3.17) derived from it as system of equality constraints for an optimization problem. Moreover, we choose lower and upper bounds of type (2.25), lbdi,j ≤ ui,j ≤ ubdi,j ∀ (i, j) ∈ {1, . . . , N̂x } × {1, . . . , N̂y }. (3.19) Choosing an objective function To derive a POP, it remains to choose an objective function F that is polynomial in u as well. The choice of an appropriate objective function is dependent on the PDE problem we are aiming to solve. In case there is at most one solution of the PDE problem, we are interested in the feasibility of the POP we construct. Thus any objective is a priori acceptable for that purpose. However, the accuracy of obtained solutions may depend on the particular objective function. In the case the solution of the PDE problem is not unique, the choice of the objective function determines the particular solution to be found. The objective function may correspond to a physical quantity a solution needs to optimize. For instance, in problems in fluid dynamics one is often interested in finding a solution of minimal kinetic energy. For such a problem we obtain the objective function by discretizing the kinetic energy function. A large class of PDEs which occur in many applications can be written as Euler-Lagrange equations. A typical case is a stable state equation of reaction-diffusion type. In this case, a canonical choice is a discretization of the corresponding energy integral as in 3.3.1. Another case is optimal control: A finite difference discretization of state and control constraints yields the feasible set and a discretization of the optimal value function yields the objective. We discuss numerical examples for optimal control problem in 3.3.6. 76 CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS Additional polynomial constraints We mentioned above, it is crucial to add lower and upper bounds (3.19) for each ui,j when constructing a POP from a differential equation. When choosing these bounds care has to be taken. A choice of lbd and ubd which is too tight may exclude solutions of (3.17) from the feasible set of the POP, while a choice which is too loose may cause inaccurate results. In addition to those constraints we may impose further inequality or equality constraints gl (u) ≥ 0 and hj (u) = 0, (3.20) respectively, with gl , hj ∈ R[u]. Constraints (3.20) can be understood as restrictions for the admissible space of functions we search for solutions of a differential equation. One possibility to obtain such bounds is derived by constraining the partial derivatives. We call bounds of this type variation bounds. For the derivative in x−direction they are given by | ∂u(xi , yj ) |≤ M ∂x ∀i ∈ {2, . . . , Nx − 1}, ∀j ∈ {2, . . . , Ny − 1}. (3.21) Expression (3.21) can be transformed into polynomial constraints easily. Another possibility is to impose bounds in the spirit of [22] like (2.80), (2.81), (2.82) and (2.83) introduced in 2.3. Add ≤ ubdk,l ui.j + lbdi,j uk,l − lbdi,j ubdk,l ≤ lbdk,l ui,j + ubdi,j uk,l − ubdi,j lbdk,l . us,t us,t (3.22) for each constraint us,t = ui,j uk,l in the POP, and for each constraint us,t = u2i,j we add us,t ≤ (ubdi,j + lbdi,j ) ui,j − lbdi,j ubdi,j . (3.23) If there is no quadratic constraint of this type for (i, j, k, l) or (i, j), respectively, we may add the quadratic constraints ui,j uk,l ≤ ubdk,l ui,j + lbdi,j uk,l − lbdi,j ubdk,l (3.24) ui,j uk,l ≤ lbdk,l ui,j + ubdi,j uk,l − ubdi,j lbdk,l for (i, j) 6= (k, l) and the constraint u2i,j ≤ (ubdi,j + lbdi,j ) ui,j − lbdi,j ubdi,j (3.25) for (i, j) = (k, l). Note, by the construction of the constraints (3.22) - (3.25), they shrink the feasible set of an SDP relaxation for a POP, but they do not change the feasible set of the POP. I.e., they may be added to improve the numerical accuracy of SDP relaxations for solving the POP, but they have no impact on the space of functions we are searching for discrete approximations to a solution for a differential equation. A POP derived from a differential equation If we take together all constraints and the chosen objective function, we obtain the following POP: min s.t. F (u) Di,j (u) gl (u) hk (u) lbdi,j = fi,j ≥0 =0 ≤ ui,j ≤ ubdi,j ∀ (i, j) ∈ {1, . . . , N̂x } × {1, . . . , N̂y }, ∀ l ∈ {1, . . . , s}, ∀ k ∈ {1, . . . , m}, ∀ (i, j) ∈ {1, . . . , N̂x } × {1, . . . , N̂y }. (3.26) Every feasible solution u of (3.26) is a solution of the finite difference scheme for the PDE problem (3.14). Let demonstrate how to derive (3.26) for an example. Example 3.1 Consider the nonlinear elliptic PDE uxx (x, y) + uyy (x, y) + λu(x, y) 1 − u(x, y)2 =0 u(x, y) = 0 0 ≤ u(x, y) ≤ 1 ∀ (x, y) ∈ [0, 1]2 , ∀ (x, y) ∈ ∂[0, 1]2 , ∀ (x, y) ∈ [0, 1]2 , (3.27) 3.2. DIFFERENTIAL EQUATIONS AND THE SDPR METHOD 77 where the parameter λP is set to λ = 22. We apply the standard finite difference discretization for N̂ = N̂x = N̂y , choose F (u) = − 1≤i,j≤N̂ ui,j as objective function, and obtain the POP min s.t. − P ui,j = 0 ∀ (i, j) ∈ {1, . . . , N̂ }2 , (ui+1,j + ui,j+1 + ui,j−1 + ui−1,j − 4ui,j ) + 22ui,j 1 − u2i,j 0 ≤ ui,j ≤ 1 ∀ (i, j) ∈ {1, . . . , N̂ }2 , (3.28) where u0,k = uk,0 = uN̂+1,k = uk,N̂+1 = 0 for all k ∈ {1, . . . , N̂ }. The choice for F is motivated by the fact, that (3.27) is known to have one strictly positive and the trivial solution. The optimal solution of (3.28) is a discrete approximation to the strictly positive solution of (3.27). 1 h2x 1≤i,j≤N̂ Correlative sparsity In 3.2.2 we will introduce a method to solve (3.26) by sparse SDP relaxations. In order to apply this method efficiently, we need to show (3.26) satisfies a structured sparsity pattern. To show correlative sparsity is straight-forward: Proposition 3.2 Let two differential operators D1 and D2 be given by D1 (u) := a(u)uxx + c(u)uyy + d(u)ux + e(u)uz + f˜(u) and D2 (u) := a(u)uxx + b(u)uxy + c(u)uyy + d(u)ux + e(u)uz + f˜(u), where a(·), . . . , f˜(·) polynomial in the function u(·). Let N̂ = N̂x = N̂y and n := N̂ 2 Let F be linear function in u. Let nz (R) denote the number of nonzero entries in the CSP matrix R of the POP (3.26). Then, nz (R) ≤ 13n, if (3.26) derived from (3.15) with D := D1 , and nz (R) ≤ 25n, if (3.26) derived from (3.15) with D := D2 , for any choice of B and f . This implies, (3.26) is correlatively sparse in both cases. Proof: As F linear, the objective function does not cause any nonzero entries in R by Definition 2.5. Due to the finite difference discretization at most 12 unknown uk,l can occur in some equality constraint with a particular unknown ui,j for D = D1 , as pictured in Figure 3.1. Hence the maximum number of nonzero elements in the row of R corresponding to ui,j is 13, which implies nz (R) ≤ 13n. With the same argument holds nz (R) ≤ 25n for D = D2 ; see Figure 3.1. These bounds are tight; they are attained in the case of periodic conditions for x and y. Let R′ denote the n × n matrix corresponding to the graph G(N, E ′ ), which is a chordal extension of CSP graph G(N, E). For the computational efficiency it is also useful to know whether R′ is sparse or not. nz (R′ ) depends on the employed ordering method P for R, which is used to avoid fill-ins in the symbolic sparse Cholesky factorization LLT of the ordered matrix P RP T . R′ is constructed as R′ = L + LT . We examine two different methods of ordering R, the symmetric minimum degree (SMD) ordering and reverse Cuthill-McKee (RCM) ordering. See [24] for details about these orderings. 78 CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS ui-1,j+1 ui+1,j+1 ui-2,j ui,j ui,j Figure 3.1: uk,l involved in some constraint with ui,j for D = D1 (left) and D = D2 (right). 300 600 250 500 200 400 n (R’)/ n 150 nz(R’)/ n nz(R’)/ n z n (R’)/ n z 300 100 200 50 100 0 0 2 4 6 8 n Figure 3.2: 10 12 14 0 16 4 x 10 nz (R′ ) n 0 2 4 6 n 8 10 12 4 x 10 for SMD (left) and RCM (right) ordering if D = D1 . We conduct some numerical experiments, in order to estimate the behavior of nz (R′ ). Figure 3.3 shows ′ ) obtained by the SMD and RCM examples of R′ after SMD and RCM ordering, and Figure 3.2 shows nz (R n orderings for the n × n-matrix R, respectively, for D = D1 and Dirichlet or Neumann condition in x and ′ ′ ) ) periodic condition in y. For n ∈ [100, 160000] it holds nz (R ≤ 300 for SMD ordering and nz (R ≤ 600 for n n ′ ) RCM ordering, respectively. The behavior of nz (R may suggest nz (R′ ) = O(n) for both ordering methods. n Hence we expect the sparse SDP relaxations to be efficient for solving (3.26) in numerical experiments. However, since the constants 300 and 600 are large, we can not always expect a quick solution of the sparse SDP relaxation. Domain-space sparsity In 2.2 we introduced the concept of domain-space and range-space sparsity of an optimization problem with matrix variables. Moreover, in 2.2.7 we constructed some linear SDP relaxations for quadratic SDP exploiting this sparsity. A QOP is a special case of a quadratic SDP. As all constraints in a QOP are scalar, it does not satisfy a range-space sparsity pattern. Thus, if we transform (3.26) into a QOP, we can apply the relaxations (a) or (b) from 2.2.7 to find approximate solutions for (3.26). In order to apply these relaxations efficiently, the question arises, whether QOPs derived from (3.26) satisfy domain-space sparsity. For certain classes of PDE problems, we obtain the following sparsity results. 79 3.2. DIFFERENTIAL EQUATIONS AND THE SDPR METHOD 0 0 0 50 50 50 100 100 100 150 150 150 200 200 200 250 250 250 300 300 300 350 350 350 400 400 0 50 100 150 200 250 300 350 400 0 50 100 150 200 250 300 350 400 400 0 50 100 150 200 250 nz = 27344 300 350 400 Figure 3.3: R (left), R′ obtained by SMD (center) and RCM (right) orderings for D = D1 with n = 400. Example 3.2 Given a rectangular domain Ω, f : Ω → R and differential operator B(·). Define the operators D1 (u) := auxx + cuyy + dux + euy + gu + hu2 , D2 (u) := auxx + cuyy + dux + euy + gu + hu3 , with a, c, d, e, g, h : Ω → R. Moreover, choose F linear in u when constructing (3.26) for a discretization n = N̂x N̂y . In the case D = D1 (3.26) is a quadratic SDP, in the case D = D2 we need to apply the method from 2.3 to (3.26) to transform it into a quadratic SDP. In fact, for the example with D = D2 there is an unique way to transform the POP into an QOP by defining n variables vi,j := u2i,j . Then, it is easy to see that the domain-space sparsity patterns of the quadratic SDP corresponding to (3.26) for D = D1 and D = D2 , respectively, is given by Figure 3.4. 0 0 0 5 100 100 10 200 200 300 300 400 400 500 500 600 600 15 20 25 30 35 700 700 40 800 0 5 10 15 20 25 nz = 121 30 35 40 0 100 200 300 400 nz = 3201 500 600 700 800 800 0 100 200 300 400 nz = 3201 500 600 700 800 Figure 3.4: d-space sparsity pattern of (3.26) for D = D1 (lef t), and D = D2 before (center) and after (right) reordering of rows and columns. Note, for the two cases in Example 3.2 the number of nonzero entries in every row but the first and last one of the domain-space sparsity pattern matrix is less or equal to two and three, respectively. This is considerably smaller than the upper bound 13 for the number of nonzero entries in each row of the correlative sparsity pattern matrix provided by Proposition 3.2. Likewise the size of an average maximal clique of the chordal extension of the domain-space sparsity pattern graph is far smaller than the one of the chordal extension of the correlative sparsity pattern graph. Thus, the primal SDP relaxation (b) from 2.2.7 for an QOP derived from a PDE of one of the two classes in Example 3.2 is far smaller than the sparse SDP relaxation (2.18) of relaxation order ω = 1 for the same QOP, and can be solved for much finer discretizations. However, in the primal SDP relaxation (b) there is no relaxation order we can increase, 80 CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS and there is no general result how well the primal SDP relaxation approximates a QOP. The sparse SDP relaxations (2.18) provide a sequence of SDPs whose minima converge to the optimum of a QOP. In fact, their approximation accuracy is improving monotonously for increasing order ω. Therefore, as we will see in numerical examples in 3.3, the primal SDP relaxation (b) is only useful for QOPs, where the solution of the primal SDP relaxation is a good approximation of an optimal solution of the QOP. 3.2.2 The SDPR method In this section we introduce the method to solve the POP (3.26) derived from a PDE problem (3.14) to obtain discrete approximations to solutions of (3.14). In order to derive (3.26) from a PDE problem, we need to choose a discretization (Nx , Ny ), the bounds lbd and ubd, the objective F , if not given by the PDE problem, and possibly additional polynomial constraints gl and hk . If (3.26) is a POP of degree three or larger, we can either apply the sparse SDP relaxations (2.18) for some relaxation order ω, or we apply one of the heuristics AI, AII, BI, BII from 2.3.1 to transform (3.26) into a QOP. To the QOP we apply either the sparse SDP relaxations (2.18) with relaxation order ω = 1 or the primal SDP relaxation (b) exploiting domain-space sparsity from 2.2.7. Solving the SDP relaxations for the POP or the QOP, we obtain a first approximation û to an optimal solution of (3.26). The solution û can be used as an initial guess for locally fast convergent methods. One possibility is to apply sequential quadratic programming (SQP) [8] to (3.26), another one is to apply Newton’s method for nonlinear systems [77] to (3.17), both starting from û, in order to obtain a more accurate discrete approximation u to a solution of the PDE problem (3.14). This procedure is called the semidefinite programming relaxation (SDPR) method for solving a PDE problem of type (3.14), it is summarized in the following chart: Method 3.1 (SDPR method) I. Given a PDE problem (3.14), choose (Nx , Ny ), lbd, ubd, F , gl and hk to derive (3.26). II. If POP (3.26) of degree three or larger, we may apply AI, AII, BI or BII to transform it into a QOP. Then, apply sSDP1 (2.18) or relaxation (b) from 2.2.7 to this QOP. Denote the first n components of the solution vector of the applied SDP relaxation as û. III. If POP (3.26) of degree three or larger which has not been transformed into a QOP, choose relaxation order ω ≥ ωmax , apply sSDPω (2.18) to that POP and obtain its solution û. IV. Apply Newton’s method to (3.17) or SQP to (3.26), both with initial guess û, and obtain u as an approximate to an optimal solution of (3.26) and as a discrete approximation for a solution of the PDE problem (3.14). Recall, when applying the SDP relaxation (b) from 2.2.7 to a QOP, we impose the additional constraints (2.84) as explained in Remark 2.9. For locally fast convergent methods we consider SQP and Newton’s method, which we describe briefly in the following. However, we are by no means restricted to these two for choosing an iterative method which is fast convergent towards an highly accurate discrete approximation of a solution to (3.17) when starting from a guess close to the accurate approximation. Newton’s method The discretized PDE (3.17) is a special case of the problem r(x) = 0, where r : Rn → Rn , r(x) = [r1 (x), . . . , rn (x)]T and ri : Rn → R are smooth functions for all i ∈ {1, . . . , n}. The functions ri may be nonlinear in x. The basic form of Newton’s method for solving nonlinear equations 3.2. DIFFERENTIAL EQUATIONS AND THE SDPR METHOD 81 is given by NEWTON Choose x0 ; for k = 0, 1, 2, . . . Calculate a solution pk to the Newton equations J(xk )pk = −r(xk ); xk+1 = xk + pk ; end (for) This algorithm is motivated by the multidimensional Taylor’s theorem. Theorem 3.5 Suppose that r : Rn → Rn is continuously differentiable in some convex open set D and that x and x + p are vectors in D. We then have that Z 1 J(x + tp)pdt. r(x + p) = r(x) + 0 J(x + tp) is the Jacobian of r at x + tp, it is defined as J(x) = ∂rj ∂xi i,j=1,...,n ∇r1 (x)T .. = . . T ∇rn (x) We define a linear model Mk (p) of r(xk + p) given in Theorem 3.5, i.e., we approximate the second term on the right-hand-side by J(x)p, and write Mk (p) = r(xk ) + J(xk )p. The vector pk = −J(xk )−1 r(xk ) satisfies Mk (pk ) = 0. It is equivalent to the solution pk of the Newton equations in the NEWTON algorithm. As shown in [77], if x0 close to a nondegenerate root x⋆ and r continuously differentiable, then the sequence (xk )k of Newton’s method converges superlinearly to x⋆ . If r furthermore Lipschitz continuously differentiable, the convergence is quadratic. Sequential Quadratic Programming The POP (3.26) is a special case of the nonlinear programming problem min f (x) s.t. h(x) = 0, g(x) ≤ 0, (3.29) where f : Rn → R, h : Rn → Rm and g : Rn → Rp three times continuously differentiable. The basic idea of SQP is to model (3.29) at a given approximate solution xk by a quadratic program, and to use the solution to this subproblem to construct a better approximation xk+1 to the solution of (3.29). This method can be viewed as the natural extension of Newton’s method to the constrained optimization setting. It shares with Newton’s method the property of rapid convergence, when the iterates are close to the solution, and possible erratic behavior, when the iterates are far from the solution. SQP has two key features: First, it is not a feasible point method, its iterates xk+1 do not need to be feasible for (3.29). Second, in each iteration of an SQP approach a quadratic program is to be solved, which is not too demanding since highly efficient procedures for quadratic programs, i.e., programs with quadratic objective and linear constraints, exist. Let the Lagrange function of (3.29) be given by L(x, u, v) and a quadratic subproblem by min s.t. ∇f (xk )T (x − xk ) + 21 (x − xk )T Bk (x − xk ) ∇h(xk )T (x − xk ) + h(xk ) = 0, ∇g(xk )T (x − xk ) + g(xk ) ≤ 0, (3.30) 82 CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS where Bk an approximation of the Hessian of the Lagrangian function at the iterate (xk , uk , vk ). Then, an SQP approach in its most basic form is given by the algorithm SQP Choose (x0 , u0 , v0 ), B0 , and a merit function φ for k = 0, 1, 2, . . . Form and solve (3.30) to obtain its optimal solution (x⋆ , u⋆ , v ⋆ ). Choose step length α so that φ(xk + α(x⋆ − xk )) < φ(xk ). xk+1 = xk + α(x⋆ − xk ), Set uk+1 = uk + α(u⋆ − uk ), vk+1 = vk + α(v ⋆ − vk ). Stop if (xk , uk , vk ) satisfies some convergence criterion. Compute Bk+1 from (xk , uk , vk ). end (for) A merit function is a function whose reduction implies progress towards the global optimum of problem (3.29). For more details about SQP see [8]. Grid-refining method In order to guarantee, that a discretized PDE problem (3.15) is a good approximation of (3.14), i.e., its solutions are good discrete approximations of continuous functions u(·, ·), it is necessary to choose fine grid discretizations (Nx , Ny ). However, a fine grid discretization results in a large scale POP (3.26). Even when exploiting correlative or domain-space sparsity, transforming it into a QOP and imposing tight lower and upper bounds, the SDP relaxation, which needs to be solved, is often computationally demanding - in particular in the cases where we need to choose a high relaxation order to obtain an accurate approximation to an optimal solution of (3.26). Thus, for many difficult PDE problems the SDP relaxations resulting from fine grid discretizations are intractable for current SDP solvers. To overcome this problem, we consider a grid-refining method. In our grid-refining method a solution obtained by applying the SDPR method to (3.14) for a coarse grid discretization in a first step is extended stepwise to finer and finer grids by subsequently interpolating coarse grid solutions and applying the SDPR method or locally convergent methods. This method is described by the following scheme: Step 1 - Initialize Step 2 - Extend Step 3a Step 3b Iterate Apply SDPR method with Nx (1), Ny (1), F1 , lbd1 , ubd1 , g(1), h(1), ω1 . obtain u1 Nx (k) = 2 Nx (k − 1) − 1 or Ny (k) = 2 Ny (k − 1) − 1 ⋆ Interpolation of uk−1 obtain uk−1 . Apply SDPR method with , Nx (k), Ny (k), Fk , lbdk , ubdk , g(k), h(k), ωk . Apply Newton’s method or SQP. obtain uk Step 2 and Step 3 Step 1 - SDPR method: Choose an objective function F1 (u), a discretization grid size (Nx (1), Ny (1)), lower bounds lbd1 , upper bounds ubd1 and an initial relaxation order ω1 . Apply SDPR with these parameters to (3.14) and obtain a solution u1 . Step 2 - Extension: Extend the (k-1)th iteration’s solution uk−1 to a finer grid. Choose either x- or y-direction as the direction of refinement, i.e. choose either Nx (k) = 2 Nx (k − 1) − 1 and Ny (k) = Ny (k − 1), or Nx (k) = Nx (k −1) and Ny (k) = 2 Ny (k −1)−1. In order to extend uk−1 to the new grid with the doubled number of grid points, assume without loss of generality the direction of extension is x. The interpolation ⋆ of the solution uk−1 to uk−1 is given by the scheme 3.2. DIFFERENTIAL EQUATIONS AND THE SDPR METHOD k−1 u2i−1,j k−1 ⋆ u2i,j ⋆ = = 83 k−1 ui,j ∀i ∈ {1, . . . , Nx (k − 1)}, ∀j ∈ {1, . . . , Ny (k)}, k−1 k−1 1 ∀i ∈ {1, . . . , Nx (k − 1) − 1}, ∀j ∈ {1, . . . , Ny (k)}. 2 ui+1,j + ui,j ⋆ The interpolated solution uk−1 is a first approximation to a solution of POP (3.26) for the Nx (k) × Ny (k)grid. Step 3a - Apply SDPR method: We choose new parameters Fk , lbdk , ubdk , ωk , g(k) and h(k). It is based on the idea, that we may be able to choose ωk < ωk−1 , if we exploit information given by the ⋆ interpolated solution uk−1 . One possibility to do this, is to choose the new objective function Fk = FM , k where FM is defined by X k−1 ⋆ 2 k (ui,j − ui,j ) . (3.31) FM (u) = i,j We may choose this objective function as we are interested in finding a feasible solution of (3.26) with ⋆ minimal Euclidean distance to the interpolated solution uk−1 . Another possibility to utilize the information k−1 ⋆ given by u is to tighten the lower and upper bounds by n o k−1 k−1 ⋆ lbdki,j = max lbdi,j , ui,j −δ ∀ i, j, n o k−1 k k−1 ⋆ ubdi,j = min ubdi,j , ui,j + δ ∀ i, j, for some δ > 0. Apply the SDPR method to obtain uk . Step 3b - Apply Newton’s method or SQP: It may occur the SDP relaxations of (3.26) for the finer grid become intractable, even if ωk < ωk−1 . Therefore, we may just apply Newton’s method to (3.17) or ⋆ SQP to (3.26) for the finer discretization (Nx (k), Ny (k)), both starting with the interpolated solution uk−1 as initial guess, to obtain a better approximate solution u. The steps 2 and 3 are repeated until an accurate solution for a high resolution grid is obtained. The SDPR method with all its options and the grid-refining method are demonstrated on a variety of numerical examples in 3.3. 3.2.3 Enumeration algorithm The SDPR method aims at finding a discrete approximation to a solution of a PDE problem. The freedom to choose an objective function for detecting particular solutions of a PDE problem is a feature of the SDPR method which is particularly interesting for a PDE problem with many solutions. Beside finding a particular solution of a PDE problem, another challenging problem is to find all solutions of a PDE problem. The problem of finding discrete approximations for all solutions of a PDE problem (3.14) is the problem of finding all real solutions of the system of polynomial equations (3.17). Classical methods to solve this problem are Gröbner basis or polyhedral homotopy method, which we describe briefly at the end of this section. They compute all complex solutions to (3.17) and it remains to choose the real solutions among them. A recent method which avoids computing all complex solutions and directly computes all real solutions is given by [55] for the case the solution set of (3.17) is finite. Another method is the extraction algorithm presented in [36] for finding all optimal solutions of (3.26). However, both methods, [55] and [36] do not exploit sparsity in (3.17) and (3.26), respectively, which restricts their applicability to small- and medium-scale systems. In the following we present an algorithm to enumerate all solutions of a system of polynomial equations first proposed in [63] for the cavity flow problem, but which can be extended to a more general case of (3.17). For this method we need to assume the number of solutions of (3.17) is finite, i.e. the feasible set of (3.26) is finite. We also assume that all feasible solutions of (3.26) are distinct with respect to the objective function F , i.e., there is no pair of feasible solutions with identical objective value. The SDPR method enables us to approximate the global minimal solution u⋆ =: u(1)⋆ of (3.26). Beside the minimal solution, we are also interested in finding the solution u(2)⋆ with the second smallest objective value, the solution u(3)⋆ with the third smallest objective value or in general the solution u(k)⋆ with the kth smallest objective value. Based 84 CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS on the SDPR method we propose an algorithm that enumerates the solutions of (3.17) with the k smallest objective values. Our algorithm shares the idea of separating the feasible set by additional constraints with Branch-and-Bound and cutting plane methods that are used for solving mixed integer linear programs and general concave optimization problems [38]. In contrast to the linear constraints of those methods we impose quadratic constraints to separate the feasible set. Algorithm 3.1 Find the approximations to the solutions of (3.17) with the k smallest objective values: Given u(k−1) , the approximation to the solution with the (k − 1)th smallest objective value obtained by applying the SDPR method with relaxation order ω to POPk−1 from the (k − 1)th iteration of the algorithm. 1. Choose ǫk > 0. 2. Choose a vector bk ∈ {0, 1}n. 3. Add the following quadratic constraints to POPk−1 and denote the resulting POP with smaller feasible set as POPk . (k−1) 2 (3.32) ) ≥ ǫk for all j with bj = 1. (uj − uj 4. Apply the SDPR method with relaxation order ω to POPk . Obtain an approximation u(k) for u(k)⋆ . 5. Iterate steps 1–4. The idea of Algorithm 3.1 is to impose an additional polynomial inequality constraint (3.32) to the POP (3.26) in iteration k, that excludes the previous iteration’s solution u(k−1) from the feasible set of (3.26). In the case the feasible set of (3.26) is finite and u(k−1) is sufficiently close to u(k−1)⋆ , the new constraint excludes u(k−1)⋆ from the feasible set of (3.26) and u(k)⋆ is the new global minimizer of (3.26). Of course, there are various alternatives to step 3 in Algorithm 3.1, in order to exclude u(k−1)⋆ from the feasible set of the POP. One alternative constraint is (k−1)⋆ un+i − ǫi = 0 for all i with bi = 1, (3.33) ui − ui n where b ∈ {0, 1} , ǫi > 0 and un+i an additional slack variable bounded by −1 and 1. It is easy to see that (3.33) is violated, if u = u(k−1)⋆ . However, it turned out that the numerical performance of (3.33) is inferior to the one of (3.32) for problems of type (3.26) as the tuning parameters ǫi and b is far more difficult for (3.33) compared to (3.32). A second alternative to exclude u(k−1)⋆ are lp -norm constraints such as k u − u(k−1)⋆ kp = n p X (k−1)⋆ ui − ui i=1 ! p1 ≥ ǫ, (3.34) for p ≥ 1. The disadvantage of the constraints (3.34) is, they destroy the correlative sparsity of (3.26), as all ui (i = 1, . . . , n) occur in the same constraint. Therefore the advantage of the sparse SDP relaxations is lost and the POP can not be solved efficiently anymore. These observations justify to impose (3.32) as additional constraints in Algorithm 3.1. We obtain the following results for Algorithm 3.1. Proposition 3.3 Let (u(1) , . . . , u(k−1) ) be the output of the first (k − 1) iterations of Algorithm (3.1). If this output is a sufficiently close approximation of the vector (u(1)⋆ , . . . , u(k−1)⋆ ) of the (k − 1) solutions with smallest objective value, and if the feasible set of POP (3.26) is finite and distinct in terms of the n objective, i.e. F (u(1)⋆ ) < F (u(2)⋆ ) < . . ., then there exist b ∈ {0, 1} and ǫ ∈ Rn such that the output u(k) of Algorithm 3.1 (for kth iteration) satisfies u(k) (ω) → u(k)⋆ when ω → ∞. (3.35) 3.2. DIFFERENTIAL EQUATIONS AND THE SDPR METHOD 85 n Proof: As each u(j) is in a neighborhood of u(j)⋆ for all j ∈ {1, . . . , k − 1}, we can choose b ∈ {0, 1} and a vector ǫ ∈ Rn , s.t. 2 (j)⋆ (j) < ǫi , ∀j ∈ {1, . . . , k − 1} ∃i with bi = 1 s.t. ui − ui and for each j ∈ {1, . . . , k − 1} holds 2 (l)⋆ (j) ui − ui ≥ ǫi ∀l ≥ k ∀i with bi = 1. Let POP(k) denote (3.26) with the k systems of additional constraints given by step 3 in Algorithm 3.1, where the kth constraints are given by (3.32) for the constructed b and ǫ. Then it holds n o feas POP(k) = feas (3.26) \ u(1)⋆ , . . . , u(k−1)⋆ . Thus, u(k)⋆ is the global minimizer of POP(k) and the global minimum is F (u(k)⋆ ). As the bounds (3.19) guarantee the compactness of the feasible set, it holds with the convergence theorem for the sparse SDP relaxations [52], if ω → ∞, (3.36) u(k) (ω) → u(k)⋆ . Although we have proven convergence, the capacity of current SDP solvers restricts the choice of the relaxation order ω to small integers, typically ω = ωmax + 1 or ω = ωmax + 2. Moreover, we need to choose the parameters ǫ and b appropriately, to obtain good approximations of the k feasible solutions with the smallest objective value. In the numerical experiments in 3.3.5, we see the Gröbner basis method is an useful tool to tune the two parameters ǫ and b, as it allows to confirm whether we derive the k solutions of smallest objective value successfully in case (Nx , Ny ) is small. In the following we briefly describe Gröbner basis method and Polyhedral Homotopy Continuation as methods to test the SDPR method and to tune the parameters in Algorithm 3.1. Gröbner basis method The Gröbner basis method to find all complex solutions of a given system of zero dimensional polynomial equations is an useful tool for tuning the parameters of the SDPR method and Algorithm 3.1, and for validating its numerical results. In order to do this, we study (3.17) by the rational univariate representation [85], [78], which is a variation of the Gröbner basis method, for coarse discretizations (Nx , Ny ). For a mesh with N := Nx = Ny small (for instance N = 5 in the example in 3.3.5), (3.17) is solvable with this method (Groebner(Fgb) in Maple 11, nd gr trace and tolex gsl in Risa/Asir). Applying Gröbner basis method to solve (3.17) for a problem satisfying the assumptions of Proposition 3.3, and enumerating all solutions by their objective value allows us to confirm whether the solutions of the SDPR method are indeed the minimal solutions of (3.17) and to determine which relaxation order ω is sufficient to derive this global minimizer. The result is also used to tune parameters ǫki in Algorithm 3.1. We have no theorem which states that the tuning based on the coarse mesh case is good for the fine mesh case. However, we believe this tuning provides a better approximation for the fine mesh case, too. Note, whereas the Gröbner basis method finds all complex solutions of (3.17), the SDPR method finds the real solution of (3.17) that minimizes F . Polyhedral Homotopy Continuation Method Another recent approach for solving (3.17) is the Polyhedral Homotopy Continuation Method for polynomial systems [39]. Consider the problem to find all isolated zeros of a system of n polynomials f (x) = (f1 (x), . . . , fn (x)) = 0 in a n−dimensional complex vector variable x = (x1 , . . . , xn ) ∈ Cn . The idea of homotopy continuation methods is to define a smooth homotopy system with a continuation parameter t ∈ [0, 1], h(x, t) = (h1 (x), . . . , hn (x)) = 0, 86 CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS using the algebraic structure of the polynomial system. The homotopy system is constructed, such that the solutions of the starting polynomial system h(x, 0) = 0 can be computed easily and that the target polynomial system h(x, 1) = 0 coincides with the system f (x) = 0 to be solved. Furthermore, every connected component of the solutions (x, t) ∈ Cn × [0, 1) of h(x, t) = 0 forms a smooth curve. The number of homotopy curves that are necessary to connect the isolated zeros of the target system to isolated zeros of the starting system determines the computational work involved in tracing homotopy curves. A recent software package to determine all complex isolated solutions of f (x) = 0 is PHoM by Gunji et al. [31]. Thus, we may apply PHoM to find the complex solutions of a discretized PDE problem pi,j (u) = 0 for all (i, j). Then, we select the real among all complex solutions of (3.17) and compare those solutions to the solutions obtained by the SDPR method. Beside its property of finding all complex solutions, PHoM has the drawback, that the computation time grows exponentially in the dimension n of the system. This is due to the fact, the number of isolated solutions increases exponentially in n. Therefore we are restricted to very coarse meshes with n ≤ 10 when applying PHoM to (3.17). As the Gröbner base method, the polyhedral homotopy method can be used for tuning the parameters in Algorithm 3.1, too. 3.2.4 Discrete approximations to solutions of differential equations In 3.1.1 we discussed, if a finite difference scheme is convergent and the discretization (Nx , Ny ) is chosen sufficiently fine, then a solution of (3.17) is a discrete approximation to a solution of the differential equation (3.14). However, there is no theorem proving the convergence of finite difference schemes for general classes of nonlinear differential equations. In the case of many nonlinear PDE we can not guarantee that a solution of (3.17) is indeed a discrete approximation of a solution of (3.14). Definition 3.4 A solution of (3.17) that is not a discrete approximation of a solution of the PDE problem (3.14) is called a fake solution. In numerical experiments our main indicator for a solution u of (3.17) not being a fake solution is if we succeed to extend u from a coarse grid to finer and finer grid via the grid refining method. Another property of solution to (3.17) is the notion of stability. Definition 3.5 Let J(·) denote the Jabobian of (3.17) and me (·) its maximal eigenvalue. A solution u to (3.17) is called stable, if all eigenvalues of J(u) are non-positive, i.e., if me (u) ≤ 0. If not, it is called unstable. Distinguishing stable and unstable solutions is of interest for certain classes of nonlinear PDE problems. In 3.3.3 we will discuss Reaction-Diffusion equations as an example of such a class. 3.3 Numerical experiments In 3.2.2 we introduced the SDPR method for computing discrete approximations to solutions of differential equations. In this section we demonstrate the broad scope of this method by applying it to problems involving nonlinear differential equations which arise from a wide range of fields. It is indeed possible to find highly accurate approximations to many nonlinear differential equations by techniques based on sparse SDP relaxations. For our numerical experiments we apply the software SparsePOP [103] as an implementation of the sparse SDP relaxations (2.18). For applying domain-space sparsity conversion methods we use SparseCoLO [20], and as an SQP based solver we utilize the MATLAB Optimization toolbox routine fmincon. As SDP solver for SparsePOP or the primal SDP relaxation from (2.2.7) we apply SeDuMi [95]. For an eventual transformation from POP into a QOP we use one of the heuristics AI, AII, BI or BII from 2.3. When applying the SDPR method to obtain an approximate solution for (3.26), the most important measure of accuracy of its solution u is the scaled feasibility error ǫsc defined by hk (u) gl (u) Di,j (u) − fi,j |, − | |, min ,0 ∀ i, j, k, l , ǫsc = min − | σi,j (u) σk (u) σ̂l (u) 87 3.3. NUMERICAL EXPERIMENTS where σi,j (u), σk (u) and σ̂l (u) the maxima over all monomials in the corresponding enumerator polynomials. Note, ǫsc (u) does measure how accurate u approximates a feasible solution of (3.26), but it does not measure how accurate the finite difference scheme approximates the continuous problem (3.14). Another question is, how well F (u) approximates the minimum of (3.26). In the case we choose the primal relaxations (a) and (b) from 2.2.7 or the dual relaxations (2.18) for some linear objective F , F (u) approximates the minimum of (3.26) very accurately, if the feasibility error of u is small. But in case F is nonlinear and we choose the dual relaxations, we have to consider the optimality error ǫobj defined by ǫobj = | min(sSDPω ) − F (u) | max {1, | F (u) |} as a measure for the optimality of u. Recall, when applying the SDPR method to solve a differential equation, the most important choices are: The objective function F , bounds lbd and ubd, relaxation order ω in the case of dual relaxations (2.18), Newton’s method or SQP as locally fast convergent method, possibly additional constraintss hk and gl . Finally, ones needs to decide whether to apply the grid-refining method from 3.2.2 with extension strategy 3a or 3b. To evaluate the results of the SDPR method for approximating the solutions of differential equations, we consider (a) to apply the SDPR method to PDE problems where an analytical solution is known, and (b) to compare the performance of the SDPR method with the performance of MATLAB Optimization Toolbox, a general purpose solver based on the Finite Element Method. Finally, we may apply Gröbner basis computation or the Polyhedral Homotopy Method to verify that the SDPR method provides accurate approximations to feasible solutions of (3.26), as mentioned in 3.2.3. Moreover, J(u) denotes the Jacobian of (3.17) at u and me (u) its largest eigenvalue. All numerical experiments are conducted on a LINUX OS with CPU 2.4 GHz and 8 Gb memory. The total processing time in seconds is denoted as tC . 3.3.1 A nonlinear elliptic equation with bifurcation A well known yet interesting nonlinear elliptic PDE problem, which we have already seen in Example 3.27, is given by uxx (x, y) + uyy (x, y) + λu(x, y) 1 − u(x, y)2 = 0 ∀ (x, y) ∈ [0, 1]2 , u(x, y) = 0 ∀ (x, y) ∈ ∂[0, 1]2 , (3.37) 0 ≤ u(x, y) ≤ 1 ∀ (x, y) ∈ [0, 1]2 , where λ ≥ 0. In fact, this PDE is known as the Allen-Cahn Equation. It was shown in [93], there exists a unique nontrivial solution for this problem if λ > λ0 = 2π 2 ≈ 19.7392, and there exists only the trivial zero solution if λ ≤ λ0 . Due to the bifurcation at λ0 , homotopy-like continuation methods, which start from a solution of a system with weak non-linearity to attempt the system with strong non-linearity, Pcannot be applied to solve (3.37). We fix λ = 22 and apply the SDPR method with ω = 2 and F (u) = − i,j ui,j . In order to study the efficiency of the various options of the SDPR method, we consider different settings: Dual SDP relaxations with and without additional local solver for the POP derived from (3.37), dual and primal SDP relaxations for a QOP equivalent to the POP, the grid-refining method starting from a coarse grid solution, tight and loose upper bounds. The numerical results are given in Table 3.1 and pictured in Figure 3.5. When applying dual SDP relaxations to the original POP, we observe that the solution provided by the SDPR method is very accurate even in the case no additional local method is used. Moreover, the size of the SDP relaxations and thus the computational cost increases rapidly for increasing Nx and Ny . One way to address this problem is to apply the transformation from POP into an equivalent QOP. Both, the primal and dual SDP relaxations are substantially smaller than the dual SDP relaxations for the original POP. But, ubd needs to be tightened when applying SDP relaxations to the QOP, in order to preserve numerical accuracy. Note, for this problem d-space sparsity is richer than correlative sparsity, as the size of the corresponding SDP relaxations and the resulting tC are smaller. The most efficient mean to obtain high resolution approximations to a solution of (3.37) is the grid-refining method. However, the grid-refining method relies on the assumption that the behavior of the discretized system does not change much for increasing Nx and Ny , which is not the case when starting from a very coarse discretization in general. 88 CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS Therefore, SDP relaxations for the equivalent QOP are a promising tool to attempt a PDE problem on a higher resolution grid directly. Finally, we notice that the solution of the sparse SDP relaxation remains a sufficiently good initial guess for SQP, even if its feasibility error ǫsc is not that small. SDP relaxation Dual Dual + Grid-refining 3b Dual Dual Dual + Grid-refining 3b Dual Dual + Grid-refining 3b Dual Dual Primal Primal Primal Local solver SQP SQP none SQP SQP SQP SQP SQP SQP SQP none SQP POP to QOP no no no no no no no no yes yes yes yes ubd 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.45 0.45 0.6 0.6 n 16 1521 81 81 1521 49 225 196 196 196 841 841 Nx 6 41 11 11 41 9 17 16 16 16 31 31 Ny 6 41 11 11 41 9 17 16 16 16 31 31 ǫsc -1e-14 -1e-13 -1e-10 -4e-15 -1e-13 -9e-15 -1e-9 -5e-11 -8e-15 -5e-4 -5e-14 tC 3 426 418 420 1016 39 49 OOM 107 35 1133 3763 Table 3.1: SDPR method results for (3.37), where OOM stands for ’Out of Memory’. 0.4 0.4 0.5 0.35 0.4 u(x,y) 0.25 0.2 0.2 0.1 u(x,y) 0.3 0.3 0.5 0.35 0.4 0.3 0.3 0.25 0.2 0.2 0.1 0.15 0.15 0 1 0.1 0.8 0.8 0.6 0.4 y 0.05 0.4 0.2 0.1 0.8 1 0.6 0 1 1 0.6 0.8 0 0 0.6 0.4 y x 0.05 0.4 0.2 0.2 0.2 0 0 x Figure 3.5: Solution for (3.37) in case λ = 22 for (Nx , Ny ) = (6, 6) and (Nx , Ny ) = (41, 41). To examine whether the obtained solution is the only strictly positive one of the discretized PDE, we impose additional constraints | ux (x, y) | ≤ M | uy (x, y) | ≤ M ∀ (x, y) ∈ [0, 1]2 . (3.38) We apply the SDPR method with ω = 2 and Nx = Ny = 6 to (3.37) for λ = 22 under the additional constraints (3.38). For M > 0.8 we detect the positive solution obtained before. If we decrease M sufficiently, we obtain the zero solution. Hence, it seems there exists exactly one positive non-trivial solution to the discretization of (3.37), and this solution converges to the strictly positive solution of (3.37) for Nx , Ny → ∞. As another way to confirm the accuracy of the SDPR method, we take advantage of a further property of (3.37). It was shown in [15], a function u : [0, 1]2 → R that is a minimizer of the optimization problem 2 R u4 u 2 2 u + u − 2λ dx dy − minu: [0,1]2 →R 2 x y 2 4 [0,1] (3.39) s.t. u=0 on ∂[0, 1]2 , 0≤u≤1 on [0, 1]2 , 89 3.3. NUMERICAL EXPERIMENTS is a solution to (3.37). The integral to be minimized in this problem is called the energy integral. By discretizing (3.39) via a finite difference scheme it can be transformed into a POP analogously to a PDE of form (3.14). In opposite to (3.26) that we derive from (3.14), the objective function F is not of free choice but canonically given by the discretization of the objective function in (3.39). We apply the SDPR method with relaxation order ω = 2 to (3.37) and (3.39) on a 6 × 6- and a 11 × 11- grid and obtain an identical solution for both problems. These results are reported in Table 3.2, where ∆u, given by ∆u = max | ui,j − ûi,j |, i,j evaluates the deviation of the SDPR method solutions for both problems; ui,j denotes the SDPR solution to (3.37) and ûi,j the SDPR solution to (3.39). Problem (3.37) (3.39) (3.37) (3.39) Nx 6 6 11 11 Ny 6 6 11 11 tC 3 2 418 98 ǫobj 2e-14 1e-10 4e-15 2e-10 ǫsc -1e-14 -9e-15 - ∆u 2e-6 2e-6 9e-7 9e-7 Table 3.2: SDPR results for (3.37) and (3.39). The solutions to both problems are highly accurate and we note that the total computation time to minimize the energy integral is less than the time required to solve the polynomial optimization problem corresponding to (3.37). Finally, we compare the numerical performance of the SDPR method for (3.37) to existing solvers for nonlinear PDE problems. We apply the nonlinear solver from Matlab PDE toolbox to (3.37). The Matlab solver is FEM based and requires an initial guess to search for a solution of the PDE problem. When starting from the zero-solution or a number of random positive functions, this solvers detects the trivial solution. Even when choosing u0 with u0 (x, y) := 0.43 sin(πx) sin(πy) on [0, 1]2 as initial guess, the FEM solver detect the trivial solution not the non-trivial one, although u0 is close to the non-trivial solution (On P 41 × 41-grid: max | u − u0 |= 0.006, 4112 i,j | ui,j − u0i,j | = 0.003). Although the FEM solver finds the trivial solution in less than 60 seconds on a mesh of much higher resolution (67356 nodes, 134204 triangles) than those solved by the SDPR method, it needs a very good initial guess to find the more interesting, non-trivial solution. It is the advantage of the SDPR method, that no initial guess is required to find a accurate approximation of the strictly positive solution of (3.37). 3.3.2 Illustrative nonlinear PDE problems A problem in Yokota’s text book Simple ODE problems can be solved by the SDPR method with ease. To demonstrate this, consider the easy solvable nonlinear boundary value problem ü(x) + 81 u(x)u̇(x) − 4 − 14 x3 u(1) u(3) 10 ≤ u(x) =0 ∀ x ∈ [1, 3], = 17, = 43 3 , ≤ 20 ∀ x ∈ [1, 3]. (3.40) For details about problem (3.40) see [105]. Applying the SDPR method with ω = 2, objective function F , defined by Nx X ui , F (u) = i=1 and without using a locally fast convergent method, yields the highly accurate, stable solution, that is documented in Table 3.3 and pictured in Figure 3.6. 90 CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS Nx 200 ǫsc 2e-9 ǫobj -4e-11 me -3 tC 104 Table 3.3: Numerical results for problem (3.40). 17 16 u u(x) 15 14 13 12 11 1 1.2 1.4 1.6 1.8 2 x 2.2 2.4 2.6 2.8 3 Figure 3.6: SDPR method solution u for problem (3.40). Nonlinear wave equation As an example of a hyperbolic PDE problem we study time periodic solutions of the nonlinear wave equation −uxx + uyy + u (1 − u) + 0.2 sin(2x) = 0 ∀ (x, y) ∈ [0, π] × [0, 2π], u(0, y) = u(π, y) = 0 u(x, 0) = u(x, 2π) −3 ≤ u(x, y) ≤ 3 ∀ y ∈ [0, 2π], ∀ x ∈ [0, π], (3.41) ∀ (x, y) ∈ [0, π] × [0, 2π]. As far as we have checked on the mathsci data base, there is no mathematical proof of the existence of periodic solution of this system. However, the SDPR method finds some periodic solutions. We observed the POP corresponding to problem (3.41) has various solutions. Therefore, the choice of the objective determines the solution found by the sparse SDP relaxation. We consider the functions X X ui,j , σi,j ui,j , F2 (u) = F1 (u) = i,j i,j as objective for (3.26), where σi,j (i = 1, . . . , Nx , j = 1, . . . , Ny ) are random variables that are uniformly distributed on [−0.5, 0.5]. We apply the SDPR method with ω = 2 and Newton’s method as a local solver. The results are enlisted in Table 3.4 and pictured in Figures 3.7 and 3.8. Imposing Variation bounds: Beside choosing different objective functions in the SDPR method, a second possibility to detect other solutions of a PDE is to impose additional constraints polynomial in the unknown functions. In 3.2.1 we introduced variation bounds (3.21) to restrict the space of functions we are searching for solutions. For (3.41) we impose the variation bounds | uy (x, y) |≤ 0.5 ∀ (x, y) ∈ (0, π) × (0, 2π). (3.42) 91 3.3. NUMERICAL EXPERIMENTS SDP relaxation Dual Dual + Grid-refining 3b Dual Dual + Grid-refining 3b objective F1 F1 F2 F2 Nx 5 33 5 33 Ny 6 40 5 33 ǫsc -4e-10 -3e-8 -3e-11 -5e-10 tC 151 427 19 86 Table 3.4: SDPR method results for (3.41). 2.5 2 2 1.5 1 0.5 u(x,y) u(x,y) 1.5 1 0.5 0 −0.5 0 −1 −0.5 8 −1.5 8 6 4 6 3 4 4 3 4 2 2 y 2 2 1 0 0 y x 1 0 0 x Figure 3.7: Solutions of (3.41) by SDPR method with objective F1 . If the SDPR method is applied to (3.41) with additional condition (3.42) and F1 as objective, another solution to the PDE problem is obtained, which is documented in Table 3.5 and pictured in Figure 3.9. Thus several solutions of (3.41) are detected by choosing different objective functions (3.26) and by imposing additional polynomial inequality constraints. SDP relaxation Dual Dual + Grid-refining 3b objective F1 F1 Nx 5 17 Ny 6 21 ǫsc -3e-15 -3e-14 tC 437 1710 Table 3.5: SDPR method results for (3.41) under additional constraint (3.42). A system of elliptic nonlinear PDEs As a PDE problem of two unknown functions in two variables, consider the following problem, where we distinguish two types of boundary conditions, Case I (Dirichlet condition) and Case II (Neumann condition). uxx + uyy + u 1 − u2 − v 2 = 0, vxx + vyy + v 1 − u2 − v 2 = 0, 0 ≤ u, v ≤ 5. ∀ (x, y) ∈ [0, 1]2 . (3.43) 92 CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS 2.5 2 2 1.5 1.5 u(x,y) u(x,y) 2.5 1 0.5 1 0.5 0 8 0 8 6 4 6 3 4 4 3 4 2 2 2 1 0 y 2 0 1 0 y x 0 x Figure 3.8: SDPR solutions of (3.41) with objective F2 . 0.05 0.06 0.04 u(x,y) u(x,y) 0.02 0 0 −0.02 −0.05 8 −0.04 8 6 4 6 3 4 4 3 4 2 2 y 2 2 1 0 0 y x 1 0 0 x Figure 3.9: SDPR solutions of (3.41) under additional constraint (3.42) with objective F1 . Case I: u(0, y) u(x, 0) v(x, 0) or Case II: ux (0, y) uy (x, 0) vx (0, y) vy (x, 0) = 0.5y + 0.3 sin(2πy), = 0.4x + 0.2 sin(2πx), = v(x, 1) u(1, y) = 0.4 − 0.4y u(x, 1) = 0.5 − 0.5x = v(0, y) = v(1, y) = 0 ∀ y ∈ [0, 1], ∀ x ∈ [0, 1], ∀ x ∈ [0, 1], ∀ y ∈ [0, 1]. = −1, = 2x, = 0, = −1, ux (1, y) uy (x, 1) vx (1, y) vy (x, 1) ∀ y ∈ [0, 1], ∀ x ∈ [0, 1], ∀ y ∈ [0, 1], ∀ x ∈ [0, 1]. In both cases, we choose F (u, v) = P i,j =1 = x + 5 sin( πx 2 ) =0 =1 ui,j for the SDPR method. Case I. We apply the SDPR method with both primal and dual SDP relaxations exploiting sparsity. Also, as the degree of this PDE problem is three, we can apply the POP to QOP transformation to reduce the size of the SDP relaxations. Finally we apply the grid refining method to extend the coarse grid solutions to finer grids. When applying the dual SDP relaxations to the POP, lbd = 0 and ubd = 5 are given by (3.43). The bound ubd is tightened to ubd = 0.6 for the primal and dual SDP relaxations of the QOP in order to obtain accurate solutions. The numerical results of the SDPR method with ω = 2, and SQP as 93 3.3. NUMERICAL EXPERIMENTS a local solver are reported in Table 3.6. We observe, exploiting d-space sparsity and applying the primal SDP relaxations is very efficient for this problem. In fact, the d-space sparsity is richer than the correlative sparsity, as a comparison of the total computation time of primal and dual SDP relaxations for the QOP derived from (3.43) for Nx = Ny = 11 reveals. When applying the dual SDP relaxations, the grid-refining method is useful to extend coarse grid solutions to high resolution grids. The approximate solution for u(·, ·) is pictured in Figure 3.10, the corresponding v equals zero on the entire domain. SDP relaxation Dual Dual + Grid-refining 3b Dual Dual + Grid-refining 3b Dual + Grid-refining 3b Dual Primal Primal Transform POP to QOP no no no no no yes yes yes n 18 7938 32 162 722 162 162 338 Nx 5 65 6 11 21 11 11 15 Ny 5 65 6 11 21 11 11 15 ǫsc -4-e13 -2e-14 -4e-13 -2e-13 -4e-16 -9e-9 -3e-8 -3e-8 tC 2 12280 150 156 238 185 19 107 Table 3.6: Results of SDPR method for (3.43) in Case I. W 0.5 0.5 0.4 0.4 W u(x,y) 0.3 0.3 0.2 0.2 0.1 0.1 0 1 0 1 0.8 1 0.6 [ 0.8 1 0.6 0.8 0.6 0.4 0.4 0.2 y 0.2 0 0 x Z 0.8 0.6 0.4 [ 0.4 0.2 0.2 0 0 Z Figure 3.10: SDPR method solution u of (3.43) for Case I and two different discretizations. We compare the performance of the SDPR method for Case I of (3.43) to Matlab PDE toolbox. Starting from an arbitrary initial guess the Matlab solver detects the same solution as the SDPR method in 2 seconds on a mesh with 2667 nodes and 5168 triangles, and in 15 seconds on a mesh with 10501 nodes and 20672 triangles, c.f. Figure 3.11. Thus, the FEM solver from Matlab is much more efficient in finding the solution. However, the discretization of (3.43) under Dirichlet condition has exactly one real solution. In a more difficult PDE problem with many solutions a good initial guess is required for the Matlab solver to find a solution of interest. Case II. We apply the SDPR method with the same settings as in Case I, and compare the efficiency of primal and dual SDP relaxations and the grid-refining method. For the primal and dual SDP relaxations of the QOP the upper bounds are tightened to ubdu = 4 and ubdv = 1.5. The numerical performance of the SDPR method is reported in Table 3.7. The single solution (u, v) of the discretized differential equation is illustrated in Figure 3.12. As in Case I, we observe that the transformation from POP to QOP is efficient to reduce the size of the SDP relaxations while the accuracy of the approximation is preserved. Moreover, d-space sparsity is richer than correlative sparsity, as the primal SDP relaxations for n = 242 can be solved in 58s whereas solving the dual SDP relaxations for the same QOP requires 748s. 94 CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS Height: u Height: u 0.5 0.5 0.45 0.45 0.4 0.4 0.35 0.35 0.3 0.3 0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 0 1 0.8 1 0.6 0.8 0 1 0.8 1 0.6 0.6 0.4 0.8 0.4 0.2 0.2 0 0.6 0.4 0.4 0.2 0 0 0.2 0 Figure 3.11: Solution u of (3.43) by Matlab PDE Toolbox for mesh with 2667 nodes/ 5168 triangles (left) and 10501 nodes/ 20672 triangles (right). SDP relaxation Dual Dual + Grid-refining 3b Dual Dual + Grid-refining 3b Dual Primal Transform POP to QOP no no no no yes yes n 32 3042 50 242 242 242 Nx 6 41 7 13 13 13 Ny 6 41 7 13 13 13 ǫsc -7e-13 -3e-15 -1e-10 -6e-13 -5e-11 -6e-13 tC 128 2160 713 728 748 58 Table 3.7: Results of the SDPR method for (3.43) in Case I. Nonlinear parabolic equation Consider a nonlinear parabolic PDE problem of two dependent scalar functions 1 50 uxx − uy + 1 1 50 vxx − vy + u2 v − 4u + 3u − u2 v u(0, y) = u(1, y) v(0, y) = v(1, y) u(x, 0) v(x, 0) =0 =0 =1 =3 = 1 + sin(2πx) =3 ∀x ∈ [0, 1], y ≥ 0, ∀x ∈ [0, 1], y ≥ 0, ∀ y ≥ 0, ∀ y ≥ 0, ∀ x ∈ [0, 1], ∀ x ∈ [0, 1]. (3.44) In order to bring (3.44) into form (3.14), we need to cut y at y = T . Since problem (3.44) is parabolic, the solutions ((u, v)(Nx , Ny )) of the discretized problems converge to solutions (u(·, ·), v(·, ·)) of (3.44) P with Theorem 3.4. We apply the grid-refining method with strategy 3b, where F is given by F (u, v) = i,j ui,j and ω = 3. Furthermore, lbd ≡ 0 and ubd ≡ 5 are chosen as bounds for u and v. The grid-refinig method yields a highly accurate, stable solutions on a 33 × 65-grid; see Table 3.8 and Figure 3.13. 95 3.3. NUMERICAL EXPERIMENTS 3.5 W 1.1 3 1 X 2.5 v(x,y) u(x,y) 0.9 2 0.8 1.5 0.7 1 0.5 1 1 0.8 0.8 1 0.6 1 0.6 0.8 0.6 0.4 [ 0.2 y 0.2 0 0 0.8 0.6 0.4 0.4 [ Z 0.2 0.2 0 y x 0.4 0 Z x Figure 3.12: SDPR method solutions u (left) and v (right) of (3.43) for Case II. strategy initial SDPR method Grid-refining 3b Nx 5 33 Ny 9 65 me -4.12 -2.88 ǫsc -2e-10 -5e-11 Table 3.8: Results for diffusion problem (3.44). First order PDEs An optimization based approach to attempt first order PDE was proposed by Guermond and Popov [29, 30]. In [29] the following example of a first order PDE with a discontinuous solution is solved on a 40 × 40−grid: ux (x, y) u(0, y) u(0, y) = 0 ∀ (x, y) ∈ [0, 2] × [0.2, 0.8], = 1 if y ∈ [0.5, 0.8], = 0 if y ∈ [0.2, 0.5[. (3.45) Applying the SDPR method with an forward or central difference approximation for the first derivative in (3.45) we detect the discontinuous solution ( 1 if y ≥ 0.5 u(x, y) = 0 otherwise on a 40 × 40-grid. A more difficult first order PDE problem is given by ux (x, y) + u(x, y) − 1 = 0 u(0, y) = u(1, y) =0 0 ≤ u(x, y) ≤1 ∀ (x, y) ∈ [0, 1]2 , ∀ y ∈ [0, 1], ∀ (x, y) ∈ [0, 1]2 . (3.46) As can be seen easily and was pointed out in [30], problem (3.46) is not well-posed since the outflow boundary condition is over-specified. Problem (3.46) is discussed in detail in [30] and the authors obtained 1 an accurate approximation to the exact solution by L P approximation on a 10×10-grid. Applying the SDPR method with ω = 1 and objective function F (u) = i,j ui,j on a 10 × 10 grid, we obtain a highly accurate 96 CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS 3 4 2.5 3.5 3 v(x,y) u(x,y) 2 1.5 1 2.5 2 0.5 1.5 0 10 1 8 6 0.6 8 0.6 4 1 0.8 1 10 0.8 4 0.2 0 y 0.4 6 0.4 2 0 0.2 2 x 0 y 0 x Figure 3.13: Solutions u (left) and v (right) for diffusion problem (3.44). approximation (| ǫsc |< 1e − 14) to this solution in less than 10 seconds in the case we choose a forward difference approximation for the first derivative. In the case of choosing a central or a backwards difference scheme the dual problem in the resulting SDP relaxation becomes infeasibe. Moreover, by applying the SDPR method on a 50 × 50-grid we are able to obtain a highly accurate approximation to the solution of (3.46) in less than 250 seconds, as pictured in Figure 3.14. 0.6 0.5 0.5 0.4 0.4 u(x,y) u(x,y) 0.6 0.3 0.2 0.2 0.1 0.1 0 1 0.3 0.8 0.6 0.4 0.2 0 1 0.8 0.6 0.4 0.2 0 0 1 0 0.2 0.8 0.4 0.6 0.6 0.4 0.8 0.2 0 1 y y x x Figure 3.14: Solution u by SDPR method for (3.46). 3.3.3 Reaction-diffusion equations An interesting class of PDE problems to analyze by the SDPR method is the class of reaction-diffusion equations. Many reaction-diffusion equations are systems of nonlinear PDE with multiple solutions. For some of them special unstable solutions exist that are difficult to detect by standard homotopy-like continuation methods. We demonstrate how the features of the SDPR method can be used to find special solutions of a reaction-diffusion equation, in particular interesting unstable ones. 97 3.3. NUMERICAL EXPERIMENTS A reaction-diffusion equations due to Mimura An exciting and difficult reaction-diffusion problem of two dependent functions is a problem by M. Mimura [64] which arises from the context of planktonic prey and predator models in biology. This problem is given below and we briefly call it Mimura’s problem. 1 1 ′′ − u(t)2 u(t) − u(t) v(t) = 0, 20 u (t) + 9 35 + 16u(t) = 0, 4 v ′′ (t) − 1 + 25 v(t) v(t) + u(t) v(t) u̇(0) = u̇(5) = v̇(0) = v̇(5) = 0, (3.47) 0 ≤ u(t) ≤ 14, 0 ≤ v(t) ≤ 14, ∀t ∈ [0, 5] . In [64] the problem is analyzed, and the existence of continuous solutions is shown in [82]. In order to construct a POP of the type (3.26), we consider different objective functions: PN F1 (u, v) = −u⌈ N ⌉ , F2 (u, v) = − i=1 ui , F3 (u, v) = −u2 , 2 (3.48) P F4 (u, v) = −uN −1 , F5 (u, v) = −u2 − uN −1 , F6 (u, v) = N i=1 (ui + vi ). First, we apply the SDPR method with ω = 3 and N = 5. In order to confirm the numerical results obtained for this very coarse grid, we apply PHoM [31], which is a C++ implementation of the polyhedral homotopy continuation method for computing all isolated complex solutions of a polynomial system of equations, to the system of discretized PDEs. In that case the dimension n of the polynomial system equals 6, as there are 2 unknown functions with 3 interior grid points each. PHoM finds 182 complex, 64 real and 11 nonnegative real solutions. Varying the upper and lower bounds for u2 and u4 and choosing one of the functions F1 , . . . , F5 as an objective function, all 11 solutions are detected accurately (| ǫsc |< 1e − 7) by the SDPR method, as enlisted in Table 3.9. u2 4.623 4.607 0.259 5.683 6.274 0.970 0.297 0.962 0.304 0.939 5.000 u3 6.787 6.930 6.930 2.971 0.177 7.812 7.932 7.932 8.045 6.787 5.000 u4 0.939 0.259 4.607 5.683 6.274 0.970 0.966 0.297 0.304 4.623 5.000 v2 9.748 9.737 5.166 10.388 10.638 5.735 5.230 5.729 5.234 5.659 10.000 v3 10.799 10.831 10.831 8.248 6.404 10.94 10.94 10.94 10.94 10.80 10.000 v4 5.659 5.166 9.737 10.388 10.638 5.735 5.729 5.230 5.234 9.748 10.000 objective F3 F3 F2 F3 F3 F3 F4 F3 F1 F4 F2 ubdu2 5 5 0.5 6 7 2 0.5 2 14 2 14 ubdu4 1.5 0.5 6 6 7 2 2 0.5 14 14 14 Table 3.9: SDPR method solutions of (3.47) for N = 5. The confirmation of our SparsePOP results by PHoM encourages us to solve Mimura’s problem for a higher discretization. Relaxation order ω = 3 is necessary to obtain an accurate solution in case N = 7 (Table 3.10, row 1). The upper bounds for u2 and uN −1 are chosen to be 1. When extending the grid size from 7 to 13, the accuracy of the SDPR solution deteriorates. Also, if we choose ω = 2 for the initial application of the SDPR method, or if Newton’s method is applied with another arbitrary starting point, or if we start for instance with N = 5 or N = 9, it is not possible to get an accurate solution. One possibility to overcome these difficulties is to start the grid-refining method with strategy 3b on a finer grid. We obtain a highly accurate stable solution 2teeth when we start with N = 7 and F2 as objective function, and a highly accurate stable solution 2,3peak when we start with N = 25 and F5 as objective function. See Table 3.10 and Figure 3.15. It seems reasonable to state that the SDPR method provides an appropriate initial guess for Newton’ s method, which leads to accurate solutions for sufficiently high discretizations. 98 CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS SDP relaxation Dual Grid-ref 3b Grid-ref 3b Grid-ref 3b Grid-ref 3b Grid-ref 3b Grid-ref 3b Dual Grid-ref 3b Grid-ref 3b Grid-ref 3b QOP no no no no no no no no no no no Local solver Newton Newton Newton Newton Newton Newton Newton Newton Newton Newton Newton obj F2 F5 ubdu1 14 14 14 14 14 14 14 14 14 14 14 N 7 13 25 49 97 193 385 26 51 101 401 solution 2teeth 2,3peak ǫsc -5e-9 -2e+0 -2e-6 -1e-12 -5e-12 -2e-4 -5e-5 -1e-1 -5e-2 -2e-15 -6e-16 me -2.2 2.7 -0.9 0.1 0.2 0.3 0.2 2.09 -0.18 -0.07 -0.07 Table 3.10: Results of grid-refining strategy 3b. 12 12 10 10 8 8 6 6 u v 4 4 u v 2 0 2 0 0.5 1 1.5 2 2.5 x 3 3.5 4 4.5 5 0 0 0.5 1 1.5 2 2.5 x 3 3.5 4 4.5 5 Figure 3.15: Unstable solution 2teeth (left) and stable solution 2,3peak (right). As the most powerful approach we apply the grid-refining method with strategy 3b, ω1 = 3 and ωk = 2 for k ≥ 2. We obtain the highly accurate stable solutions 3peak and 4peak, that are documented in Table 3.11 and pictured in Figure 3.16. As objective function for the POPs to be solved in each iteration we choose the function FM from (3.31). Another way to attempt finer grids directly is to transform the POP derived from 3.47 into a QOP, and to apply both primal and dual SDP relaxations to that QOP. As reported in Table 3.11, the total computation time in the case N = 51 is reduced by two magnitudes under this method. A highly accurate solution is obtained when applying SQP as local solver in the SDPR method. Finally, we yield various stable and unstable solutions to Mimura’s problem, when choosing different functions as objective F and when tightening or loosening ubdu 1 . By the SDPR method we obtain the stable solutions 3peak, 4peak, 2,3peak and the unstable solutions 2teeth, peak3unstable, peak4unstable, 2valley. Reaction-diffusion equations from collision processes Another interesting class of reaction-diffusion equations arises from collision processes of particle-like patterns in dissipative systems. Various different input-output relations such as annihilation, repulsion, fusion, and even chaotic dynamics are observed after collision of these patterns. The reaction-diffusion equations 99 3.3. NUMERICAL EXPERIMENTS SDP Relaxation Dual Grid-ref 3a Grid-ref 3a Grid-ref 3a Dual Grid-ref 3a Grid-ref 3a Grid-ref 3a Dual Grid-ref 3b Dual Grid-ref 3b Dual Grid-ref 3b Dual Primal QOP no no no no no no no no no no no no no no yes yes Local solver Newton Newton Newton Newton Newton Newton Newton Newton SQP SQP SQP SQP SQP SQP SQP SQP obj F5 FM FM FM F1 FM FM FM F1 F1 F1 F1 F2 F2 F6 F6 ubdu1 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 14 14 0.5 0.5 0.5 0.5 N 26 51 101 201 26 51 101 201 51 201 51 201 51 201 51 51 solution ǫsc -2e-1 -4e-2 -4e-4 -3e-11 -1e-3 -4e-3 -3e-16 -2e-11 -5e-13 -1e-12 -1e-10 -4e-12 -1e-2 -3e-16 -1e-13 -1e-9 4peak 3peak 3peak peak3unstable peak4unstable 2valley 2valley me 2.09 -0.05 -0.02 -0.02 -0.12 -0.08 -0.08 -0.07 -0.07 -0.07 0.70 0.70 1.62 1.43 2.10 2.10 tC 203 224 383 1082 270 348 511 1192 470 501 393 427 806 868 16 8 Table 3.11: Results of grid-refining strategy 3a. describing the collision processes have special unstable solutions, so-called scattors, which are difficult to detect and attract lots of interest [73, 74, 75, 76]. We show how scattors are detected by the SDPR method. Gray-Scott model in 1D: As an first example we consider the stationary equation of a Gray-Scott model for the dynamics of two traveling pulses in one dimension from [74]: Du uxx (x) − u(x)v(x)2 + f (1 − u(x)) Dv vxx (x) + u(x)v(x)2 − (f + k)v(x) = 0 ∀ x ∈ [0, 4] , = 0 ∀ x ∈ [0, 4] , (3.49) where Du > 0, Dv > 0, and f > 0 and k > 0 are two parameters related to inflow and removal rate of chemical species. Moreover, we impose Neumann boundary conditions. The existence and the shape of stable solutions and scattors doublePeak depends heavily on the choice for f and k. We set the parameters to Du = 5e − 5, Dv = 2.5e − 5, f = 0.0198 and k = 0.0497859. For this setting a scattor withP positive eigenvalues (λ1 , λ2 , λ3 ) = (0.0639, 0.0638, 0.0023) was reported in [74]. We choose F (u, v) = − N i=1 ui , lbdu = lbdv = 0, ubdu = 1.0, ubdv = 0.8, lbdu 1 = lbdu N = 0.4, ubdu 1 = ubdu N = 0.6 and ubdu N = 0.9. 2 Applying the SDPR method with these settings yields the scattor doublePeak as reported in Table 3.12 and pictured in Figure 3.18. SDP Relaxation Dual Grid-ref 3b Grid-ref 3b QOP yes yes yes Local solver SQP SQP SQP ω 2 N 20 640 1280 solution doublePeak doublePeak λ1 0.0441 0.0639 0.0638 λ2 0.0434 0.0638 0.0637 λ3 0.0023 0.0023 tC 32 469 1247 Table 3.12: Results of the SDPR method for (3.49). In (3.49) the shape and number of scattors depends on the choice of the two parameters F and k. It is well known, that if we fix f there is a bifurcation point k0 , to be precise, there are no scattors for (3.49) if k < k0 and several scattors if k > k0 . For a bifurcation analysis by the SDPR method we fix f = 0.0270. 100 CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS 12 12 u v 10 10 8 8 6 6 4 4 2 2 0 0 0.5 1 1.5 2 2.5 x 3 3.5 4 4.5 0 5 0 0.5 1 1.5 2 2.5 x 3 3.5 4 4.5 5 Figure 3.16: Stable solutions 4peak and 3peak. 12 12 u v u v 10 10 8 8 6 6 4 4 2 2 0 0 0.5 1 1.5 2 2.5 x 3 3.5 4 4.5 0 5 0 0.5 1 1.5 2 2.5 x 3 3.5 4 4.5 5 Figure 3.17: Unstable solutions peak4unstable and peak3unstable. Moreover, we define the norm of a numerical solution u by k u k= We define F (u, v) = − as follows: PN i=1 N X i=1 u2i ! 12 . u2i as objective function for the SDPR method. Bounds lbd and ubd are chosen lbdu i ( N 0.5, if i ∈ {1, . . . , ⌈ 10 ⌉} ∪ {⌈ 9N 10 ⌉, . . . , N } , = 0, else (3.50) ubdu = 0.8, lbdv = 0 and ubdv = 0.8. If we apply the SDPR method with ω = 2 and N = 256 to (3.49) for k ≤ 0.05281, we do not obtain any nontrivial solution. If we apply the SDPR method for the same settings to (3.49) with k increasing from 0.05282 to 0.0535 we obtain a scattor (u, v)1 with k u1 k increasing from 101 3.3. NUMERICAL EXPERIMENTS 1 0.9 0.8 u v 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.5 1 1.5 2 2.5 3 3.5 4 Figure 3.18: SDPR method detects double peak scattor for (3.49). 0.1682 to 0.1843. To obtain a second scattor requires to tighten the bounds lbdu and ubdu : ( N 0.4, if i ∈ {1, . . . , ⌈ 10 ⌉} ∪ {⌈ 9N 10 ⌉, . . . , N } , lbdui = 0, else ( N ⌉} ∪ {⌈ 9N 0.6, if i ∈ {1, . . . , ⌈ 10 10 ⌉, . . . , N } . ubdui = 0.8, else (3.51) The bounds for v remain unchanged. Applying the SDPR method with the tighter bounds (3.51) for k = 0.0535 yields a second solution (u, v)2 with k u2 k= 0.1566 6= 0.1843 =k u1 k. However, it is not possible to obtain accurate approximations for the second solution when applying the SDPR method for smaller choices of k. Therefore we apply a continuation technique based on the SDPR method to obtain the second solution for smaller k: Set k̃ = 0.0535, choose some stepsize ∆k and apply the SDPR method to (3.49) for k = k̃ − ∆k with objective function Gk̃ defined by Gk̃ (u, v) = N X i=1 ui − uk̃i 2 , where uk̃ solution of (3.49) for k = k̃. Update k̃ = k̃ − ∆k and iterate. Following this procedure we obtain (u, v)2 for k decreasing from 0.0535 to 0.05281 and k u2 k increasing from 0.1566 to 0.1655. Thus, the bifurcation point of (3.49) is k0 ≈ 0.05281 for f = 0.0270. The results of the SDPR method are illustrated in the bifurcation diagram in Figure 3.19. Three-component reaction-diffusion equation tion from [74]: Consider another one-dimensional steady state equa- Du uxx (x) + 2u(x) − u(x)3 − κ3 v(x) − κ4 w(x) + κ1 1 τ (Dv vxx (x) + u(x) − v(x)) 1 θ (Dw wxx (x) + u(x) − w(x)) −2 ≤ u(x), v(x), w(x) =0 =0 =0 ≤2 ∀ x ∈ [0, 0.5] , ∀ x ∈ [0, 0.5] , ∀ x ∈ [0, 0.5] , ∀ x ∈ [0, 0.5] , (3.52) under Neumann boundary conditions. We set the parameters to Du = 5e − 6, Dv = 5e − 5, Dw = 1e − 2, κ1 = −7, κ3 = 1, κ4 = 8.5, τ = 16.1328 and θ = 1. In [74] the existence of two scattors, twin-horn and fusion, is shown for this setting. We choose F (u, v, w) = −u⌈ N ⌉ . fusion has one positive eigenvalue and 2 102 CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS 0.185 0.18 ||u1|| ||u2|| 0.175 0.17 0.165 0.16 0.155 0.0528 0.0529 0.053 0.0531 0.0532 0.0533 0.0534 0.0535 k Figure 3.19: Bifurcation diagram by the SDPR method (left) compared to the one (right) from [100]. twin-horn three positive eigenvalues (λ1 , λ2 , λ3 ) = (0.9069, 0.1297, 0.0138). Applying the SDPR method with SQP as local solver starting from N = 40 and subsequent application of the grid-refining method with strategy 3b, we obtain a highly accurate approximation of the scattor fusion pictured in Figure 3.20. 1.5 1 0.5 0 −0.5 −1 −1.5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Figure 3.20: SDPR method detects fusion for (3.52), with u (blue), v (green) and w (red). Swift-Hohenberg equation Another type of reaction-diffusion equations arising from modeling pattern formations is given by SwiftHohenberg equations [80]. Swift-Hohenberg equations are interesting from the point of view of pattern formation because they have many qualitatively different stationary solutions. Differential equations with many solutions are of particular interest for applying the SDPR method. We examine a stationary SwiftHohenberg equation from [80]: −uxxxx(x) − 2ux (x) − (1 − α)u(x) − u(x)3 u(0) = u(L) = uxx (0) = uxx (L) = 0 ∀ x ∈ [0, L], = 0, (3.53) where α ∈ R. It is known that the shape and number of solutions depends on the choice for L. For our numerical analysis we set the parameters to L = 9 and α = 0.3. The forth order derivative is approximated 103 3.3. NUMERICAL EXPERIMENTS by uxxxx(xi ) ≈ 1 (ui+2 − 4ui+1 + 6ui − 4ui−1 + ui−2 ) . h4x We consider the following functions as objective functions: PN PN F1 (u) = − i=1 ui , F2 (u) = i=1 ui , F3 (u) = −u2 , F5 (u) = −u⌈ 2N ⌉ , F6 (u) = −u⌈ N ⌉ , F4 (u) = −u⌈ N ⌉ 3 3 (3.54) 4 We apply the SDPR method with ω = 3 for N = 40 and varying objective functions, and obtain five different solutions for (3.53) as reported in Table 3.13 and pictured in Figure 3.21. SDP relaxation Dual Dual Dual Dual Dual POP to QOP no no no no no Local Solver SQP SQP SQP SQP SQP Objective F1 F2 F3 F4 F5 ǫsc -1e-9 -3e-15 -3e-10 -3e-10 -1e-13 tC 29 32 31 69 58 Table 3.13: Results of the SDPR method with ω = 3, N = 40 and varying objective function. In a next step we investigate a more systematic approach to enumerate solutions of (3.53). Applying Algorithm 3.1 does not yield an accurate enumeration of solutions for (3.53). The numerical accuracy of the solutions deteriorates from the second iteration. This may be explained by the fact, that the relaxation of the quadratic constraints added in step 3 of Algorithm 3.1 is too weak to provide a good starting point for the initial solver. We therefore consider a variant of Algorithm 3.1. Instead of adding constraint (k−1) 2 (ui − ui to the POP, we consider 2 Pn i=1 bi ) ≥ ǫk for all i with bi = 1 POPs where we add the constraints (k−1) − ǫki (k−1) + ǫki ui ≤ ui or ui ≥ ui for every i with Pn bi = 1. It is clear, that the solution with kth smallest objective value is feasible for exactly one of the 2 i=1 bi POPs. This procedure has the advantage that the added linear constraint remains hard under the SDP relaxation, it has the disadvantage that many SDPs need to be solved in course of the algorithm. Therefore, we consider this modified enumeration algorithm for the SDPR method with small relaxation order. We apply this modified enumeration algorithm and SDPR method with N = 50, bk := b := (0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, . . ., 0), ǫk := ǫ := 0.1, POP to QOP transformation and ω = 2, F6 as objective and 6 iterations of the enumeration algorithm. In its first five iterations we obtain the same five solutions as those obtained by the SDPR method when choosing different objective functions, the output of the sixth iteration is the same as for the fifth one, although it is not a feasible solution for the POP from the sixth iteration. Our numerical results are reported in Table 3.14 and pictured in Figure 3.21. Note, applying the modified enumeration algorithm requires a longer time to obtain the five solutions to (3.53). However, enumerating the five solutions one by one with respect to the same objective function is a more systematic approach than choosing five arbitrary objective functions. 3.3.4 Differential algebraic equations The class of PDE problems (3.14) contains so called differential algebraic equations (DAE). These are differential equations, where the derivatives of several unknown functions do not occur explicitly. We 104 CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS k 0 1 2 3 4 5 SDP relaxation Dual Dual Dual Dual Dual Dual POP to QOP yes yes yes yes yes yes Local Solver SQP SQP SQP SQP SQP SQP ǫsc -4e-15 -9e-11 -5e-10 -9e-13 -1e-11 -1e-11 F6 (u(k) ) -0.43 -0.17 0.00 0.17 0.43 0.43 tC 30 153 293 400 500 593 Table 3.14: Results of the modified enumeration algorithm and SDPR method with ω = 2, N = 50 and F6 as objective. 0.6 0.6 0.4 0.4 0.2 0.2 0 0 −0.2 −0.2 F 1 −0.4 −0.4 (0) F u 2 u(1) F 3 (2) u F 4 −0.6 −0.6 (3) u F 5 (4) u −0.8 0 1 2 3 4 5 6 7 8 −0.8 9 0 1 2 x 3 4 5 6 7 8 9 x Figure 3.21: SDPR method with varying objective for N = 40 (left) and modified enumeration algorithm with SDPR method for N = 50 (right). demonstrate for the following example that the SDPR method can be applied to solve DAE as well. Consider the DAE problem u̇1 (x) 0 0 u1 (0) = u3 (x), = u2 (x) (1 − u2 (x)) , = u1 (x) u2 (x) + u3 (x) (1 − u2 (x)) − x, = u0 . ∀ x ∈ [0, T ] (3.55) It is easy to see that two closed-form solutions u1 and u2 are given by u1 (x) u2 (x) = u0 + x2 2 , T 0, x T = (x, 1, 1) x ∈ [0, T ], x ∈ [0, T ]. We choose lbd ≡ 0 and ubd ≡ 10 for each function u1 , u2 and u3 and define two objective functions F1 and F2 , Nx Nx X X u2i . F2 (u) = u1i , F1 (u) = i=1 i=1 First we choose u0 = 0 and apply the SDPR method with F2 as an objective, and we obtain an highly accurate approximation for u2 , which is documented in Table 3.15 and Figure 3.22. 105 3.3. NUMERICAL EXPERIMENTS objective F2 F2 F1 F1 F1 F1 F1 F1 F1 F1 ω 2 2 2 2 2 2 2 2 3 4 Nx 100 200 200 200 10 20 30 40 30 30 u0 0 0 1 2 0.5 0.5 0.5 0.5 0.5 0.5 ǫobj 4e-10 3e-9 8e-9 3e-10 3e-10 3e-10 8e-10 7e-8 9e-9 8e-9 ǫsc -4e-10 -3e-9 -2e-6 -4e-8 -3e-7 -9e-6 -3e-3 -1e-1 -2e-3 -6e-4 tC 29 122 98 107 4 9 15 24 51 210 Table 3.15: Results of SDPR method for (3.55) with T = 2. 2 3 1.8 2.5 u1 1.6 u2, u3 1.4 u1(x), u2(x), u3(x) u1(x), u2(x), u3(x) 2 1.2 1 0.8 1.5 1 u1 0.6 u2 0.4 u3 0.5 0.2 0 0 0.2 0.4 0.6 0.8 1 x 1.2 1.4 1.6 1.8 2 0 0 0.2 0.4 0.6 0.8 1 x 1.2 1.4 1.6 1.8 2 Figure 3.22: Solutions u2 (left) and u1 (right) of DAE problem (3.55). Next we apply the SDPR method with F1 as objective. For u0 ∈ {1, 2} highly accurate approximations of u1 are obtained. An interesting phenomenon is observed in case u0 is small. For instance, if we choose u0 = 0.5 and ω = 2, we get a highly accurate solution for Nx = 10. But, as we increase Nx stepwise to 40, the accuracy decreases, although the relaxation order remains constant. For numerical details see Table 3.15. This effect can be slightly compensated by increasing ω, as demonstrated for the case Nx = 30. But due to the limited capacity of current SDP solvers it is not possible to increase ω as much as needed to obtain a high accuracy solution. However, we obtain highly accurate approximations to both solutions of (3.55) by the SDPR method even without applying a locally fast convergent method. 3.3.5 The steady cavity flow problem One of the most challenging present PDE problems is the numerical analysis of the Navier-Stokes equations. As a first step to attempt this class of PDE problems by the SDPR method we consider the steady cavity flow problem, which contains a steady state version of the Navier-Stokes equations. The steady cavity flow problem is a simple model of a flow with closed streamlines and is used for examining and validating numerical solution techniques in fluid dynamics. Although it has been discussed in the literature of numerical analysis of fluid mechanics (see, e.g., [40], [12], [32], [14], [98]), it is still an interesting problem to a number of researchers for a range of Reynolds numbers. The setting of the steady cavity flow problem is the following. 106 CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS Let (v1 (x, y, t), v2 (x, y, t)) be the velocity of the two dimensional cavity flow of an incompressible fluid on the cavity region ABCD with the coordinates A = (0, 1), B = (0, 0), C = (1, 0), D = (1, 1). A - B D C It follows from the continuity equation of the incompressible fluid (preservation of the mass) that there exists a function ψ(x, y, t) such that ∂ψ ∂ψ = −v2 , = v1 . ∂x ∂y ∂v1 ∂v2 ∂x + ∂y =0 (3.56) ~ = rot v is called the vorticity. Since the last coordinate of v is 0, φ ~ can be written as Put v = (v1 , v2 , 0). φ (0, 0, φ(x, y, t)). The continuity equation and the Navier-Stokes equation (preservation of the momentum) can be written as follows in terms of ψ and φ. ∆ψ = −φ ∂ψ ∂φ ∂ψ ∂φ 1 ∂φ = − + ∆φ, ∂t ∂y ∂x ∂x ∂y R (3.57) (3.58) where the parameter R is called the Reynolds number. The steady cavity flow problem is (3.57) and (3.58) with the steady condition condition v1 (0, y) = v1 (x, 0) = v1 (1, y) = 0 v2 (0, y) = v2 (x, 0) = v2 (1, y) = 0 v1 (x, 1) = s, v2 (x, 1) = 0 on AB, BC, CD on AB, BC, CD on AD ∂φ ∂t = 0 and the boundary (3.59) (3.60) (3.61) Here s is the velocity of the stream out of the cavity ABCD. We set the boundary velocity at AD to s := 1. Due to its boundary conditions PDE problem (3.56), (3.57), (3.58), (3.59) - (3.61) is not of form (3.14). Therefore, we need to follow a specialized strategy to discretize this problem via a finite different scheme. We devide the square ABCD into a N × N mesh and define h := N1−1 . We translate the boundary conditions for v1 and v2 into boundary conditions for ψ and φ. It follows from (3.59), (3.60), (3.61) that the function ψ is constant on the boundaries AB, BC, CD, DA. Since ψ is continuous, we suppose that ψ = 0 on the boundaries. The boundary condition for φ is a little complicated. We derive it from the discussion in [98], p 162: Let us consider the case of the boundary AD first. We take a mesh point M on AD. Let P be the mesh point inside the cavity adjacent to M and P ′ the mirror image of P with respect to AD. We supposed that the size of the mesh is h. P′ M A h P D 107 3.3. NUMERICAL EXPERIMENTS We denote the value of ψ at the point P by ψ(P ) or ψP . Moreover, we have −φ(M ) = ∆ψ(M ) = ψyy ≈ . We need to determine the value of ψP ′ to get an approximate value of φ at M . By using the ψP ′ −ψP holds. Then, ψP ′ ≈ 2h + ψP . Therefore, central difference approximation, s = 1 = v2 = ∂ψ ∂y (M ) ≈ 2h we have 2ψP + 2h φM ≈ − . (3.62) h2 Analogously, we obtain 2ψP φM ≈ − 2 (3.63) h when M is a grid point on AB or BC or CD and P is the adjacent internal grid point of M . From this discussion follows, we obtain the following finite difference scheme for the steady cavity flow problem. ψP −2ψM +ψP ′ h2 1 gi,j (ψ, φ) = 0 2 gi,j (ψ, φ) ∀ 2 ≤ i, j ≤ N − 1, ∀ 2 ≤ i, j ≤ N − 1, =0 ψ1,j = ψN,j = 0 ψi,1 = ψi,N = 0 ψ φ1,j = −2 h2,j 2 ψ φN,j = −2 Nh−1,j 2 ψ φi,1 = −2 hi,2 2 ψ +h φi,N = −2 i,Nh−1 2 where 1 gi,j (ψ, φ) := ∀ j ∈ {1, . . . , N } , ∀ i ∈ {1, . . . , N } , ∀ j ∈ {1, . . . , N } , ∀ j ∈ {1, . . . , N } , ∀ i ∈ {1, . . . , N } , ∀ i ∈ {1, . . . , N } , (3.64) (3.65) (3.66) −4φi,j + φi+1,j + φi−1,j + φi,j+1 + φi,j−1 +R 4 (ψi+1,j − ψi−1,j )(φi,j+1 − φi,j−1 ) −R 4 (ψi,j+1 − ψi,j−1 )(φi+1,j − φi−1,j ), −4ψi,j + ψi+1,j + ψi−1,j + ψi,j+1 +ψi,j−1 + h2 φi,j . 2 gi,j (ψ, φ) := We call the polynomial system (3.64), (3.65), (3.66) the discrete steady cavity flow problem denoted as DSCF(R, N ). It depends on two parameters, the Reynolds number R and the discretization N of the cavity region ABCD = Ω = [0, 1]2 . Its dimension is given by n = 2(N − 2)2 . Remark 3.2 We conjecture that the discrete cavity flow problem DSCF(R, N ) has finite complex solutions. In other words, it defines a zero-dimensional ideal. We checked this conjecture up to N = 5 by Gröbner basis computation. DSCF(R, N ) is a sparse, polynomial system which we apply the SDPR method and Algorithm 3.1 to. The choice for F in 3.26 is motivated by the fact, that one is interested in the solution to the PDE problem which minimizes the kinetic energy given by Z Z ABCD ∂ψ ∂y 2 + −∂ψ ∂x 2 dxdy. (3.67) Thus, by discretizing (3.67) we yield the following function as a canonical choice for F : F (ψ, ω) = 1 4 X 2 2 2 2 ψi+1,j + ψi−1,j + ψi,j+1 + ψi,j−1 2≤i,j≤N −1 (3.68) −2ψi+1,j ψi−1,j − 2ψi,j+1 ψi,j−1 . We denote the optimization problem to minimize F subjected to DSCF(R, N ) as he steady cavity flow optimization problem CF(R, N ). It is characterized by a simple proposition. Proposition 3.4 a) CF(0, N ) is a convex quadratic program for any N . 108 CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS b) CF(R, N ) is non-convex for any N , if R 6= 0. Proof: a) In case 0 all constraints are linear. Furthermore, the objective function can be written as P R = 1 2 + Fi,j , where F = i,j Fi,j 1 Fi,j (ψ, ω) = T ψi−1,j ψi+1,j 2 −2 −2 2 ψi−1,j ψi+1,j . 2 −2 It follows that is convex as positive semidefinite with eigenvalues 0 and 4. The −2 2 2 convexity of Fi,j follows analoguously. Thus, F can be written as a sum of convex function and is therefore convex as well. The proposition follows. 1 Fi,j 1 b) In case R 6= 0, the equality constraint function gi,j is indefinite quadratic. Thus, CF(R, N ) is a non-convex quadratic program. Solving CF(R, N ) by the SDPR method First, we apply the SDPR method with ω = 1 and Newton’s method as local solver to CF(100, N ). Highly accurate solutions with ǫsc < 1e − 10 are obtained for N ∈ {10, 15, 20}. By applying the grid-refining method we succeed in extending the solutions to grids of size 30 × 30 and 40 × 40, as pictured for N = 40 in Figure 3.23 and reported in Table 3.16. Thus, it seems reasonable to conclude, that the minimal energy solution of CF(100, N ) converges to a continuous solution of the steady cavity flow problem for N → ∞. The discrete steady cavity flow problem has multiple solutions. It is an advantage of the SDPR method to show that the minimal kinetic energy solution u⋆ (N ) of CF(R, N ) converges to an analytic solution for N → ∞. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Figure 3.23: (v1 , v2 ) for solution u of CF(100, 40). N 10 15 20 30 40 ω 1 1 1 1 1 ǫsc 4e-11 6e-16 6e-16 4e-11 4e-11 tC 14 255 948 1759 4156 F (u⋆ ) 0.0169 0.0313 0.0409 0.0503 0.0554 Table 3.16: Results for CF (100, N ) for increasing N . Second, we apply the SDPR method to CF(R, N ) for a much larger R. For example we examine CF (10000, N ) for N ∈ {8, . . . , 18}. For all tested discretizations we were able to find accurate solutions by the SDPR method with ω = 1 and SQP as local solver, c.f. Table 3.17 and Figure 3.24. 109 3.3. NUMERICAL EXPERIMENTS N 8 10 12 14 16 18 ω 1 1 1 1 1 1 ǫsc 2e-7 3e-10 1e-7 5e-9 4e-12 2e-8 F (u(k) ) 1.5e-6 3.2e-6 6.0e-6 1.1e-5 1.9e-5 3.9e-5 tC 7 21 49 99 199 501 Table 3.17: Results for CF (10000, N ) for increasing N . 1 1 1 0.9 0.9 0.9 0.8 0.8 0.8 0.7 0.7 0.7 0.6 0.6 0.6 0.5 0.5 0.5 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 1 1 1 0.9 0.9 0.9 0.8 0.8 0.8 0.7 0.7 0.7 0.6 0.6 0.6 0.5 0.5 0.5 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 Figure 3.24: Solutions of CF (10000, N ) of the SDPR method for N = 8 (left, top), N = 10 (center, top), N = 12 (right, top), N = 14 (left, bottom), N = 16 (center, bottom) and N = 18 (right, bottom). If we compare the pictures in Figure 3.24, it seems the SDPR(1) solution of CF (10000, 1, N ) evolves into some stream-like solution for increasing N . However, unlike the solutions of CF (100, 1, N ), we have not been able to expand this solution to a grid of higher resolution by the grid-refinement method. Therefore, it is possible the solution pictured in Figure 3.24 is a fake solution, which confirms that the Steady Cavity Flow problem becomes a hard problem for increasing Reynolds number. Enumerating the solutions of DSCF(R, N ) A further interesting question is to find all solutions of the cavity flow problem, in particular for large Reynolds number. Therefore, we examine the efficiency of Algorithm 3.1 for enumerating the solutions of DSCF(R, N ) with respect to their discretized kinetic energy. For the parameter bk ∈ {0, 1}n to be chosen in each iteration of Algorithm 3.1 we restrict ourselves to the case where bk is given by ( 1, if i ∈ {1, . . . , bk1 } ∪ { n2 + 1, . . . , n2 + bk2 }, k bi = 0, else. 110 CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS 1 1 1 0.9 0.9 0.9 0.8 0.8 0.8 0.7 0.7 0.7 0.6 0.6 0.6 0.5 0.5 0.5 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Figure 3.25: (v1 , v2 ) for solutions u(0) (left), u(1) (center) and u(2) (right) of CF(4000, 5) on the interior of [0, 1]2 . Thus bk is defined by the two parameters bk1 , bk2 ∈ {1, . . . , n2 }. The parameters ǫk1 and ǫk2 are corresponding to the constraints imposed by bk1 and bk2 , respectively. CF(4000,5): In a first setting we choose the discretization N = 5, i.e. the dimension is n = 2 · 32 = 18. This dimension is small enough to apply the Gröbner basis method to determine all complex solutions of DSCF(R, N ). Therefore, we are able to verify whether the solutions provided by Algorithm 3.1 are optimal. The computational results are given in Table 3.18. Comparing the solutions of the SDPR method to all k 0 1 2 ω 1 1 1 ǫk1 1e-3 1e-3 bk1 3 3 bk2 0 0 tC 2 5 8 ǫsc 2e-10 5e-4 5e-4 F (u(k) ) 4.6e-4 6.3e-4 1.0e-3 solution u(0) u(1) u(2) Table 3.18: Results of Algorithm 3.1 for CF(4000, 5). solutions of the polynomial system obtained by polyhedral homotopy method or Gröbner basis method, it turns out that the solutions u(0) , u(1) and u(2) indeed coincide with the three smallest energy solutions u(0)⋆ , u(1)⋆ and u(2)⋆ . The velocities (v1 , v2 ) derived from these three solutions via (3.56) are displayed in Figure 3.25. Note, that the third smallest energy solution u(2) shows a vortex in counter-clockwise direction, which may indicate that this solution is a fake solution. CF(20000,7): We apply Algorithm 3.1 with ω = 1 to CF(20000, 7) and obtain the results in Table 3.19. The two parameter settings (ǫ11 , b11 ) = (1e − 3, 1) and (ǫ11 , b11 ) = (1e − 6, 5) are not sufficient to obtain an other solution than u(0) , whereas (ǫ11 , b11 ) = (1e − 5, 5) yields u(1) , a solution of larger energy. After another iteration with (ǫ21 , b21 ) = (1e − 5, 5) we obtain a third solution u(3) of even larger energy. k 0 1 1 1 2 ω 1 1 1 1 1 ǫk1 1e-3 1e-6 1e-5 1e-5 bk1 1 5 5 5 bk2 0 0 0 0 tC 2 5 5 9 14 ǫsc 3e-7 5e-4 6e-6 5e-6 5e-6 F (u(k) ) 3.4e-4 3.4e-4 3.4e-4 5.9e-4 5.2e-3 solution u(0) u(0) u(0) u(1) u(2) Table 3.19: Results of Algorithm 3.1 for CF(20000, 7). 111 3.3. NUMERICAL EXPERIMENTS 1 1 1 0.9 0.9 0.9 0.8 0.8 0.8 0.7 0.7 0.7 0.6 0.6 0.6 0.5 0.5 0.5 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Figure 3.26: (v1 , v2 ) for solutions u(0) (left), u(1) (center) and u(2) (right) of CF(20000, 7) on [0, 1]2 . It is interesting to observe in Figure 3.26 that u(1) and u(2) are one-vortex solutions, whereas there seems to be no vortex in the smallest energy solution u(0) . CF(40000,7): Next, we examine CF (40000, 7), which is a good example to demonstrate that solving DSCF(R, N ) and CF(R, N ) are becoming more difficult for larger Reynolds numbers. As for the previous problem, the dimension of the POP is n = 50, which is too large to be solved by Gröbner basis. Our computational results are reported in Table 3.20. k 0 1 2 3 0 ω 1 1 1 1 2 ǫk1 5e-6 5e-6 8e-6 - bk1 5 5 5 - bk2 0 0 0 - tC 3 7 11 16 5872 ǫsc 2e-7 6e-9 3e-6 5e-6 8e-10 F (u(k) ) 3.4e-4 7.3e-4 5.9e-4 2.3e-4 2.6e-4 solution u(0) (1) u(1) (1) u(2) (1) u(3) (1) u(0) (2) Table 3.20: Results of Algorithm 3.1 for CF (40000, 7). Solution u(2) (1) is of smaller energy than u(1) (1), and u(3) (1) is even of smaller energy than u(0) (1). Thus, unlike the solutions for CF(20000,7) reported in Table 3.19, the solutions of CF(40000,7) are not enumerated in the correct order. This phenomenon can be explained by the fact, that the SDP relaxation with ω = 1 is not tight enough to yield a solution that converges to u⋆ under the local optimization procedure. The energy of u(0) (2) obtained by SDPR(2) is smaller than the one of u(0) (1), but it is not the global minimizer as well. In fact, Algorithm 3.1 with ω = 1 generates a better solution u(3) (1) (with smaller energy) in 3 iterations requiring 16 seconds computation time, compared to solution u(0) (2) obtained by applying the SDPR method with ω = 2 to CF (40000, 7) requiring 5872 seconds. Thus, despite failing to enumerate the smallest energy solutions in the right order with ω = 1, applying the enumeration algorithm with relaxation order ω = 1 is far more efficient than the original sparse SDP relaxation (2.18) with ω = 2 for approximating the global minimizer of POP (3.26). It is a future problem to make this construction systematic. Alternative finite difference scheme ∂φ ∂ψ ∂φ To derive DSCF(R, N ) we discretize the Jacobian ∂ψ ∂y ∂x − ∂x ∂y by the standard central difference scheme. Arakawa [3] showed that the standard central difference scheme does not keep important physical invariants. Therefore, Arakawa proposed an alternative finite difference discretization for the Jacobian, that is shown to preserve those invariants. We use this alternative scheme to derive an alternative discrete steady cavity flow problem ADSCF(R, N ) and solve it via the SDPR method. In ADSCF(R, N ), the finite difference 112 CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS approximation for ∂ψ ∂φ ∂y ∂x − ∂ψ ∂φ ∂x ∂y is replaced by ∂ψ ∂φ ∂ψ ∂φ ∂y ∂x − ∂x ∂y (xi , yj ) ≈ 1 − 12h 2 [(φi,j−1 + φi+1,j−1 − φi,j+1 − φi+1,j+1 ) (ψi+1,j + ψi,j ) − (φi−1,j−1 + φi,j−1 − φi−1,j+1 − φi,j+1 ) (ψi,j + ψi−1,j ) + (φi+1,j + φi+1,j+1 − φi−1,j − φi−1,j+1 ) (ψi,j+1 + ψi,j ) − (φi+1,j−1 + φi+1,j − φi−1,j−1 − φi−1,j ) (ψi,j + ψi,j−1 ) + (φi+1,j − φi,j+1 ) (ψi+1,j+1 + ψi,j ) − (φi,j−1 − φi−1,j ) (ψi,j + ψi−1,j−1 ) + (φi,j+1 − φi−1,j ) (ψi−1,j+1 + ψi,j ) − (φi+1,j − φi,j−1 ) (ψi,j + ψi+1,j−1 )]. (3.69) Note, ADSCF(R, N ) is less sparse than DSCF(R, N ) and it is more difficult to derive accurate solutions by the SDPR method with relaxation order ω = 1. However, we succeed in solving ADSCF(R, N ) in some instances. For example, in Table 3.21 and Figure 3.27 we compare the minimum kinetic energy solutions obtained for DSCF(5000, N ) and ADSCF(5000, N ). It is interesting that the vortex in the minimum kinetic energy solution for ADSCF(5000, N ) is preserved for increasing N , whereas the vortex in solution for DSCF(5000, N ) seems to deteriorate. Problem ADSCF(5000,14) ADSCF(5000,16) DSCF(5000,14) DSCF(5000,16) ǫsc 7e-12 5e-10 1e-11 3e-10 tC 1304 2802 419 768 F (u⋆ ) 1.8e-4 3.1e-4 5.6e-4 1.1e-4 Table 3.21: Results for solving ADSCF(5000,N) compared to DSCF(5000,N). Solutions of CF(R, N ) for increasing R In order to understand why convergence of discrete approximations to the analytic solution is a lot more difficult to obtain for large R, we examine the behavior of the minimal energy solution of DSCF(R, N ) and CF(R, N ), respectively, for increasing Reynolds number R. The SDPR method is one possible approach to solve DSCF(R, N ). If ω is chosen sufficiently large, the output u of the SDPR method is guaranteed to accurately approximate the minimal energy solution u⋆ of CF(R′ , N ) and DSCF(R′ , N ), respectively. In order to show the advantage of the SDPR method we compare our results to solutions of DSCF(R′ , N ) obtained by the following standard procedure: Method 3.2 Naive homotopy-like continuation method 1. Choose the parameters R′ , N and a step size ∆R. 2. Solve DSCF(0, N ), i.e. a linear system, and obtain its unique solution u0 . 3. Increase Rk−1 by ∆R: Rk = Rk−1 + ∆R 4. Apply Newton’s method to DSCF(Rk , N ) starting from uk−1 . Obtain solution uk as an approximation to a solution of the discrete cavity flow problem. 5. Iterate 3. and 4. until the desired Reynold’s number R′ is reached. Note, the continuation method does not necessarily yield the minimal kinetic energy solution of DSCF(R, N ). Let u⋆ (R, N ) denote the global minimizer of CF(R, N ), the minimal energy is given by Emin (R, N ) = F (u⋆ (R, N )). Obviously, Emin (0, N ) = F (u0 (N )) holds. In a next step, the solution of DSCF(R, N ) obtained by the continuation method starting from u0 is denoted as ũ(R), and its energy as EC (R, N ) := F (ũ(R, N )). As illustrated for N = 5 in Figure 3.28, it is possible to find a continuation ũ of u0 for all R. 113 3.3. NUMERICAL EXPERIMENTS 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0 0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 1 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Figure 3.27: Solutions for ADSCF(5000,14) (top left), ADSCF(5000,16) (top right), DCSF(5000,14) (bottom left) and DCSF(5000,16) (bottom right). For N = 5 the dimension of DSCF(R, N ) is n = 18. This dimension is small enough to solve a polynomial system by Gröbner basis method and to determine all complex solutions of the system. Therefore, we can verify whether SDPR method detects the global minimizer of CF(R, N ) or not. It is worth pointing out, that we are able to find the minimal energy solution of CF(R, N ) by applying the SDP relaxation method, whereas this solution cannot be obtained by the continuation method. We observe applying the SDPR method with ω = 1 is sufficient to detect the global optimizer for R ≤ 10000, and for R ≥ 20000 the global optimizer is obtained by the SDPR method with ω = 2. In the case of N = 6 and N = 7 the dimension of the polynomial system is too large to be solved by Gröbner basis method for R > 0. For N = 6 the continuation method, and the SDPR method with ω = 1 and ω = 2 yield the same solution for all tested R. And in the case of N = 7 the continuation solution ũ(R) is detected by the SDPR method with ω = 1 as well, except the case R = 6000, where a solution with slightly smaller energy is detected, as documented in Table 3.23. Summarizing these results, F (u0 (N )) ≥ F (ũ(R, N )) for any of the tested R > 0. It is an advantage of the SDPR method to show, ũ(R, N ) is in general not the optimizer of CF (R, N ) for increasing R. In fact, for some settings we obtain far better approximations to the minimal energy solution than ũ(R, N ). Furthermore, Emin (R) and EC (R) are both decreasing in R. The behavior of EC , ESDPR and Emin coincides for all chosen discretizations N and motivates the following conjecture. Conjecture 3.1 Let discretization N be fixed. a) F (u0 (N )) = Emin (0, N ) ≥ Emin (R, N ) ≥ 0 ∀R ≥ 0. 114 CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS R 0 100 500 1000 2000 4000 6000 8000 10000 30000 100000 NC 1 37 37 37 37 37 36 36 35 35 34 NR 1 13 13 13 13 17 16 16 17 17 16 EC 0.0096 0.0030 6.2e-4 5.4e-4 6.2e-4 6.3e-4 5.7e-4 5.2e-4 4.7e-4 4.5e-4 4.5e-4 ESDP R(1) 0.0096 0.0030 6.2e-4 5e-4 6.2e-4 4.6e-4 4.5e-4 4.5e-4 4.5e-4 4.5e-4 4.5e-4 ESDP R(2) 0.0096 0.0030 6.2e-4 5e-4 6.2e-4 4.6e-4 4.5e-4 4.5e-4 4.5e-4 2.5e-4 8.8e-5 Table 3.22: Numerical results for CF (R, 5), where ESDP R(ω) the discretized kinetic energy of the solution of SDPR method with relaxation order ω. 0.01 0.009 0.008 EC(R) ESDPR(1)(R) 0.007 ESDPR(2)(R) / Emin(R) 0.006 0.005 0.004 0.003 0.002 0.001 0 0 1 2 3 4 5 R 6 7 8 9 10 4 x 10 Figure 3.28: EC (R), ESDPR(1) (R), ESDPR(2) (R) and Emin (R) for N = 5. b) Emin (R, N ) → 0 for R → ∞. As an application, Conjecture 3.1 can be used as a certificate for the non-optimality of a feasible solution u′ of CF(R, N ) in the case F (u′ (R, N )) > Emin (0, N ). If it is possible to extend u0 to R via continuation method, ũ(R, N ) can serve as a non-optimality certificate in the case F (u′ (R, N )) > F (ũ(R, N )). 3.3.6 Optimal control problems A class of challenging problems involving differential equations that goes beyond the class of PDE problems (3.14) is optimal control, in particular nonlinear optimal control. To solve nonlinear optimal control problems (OCP) analytically is a challenging problem, even though powerful techniques such as the maximum principle and Hamilton-Jacobi-Bellman optimality equations exist. For numerical methods to solve OCP, we distinguish direct and indirect methods [18, 25, 94, 101]. But, in particular for OCPs with state constraints many numerical methods are difficult to use. A recent approach by Lasserre et al. [54] takes advantage of semidefinite programming (SDP) relaxations to generate a convergent sequence of lower bounds for the optimal value of an OCP. We demonstrate on the following examples that the SDPR method can be applied as well to solve OCPs numerically. An OCP can be discretized via finite difference approximations to obtain a POP satisfying a structured sparsity pattern. The POP we derive from an OCP is essentially of the form (3.26); the main difference being that we do not choose the objective function F , but that F is given as the discretization of the objective of the OCP. By applying the SDPR method to an OCP we (a) obtain 115 3.3. NUMERICAL EXPERIMENTS R EC ESDPR(1) 0 2.0e-2 2.0e-2 100 7.7e-3 7.7e-3 4000 4.1e-4 4.1e-4 6000 3.7e-4 3.6e-4 10000 3.4e-4 3.4e-4 Table 3.23: Numerical results for CF (R, 7), where ESDP R(ω) the discretized kinetic energy of the solution of SDPR method with relaxation order ω. a lower bound for its optimal value, and (b) unlike the approach in [54] we obtain approximations for the optimal value, the optimal control and trajectory. As in the PDE case, it is a feature of the SDPR method that state and/or control contraints can be incorporated by defining additional polynomial equality and inequality constraints. Control of production and consumption The following problem arises from the context of control of production and consumption of a factory. Let x(t) be the amount of output produced at time t ≥ 0, α(t) the control variable which denotes the fraction of output reinvested at time t ≥ 0, with 0 ≤ α(t) ≤ 1. The dynamics of the system are provided by the ODE problem ẋ(t) = kα(t)x(t) ∀ t ∈ [0, T ], x(0) = x0 , where k > 0 a constant modeling the growth rate of a reinvestment. It is the aim to maximize the functional Z T (1 − α(t)) x(t)dt, P (α(·), x(·)) = 0 i.e., the total consumption of the output, our consumption at a given time t being (1 − α(t))x(t). Thus, the control problem can be written as RT (1 − α(t)) x(t)dt max 0 s.t. ẋ(t) = k α(t)x(t) ∀ t ∈ [0, T ], (3.70) x(0) = x0 , 0 ≤ α(t) ≤1 ∀ t ∈ [0, T ]. The constraining ODE problem can be discretized by a finite difference scheme in the same way as (3.14). In contrast to the previous examples, we are not free to choose the objective function of (3.26). It is given by the objective function P (α(·), x(·)) of the optimal control problem (3.70). We obtain the POP’s objective function F by discretizing P (α(·), x(·)) as F (α, x) = N X i=1 (1 − αi )xi . It is easy to show with the Pontryagin Maximum Principle, see for example [60], the optimal control law α⋆ (·) is given by ( 1 if 0 ≤ t ≤ t⋆ ⋆ α (t) = 0 if t⋆ < t ≤ T for an appropriate switching time t⋆ , 0 ≤ t⋆ ≤ T . In the case k = 1 the switching time is given by t⋆ = T − 1. We apply the SDPR method to (3.70) with objective function F and ω = 2, and can confirm numerically t⋆ = T − 1 holds for k = 1. Our results are reported in Table 3.24 and illustrated in Figure 3.29. Thus, the solution of the control problem and in particular the optimal control law α⋆ are approximated accurately. Moreover, we observe that the switching time fits the predicted value. 116 CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS T 3 3 4 4 Nt 200 300 200 300 ǫsc -3.9e-6 -1.3e-4 -4.5e-4 -2.7e-4 ǫobj 4.4e-9 3.9e-9 1.9e-8 3.9e-8 tC 102 354 126 367 t⋆ 2.00 2.00 3.00 3.00 Table 3.24: Results of the SDPR method for (3.70) with k = 1, x0 = 0.25. 2 6 1.8 5 1.6 x(t) alpha*(t) 1.4 4 x(t) alpha(t) 1.2 1 3 0.8 2 0.6 0.4 1 0.2 0 0 0.5 1 1.5 t 2 2.5 3 0 0 0.5 1 1.5 2 t 2.5 3 3.5 4 Figure 3.29: SDPR method solutions for (3.70), x(t) (blue) and α(t) (green) for Nt = 300, T = 3 (left) and T = 4 (right), respectively. Control of reproductive strategies of social insects As another example, consider a problem arising from reproductive strategies of social insects. max P (w(·), q(·), α(·)) s.t. ẇ(t) w(0) q̇(t) q(0) 0 ≤ α(t) = q(T ) = −µw(t) + b s(t)α(t) w(t) ∀ t ∈ [0, T ], = w0 , = −νq(t) + c(1 − α(t))s(t)w(t) ∀ t ∈ [0, T ], = q0 , ≤ 1 ∀ t ∈ [0, T ], (3.71) where w(t) the number of workers at time t, q(t) the number of queens, α(t) the control variable, which denotes the fraction of the colony effort devoted to increasing work force, µ the workers death rate, ν the queens death rate, s(t) a known rate at which each worker contributes to the bee-economy, b and c constants. It follows from Pontryagin Maximum Principle [60], the optimal control law α of problem (3.71) is a bang-bang control law for any rate s(t), i.e., α(t) ∈ {0, 1} for all t ∈ [0, 1]. For the SDPR method we choose as objective the function F given by F (w, q, α) = qNt , 117 3.3. NUMERICAL EXPERIMENTS which is a discretization of the objective function P (w(·), q(·), α(·)). Table 3.25 shows the numerical results, which are illustrated in Figure 3.30. s(t) 1 1 2 (sint + 1) Nt 300 300 ǫobj 2e-7 1e-4 ǫsc -2e-6 -4e-5 Table 3.25: Results of the SDPR method for (3.71) with T = 3, µ = 0.8, b = 1, w0 = 10, ν = 0.3, c = 1, q 0 = 1 and ω = 2. s(t)=1 s(t) = 0.5 (sin(t) +1) 14 10 w(t) 9 w(t) q(t) alpha(t) 12 q(t) 8 10 alpha(t) 7 6 8 5 6 4 3 4 2 2 1 0 0 0.5 1 1.5 t 2 2.5 3 0 0 0.5 1 1.5 t 2 2.5 3 Figure 3.30: SDPR method solutions for (3.71) for s(t) = 1 (left) and s(t) = 0.5(sin(t) + 1) (right). In the case of s(t) = 1, it is sufficient to choose w(t), q(t) ≤ 20 as upper bounds to get accurate results. For the more difficult problem with s(t) = 0.5(sin(t) + 1) it is necessary to tighten the upper bounds to w(t) ≤ 10 and q(t) ≤ 3, in order to obtain fairly accurate results. In both cases, the bang-bang control law is approximated with high precision. The double integrator Consider the optimal control problem given by min T s.t. ẋ1 (t) = x2 (t) ∀t ∈ [0, T ], ẋ2 (t) = u(t) ∀t ∈ [0, T ], x1 (0) = x1,0 , x1 (T ) = 0, x2 (0) = x2,0 , x2 (T ) = 0, x2 (t) ≥ −1, u(t) ∈ [−1, 1]. (3.72) Note, we can not apply the SDPR method directly to (3.72), since the length T of the domain is not specified. Furthermore as a system of first order PDE with terminal condition is overspecified, we replace 118 CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS the constraints x1 (T ) = x2 (T ) = 0 by | x1 (T ) | + | x2 (T ) |≤ ǫ for a small ǫ > 0. We apply a standard coordinate transformation to (3.72) and obtain the equivalent problem min T s.t. ẋ1 (t) = T x2 (t) ∀t ∈ [0, 1], ẋ2 (t) = T u(t) ∀t ∈ [0, 1], x1 (0) = x1,0 , x2 (0) = x2,0 , | x1 (1) | + | x2 (1) |≤ ǫ, x2 (t) ≥ −1, u(t) ∈ [−1, 1]. (3.73) Optimal control problem (3.73) is of a form we can apply the SDPR method to. Lower bounds lbdu = lbdx = −1 and upper bound ubdu = 1 are given, and we choose ubdx = 10. Given some starting point x0 ∈ R2 it is the aim to find the minimal time T ⋆ to steer x(t) into the origin. For this simple problem it is possible to determine the minimal time T ⋆ (x0 ) analytically, c.f. [54]. Thus, for each choice ω (x0 )) and evaluate the performance of our approach. We apply x0 we can calculate the ratio min(sSDP T ⋆ (x0 ) the SDPR method with ω = 3 for discretization N = 50. We choose the same set of x0 as in [54], x0,1 ∈ {0, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0} and x0,2 ∈ {−1.0, −0.8, −0.6, −0.4, −0.2, 0.0, 0.2, 0.4, 0.6, 0.8, 1.0}. 3 (x0 )) for the 11 × 11 different values of x0 . Some entries are In Table 3.26 we report the fraction min(sSDP T ⋆ (x0 ) larger than 1, which can be explained by the fact that there is a discretization error for the medium scale discretization N = 50. Compared to the corresponding table in [54], we achieve better lower bounds for T ⋆ in most cases. Moreover, the SDPR method approximates optimal control and trajectory in addition to generating lower bounds for the optimal value. See Figure 3.31 for approximations of u⋆ and x⋆ before and after applying sequential quadratic programming with the SDP solution as initial guess. We observe, that the approximation (ũ, x̃) provided by the sparse SDP relaxation is already close to the highly accurate approximation (u, x) to the optimal solution of the discretized OCP obtained by additionally applying SQP. x0,1 \x0,2 0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 -1 0.8783 0.8158 0.7228 1.0139 1.0079 1.0141 1.0162 1.0185 1.0189 1.0196 1.0234 -0.8 0.6162 0.4756 0.9440 0.9975 1.0071 1.0124 1.0131 1.0156 1.0195 1.0182 1.0205 -0.6 0.5543 0.9060 0.9237 0.9971 1.0090 1.0119 1.0109 1.0135 1.0148 1.0187 1.0177 -0.4 0.5665 0.8756 0.9258 0.9886 0.9972 1.0050 1.0064 1.0114 1.0136 1.0150 1.0168 -0.2 0.8472 0.8281 0.9023 0.9991 0.9983 1.0009 1.0044 1.0086 1.0114 1.0133 1.0140 0 1.0000 0.7869 0.8708 0.9876 1.0035 0.9926 1.0018 1.0067 1.0069 1.0082 1.0109 0.2 0.8420 0.7362 0.8539 0.9382 1.0005 1.0086 1.0086 1.0042 1.0070 1.0085 1.0095 0.4 0.5447 0.7420 0.8495 0.9588 0.9962 1.0016 1.0076 1.0017 1.0062 1.0076 1.0082 Table 3.26: min(sSDP3 (x0 ))/T ⋆ (x0 ) for different choices of x0 . 0.6 0.8191 0.9068 0.9423 0.9507 0.9772 1.0020 1.0018 1.0065 1.0021 1.0065 1.0045 0.8 0.8858 0.9339 0.9692 0.9875 1.0025 1.0026 1.0043 0.9991 1.0009 1.0027 1.0028 1.0 0.9128 0.9537 0.9811 0.9848 0.9901 0.9961 0.9974 0.9967 0.9997 1.0010 1.0024 119 3.3. NUMERICAL EXPERIMENTS 1 1 0.8 0.8 x1 0.6 x1 0.6 x2 0.4 x2 0.4 u 0.2 0.2 0 0 −0.2 −0.2 −0.4 −0.4 −0.6 −0.6 −0.8 −0.8 −1 0 5 10 15 20 25 30 35 40 45 50 −1 u 0 5 10 15 20 25 30 35 40 45 50 Figure 3.31: Optimal control and trajectories for x0 = (0.8, −1) for Double Integrator OCP before (left) and after (right) applying SQP. The Brockett Integrator Consider the nonlinear optimal control problem given by min T s.t. ẋ1 (t) = u1 (t) ẋ2 (t) = u2 (t) ẋ3 (t) = u1 (t)x2 (t) − u2 (t)x1 (t) x1 (0) = x1,0 , x2 (0) = x2,0 , x3 (0) = x3,0 , x1 (T ) = 0, x2 (T ) = 0, x3 (T ) = 0, u1 (t)2 + u2 (t)2 ≤ 1. ∀t ∈ [0, T ], ∀t ∈ [0, T ], ∀t ∈ [0, T ], (3.74) Applying the same transformation as for (3.72), we bring (3.74) into a form we can apply the SDPR method to: min T s.t. ẋ1 (t) = T u1 (t) ∀t ∈ [0, 1], ẋ2 (t) = T u2 (t) ∀t ∈ [0, 1], ẋ3 (t) = T u1 (t)x2 (t) − T u2 (t)x1 (t) ∀t ∈ [0, 1], x1 (0) = x1,0 , (3.75) x2 (0) = x2,0 , x3 (0) = x3,0 , | x1 (1) | + | x2 (1) | + | x3 (1) |≤ ǫ, u1 (t)2 + u2 (t)2 ≤ 1, for some small ǫ > 0. As in example (3.72), it is the aim to find the minimal time T ⋆ (x0 ) to steer some point x(t) with x(0) = x0 ∈ R3 into the origin. For the lower and upper bounds of u and x, we choose lbdu = −1, ubdu = 1, lbdx = −5, ubdx = 5. For this optimal control problem it is possible to calculate the minimal time T ⋆ exactly [54]. Thus, we can compare the performance of the SDPR method to the approach in [54]. We apply the SDPR method 120 CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS x0,2 \x0,3 0 1 2 3 0 0.0000 1.6049 2.7816 3.9705 1 1.0081 1.2827 2.0269 2.8498 2 2.0276 2.0337 2.1959 2.6177 3 3.0145 3.0125 3.0170 3.1906 Table 3.27: min(sSDP3 (x0 )) for x0,1 = 0 and (x0,2 , x0,3 ) ∈ {0, 1, 2, 3}2. x0,2 \x0,3 0 1 2 3 0 0.0000 2.5066 3.5449 4.3416 1 1.0000 1.7841 2.6831 3.4328 2 2.0000 2.1735 2.5819 3.0708 3 3.0000 3.0547 3.2088 3.4392 Table 3.28: T ⋆ (x0 ) for x0,1 = 0 and (x0,2 , x0,3 ) ∈ {0, 1, 2, 3}2. with ω = 3 to (3.75). The numerical results for N = 50 and x0,1 = 0, (x0,2 , x0,3 ) ∈ {0, 1, 2, 3}2 are reported in Table 3.27, the results for N = 30 and x0,1 = 1, (x0,2 , x0,3 ) ∈ {1, 2, 3}2 in Table 3.29. Again, min(sSDPw (x0 )) is larger than T ⋆ (x0 ) for some x0 , which is explained by the discretization error due to the medium scale choice N ∈ {30, 50}. This gap closes for N → ∞. In particular for choices x0 to be found in the lower left corner of Table 3.27 and 3.29, we obtain better lower bounds than [54]. Again, unlike the method in [54] we also obtain an accurate approximation of optimal control and trajectory, as pictured in Figure 3.32. x0,2 \x0,3 1 2 3 1 1.8862 2.6077 3.2969 2 2.4412 2.7737 3.2033 3 3.3145 3.4516 3.6618 Table 3.29: min(sSDP3 (x0 )) for x0,1 = 1 and (x0,2 , x0,3 ) ∈ {1, 2, 3}2. 121 3.3. NUMERICAL EXPERIMENTS x0,2 \x0,3 1 2 3 1 1.8257 2.5231 3.1895 2 2.3636 2.6856 3.1008 3 3.2091 3.3426 3.5456 Table 3.30: min(T ⋆ (x0 )) for x0,1 = 1 and (x0,2 , x0,3 ) ∈ {1, 2, 3}2. 2 2 x1 x1 x2 1.5 x2 1.5 x3 x3 u1 u1 u2 1 0.5 0.5 0 0 −0.5 −0.5 −1 0 5 10 15 20 25 30 35 40 45 u2 1 50 −1 0 5 10 15 20 25 30 35 40 45 50 Figure 3.32: Optimal control and trajectories for x0 = (0, 2, 1) for Brockett Integrator OCP before (left) and after (right) applying SQP. 122 CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS Chapter 4 Concluding Remarks and Future Research 4.1 Conclusion Hierarchies of SDP relaxations are a powerful tool to solve general, severely nonconvex POPs. However, to solve large scale POPs remains a very challenging task due to the limited capacity of contemporary SDP solvers. In this thesis we discussed two major approaches to attempt large scale POPs by reducing the size of the SDP relaxations. In the first one presented in 2.2, our focus has been on developing a theoretical framework consisting of the d- and r-space conversion methods to exploit structured sparsity, characterized by a chordal graph structure, via the positive semidefinite matrix completion for an optimization problem involving linear and nonlinear matrix inequalities. The two d-space conversion methods are provided for a matrix variable X in objective and/or constraint functions of the problem, which is required to be positive semidefinite. The methods decompose X into multiple smaller matrix variables. The two r-space conversion methods are aimed at a matrix inequality in the constraint of the problem. In these methods, the matrix inequality is converted into multiple smaller matrix inequalities. As mentioned in Remarks 2.3, 2.5 and 2.7, the d-space conversion method using clique trees and the r-space conversion method using clique trees have plenty of flexibilities in implementation. This should be explored further for increasing the computational efficiency. In 2.2.7 we constructed linear SDP relaxations for general quadratic SDP that exploit d- and r-space sparsity. When applying these relaxations to quadratic SDP arising from different applications, we observed that the computational performance is greatly improved compared to the classical SDP relaxations, which do not apply d- and r-space conversion methods. Of particular interest for the numerical analysis of differential equation is the linear SDP relaxation exploiting d-space sparsity. It reduces the size of SDP relaxations for POPs derived from certain differential equations a lot. It will be interesting topic to study the efficiency of d- and r-space conversion methods for further classes of nonlinear SDPs. In 2.3, we proposed four different heuristics to transform a general POP into a QOP. The advantage of this transformation is that the sparse SDP relaxation of order one can be applied to the QOP. The sparse SDP relaxation of order one is of vastly smaller size than the sparse SDP relaxation of minimal order ωmax for the original POP. By solving the sparse SDP relaxation of the QOP, approximates to the global minimizer of a large scale POP of higher degree can be derived. The reduction of the SDP relaxation and the gain in numerical tractability come at the cost of deteriorating feasibility and optimality errors of the approximate solution obtained by solving the SDP relaxation. In general the SDP relaxation of order one for the QOP is weaker than the SDP relaxation of order ωmax for the original POP. We discussed how to overcome this difficulty by imposing tighter lower and upper bounds for the components of the n-dimensional variable of a POP, by adding linear or quadratic Branch-and-Cut bounds, and by applying local convergent optimization methods such as sequential quadratic programming to the POP starting from the solution provided by the SDP relaxation for the QOP. The proposed heuristics have been demonstrated with success 123 124 CHAPTER 4. CONCLUDING REMARKS AND FUTURE RESEARCH on various medium and large-scale POPs. We have seen that imposing additional Branch-and-Cut bounds was necessary to yield accurate approximations to the global optimizer for some problems. However, for most problems it was crucial to choose the lower and upper bounds for the variable x sufficiently tight and to apply SQP to obtain a highly accurate approximation of the global optimizer. The total processing time could be reduced by up to three magnitudes for the problems we tested. For these reasons we think the proposed technique is promising to find first approximate solutions for POPs, whose size is too large to be solved by the more precise, original SDP relaxations due to Waki et al. Our most important application for both approaches to reduce the size of SDP relaxations is the numerical analysis of nonlinear differential equations. We were able to transform nonlinear PDE problems with polynomial structure into POPs. The description is based on the discretization of a PDE problem, the approximation of its partial derivatives by finite differences and the choice of an appropriate objective function. Due to the finite difference discretization, the POPs derived from PDEs satisfy both, correlative and domain-space sparsity patterns under fairly general assumptions. Therefore, we can apply dual standard form SDP relaxations exploiting correlative sparsity and primal standard form SDP relaxations exploiting domain-space sparsity efficiently. For many PDE problems the solution of the SDP relaxation is an appropriate initial guess for locally fast convergent methods like Newton’s method and SQP. However, we have seen it is often necessary to impose tighter or additional bounds to the POPs to derive highly accurate solutions. Moreover, we demonstrated how to choose an objective function and bounds for the unknown function, in order to detect particular solutions of a discretized PDE problem. In other words, one of the features of using the SDPR method instead of several existing methods is that a function space to find solutions may be translated into natural constraints for a sparse POP. In the case we have partial information about a particular solution we want to find, this information can be exploited by the SDPR method to provide an appropriate initial guess for a local method. The reduction techniques from Chapter 2 are highly efficient to solve POPs derived from PDEs for higher resolution grids and to obtain accurate discrete approximations for solutions of the continuous PDE problem. Another technique to extend solutions to finer and finer grid is the grid-refining method, which is efficient even when starting from solutions on very coarse grids. We have shown that the SDPR method is very promising for nonlinear differential equations with several solutions. One feature of the SDPR method is the ability to detect a particular solution. Another challenging problem is to enumerate all solutions of a discretized PDE problem and ultimately to enumerate accurate approximations for all solutions of the underlying continuous PDE problem. We proposed an algorithm based on the SDPR method to approximately enumerate all real solutions of a zero dimensional radical polynomial system with respect to a cost function. If the order of the SDP relaxations tends to infinity, we can guarantee the convergence of the algorithm’s output to the smallest kinetic energy solutions of the polynomial system. The algorithm can be applied successfully to enumerate the solutions of the discrete cavity flow problem with the kinetic energy of the flow as cost function. A variant of the enumeration algorithm has been applied to detect all solutions of an interesting reaction-diffusion equation. Since both, the enumeration algorithm and its variant, are based on the SDPR method that exploits sparsity, it is possible to attempt POPs of much larger scale than by the approaches in [35] and [55]. To conclude, the SDPR method constitutes a general purpose method for solving problems involving differential equations polynomial in the unknown functions. We demonstrated the potential of the SDPR method on differential equations arising from a range of areas: Elliptic, parabolic and hyperbolic PDE, reaction-diffusion equations, fluid dynamics, nonlinear optimal control, differential algebraic equations and first order PDEs. This list of differential equations we may analyze by the SDPR method is by no means complete. But it illustrates that the SDPR method provides a powerful tool to get new insights in the numerical analysis of differential equations. 4.2 Outlook on future research directions Efficient software for large scale POPs and their SDP relaxations remains a challenging field with many open problems. The research presented in this thesis motivates to look for answers to a number of questions: We discussed four heuristics to transform an arbitrary POP into an equivalent QOP. Moreover, we 4.2. OUTLOOK ON FUTURE RESEARCH DIRECTIONS 125 encountered that correlative sparsity and domain-space sparsity of a QOP can differ significantly. Therefore, the size of the resulting dual and primal form SDP relaxations may be of vastly different size. The question remains whether (a) there is a way to transform a POP into a QOP that enhances these types of sparsity, and (b) whether we may find a more general concept of sparsity for a QOP that combines these two types of sparsity. We have also seen that the approximation accuracy of the SDP relaxation for the QOP is weaker than for the original POP. It remains a future problem to strengthen the sparse SDP relaxation of order one for a QOP further. In that respect, it is desirable to find a systematic approach to tighten lower and upper bounds successively, without shrinking the optimal set of the POP. Furthermore, the additional quadratic constraints derived under the transformation algorithm allow to express some moments as linear combination of other moments. As proposed by Henrion and Lasserre in [35] and Laurent in [56] these linear combinations can be substituted in the moment and localizing matrices of the SDP relaxation to reduce the size of the moment vector y. Exploiting this technique will shrink the size of the sparse SDP relaxations for QOPs further and may enable us to solve POPs of even larger scale. Compared to the methods [35, 55] for finding all real solutions of a zero-dimensional radical polynomial system, our enumeration algorithm can be applied to problems of much larger scale. However, the numerical stability depends heavily on the choice of the parameters in the algorithm. The variant of the enumeration algorithm for the Swift-Hohenberg equation constitutes a promising first step to improve the numerical stability, since the additional linear constraints remain unchanged under the SDP relaxation. The idea of this variant may be exploited more systemically in future. Although we are able to solve some PDE problems with minimal relaxation order ω = ωmax , in many cases it is a priori not possible to predict the relaxation order ω which is necessary to attain an accurate solution. As the size of the sparse SDP relaxation increases polynomially in ω, the tractability of the SDP is limited by the capacity of current SDP solvers. It is a further challenging question whether we can characterize a class of differential equation problems that is guaranteed to be approximated accurately for a certain fixed relaxation order. At the moment there are only very few results for error bounds of SDP relaxations for general, nonconvex POPs [72]. Not every solution of a discretized differential equation is a discrete approximation for an actual solution of this differential equation, as we encountered in the analysis of the steady cavity flow problem. It is therefore interesting to close the gap between the discrete and the continuous world. An approach based on the SDPR method for narrowing this gap takes advantage of maximum entropy estimation [10, 53]. In this approach the solution of the SDPR method is used to compute discrete approximations to moments of a measure corresponding to the differential equation, and when applying maximum entropy estimation to these discretized moments we obtain a smooth approximation for a solution of the differential equation. This is the topic of some ongoing joint work with Jean Lasserre and Didier Henrion. Finally, we applied the SDPR method to a wide variety of problems involving differential equations. However, the classes of problems we may attempt by this approach are by no means exhausted. It will be an interesting topic of future research to apply the SDPR methods to challenging nonlinear differential equations satisfying a polynomial structure. Also, nonlinear optimal control seems an challenging area to apply this methodology to, as the numerical experiments for the simple optimal control problems presented in this thesis suggest. 126 CHAPTER 4. CONCLUDING REMARKS AND FUTURE RESEARCH Bibliography [1] J. Agler, J. W. Helton, S. McCullough, L. Rodman, Positive semidefinite matrices with a given sparsity pattern, Linear Algebra Appl. (1988), Vol. 107, pp. 101-149. [2] E.L. Allgower, D.J. Bates, A.J. Sommese, C.W. Wampler, Solution of polynomial systems derived from differential equations, Computing, 76 (2006), No. 1, pp. 1-10. [3] A. Arakawa, Computational design for long-term numerical integration of the equation of fluid motion: two dimensional incompressible flow, part I., Journal of Computational Physics 135 (1997), pp. 103-114. [4] J.R.S. Blair, B. Peyton, An introduction to chordal graphs and clique trees, Graph Theory and Sparse Matrix Computation, Springer Verlag (1993), pp. 1-29. [5] D. Bertsimas, C. Caramanis, Bounds on linear PDEs via semidefinite optimization, Math. Programming, Series A 108 (2006), pp. 135-158. [6] P. Biswas, Y. Ye, A distributed method for solving semidefinite programs arising from Ad Hoc Wireless Sensor Network Localization, Multiscale Optimization Methods and Applications, 69-84, SpringerVerlag [7] J. Bochnak, M. Coste, M.-F. Roy, Real Algebraic Geometry, Springer-Verlag (1998). [8] P.T. Boggs, J.W. Tolle, Sequential Quadratic Programming, Acta Numerica 4 (1995), pp. 1-50. [9] B. Borchers, SDPLIB 1.2, A library of semidefinite programming test problems, Optim. Methods Softw. (1999), 11-12, pp. 683-689. [10] J. Borwein, A.S. Lewis, On the convergence of moment problems, Trans. Am. Math. Soc., 325 (1991), pp. 249-271. [11] D. Braess, Finite Elements, Theory, fast solvers, and applications in solid mechanics, Cambridge University Press (2001). [12] O.R. Burggraf, Analytical and numerical studies of the structure of steady separated flows, J. Fluid Mech 24 (1966), pp. 113-151. [13] R. Courant, K. Friedrichs, H. Lewy, Über die partiellen Differenzengleichungen der mathematischen Physik, Math. Ann. 100 (1928), No. 1, pp. 32-74. [14] M. Cheng, K.C. Hung, Vortex structure of steady flow in a rectangular cavity, Computers & Fluids, Volume 35, Issue 10 (2006), pp. 1046-1062. [15] R. Courant, D. Hilbert, Methoden der Mathematischen Physik, Vol 1 (1931), Chapter 4, The method of variation. [16] R. Courant, Variational methods for the solution of problems of equilibrium and vibrations, Bull. Amer. Soc. 49 (1943), pp. 1-23. 127 128 BIBLIOGRAPHY [17] G. Dahlquist, Convergence and stability in the numerical integration of ordinary differential equations, Math. Scand. 4 (1956), pp. 33-53. [18] R. Fletcher, Practical Methods of Optimization. Vol. 1 Unconstrained Optimization, John Wiley, Chichester (1980). [19] I. Fried, Numerical Solutions of Differential Equations, Academic Press (1979). [20] K. Fujisawa, S. Kim, M. Kojima, Y. Okamoto, M. Yamashita, User’s Manual for SparseCoLO: Conversion Methods for SPARSE COnic-form Linear Optimization Problems, Research reports on Mathematical and Computing Sciences B-453, Tokyo Institute of Technology. [21] M. Fukuda, M. Kojima, K. Murota, K. Nakata Exploiting sparsity in semidefinite programming via matrix completion I: General framework, SIAM J. Optim., 11 (2000), pp. 647-674. [22] M. Fukuda, M. Kojima, Branch-and-Cut Algorithms for the Bilinear Matrix Inequality Eigenvalue Problem, Computational Optimization and Applications, 19 (2001), pp. 79-105. [23] B.G. Galerkin, Series solution of some problems in elastic equilibrium of rods and plates, Vestn. Inzh. Tech. 19 (1915), pp. 897-908. [24] A. George, J.W. Liu, Computer Solution of Large Sparse Positive Definite Systems, Prentice-Hall (1981). [25] P.E. Gill, W. Murray, M.H. Wright, Practical Optimization, Academic Press, London, New York (1981). [26] M. Goemans, D.P. Williamson, Improved Approximation Algorithms for Maximum Cut and Satisfiability Problems Using Semidefinite Programming, Journal of the ACM, 42 (1995), No. 6, pp. 1115-1145. [27] D. Gottlieb, S. Orszag, Numerical Analysis of Spectral Methods: Theory and Applications, SIAM, Philadelphia (1977). [28] R. Grone, C. R. Johnson, E. M. Sá, H. Wolkowitz, Positive definite completions of a partial hermitian matrices, Linear Algebra Appl. 58 (1984), pp. 109-124. [29] J.L. Guermond, A finite element technique for solving first-order PDEs in LP , SIAM Journal Numerical Analysis 42 (2004), No. 2, pp. 714-737. [30] J.L. Guermond, B. Popov, Linear advection with ill-posed boundary conditions via L1 -minimization, International Journal of Numerical Analysis and Modeling 4 (2007), No. 1, pp. 39-47. [31] T. Gunji, S. Kim, M. Kojima, A. Takeda, K. Fujisawa, and T. Mizutani, PHoM - a Polyhedral Homotopy Continuation Method for Polynomial Systems, Research Reports on Mathematical and Computing Sciences, Dept. of Math. and Comp. Sciences, Tokyo Inst. of Tech., B-386 (2003) [32] K. Gustafson, K. Halasi, Cavity flow dynamics at higher Reynolds number and higher aspect ratio, Journal of Computational Physics 70 (1987), pp. 271-283. [33] W. Hao, J.D. Hauenstein, B. Hu, Y. Liu, A.J. Sommese, Y.-T. Zhang, Multiple stable steady states of a reaction-diffusion model on zebrafish dorsal-ventral patterning, Discrete and Continuous Dynamical Systems, Series S, To appear. [34] J.D. Hauenstein, A.J. Sommese, C.W. Wampler, Regeneration Homotopies for Solving Systems of Polynomials, Mathematics of Computation, To appear. [35] D. Henrion, J.B. Lasserre, Detecting global optimality and extracting solutions in GloptiPoly, Chapter in D. Henrion, A. Garulli, editors, Positive polynomials in control. Lecture Notes in Control and Information science, Springer Verlag (2005), Berlin. BIBLIOGRAPHY 129 [36] D. Henrion, J. B. Lasserre, Convergent relaxations of polynomial matrix inequalities and static output feedback, IEEE Trans. Automatic Control (2006), 51, pp. 192-202. [37] C. W. J. Hol, C. W. Scherer, Sum of squares relaxations for polynomial semidefinite programming, Proc. Symp. on Mathematical Theory of Networks and Systems (MTNS), Leuven, Belgium, 2004. [38] R. Horst, P.M. Pardalos, N.V. Thoai, Introduction to Global Optimization, Kluwer Academic Publishers (2000). [39] B. Huber, B. Sturmfels, A polyhedral method for solving sparse polynomial systems, Math. of Comp. 64 (1995), pp. 1541-1555. [40] M. Kawaguti, Numerical solution of the Navier-Stokes equations for the flow in a two dimensional cavity, J. Phys. Soc. Jpn. 16 (1961), pp. 2307-2315. [41] S. Kim, M. Kojima, Exact solutions of some nonconvex quadratic optimization problems via SDP and SOCP relaxations, Computational Optimization and Applications, 26 (2003), pp. 143-154. [42] S. Kim, M. Kojima, M. Mevissen, M. Yamashita, Exploiting Sparsity in Linear and Nonlinear Matrix Inequalities via Positive Semidefinite Matrix Completion, Mathematical Programming, To Appear. [43] S. Kim, M. Kojima, H. Waki, Exploiting Sparsity in SDP Relaxation for Sensor Network Localization, SIAM Journal of Optimization 20 (2009), No. 1, pp. 192-215. [44] S. Kim, M. Kojima H. Waki, M. Yamashita, SFSDP: a Sparse Version of Full Semidefinite Programming Relaxation for Sensor Network Localization Problems, Research Report B-457, Department of Mathematical and Computing Sciences, Tokyo Institute of Technology (2009). [45] K. Kobayashi, S. Kim, M. Kojima, Correlative sparsity in primal-dual interior-point methods for LP, SDP and SOCP, Appl. Math. Optim. (2008), 58, pp. 69-88. [46] M. Kojima, Sums of Squares Relaxations of Polynomial Semidefinite Programs, Research Report B-397, Department of Mathematical and Computing Sciences, Tokyo Institute of Technology (2003). [47] M. Kojima, S. Kim, H. Waki, Sparsity in sums of squares of polynomials, Mathematical Programming, 103 (2005), pp. 45-62. [48] M. Kojima, M. Muramatsu, An Extension of Sums of Squares Relaxations to Polynomial Optimization Problems over Symmetric Cones, Math. Programming (2007), 110, pp. 315-336. [49] M. Kojima, M. Muramatsu, A note on sparse SOS and SDP relaxations for polynomial optimization problems over symmetric cones, Comput. Optim. Appl. (2009), 42, pp. 31-41. [50] W. Kutta, Beitrag zur näherungsweisen Integration totaler Differentialgleichungen, Zeitschrift Math. Physik 46 (1901), pp. 435-453. [51] J.B. Lasserre, Global optimization with polynomials and the problem of moments, SIAM Journal on Optimization, 11 (2001), pp. 796-817. [52] J.B. Lasserre, Convergent SDP-Relaxations in Polynomial Optimization with Sparsity, SIAM Journal on Optimization, 17 (2006), No. 3, pp. 822-843. [53] J.B. Lasserre, Semidefinite programming for gradient and Hessian computation in maximum entropy estimation, Proc. IEEE Conf. Dec Control, 2007. [54] J.B. Lasserre, D. Henrion, C. Prieur, E. Trelat, Nonlinear optimal control via occupation measures and LMI-relaxations, SIAM Journal on Control and Optimization, 47 (2008), pp. 1649-1666. [55] J.B. Lasserre, M. Laurent, P. Rostalski, Semidefinite characterization and computation of real radical ideals, Foundations of Computational Mathematics, Vol. 8 (2008), No. 5, pp. 607-647. 130 BIBLIOGRAPHY [56] M. Laurent, Sums of squares, moment matrices and optimization over polynomials, Emerging Applications of Algebraic Geometry, Vol. 149 of IMA Volumes in Mathematics and its Applications (2009), M. Putinar and S. Sullivant (eds.), Springer, pp. 157-270. [57] P.D. Lax, R.D. Richtmyer, Survey of the stability of linear finite difference equations, Comm. Pure Appl. Math. 9 (1956), pp. 267-293. [58] R.J. LeVeque, Finite Volume Methods for Hyperbolic Problems, Cambridge University Press (2002). [59] G.R. Liu, S.S. Quek, The Finite Element Method, A practical course, Elsevier (2003). [60] J. Macki, A. Strauss, Introduction to Optimal Control Theory, Springer-Verlag (1982), pp. 108. [61] M. Mevissen, M. Kojima, J. Nie and N. Takayama, Solving partial differential equations via sparse SDP relaxations, Pacific Journal of Optimization, 4 (2008), No. 2, pp. 213-241. [62] M. Mevissen, M. Kojima, SDP Relaxations for Quadratic Optimization Problems Derived from Polynomial Optimization Problems, Asia-Pacific Journal for Operations Research 27 (2010), No. 1, pp. 1-24. [63] M. Mevissen, K. Yokoyama and N. Takayama, Solutions of Polynomial Systems Derived from the Cavity Flow Problem, Proceedings of the 2009 International Symposium on Symbolic Computation, 2009, pp. 255 - 262. [64] M. Mimura, Asymptotic Behaviors of a Parabolic System Related to a Planktonic Prey and Predator Model, SIAM Journal on Applied Mathematics, 37 (1979), no. 3, pp. 499-512. [65] A.R. Mitchell, D.F. Griffiths, The Finite Difference Method in Partial Differential Equations, John Wiley and Sons (1980). [66] J.J. More, B.S. Garbow and K.E. Hillstrom, Testing unconstrained optimization software, ACM Trans. Math. Software, 7 (1981), pp. 17-41. [67] K.G. Murty, S.N. Kabadi, Some NP-complete problems in quadratic and nonlinear programming, Mathematical Programming, 39 (1987), pp. 117-129. [68] K. Nakata, K. Fujisawa, M. Fukuda, M. Kojima, K. Murota Exploiting sparsity in semidefinite programming via matrix completion II: Implementation and numerical results, Math. Programming, 95 (2003), pp. 303-327. [69] Ju. E. Nesterov, A. S. Nemirovski, Interior Point Polynomial Methods in Convex Programming: Theory and Applications, SIAM, Philadelphia, PA, 1994. [70] Y. Nesterov, Squared functional systems and optimization problems, in J.B.G. Frenk, C. Roos, T. Terlaky, and S. Zhang, editors, High Performance Optimization, pp. 405-440. Kluwer Academic Publishers (2000). [71] J. Nie, Sum of squares method for sensor network localization, Computational Optimization and Applications 43 (2009), No. 2, pp. 151-179. [72] J. Nie, An Approximation Bound Analysis for Lasserre’s Relaxation in Multivariate Polynomial Optimization, preprint (2009). [73] Y. Nishiura, D. Ueyama, Spatio-temporal chaos for the Gray-Scott model, Physica D, 150 (2001), pp. 137 - 162. [74] Y. Nishiura, T. Teramoto, K. Ueda, Dynamic transitions through scattors in dissipative systems, Chaos, 13 (2003), No. 3, pp. 962 - 972. BIBLIOGRAPHY 131 [75] Y. Nishiura, T. Teramoto, K. Ueda, Scattering of traveling spots in dissipative systems, Chaos, 15 (2005), 047509. [76] Y. Nishiura, T. Teramoto, X. Yuan, K. Ueda, Dynamics of traveling pulses in heterogeneous media, Chaos, 17 (2007), 037104. [77] J. Nocedal, S.J. Wright, Numerical Optimization, Series in Operations Research, Springer, New York 2006. [78] M. Noro, K. Yokoyama, A modular method to compute the rational univariate representation of zerodimensional ideals, Journal of Symbolic Computation 28 (1999), pp. 243–263. [79] P.A. Parrilo, Semidefinite programming relaxations for semialgebraic problems, Math. Programming, 96 (2003), pp. 293 - 320. [80] L.A. Peletier, V. Rottschäfer, Pattern selection of solutions of the Swift-Hohenberg equation, Physica D, 194 (2004), pp. 95 - 126. [81] M. Putinar, Positive Polynomials on Compact Semi-algebraic Sets, Indiana Univ. Math. Journal 42 (1993), No. 3, pp. 969-984 [82] J. Rauch, J. Smoller, Qualitative theory of the FitzHugh-Nagumo equations, Advances in Mathematics, 27 (1978), pp. 12-44. [83] L. Rayleigh, On the theory of resonance, Trans. Roy. Soc. A 161 (1870), pp. 77 - 118. [84] W. Ritz, Über eine neue Methode zur Lösung gewisser Variationsprobleme der mathematischen Physik, Journal für die reine und angewandte Mathematik 135 (1908), pp. 1-61. [85] F. Rouillier, Solving zero-dimensional systems through the rational univariate representation, Applicable Algebra in Engineering, Communication and Computing 9 (1999), pp. 433–461. [86] C. Runge, Über die numerische Auflösung von Differentialgleichungen, Math. Ann. 46 (1895), pp. 167 -178. [87] K. Schmüdgen, The K-moment problem for compact semi-algebraic sets, Math. Ann. 289 (1991), pp. 203-206. [88] M. Schweighofer, Optimization of polynomials on compact semialgebraic sets, SIAM J. Optimization 15 (2005), pp. 805-825. [89] H.D. Sherali, C.H. Tuncbilek, A global optimization algorithm for polynomial programming problems using a reformulation-linearization technique, Journal of Global Optimization, 2 (1992), pp. 101-112. [90] H.D. Sherali, C.H. Tuncbilek, New reformulation-linearization technique based relaxations for univariate and multivariate polynomial programming problems, Operations Research Letters, 21 (1997), 1, pp. 1-10. [91] N.Z. Shor, Class of global minimum bounds of polynomial functions, Cybernetics, 23 (1987), 6, pp. 731-734. [92] N.Z. Shor, Nondifferentiable Optimization and Polynomial Problems, Kluwer (1998). [93] J. Smoller, Shock Waves and Reaction-Diffusion Equations, Springer-Verlag (1983), pp. 106. [94] J. Stoer, R. Bulirsch, Introduction to Numerical Analysis, 3rd edition, Springer-Verlag, New York (2002). [95] J.F. Sturm, SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones, Optimization Methods and Software, 11 and 12 (1999), pp. 625-653. 132 BIBLIOGRAPHY [96] J.C. Strikwerda, Finite Difference Schemes and Partial Differential Equations, Wadsworth and Brooks (1989). [97] M. Tabata, A finite difference approach to the number of peaks of solutions for semilinear parabolic problems, J. Math. Soc. Japan, 32 (1980), pp. 171-192. [98] T. Takami, T. Kawamura, Solving Paritial Differential Equations with Difference schemes, Tokyo University Press (1994). [99] M.J. Turner, R.M. Clough, H.C. Martin, L.J. Topp, Stiffness and deflection analysis of complex structures, J. Aeron. Sci. 23 (1956), pp. 805-823, pp. 854. [100] T. Teramoto, Personal communication. [101] O. Von Stryk, R. Bulirsch, Direct and indirect methods for trajectory optimization, Ann. Oper. Res. 37 (1992), pp. 357-373. [102] H. Waki, S. Kim, M. Kojima, M. Muramatsu, Sums of squares and semidefinite program relaxations for polynomial optimization problems with structured sparsity, SIAM Journal of Optimization 17 (2006) 218-242. [103] H. Waki, S. Kim, M. Kojima, M. Muramatsu, SparsePOP: a Sparse Semidefinite Programming Relaxation of Polynomial Optimization Problems, Research Reports on Mathematical and Computing Sciences, Dept. of Math. and Comp. Sciences, Tokyo Inst. of Tech., B-414 (2005). [104] M. Yamashita, K. Fujisawa, M. Kojima, Implementation and evaluation of SDPA 6.0 (SemiDefinite Programming Algorithm 6.0), Optimization Methods and Software 18 (2003), pp. 491-505. [105] Yokota, http://next1.cc.it-hiroshima.ac.jp/MULTIMEDIA/numeanal2/node24.html. [106] O.C. Zienkiewicz, R.L. Taylor, J.Z. Zhu, The Finite Element Method, Its Basis and Fundamentals, Elsevier (2005).