"Sparse Semidefinite Programming Relaxations for Large Scale Polynomial optimization and their applications to differential equations"

Sparse Semidefinite Programming Relaxations for Large
Scale Polynomial Optimization and Their Applications
to Differential Equations
Martin Mevissen
Submitted in partial fulfillment of
the requirements for the degree of
DOCTOR OF SCIENCE
Department of Mathematical and Computing Sciences
Tokyo Institute of Technology
August 2010
2
Acknowledgements
The endeavor for the PhD in Tokyo and to complete this thesis would not have been possible without the
support of a number of great people.
My greatest appreciation goes to my advisor Masakazu Kojima for his encouragement, guidance and
continued support. I am grateful for his interest in my studies and the insight I gained by collaborating
with him. I am indebted to him for providing me with an environment to enjoy research to the fullest ever
since I joined his group for writing my Diploma thesis back in 2006.
Moreover, I would like to express my gratitude to Nobuki Takayama, who hosted me twice at Kobe university. I am thankful for his sincere interest and his engagement in our joint work with Kosuke Yokoyama.
My gratitude extends to Yasumasa Nishiura and Takashi Teramoto for inviting me to Hokkaido University in 2008. They offered me a great environment to learn more about reaction-diffusion equations. I am
also thankful to Jean Lasserre and Didier Henrion for hosting me at LAAS in 2009, and I am looking forward
to continuing our joint work in the future. I would like to thank Sunyoung Kim, Makoto Yamashita and
Jiawang Nie for our fruitful collaborations and exciting discussions. In particular, I would like to express
my gratitude to Makoto for his constant and patient technical support. Many thanks go to Hans-Jakob
Lüthi for supporting this venture in Japan from early on and his encouraging advice. Also, I would like
to thank the German Academic Exchange Service for enabling me to pursue this journey with its Doctoral
Scholarship for three years.
The stay at Tokyo Institute of Technology would have been inconceivable without the people I shared
this time, many thoughts and the bond of friendship. In particular, I would like to thank Paul Sheridan,
for his unshakable optimism, Ken Shackleton, for his large-heartedness, Matthias Hietland Heie, for his
open mind, Hiroshi Sugimoto, for his noble heart, Kojiro Akiba, for all our conversation, Mikael Onsjö, for
being a great host, and Tomohiko Mizutani, for helping me out with a lot of things, when I was a newcomer.
In my life in Japan, I was glad to find many friends who I can count on and who gave me the chance
to call this place home. I enjoyed greatly everything I shared with them. Thank you so much, Shuji, Yoko,
Jif, Azra, Masa, Mari, Moe, Hiroshi, Shota, Chiaki, Soji, Naomi, Tomoko.
Finally, my deepest gratefulness goes to my parents. Their encouragement and love have been with me
all the time. They stood by me every day of my life. I dedicate this thesis to them.
3
4
To my parents
Contents
1 Introduction
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Outline of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
9
10
11
2 Semidefinite Programming and Polynomial Optimization
2.1 Positive polynomials and polynomial optimization . . . . . . . . . . . . .
2.1.1 Decomposition of globally nonnegative polynomials . . . . . . . . .
2.1.2 Decomposition of polynomials positive on closed semialgebraic sets
2.1.3 Dense SDP relaxations for polynomial optimization problems . . .
2.1.4 Sparse SDP relaxations for polynomial optimization problems . . .
2.2 Exploiting sparsity in linear and nonlinear matrix inequalities . . . . . . .
2.2.1 An SDP example . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.2 Positive semidefinite matrix completion . . . . . . . . . . . . . . .
2.2.3 Exploiting the domain-space sparsity . . . . . . . . . . . . . . . . .
2.2.4 Duality in positive semidefinite matrix completion . . . . . . . . .
2.2.5 Exploiting the range-space sparsity . . . . . . . . . . . . . . . . . .
2.2.6 Enhancing the correlative sparsity . . . . . . . . . . . . . . . . . .
2.2.7 Examples of d- and r-space sparsity in quadratic SDP . . . . . . .
2.3 Reduction techniques for SDP relaxations for large scale POP . . . . . . .
2.3.1 Transforming a POP into a QOP . . . . . . . . . . . . . . . . . . .
2.3.2 Quality of SDP relaxations for QOP . . . . . . . . . . . . . . . . .
2.3.3 Numerical examples . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13
13
13
15
18
25
28
29
31
33
37
40
42
46
49
50
57
60
3 SDP Relaxations for Solving Differential Equations
3.1 Numerical analysis of differential equations . . . . . . . . . . . . . .
3.1.1 The finite difference method . . . . . . . . . . . . . . . . . . .
3.1.2 The finite element method and other numerical solvers . . . .
3.2 Differential equations and the SDPR method . . . . . . . . . . . . .
3.2.1 Transforming a differential equation into a sparse POP . . .
3.2.2 The SDPR method . . . . . . . . . . . . . . . . . . . . . . . .
3.2.3 Enumeration algorithm . . . . . . . . . . . . . . . . . . . . .
3.2.4 Discrete approximations to solutions of differential equations
3.3 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.1 A nonlinear elliptic equation with bifurcation . . . . . . . . .
3.3.2 Illustrative nonlinear PDE problems . . . . . . . . . . . . . .
3.3.3 Reaction-diffusion equations . . . . . . . . . . . . . . . . . . .
3.3.4 Differential algebraic equations . . . . . . . . . . . . . . . . .
3.3.5 The steady cavity flow problem . . . . . . . . . . . . . . . . .
3.3.6 Optimal control problems . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
67
67
68
72
74
74
80
83
86
86
87
89
96
103
105
114
5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6
CONTENTS
4 Concluding Remarks and Future Research
123
4.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.2 Outlook on future research directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Notation
N
Nn
Z
R
Rn
Rn×m
Sn
Sn+
Sn (E, ?)
Sn+ (E, ?)
Sn (E, 0)
Sn+ (E, 0)
R[x]
R[x]d
R[x,
P A]2
R[x]
Λ(d)
u(x, A)
G(N, E)
<
≻
•
det(·)
rank(·)
Tr(·)
deg(·)
supp(·)
natural numbers
n- dimensional vector space with entries in N
integers
real numbers
n-dimensional vector space with real entries
vector space of n by m matrices with real entries
vector of symmetric matrices in Rn×n
cone of symmetric, positive semidefinite matrices in Rn×n
partial symmetric matrices with entries specified in E
matrices in Sn (E, ?) that can be completed to positive semidefinite matrices
symmetric matrices with nonzero entries on the diagonal and in E
positive semidefinite matrices in Sn (E, 0)
ring of multivariate polynomials in the n dimensional variable x with coefficients in R
set of polynomials of degree less or equal d
set of polynomials supported on A ⊂ Nn , R[x, A] = {p ∈ R[x] | supp(p) ⊂ A}
set
of squares polynomials,
P of sums
P
R[x]2 := p ∈ R[x] | p = ri=1 p2i , pi ∈ R[x] for some r ∈ N
set of multivariate indices of degree less or equal d,
Λ(d) = {α ∈ Nn : | α |≤ d}
monomial vector for A ⊂ Nn , u(x, A) = (xα | α ∈ A)
graph with vertex set N and edge set E
positive semidefinite matrix
positive definite matrix
Pn Pn
interior product on Sn , A • B = i=1 j=1 Ai,j Bi,j
determinant of a matrix
rank of a matrix
P
trace of a matrix, Tr(A) := ni=1 Ai,i
degree of a polynomial
support of a polynomial, supp(p) := {α ∈ Nn | pα 6= 0}
7
8
CONTENTS
Mw (y)
Mw (y, I)
Mw (y g)
Mw (y g, I)
imaginary unit, i.e., I 2 = −1
basic, closed semialgebraic set generated by g1 , . . . , gm ∈ R[x],
K(g1 , . . . , gm ) := {x ∈ Rn | g1 (x) ≥ 0, . . . , gm (x) ≥ 0}
quadraticP
module defined
P by g1 , . . . , gm ,
M (K) = R[x]2 + g1 R[x]2 + . . . + gm R[x]2
approximation
M (K)
Pmof order d forP
Md (K) =
σ
g
|
σ
∈
R[x]2 , deg(σi gi ) ≤ 2d
i
i
i
i=0
multiplicative convex cone generated by R[x]2 and g1 , . . . , gm ,
P
P
P2
hg1 , . . . , gm i = M (K) + g1 g2 R[x]2 + . . . + g1 g2 · · · gm R[x]2
multiplicative monoid
Qr generated by g1 , . . . , gm ,
O(g1 , . . . , gm ) = { i=1 ti | ti ∈ {g1 , . . . , gm } for r ∈ N}
ideal generated by g1 , . . . , gm ,
I(g1 , . . . , gm ) = R[x] + g1 R[x] + . . . + gm R[x]
moment matrix of order w for the vector y
partial moment matrix, contains only those components yα of y with α ∈ I
localizing matrix of order w for vector y and g ∈ R[x]
partial localizing matrix
MkS
Mk
α
ki,j
k0
tC
ǫsc
ǫobj
ω
me
Nx
hx
ui,j
higher monomial set of a POP
higher monomial list of a POP
dividing coefficient
number of substitutions required by an algorithm to transform a given POP into a QOP
total computation time of an algorithm
scaled feasibility error of numerical solution for a POP
optimality error of SDP relaxation solution for a POP
relaxation order of dense or sparse SDP relaxation
maximum eigenvalue of a Jacobian of a system of polynomial equations
number of grid points in x-direction in a discretized domain
distance of two grid points in x-direction in a discretized domain
approximation of u at grid point (xi , yj ) in a finite difference scheme
max
min
sup
inf
lbd
ubd
maximum
minimum
supremum
infimum
lower bound
upper bound
SDP
POP
QOP
ODE
PDE
OCP
FDM
FEM
FVM
SQP
QSDP
semidefinite program
polynomial optimization problem
quadratic optimization problem
ordinary differential equation
partial differential equation
optimal control problem
finite difference method
finite element method
finite volume method
sequential quadratic programming
quadratic semidefinite program
I
K(g1 , . . . , gm ), K
M (K)
Md (K)
P2
hg1 , . . . , gm i
O(g1 , . . . , gm )
I(g1 , . . . , gm )
Chapter 1
Introduction
1.1
Motivation
A wide variety of problems arising in mathematics, physics, engineering, control and computer science can
be formulated as optimization problems, where all functions in objective and constraints are multivariate
polynomials over the field of real numbers - so called polynomial optimization problems (POP). In general,
polynomial optimization problems are severely non-convex and NP-hard to solve. In recent years there has
been active research in semidefinite programming (SDP) relaxation methods for POPs. That is, a general,
non-convex POP is relaxed to a convex optimization problem, an SDP. Solving the SDP provides either
a lower bound to the minimum, or approximations for minimum and global minimizers of the POP. First
convexification techniques for general POPs have been proposed by Shor [91] and Nesterov [70]. Since the
pioneering work of Shor, convexification and in particular SDP relaxation techniques have been used in an
ever increasing number of applications and problems. One of the classical examples is the SDP relaxation
for a non-convex quadratic programming formulation of the NP-hard max-cut problem [26]. Other NPhard problems, that can be formulated as POPs are {0, 1}− linear programming, or testing whether a
symmetric matrix is copositive [67]. A breakthrough in this field was Lasserre’s seminal paper [51]. Given
the feasible set of a general POP is a basic, compact semialgebraic set, a hierarchy of SDP relaxations can
be constructed, which provides a monoton increasing sequence of lower bounds for the minimum of the
POP. Lasserre showed this sequence converges to the minimum of the POP under a fairly general condition.
Lasserre’s relaxation and also the approach [79] by Parrilo do rely on the representation of nonnegative
polynomials as sum of squares of polynomials and the dual theory of moments. Despite being a powerful
theoretical tool to approximate minimum and minimizer of general polynomial optimization problems, the
Lasserre relaxation is not practical even for medium scale POPs. In fact, the size of the matrix inequality
constraints in the SDP relaxations grows rapidly for increasing order of the hierarchy. Thus, in the case of
a medium or large scale POPs the SDP relaxations becomes intractable for current SDP solvers as SeDuMi
[95] or SDPA [104], even for small choices of the relaxation order. However, large scale POP arise from
challenging problems and efficient methods to solve them are in high demand. For instance one problem,
which received lots of attention recently, is the sensor network localization problem [6, 71, 43].
A first approach to reduce the size of SDP relaxations for POPs has been the concept of correlative sparsity
of a POP [47, 102, 52]. Exploiting structured sparsity enables to attempt POP of larger dimension by a
hierarchy of sparse SDP relaxations. Still, the capacity of current SDP solvers is limiting the applicability
of sparse SDP relaxations for large scale POP. As one way to take advantage of sparsity more efficiently,
we develop a general notion of sparsity in linear and nonlinear matrix inequalities and show how to exploit
this sparsity via positive semidefinite matrix completion. We demonstrate how so called domain-space and
range-space sparsity can be used to decrease the size of SDP relaxations for large scale POPs substantially.
Another technique to attempt large scale POPs is based on the idea to reduce the size of the sparse SDP
relaxations by transforming a general POP into an equivalent quadratic optimization problem (QOP). For
an important class of large scale POPs the size of sparse SDP relaxations for the equivalent QOPs is far
9
10
CHAPTER 1. INTRODUCTION
smaller than the size of sparse SDP relaxations for the original POPs.
The second topic of this thesis is to investigate how to efficiently apply sparse SDP relaxation techniques
for an important class of challenging problems, the numerical analysis of differential equations. For most
problems involving ordinary or partial differential equations it is not possible to find analytic solutions - in
particular if the equations are nonlinear in the unknown function. Even the problem to find approximations
to the solutions of ODEs or PDEs by numerical methods is well known to be a hard problem, which begins
to attract attention by researchers in moment, SDP and numerical algebra techniques. On the one hand
a moment based approach [5] has been proposed to find tight bounds for linear functionals defined on
linear PDEs. On the other hand, a homotopy continuation based approach [2, 33, 34] has been proposed
to find all solutions of a discretized, possibly nonlinear PDE. We will show how to transform a problem
involving differential equations into a POP by using standard finite difference schemes. The dimension of
these POPs is determined by the discretization of the domain of the differential equation. Thus, for fine
discretizations we obtain a challenging class of large scale POPs. These POPs satisfy both, correlative and
domain-space sparsity, which enables us to apply sparse SDP relaxation techniques efficiently. The sparse
SDP relaxation method is of particular interest for PDE problems with several solutions. We propose
different algorithms based on the sparse SDP relaxation method to detect several or even all solutions of a
system of nonlinear PDE. It is a strength of this method, that a wide variety of nonlinear PDE problems
can be solved: Nonlinear elliptic, parabolic and hyperbolic equations, reaction-diffusion equations, steady
state Navier-Stokes equations in fluid dynamics, differential algebraic equations or nonlinear optimal control
problems.
1.2
Contribution
This thesis is largely based on the content of prior publications of the author. Its contributions can be
summarized as follows.
• We present a general framework to detect, characterize and exploit sparsity in an optimization problem
with linear and nonlinear matrix inequality constraints via positive semidefinite matrix completion.
We distinguish two types of sparsity, domain-space sparsity for the symmetric matrix variable in
objective and constraint functions of the problem, and range-space sparsity. Four conversion methods
are proposed to exploit these two types of sparsity. We demonstrate the efficiency of these conversion
methods on SDP relaxations for sparse, large-scale POP derived from discretizing partial differential
equations and the sensor network localization problem. This result is based on our work [42].
• Based on the observation dating back to Shor [92], that any POP can be written as an equivalent QOP,
we develop four heuristics for transforming a POP into a QOP. We show, that sparsity of the POP
is maintained under our transformation procedures, and propose different techniques to improve the
quality of the sparse SDP relaxations for the QOP, which are weaker than the more expensive sparse
SDP relaxations for the equivalent POP. This technique is shown to be very efficient for large-scale
POP: We are able to obtain highly accurate approximations to the global optimizers of the POP by
solving SDP relaxations of vastly reduced size. This work is presented in detail in [62].
• We are the first to introduce a method based on sparse SDP relaxations to solve systems of linear
and partial differential equations [61, 63]. Unlike the approach [5] we are able to approximate the
actual solutions of an ordinary or partial differential equation. Moreover, our approach is applicable
to nonlinear differential equations, whereas the technique [5] is limited to linear PDEs. Furthermore,
compared to the numerical algebraic approach [2, 33, 34], we can solve a system of polynomial equations
derived from a PDE for a much finer discretization by exploiting sparsity. Also, we do not aim at
finding all complex solutions, but we detect the real solutions to that system of equations one by one.
• Comparing the sparse SDP relaxation method to solve differential equations to existing PDE solvers,
our approach has the following advantages: (a) We can add polynomial inequality constraints for
the unknown solutions of the differential equations to the system of equations obtained by the finite
1.3. OUTLINE OF THE THESIS
11
difference discretization of the PDE, which can be understood as restricting the space of functions
we are searching for solutions. (b) We can detect particular solutions of a PDE, by choosing an
appropriate objective function for the sparse POP derived from the PDE problem or by adding
inequality constraints to that POP. (c) We are able to systematically enumerate all solutions of a
discretized PDE problem by iteratively applying the SDP relaxation method. (d) We exploit the fact,
that the sparse SDP relaxations provide an approximation to the global optimizer of a POP. Thus,
even if the accuracy of the solution of the SDP relaxation is not high, it is a good initial guess for
locally convergent solvers for many PDE problems. This fact is in particular interesting for PDE
problems with many solutions. These results are based on our work in [61, 63, 62].
• The sparse SDP relaxation method for solving large scale POP derived from differential equations
can be applied to solve nonlinear, optimal control problems. Unlike the moment method in [54] our
method yields approximations to the optimal control, trajectories and value of a control problem in
addition to provide lower bounds for the optimal value of the control problems.
1.3
Outline of the thesis
This thesis consists of two main parts. In the first part given by Chapter 2 we introduce the approaches to
use methods from convex optimization to solve general, nonconvex polynomial optimization problems. We
begin in 2.1 with introducing the historical background of characterizing positive polynomials, the problem of
minimizing multivariate polynomials over basic, closed semialgebraic sets and the dense Lasserre relaxation,
a sequence of semidefinite programs whose minima converge to the minimum of a polynomial optimization
problem under fairly general conditions. Finally, we review the method of exploiting correlative sparsity
of a POP to construct a sequence of sparse SDP relaxations. In 2.2 we present a general framework to
exploit domain- and range space sparsity in problems involving linear or nonlinear matrix inequalities. This
technique can be applied to the large scale SDP relaxations for large scale POP. In 2.3 we introduce the
approach to reduce the size of dense or sparse SDP relaxations for large scale POP, which is based on the
idea to transform a general POP into an equivalent QOP.
In the second part presented by Chapter 3 we show how to use the methods and techniques from Chapter
2 for the numerical analysis of ordinary and partial differential equations. First we give an overview over
existing numerical methods for solving partial differential equations, in particular the two most common
approaches, the finite difference method and the finite element method, in 3.1. In 3.2 we introduce our
method to transform a problem involving partial differential equations into a POP via finite difference
discretization, and to solve the resulting large scale POP by the SDP relaxation techniques from Chapter
2. In 3.3 we apply our SDP relaxation method to a variety of different PDE problems such as nonlinear
elliptic, parabolic and hyperbolic equations, differential algebraic equations, reaction-diffusion equations,
fluid dynamics and nonlinear optimal control.
Finally, we summarize the thesis in Chapter 4 with some concluding remarks and give an outlook on possible
future research directions based on the methods and results presented here.
12
CHAPTER 1. INTRODUCTION
Chapter 2
Semidefinite Programming and
Polynomial Optimization
2.1
Positive polynomials and polynomial optimization
Polynomial optimization and the problem of global nonnegativity of polynomials are active fields of research
and remain in the focus of researchers from various areas as real algebra, semidefinite programming and
operator theory. Shor [91] was the first who introduced the idea of applying a convex optimization technique
to minimize an unconstrained multivariate polynomial. Also, Nesterov [70] was one of the first who discussed
to exploit the duality of moment cones and cones of nonnegative polynomials in a convex optimization
framework. He showed the characterization of a moment cone by linear matrix inequalities, i.e., semidefinite
constraints, if the elements of the corresponding cone of nonnegative polynomials can be written as sum of
squares. The next milestone in minimizing multivariate polynomials was given by Lasserre [51], who applied
recent real algebraic results by Putinar [81] to construct a sequence of semidefinite program relaxations whose
optima converge to the optimum of a polynomial optimization problem. Another approach to apply real
algebraic results to attempt the problem of nonnegativity of polynomials was introduced by Parrilo [79].
We attempt to solve the following polynomial optimization problem:
min p(x)
s.t. gi (x) ≥ 0
(2.1)
∀i = 1, . . . , m
where p, g1 , . . . , gm ∈ R [x]. Problem (2.1) can also be written as
minx∈K p(x)
(2.2)
where K the basic, closed semialgebraic set that is defined by the polynomials g1 , . . . , gm . Let p⋆ denote
the optimal value of problem (2.2) and K ⋆ := {x⋆ ∈ K | ∀x ∈ K : p(x⋆ ) ≤ p(x)}. In the case K compact,
K ⋆ 6= ∅, if K 6= ∅.
2.1.1
Decomposition of globally nonnegative polynomials
The origin of research in characterizing nonnegative and positive polynomials lies in Hilbert’s 17th problem,
whether it is possible to express a nonnegative rational function as sum of squares of rational functions. This
question was answered positively by Artin in 1927. Moreover, the question arises, whether it is possible to
express any nonnegative polynomial as sum of squares of polynomials. In the case of univariate polynomials
the answer to this question is yes, as stated in the following theorem.
Theorem 2.1 Let p ∈ R [x], x ∈ R. Then, p(x) ≥ 0 for all x ∈ R if and only if p ∈
13
P
R[x]2 .
14
CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION
Proof ” ⇐ ” : Trivial.
” ⇒ ” : Let p(x) ≥ 0 for all x ∈ R. It is obvious that deg(p) = 2k for some k ∈ N. Then, the real roots of
of p(x) should have even multiplicity, otherwise p(x) would alter its sign in a neighborhood of a root. Let
λi , i = 1, . . . , r be its real roots with corresponding multiplicity 2mi . Its complex roots can be arranged in
conjugate pairs, aj + Ibj , aj − Ibj , j = 1, . . . , h. Then,
h
r
Y
Y
2mi
((x − aj )2 + b2j ).
(x − λi )
p(x) = C
j=1
i=1
Note that the leading coefficient C needs to be positive. Thus, by expanding the terms in the products, we
see that p(x) can be written as a sum of squares of polynomials, of the form

2
k
k
X
X

vij xj  .
p(x) =
i=0
j=0
However, Hilbert himself already noted that not every nonnegative polynomial can be written as sum of
squares. For instance the Motzkin form M ,
M (x, y, z) = x4 y 2 + x2 y 4 + z 6 − 3x2 y 2 z 2
is nonnegative but not sum of squares. In fact Hilbert gave a complete characterization of the cases where
nonnegativity and the existence of a sum of squares decomposition are equivalent.
Definition 2.1 A form is a polynomial where all the monomials have the same total degree m. Pn,m
denotes the set of nonnegative forms of degree m in n variables, Σn,m the set of forms p such that p = Σk h2k ,
where hk are forms of degree m
2.
There is a correspondance between forms in n with power m and polynomials in n − 1 variables with degree
less or equal to m. In fact, a form in n variables of degree m can be dehomogenized to a polynomial in n − 1
variables by fixing any of the n variables to the constant value 1. Conversely, given a polynomial in n − 1
variables in can be homogenized by multiplying each monomial by powers of a new variable such that the
degree of all monomials equals m. Obviously, Σn,m ⊆ Pn,m holds for all n and m. The following Theorem
is due to Hilbert.
Theorem 2.2 Σn,m ⊆ Pn,m holds with equality only in the following cases:
(i) Bivariate forms: n = 2,
(ii) Quadratic forms: m = 2,
(iii) Ternary quartic forms: n = 3, m = 4.
We interprete the three cases in Theorem 2.2 in terms of polynomials. The first one corresponds to the
equivalence of nonnegativity and sum of squares condition in the univariate case as in Theorem (2.1). The
second one is the case of quadratic polynomials, where the sum of squares decomposition follows from an
eigenvalue/eigenvector factorization. The third case corresponds to quartic polynomials in two variables.
Relevance of sum of squares characterizations Recall that the constraints of our original polynomial
optimization problem are nonnegativity constraints for polynomials of the type gi (x) ≥ 0 (i = 1, . . . , m).
The question, whether a given polynomial is globally nonnegative is decidable, for instance by the TarskiSeidenberg decision procedure [7]. Nonetheless, regarding complexity, the general problem of testing global
nonnegativity of a polynomial function is NP-hard [67], if the degree of the polynomial is at least four.
Therefore it is reasonable to substitute the nonnegativity constraints by expressions that can be decided
easier. It was shown by Parrilo that the decision whether a polynomial is sum of squares is equivalent to a
semidefinite program as stated in the following theorem.
15
2.1. POSITIVE POLYNOMIALS AND POLYNOMIAL OPTIMIZATION
Theorem 2.3 The existence of a sum of squares decomposition of a polynomial in n variables of degree 2d
can be decided by solving a semidefinite programming
[79].
feasibility
problem
If the polynomial is dense,
n+d
n+d
the dimensions of the matrix inequality are equal to
×
.
d
d
αn
1 α2
Proof Let p ∈ R [x] with degree 2d. Recall u(x, Λ(d)) denotes
vector of monomials xα
1 x2 · · · xn
the ordered
Pn
n+d
with i=1 αi ≤ d. The length of u(x, Λ(d)) is s := s(d) =
.
d
P
Claim: p ∈ P R[x]2 if and only if ∃V ∈ Ss+ such that p = u(x, Λ(d))T V u(x, Λ(d)).
Pf: ⇒: p ∈
R[x]2 , i.e.
p=
r
X
qi2
=
i=1
r
X
(wiT u(x, Λ(d)))2
T
= u(x, Λ(d))
r
X
wi wiT
i=1
i=1
!
u(x, Λ(d)).
Pr
Thus, V = i=1 wi wiT and V ∈ Ss+ .
⇐: As V ∈ Ss+ there exists a Cholesky factorization V = W W T , where W ∈ Rs×s and let wi denote the
ith column of W . We have
p = u(x, Λ(d))T V u(x, Λ(d)) =
s
X
i=1
wi wiT u(x, Λ(d)) =
s
X
(wiT u(x, Λ(d)))2 ,
i=1
i.e., p ∈ R[x]. Thus, the claim follows. P
Expanding the quadratic form gives p = si,j=1 Vi,j u(x, Λ(d))i u(x, Λ(d))j . Equating the coefficients in this
expression with the coefficients of the corresponding monomials in the original form for p generates a set of
linear equalities for the variables Vi,j (i, j = 1, . . . , s). Adding the constraint VP∈ Ss+ to those linear equality
constraints, we
conditions for p which are equivalent to claiming p ∈
R[x]2 . Therefore, to decide
P obtain
2
whether p ∈
R[x] is equivalent to solving a semidefinite programming feasibility problem. 2.1.2
Decomposition of polynomials positive on closed semialgebraic sets
Real algebraic geometry deals with the analysis of the real solution set of a system of polynomial equations.
The main difference to algebraic geometry in the complex case lies in the fact that R is not algebraically
closed. One of the main results of real algebra are the Positivstellensätze which provide certificates in
the case a semialgebraic set is empty. Improved versions of the Positivstellensätze can be obtained in case
of compact semialgebraic sets.
General semialgebraic sets
The Positivstellensatz below is due to Stengle; a proof can be found in [7].
Theorem 2.4 (Stengle) Let (fj )j=1,...,t , (gk )k=1,...,m , (hl )l=1,...,k be finite families of polynomials in R [x].
The following properties are equivalent:


gj (x) ≥ 0, j = 1, . . . , m 

(i)
x ∈ Rn | fs (x) 6= 0, s = 1, . . . , t
= ∅.


hi (x) = 0, i = 1, . . . , k
(ii) There exist g ∈ Σ2 hg1 , . . . , gm i, f ∈ O(f1 , . . . , ft ), the multiplicative monoid generated by f1 , . . . , ft , h ∈
I(h1 , . . . , hk ), the ideal generated by h1 , . . . , hk , such that g + f 2 + h = 0.
To understand the differences between the real and the complex case, and the use of the Positivstellensatz
2.4 consider the following example.
16
CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION
Example 2.1 Consider the very simple quadratic equation
x2 + ax + b = 0.
By the fundamental theorem of algebra, the equation has always solutions in C. For the case when the
solution is required to be real, the solution set will be empty if and only if the discriminant D satisfies
D := b −
In this case taking
g :=
2
1
a
√ (x + ) ,
2
D
a2
> 0.
4
f := 1,
h := −
1 2
(x + ax + b),
D
the identity g + f 2 + h = 0 is satisfied.
It is to remark, the Positivstellensatz represents the most general deductive system for which inferences
from the given equations can be made. It guarantees the existence of infeasibility certificates given by
the polynomials f , g and h. For complexity reasons these certificates cannot be polynomial time checkable
for every possible instance, unless NP=co-NP. Parrilo showed that it is possible that the problem of finding
infeasibility certificates is equivalent to an semidefinite program, if the degree of the possible multipliers is
restricted [79].
Theorem 2.5 Consider a system of polynomial equalities and inequalities as in Theorem 2.4. Then, the
search for bounded degree Positivstellensatz infeasibility certificates can be done using semidefinite programming. If the degree bound is chosen to be large enough , then the SDPs will be feasible, and the certificates
are obtained from its solution.
Proof: Consequence of the Positivstellensatz and Theorem 2.3, c.f. [79].
As the feasible set of (2.2) is a closed semialgebraic set, we are interested in characterizations for these
sets and polynomials positive on semialgebraic sets. The Positivstellensatz allows to deduce conditions for
the positivity or the nonnegativity of a polynomial over a semialgebraic set. A direct consequence of the
Positivstellensatz is the following corollary [7], pp. 92.
Corollary 2.1 Let g1 , . . . , gm ∈ R [x] ,
K = {x ∈ Rn | g1 (x) ≥ 0, . . . , gm (x) ≥ 0} and f ∈ R[x]. Then:
(i) ∀x ∈ K f (x) ≥ 0 ⇔ ∃ s ∈ N ∃ g, h ∈ Σ2 hg1 , . . . , gm i s.t. f g = f 2s + h.
(ii) ∀x ∈ K f (x) > 0 ⇔ ∃ g, h ∈ Σ2 hg1 , . . . , gm i s.t. f g = 1 + h.
Proof
(i) Apply the Positivstellensatz to the set
{x ∈ Rn | g1 (x) ≥ 0, . . . , gm (x) ≥ 0, −f (x) ≥ 0, f (x) 6= 0} .
(ii) Apply the Positivstellensatz to the set
{x ∈ Rn | g1 (x) ≥ 0, . . . , gm (x) ≥ 0, −f (x) ≥ 0} .
These conditions for the nonnegativity and positivity of polynomials on semialgebraic sets can be improved
under additional assumptions. We present these improved conditions for compact semi-algebraic sets in
the following section.
2.1. POSITIVE POLYNOMIALS AND POLYNOMIAL OPTIMIZATION
17
Compact semialgebraic sets
It is our aim to characterize polynomials that are positive or nonnegative on compact semialgebraic sets. A
first characterization is a theorem due to Schmüdgen [87]:
Theorem 2.6 (Schmüdgen) Let K = {x ∈ Rn ; g1 (x) ≥ 0, . . . , gm (x) ≥ 0} be a compact semialgebraic
subset of Rn and let p be a positive polynomial on K. Then p ∈ Σ2 hg1 , . . . , gm i.
It was Putinar [81] who simplified this characterization under an additional assumption.
P
Definition 2.2 A quadratic module M (K) is called archimedean if N − ni=1 x2i ∈ M (K) for some
N ∈ N.
Theorem 2.7 (Putinar) Let p be a polynomial, positive on the compact semialgebraic set K and M (K)
archimedian, then p ∈ M (K).
Thus, under the additional assumption of M (K) being archimedian, we obtain the stricter characterization
p ∈ M (K) ⊆ Σ2 hg1 , . . . , gm i instead of p ∈ Σ2 hg1 , . . . , gm i. The original proof of Theorem 2.7 is due to
Putinar [81]. In this proof Putinar applies the separation theorem for convex sets and some arguments from
functional analysis. A new proof due to Schweighofer [88] avoids the arguments from functional analysis and
requires only results from elementary analysis. A further theorem by Schmüdgen [87] provides equivalent
conditions for M (K) being archimedian.
Theorem 2.8 The following are equivalent:
(i) There exist finitely many t1 , . . . , ts ∈ M (K) such that the set
{x ∈ Rn | t1 (x) ≥ 0, . . . , ts (x) ≥ 0}
(which contains K) is compact and
Q
i∈I ti
∈ M (K) for all I ⊂ {1, . . . , s}.
(ii) There exists some p ∈ M (K) such that {x ∈ Rn | p(x) ≥ 0} is compact.
Pn
(iii) There exists an N ∈ N such that N − i=1 x2i ∈ M (K), i.e., M (K) is archimedian.
(iv) For all p ∈ R [x], there is some N ∈ N such that N ± p ∈ M (K).
Thus, for any polynomial p positive on K, p ∈ M (K) holds, if one of the conditions in Theorem 2.8 is
satisfied. Whether it is decidable that one of the equivalent conditions hold is not known and subject of
current research. However, for a given polynomial optimization problem with compact feasible set K, it is
easy to make the
Pncorresponding quadratic module M (K) archimedian. We just need to add a redundant
constraint N − i=1 x2i ≥ 0 for a sufficiently large N .
Example 2.2 Consider the compact semialgebraic set
K = x ∈ R2 | g(x) = 1 − x21 − x22 ≥ 0 .
The quadratic module M (K) is archimedian, as 1 − x21 − x22 = 02 + 12 · g(x) ∈ M (K). The polynomials
f1 (x) := x1 + 2 and x31 + 2 are positive on K. Thus f1 , f2 ∈ M (K) with Theorem 2.7. Their decomposition
can be derived as
f1 (x)
f2 (x)
= x1 + 2
= 2x31 + 3
= 12 (x1 + 1)2 + 21 x22 + 1 + 12 (1 − x21 − x22 ),
= (x31 + 1)2 + (x21 x2 )2 + (x1 x2 )2 + x22 + 1 + (x41 + x21 + 1) (1 − x21 − x22 ).
The next example demonstrates that in general not every polynomial nonnegative on a compact semialgebraic set K is contained in M (K) even if M (K) is archimedian.
18
CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION
Example 2.3 Consider the compact semialgebraic set
K = x ∈ R | g1 (x) := x2 ≥ 0, g2 (x) := −x2 ≥ 0 .
It is obvious that M (K) is archimedian. Also, it is easy to see that there are no q, r, s ∈
p(x) := x = q(x) + r(x) x2 + s(x) (−x2 ),
P
R[x]2 such that
although p is nonnegative on K. However, the polynomial pa ∈ R[x] defined by pa (x) = x + a for a > 0 can
be decomposed as
1
1
(x + 2a)2 − x2 .
pa (x) = x + a =
4a
4a
Thus pa ∈ M (K) for all a > 0.
Remark 2.1 Given a compact semialgebraic set K, it apparently holds, any positive polynomial on K
belongs to the cone M (K) if and only if M (K) is archimedian.
Theorem 2.7 is called Putinar’s Positivstellensatz. Obviously, it does not really characterize the polynomials positive on K since the polynomials in M (K) must only be nonnegative on K. Also, it does not
fully describe the polynomials nonnegative on K since they are not always contained in M (K). However,
it is Theorem 2.7 that is exploited by Lasserre in order to attempt the polynomial optimization problem.
2.1.3
Dense SDP relaxations for polynomial optimization problems
The idea to apply convex optimization techniques to solve polynomial optimization problems was first proposed in the pioneering work of Shor [91]. Shor introduced lower bounds for the global minimum of a
polynomial function p. These bounds are derived by minimizing a quadratic function subject to quadratic
constraints. Also Nesterov discussed the minimizaion of univariate polynomials and mentioned the problem
of minimizing multivariate polynomials in [70]. It was Lasserre [51] who first realized the possibility to apply Putinar’s Positivstellensatz, Theorem 2.7, to solve a broader class of polynomial optimization problems,
that goes beyond the case where p − p⋆ can be described as sum of squares of polynomials.
At first, we introduce Lasserre’s approach to derive semidefinite relaxations for minimizing a polynomial
over a semialgebraic set, as Putinar’s theorem is applied directly there. At second, we present the unconstrained case. Since semialgebraic sets enter through the backdoor, in order to be able to apply Putinar’s
Positivstellensatz, we present it after the constrained case.
Lasserre’s relaxation in the constrained case
After studying positivity and nonnegativity of polynomials and the related problem of moments, we attempt
the inital polynomial optimization problem (2.2) over a compact semialgebraic set K,
minx∈K p(x).
One of the major obstacles for finding the optimum p⋆ is the fact that the set K and the function p are far
from being convex. The basic idea of Lasserre’ s approach [51] is to convexify problem (2.2). We outline this
procedure of convexification. It has to be emphasized that Lasserre’s approach is based on two assumptions.
First we require the semi-algebraic set K to be compact, and second we assume M (K) is archimedian.
These assumptions imply, we are able to apply Putinar’s Positivstellensatz to polynomials positive on K.
At first we note,
p⋆ = sup {a ∈ R | p − a ≥ 0 on K} = sup {a ∈ R | p − a > 0 on K} .
Since we assume that M (K) archimedian, we apply Theorem 2.7 to (2.3). Thus
p⋆ ≤ sup {a ∈ R | p − a ∈ M (K)} ≤ sup {a ∈ R | p − a ≥ 0 on K} = p⋆ .
(2.3)
2.1. POSITIVE POLYNOMIALS AND POLYNOMIAL OPTIMIZATION
19
Finally we obtain
p⋆ = sup {a ∈ R | p − a ∈ M (K)} .
As a second approach, we note for the minimum p⋆ of (2.1) holds
Z
⋆
p = inf
p dµ | µ ∈ MP (K) ,
(2.4)
(2.5)
where MP (K) ⊆ M(K) denotes the set
R of all Borel measures on K which are also probability measures.
≤′ holds since p(x) ≥ p⋆ on K implies p dµ ≥ p⋆ . And ′ ≥′ follows as each x feasible in (2.1) corresponds
to a µ = δx ∈ M(K), where δx the Dirac measure at x.
In order to get rid of the set M(K) in (2.5) we exploit the following theorem by Putinar [81].
′
Theorem 2.9 For any map L : R [x] → R, the following are equivalent:
(i) L is linear, L(1) = 1 and L(M (K)) ⊆ [0, ∞) .
(ii) L is integration with respect to a probability measure µ on K, i.e.,
Z
∃ µ ∈ MP (K) : ∀ p ∈ R [x] : L(p) = p dµ.
Proof C.f. [88], pp. 10.
This theorem does not really characterize MP (K), but all real families (yα )α∈Nn that are sequences of
moments of probability measures on K, i.e.,
Z
yα = xα dµ
∀ α ∈ Nn ,
αn
1
where xα = xα
1 · · · xn . This statement is true, as every linear map L : R [x] → R is given uniquely by its
α
values L(x ) on the basis (xα )α∈Nn of R [x]. With Theorem 2.9 we obtain
p⋆ = inf {L(f ) | L : R [x] → R is linear , L(1) = 1, L(M (K)) ⊆ [0, ∞)} .
(2.6)
Recall (2.4) as
p⋆ = sup {a ∈ R | f − a ∈ M (K)} .
Thus (2.6) can be understood as a primal approach to the original problem (2.1) and (2.4) as a dual
approach. Due to complexity reasons it is necessary to introduce relaxations to these primal-dual pair of
optimization problems, in order tonsolve the problem (2.1). Therefore we approximate
M (K) by the sets
o
P
Pm
2
R [x] , deg(σi gi ) ≤ 2ω for an
Mω (K) ⊆ R [x], where Mω (K) :=
i=0 σi gi | σi ∈
ω ∈ N := {s ∈ N | s ≥ ωmax := max {ω0 , ω1 , . . . , ωm }} ,
degp
i
ωi := ⌈ degg
2 ⌉ (i = 1, . . . , m), ω0 := ⌈ 2 ⌉. Replacing M (K) by Mω (K) motivates to consider the following
pair of optimization problems for a ω ∈ N :
(Pω )
min
(Dω ) max
L(p) subject to
a
subject to
L : R [x]2ω → R is linear,
L(1) = 1 and
L (Mω (K)) ⊆ [0, ∞) .
a ∈ R and
p − a ∈ Mω (K).
(2.7)
The optimal values of (Pω ) and (Dω ) are denoted by Pω⋆ and Dω⋆ , respectively. The parameter ω ∈ N is
called the relaxation order of (2.7). It determines the size of the relaxations (Pω ) and (Dω ) to (2.2) and
therefore also the numerical effort that is necessary to solve them.
20
CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION
Theorem 2.10 (Lasserre) Assume M (K) is archimedian. (Pω⋆ )ω∈N and (Dω⋆ )ω∈N are increasing sequences that converge to p⋆ and satisfy Dω⋆ ≤ Pω⋆ ≤ p⋆ for all ω ∈ N . Moreover, if p − p⋆ ∈ M (K), then
Dω⋆ = Pω⋆ = p⋆ for a sufficiently large relaxation order ω, i.e. strong duality holds.
Proof Since the feasible set of (2.6) is a subset of the feasible set of (Pω ), Pω⋆ ≤ p⋆ . Moreover, if L feasible
for (Pω ) and a for (Dω ), L(p) ≥ a holds since p−a ∈ Mω (K) implies L(p)−a = L(p)−aL(1) = L(p−a) ≥ 0.
Thus Dω⋆ ≤ Pω⋆ . Obviously, a feasible a for (Dω ) is feasible for (Dω+1 ), and every feasible L of (Pω+1 ) is
feasible for (Pω ). This implies (Pω⋆ )ω∈N and (Dω⋆ )ω∈N are increasing. Furthermore, as for any ǫ > 0 there
exists a sufficiently large ω ∈ N such that p − p⋆ + ǫ ∈ Mω (K) by Theorem 2.7, i.e. p⋆ − ǫ feasible for (Dω ),
the convergence follows. If p − p⋆ ∈ M (K), p − p⋆ ∈ Mω (K) for ω sufficiently large. Thus p⋆ feasible for
(Dω ) und therefore Dω⋆ = Pω⋆ = p⋆ . If M (K) not archimedian, we are still able to exploit Schmuedgen’s Positivstellensatz to characterize
p − a in (Dω ). As a next step we follow Lasserre’s observation and translate (Dω ) and (Pω ) into a pair of
primal-dual semidefinite programs.
Definition 2.3 Let L : R[x] → R be linear functional, a sequence y = (yα )α∈Nn be given by
yα := L(xα ),
and d ∈ N fixed. The moment matrix Md (y) of order d is the matrix with rows and columns indexed by
u(x, Λ(d)), such that
Md (y)α,β = L(xα xβ ) = yα+β ∀α, β ∈ Nn with | α |, | β |≤ d.
n+d
The size of Md (y) is given by the | u(x, Λ(d)) |=
, the number of components of y needed for
d
constructing Md (y) is given by
n + 2d
| (yα )|α|≤2d |=
.
2d
In the case n = d = 2, the moment matrix is given by


y00 y10 y01 y20 y11 y02
 y10 y20 y11 y30 y21 y12 


 y01 y11 y02 y21 y12 y03 

.
M2 (y) = 

 y20 y30 y21 y40 y31 y22 
 y11 y21 y12 y31 y22 y13 
y02 y12 y03 y22 y13 y04
P
Let g ∈ R[x] with g(x) = α∈Nn gα xα . The localizing matrix Md (y) of order d associated with g and y is
the matrix with rows and columns indexed in ud (x), obtained from the moment matrix by
X
Md (g y)α,β := L(g(x)xα xβ ) =
hγ yγ+α+β ∀α, β ∈ Nn , | α |, | β |≤ d.
γ∈Nn
For g(x) = x21 + 2x2 + 3, n = 2 and d = 1, the localizing matrix is given by


y20 + 2y01 + 3y00 y30 + 2y11 + 3y10 y21 + 2y02 + 3y01
M1 (g y) =  y30 + 2y11 + 3y10 y40 + 2y21 + 3y20 y31 + 2y12 + 3y11 
y21 + 2y02 + 3y01 y31 + 2y12 + 3y11 y22 + 2y03 + 3y02 .
We will exploit the following key lemma [88].
Lemma 2.1 Suppose L : R [x] → R is a linear map. Then L(Mω ) ⊆ [0, ∞) if and only if the m+ 1 matrices
Mω−ωi (y gi ) < 0,
∀ i ∈ {0, . . . , m} ,
21
2.1. POSITIVE POLYNOMIALS AND POLYNOMIAL OPTIMIZATION
where g0 := 1, ω0 = 0 and the sequence y defined by yα := L(xα ). Moreover,
)
(m
X
s(ω−ωi )
.
hMω−ωi (u2ω−2ωi (x) gi ), Gi i | G0 , . . . , Gm ∈ S+
Mω (K) =
i=0
Proof C.f. [88], p. 19.
Using this Lemma, we reformulate (2.7) as
P
dSDPω min
α∈Λ(2ω) yα pα
s.t.
y ∈ R|Λ(2ω)| , y0 = 1 and
Mω−ωi (y gi ) < 0, i = 1, . . . , m
Mω (y) < 0,
dSDP⋆ω max a
s(ω)
s(ω−ωi )
for i = 1, . . . , m and
s.t.
a
P∈mR, G0 ∈ S+ , Gi ∈ S+
i=0 hMω−ωi (u(x, Λ(2ω)gi ), Gi i = p − a,
(2.8)
P
α
where p(x) =
α∈Λ(2ω) pα x . We call dSDPω the dense Lasserre relaxation or the dense SDP relaxation of relaxation order ω for the polynomial optimization problem (2.1). By sorting the monomials in
the moment and localizing matrix inequality constraints in (2.8), we can express Mω (u(x, Λ(2ω))) and
Mω−ωi (u(x, Λ(2Ω) gi )) as
Mω (u(x, Λ(2ω))) =
X
α∈Λ(2ω)
Bα xα ,
Mω−ωi (u(x, Λ(2ω − 2ωi )) gi ) =
X
Cαi xα ,
α∈Λ(2ω)
for some matrices Bα ∈ Ss(ω) and Cαi ∈ Ss(ω−ωi ) . Thus we can rewrite the primal-dual pair of SDP (2.8)
as the primal-dual pair of equivalent SDP in standard form
P
(PωSDP ) min
α∈Λ(2ω) pα yα
|Λ(2ω)|
s.t.
y ∈ RP
, y0 = 1, and
B0 + α∈Λ(2ω)\{0} yα Bα < 0,
P
C0i + α∈Λ(2ω)\{0} yα Cαi < 0, i = 1, . . . , m
(2.9)
Pm
(DωSDP ) max −G0 (1, 1) − i=1 hC0i , Gi i
s(ω)
s(ω−ωi )
s.t.
a ∈ R, G0 ∈PS+ , Gi ∈ S+
for i ∈ {1, . . . , m} and
m
hBα , G0 i + i=1 hCαi , Gi i = pα , 0 6= α ∈ Λ(2ω)
In general SDP can be solved in polynomial time. Efficient solvers for the SDP (2.9) in standard form
are provided by the software packages SeDuMi [95] and SDPA [104].
Lasserre’s relaxation in the unconstrained case
The procedure to derive a sequence of convergent SDP relaxations in the case of an unconstrained polynomial
optimization problem
minn p(x),
(2.10)
x∈R
where p ∈ R [x] and p⋆ := minx p(x), is similar to the constrained case, which we discussed in the previous
subsection. Let p be of even degree 2l, otherwise inf p = −∞. Moreover, we will exploit the characterization
of sum of squares decompositions by semidefinite matrices and Putinar’s Positivstellensatz. In order to apply
this theorem, it is necessary to construct an appropriate semialgebraic set.
First, we derive the following relaxation,
R
p⋆ = inf n p dµ | µ ∈ MP (Rn )
o
(2.11)
s(l)
≥ inf L(p) | L : R [x] → R, L(1) = 1, Ml (L(x)) ∈ S+
.
22
CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION
We order the expression Ml (L(x)) and introduce symmetric matrices Bα ∈ Ss(l) such that Ml (L(x)) =
P
α
α
α∈Λ(2l) Bα L(x ). Finally we identify yα = L(x ) for α ∈ Λ(2l) \ {0} and y0 = 1 to obtain the following
relaxation for (2.10)
P
(Pl ) min Pα pα yα
(2.12)
s.t.
α6=0 yα Bα < −B0 .
As in the constrained case we can apply a dual approach to (2.10),
P
p⋆ = sup n
{a ∈ R | p(x) − a ≥ 0 ∀ x ∈ Rn } ≥ sup o a ∈ R | p(x) − a ∈
R[x]2
s(l)
.
= sup a | p(x) − a = hMl (x) , Gi, G ∈ S+
(2.13)
Thus, we derive another relaxation to problem (2.10),
(Dl ) max −G(1, 1)
s.t.
hBα , Gi = pα ,
G < 0.
α 6= 0
(2.14)
With the duality theory of convex optimization it can be shown easily, that the two convex programs (2.12)
and (2.14) are dual to each other. In the case (2.14) has an interior feasible solution, strong duality holds,
that is
Pl⋆ = Dl⋆ .
The idea of the following theorem was proposed by Shor [91] first. The presented version is due to Lasserre
[51].
Theorem 2.11 (Shor) If the nonnegative polynomial p−p⋆ is a sum of squares of polynomials, then (2.10)
is equivalent to (2.12). More precisely, p⋆ = ZP and, if x⋆ is a global minimizer of (2.10), then
y ⋆ := x⋆1 , . . . , x⋆n , (x⋆1 )2 , x⋆1 x⋆2 , . . . , (x⋆1 )2m , . . . , (x⋆n )2m
is a minimizer of (2.12).
Next, we treat the general case, that is, when p − p⋆ is not sum of squares. As mentioned at the beginning
we have to construct a semialgebraic set in order to be able to apply Putinar’s Positivstellensatz. Suppose
we know that a global minimizer x⋆ of p(x) has norm less than a for some a > 0, that is, p(x⋆ ) = p⋆ and
|| x⋆ ||2 ≤ a. Then, with x → qa (x) = a2 − || x ||22 , we have p(x) − p⋆ ≥ 0 on Ka := {qa (x) ≥ 0}. Obviously,
M (Ka ) is archimedian, as the condition (iii) in Theorem 2.8 is satisfied for N = a2 . Now, we can use
that every polynomial f , strictly positive on the semialgebraic set Ka is contained in the quadratic module
M (Ka ).
For every ω ≥ l, consider the following semidefinite program
P
(Pωa ) min
α pα y α ,
s.t.
Mω (y) < 0,
(2.15)
Mω−1 (qa y) ≥ 0.
Writing Mω−1 (qa y) =
nite program
P
α
yα Dα , for appropriate matrices Dα (| α |≤ 2ω), the dual of (Pωa ) is the semidefi(Dωa ) max
s.t.
−G(1, 1) − a2 H(1, 1),
hG, Bα i + hH, Dα i = pα , α 6= 0.
(2.16)
Then, the following theorem is due to Lasserre [51].
Theorem 2.12 (Lasserre) Given (Pωa ) and (Dωa ) for some a > 0 such that || x⋆ ||2 ≤ a for some global
minimizer x⋆ . Then
23
2.1. POSITIVE POLYNOMIALS AND POLYNOMIAL OPTIMIZATION
(a) as ω → ∞, one has
inf(Pωa ) ↑ p⋆ .
Moreover, for ω sufficiently large, there is no duality gap between (Pωa ) and its dual (Dωa ), and (Dωa )
is solvable.
(b) min(Pωa ) = p⋆ if and only if p − p⋆ ∈ Mω (Ka ). In this case, the vector
y ⋆ := x⋆1 , . . . , x⋆n , (x⋆1 )2 , x⋆1 x⋆2 , . . . , (x⋆1 )2ω , . . . , (x⋆n )2ω
is a minimizer of (Pωa ). In addition, max(Pωa ) = min(Dωa ).
Proof
(a) From x⋆ ∈ Ka and with
y ⋆ := x⋆1 , . . . , x⋆n , (x⋆1 )2 , x⋆1 x⋆2 , . . . , (x⋆1 )2ω , . . . , (x⋆n )2ω
it follows that Mω (y ⋆ ), Mω−1 (qa y ⋆ ) < 0 so that y ⋆ is feasible for (Pωa ) and thus inf(Pωa ) ≤ p⋆ .
Now, fix ǫ > 0 arbitrary. Then, p − p⋆ + ǫ > 0 and therefore, with Theorem 2.7 there is some N0 such
that
r2
r1
X
X
tj (x)2
qi (x)2 + q(x)
p − p⋆ + ǫ =
i=1
j=1
for some polynomials qi (x), i = 1, . . . , r1 , of degree at most N0 , and some polynomials tj (x), j =
1, . . . , r2 , of degree at most N0 − 1. Let qi ∈ Rs(N0 ) , tj ∈ Rs(N0 −1) be the corresponding vectors of
coefficients, and let
r2
r1
X
X
T
tj tTj
qi qi , Z :=
G :=
i=1
j=1
so that G, H < 0. It is immediate to check that (G, H) feasible for (Dωa ) with value −G(1, 1) −
a2 H(1, 1) = (p⋆ − ǫ). From weak duality follows convergence as
p⋆ − ǫ ≤ inf(Pωa ) ≤ p⋆ .
For strong duality and for (b), c.f. [51].
We needed to add the constraint qa (x) ≥ 0, in order to show convergence of the SDP relaxation (Pωa ).
For applications it is often sufficient to consider an SDP relaxation, which does not take into account this
constraint. Thus, we denote the primal-dual pair of SDP
dSDPω
dSDP⋆ω
P
min
α pα y α
s.t.
Mω (y) < 0,
max −G(1, 1)
s.t.
hG, Bα i = pα
∀α 6= 0,
as the dense SDP relaxation of relaxation order ω for polynomial optimization problem (2.10), which is
consistent with the dense SDP relaxation for the constrained case. This sequence of SDP is not guaranteed
to converge to the minimum of (2.10) for ω → ∞. However, it provides a non-decreasing sequence of lower
bounds to p⋆ ,
min(dSDPωmax ) ≤ min(dSDPωmax +1 ) ≤ . . . ≤ p⋆ .
24
CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION
Global minimizer
Usually one is not only interested in finding the minimum value p⋆ of p on K, but also in obtaining a global
minimizer x⋆ ∈ K ⋆ with p(x⋆ ) = p⋆ . It will be shown that in Lasserre’s procedure not only min(dSDPω )
converges to the infimum p⋆ , but also a convergence to the minimizer x⋆ of (2.2) in the case it is unique.
Definition 2.4 Lω solves (Pω ) nearly to optimality (ω ∈ N ) if Lω is a feasible solution of (Pω ) (ω ∈ N )
such that limω→∞ Lω (p) = limω→∞ Pω⋆ .
This notation is useful because (Pω ) might not possess an optimal solution, and even if it does, we might
not be able to compute it exactly. For an example, c.f. [88], Example 22. Obviously Lω solves (Pω ) nearly
to optimality (ω ∈ N ) if and only if limω→∞ Lω (f ) = p⋆ . The following theorem is the basis for the convergence to a minimizer in the case K ⋆ is a singleton.
Theorem 2.13 Suppose K 6= ∅ and Lω solves (Pω ) nearly to optimality (ω ∈ N ). Then
Z
α
α
⋆
∀d ∈ N : ∀ǫ > 0 : ∃k0 ∈ N ∩ [d, ∞) : ∀k ≥ k0 : ∃µ ∈ M(K ) : Lω (x ) − x dµ
< ǫ.
α∈Λ(2d) Proof [88], p. 11.
In the convenient case where K ⋆ is a singleton it is possible to guarantee convergence of the minimizer:
Corollary 2.2 K ⋆ = {x⋆ } is a singleton and Lω solves (Pω ) nearly to optimality (ω ∈ N ). Then
lim (Lω (x1 ), . . . , Lω (xn )) = x⋆ .
ω→∞
Proof We set d = 1 in Theorem 2.13 and note that M(K ⋆ ) contains only the Dirac measure δx⋆ at the
point x⋆ . It is possible to apply Corollary 2.2 to certify that p⋆ has almost been reached after successively
solving the relaxations (Pω ).
Corollary 2.3 Suppose M (K) is archimedian, p has a unique minimizer on the compact semialgebraic set
K and Lω solves (Pω ) nearly to optimality for all ω ∈ N . Then holds for all ω ∈ N ,
Lω (p) ≤ p⋆ ≤ p(Lω (x1 ), . . . , Lω (xn )),
and the lower and upper bounds for p⋆ converge to p⋆ for ω → ∞.
Proof Lω (p) ≤ p⋆ follows from Theorem 2.10. The convergence of p(Lω (x1 ), . . . , Lω (xn )) is a consequence
of Corollary 2.2. To see that p⋆ is a lower bound, observe that
gi (Lω (x1 ), . . . , Lω (xn )) = Lω (gi ) ≥ 0,
whence (Lω (x1 ), . . . , Lω (xn )) ∈ K for all k ∈ N .
The case where several optimal solutions exist is more difficult to handle. In fact, as soon as there are
two or more global minimizers, it often occurs that symmetry in the problem prevents the nearly optimal
solutions of the SDP relaxations to converge to a particular minimizer.
Henrion and Lasserre established a sufficient condition for the dense SDP relaxations to detect all optimal
solutions [35]. Given the dense SDP relaxation dSDPω for some order ω ≥ ωmax and y ⋆ an optimal solution
of this semidefinite program. In the case
rankMω (y ⋆ ) = rankMω−ωmax (y ⋆ )
(2.17)
⋆
holds, the SDP relaxation dSDPω is exact. That is, min(dSDPω ) = p . Moreover, Henrion and Lasserre
provided an algorithm for extracting all global optimal solutions of the POP (2.1), if (2.17) holds. See [35]
for details. Note, (2.17) is not necessary, dSDPω may be exact already for some relaxation order ω with
ω until
rankMω (yω⋆ ) > rankMω−ωmax (yω⋆ ). For many POP it may not be practical to increase
(2.17) holds,
n+ω
as the size of the moment and localizing matrix constraints in dSDPω given by
grows rapidly
n
for increasing ω.
2.1. POSITIVE POLYNOMIALS AND POLYNOMIAL OPTIMIZATION
2.1.4
25
Sparse SDP relaxations for polynomial optimization problems
The dense SDP relaxation (2.8) by Lasserre is a powerful theoretical result since it allows to approximate
the solutions of polynomial optimization problems (2.1) as closely as desired
by solving
a finite sequene of
n+ω
SDP relaxations. However, since the size of the SDP relaxation grows as
, even for medium scale
ω
POPs the SDP relaxations become intractable for present SDP solvers in the case of small choices of the
relaxation order ω. Therefore, it is crucial to reduce the size of the semidefinite programs to be solved, in
order to be able to attempt large scale POPs. In this section we will review the approach [102] to exploit
sparsity in a large scale POP by introducing a sequence of sparse SDP relaxations, which is of much smaller
size than the dense SDP relaxations (2.8). A second method to exploit sparsity in a general optimization
problem with linear and/ or nonlinear matrix inequality constraints is presented in [42].
In many problems of type (2.1), the involved polynomials p, g1 , . . . , gm are sparse. Waki, Kojima, Kim
and Muramatsu constructed a sequence of SDP relaxations which exploits the sparsity of such polynomial
optimization problems [102]. This method shows strong numerical efforts in comparision to Lasserre’s
relaxations (2.8). The convergence of the sparse SDP relaxations to the optimum of the original problem
(2.1) was shown by Lasserre [52] and Kojima and Muramatsu [49]. In the following, we give a review of the
sparse SDP relaxations for POP with structured sparsity.
Let the polynomial optimization problem be given as in (2.2),
minx∈K p(x),
where K is a compact semialgebraic set defined by the m inequality constraints g1 ≥ 0, . . . , gm ≥ 0. We
characterize sparsity for a POP (2.2) with the following definition.
Definition 2.5 Given a POP (2.2), we denote the n × n symbolic matrix R defined by


⋆, if xi xj occurs in some monomial of p,
Ri,j = ⋆, if xi and xj occur in the same gl (l = 1, . . . , m),


0, else,
as the correlative sparsity pattern matrix of the POP. The graph G = (V, N ) with vertex set V :=
{1, . . . , n} and edge set
N := {i, j} ∈ V 2 | Ri,j = ⋆ ,
is called the corresponding correlative sparsity pattern graph. A POP is defined to be correlatively
sparse, if R is sparse.
We will construct a sequence of SDP relaxations to this polynomial optimization problem, which exploits
the sparsity pattern characterized by the correlative sparsity pattern matrix R. Under a certain condition
on the sparsity pattern of the problem, the optima of these SDP relaxations converge to the optimum of
the polynomial optimization problem (2.2).
First, let {1, . . . , n} be the union ∪qk=1 Ik of subsets Ik ⊂ {1, . . . , n}, such that every gj , j ∈ {1, . . . , m} is
only concerned with variables {xi | i ∈ Ik } for some k. And it is required the objective p can be written as
p = p1 + . . . + pq where each pk uses only variables {xi | i ∈ Ik }. A possible choice for the sets I1 , . . . , Ip
are the maximal cliques of the correlative sparsity pattern graph G. In order to tackle the sparse SDP
relaxations we need some further definitions.
Definition 2.6 Given a subset I of {1, . . . , n} we define the sets
AI
AIω
=
{α ∈ Nn : αi = 0 if i ∈
/ I} ,
P
= α ∈ Nn : αi = 0 if i ∈
/ I and
i∈I αi ≤ ω .
Then, we define R [x, G] := {f ∈ R [x] : supp(f ) ⊆ G}. Also, the restricted moment matrix Mr (y, I)
and localizing matrices Mr (gy, I) are defined for I ⊆ {1, . . . , n} , r ∈ N and g ∈ R [x]. They are
26
CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION
obtained from Mr (y) and Mr (gy) by retaining only those rows (and columns) α ∈ Nn of Mr (y) and Mr (gy)
with supp(α) ⊆ AIr . In doing so, Mr (y, I) and Mr (gy, I) can be interpreted
as moment and localizing
matrices with rows and columns indexed in the canonical basis u(x, AIr ) of R x, AIr . Finally, we denote the
P
P
2
2
set of sum of square polynomials in R [x, G] as
R [x, G] . In analogy to Theorem 2.3,
R [x, G] can be
written as
X
2
R [x, G] = u(x, G)T V u(x, G) : V < 0 .
Let m′ be the number of inequality constraints which define the basic, closed, semialgebraic set K. In our
initial setting m′ = m, but later on m′ > m may hold, in the case we add further inequality constraints to
restrict the set K. We introduce a condition for the index sets I1 , . . . , Iq .
Assumption 1: Let K ⊆ Rn as in (2.23). The index set J = {1, . . . , m′ } is partitioned into q disjoint
sets Jk , k = 1, . . . , q, and the collections {Ik } and {Jk } satisfy:
1. For every j ∈ Jk , gj ∈ R x, AIk , that is, for every j ∈ Jk , the constraint gj (x) ≥ 0 is only concerned
with the variables x(Ik ). Equivalently, viewing gj as a polynomial in R [x] , gjα 6= 0 ⇒ supp(α) ∈ AIk .
2. The objective function p ∈ R [x] can be written
p=
q
X
k=1
pk , with pk ∈ R x, AIk , k = 1, . . . , q.
Equivalently, fα 6= 0 ⇒ supp(α) ∈ ∪qk=1 AIk .
Example 2.4 For n = 6 and m = 6, let
g1 (x) = x1 x2 − 1,
g2 (x) = x21 + x2 x3 − 1,
g3 (x) = x2 + x23 x4 ,
and
g4 (x) = x3 + x5 ,
g5 (x) = x3 x6 ,
g6 (x) = x2 x3 .
Then we can construct {Ik } and {Jk } for q = 4 with
I1 = {1, 2, 3} , I2 = {2, 3, 4} , I3 = {3, 5} , I4 = {3, 6} ,
J1 = {1, 2, 6} , J2 = {3} ,
J3 = {4} ,
J4 = {5} .
Now, we can construct sparse SDP relaxations in analogy to the dense SDP relaxations (2.8). For each
deg g
j = 1, . . . , m′ write ωj = ⌈ 2 j ⌉. Then, for some ω ∈ N define the following semidefinite program
(sSDPω ) inf y
s.t.
P
α pα y α
Mω (y, Ik ) < 0,
k = 1, . . . , q,
Mω−ωj (gj y, Ik ) < 0, j ∈ Jk ; k = 1, . . . , q,
y0 = 1.
(2.18)
Program (2.18) is well defined under Assumption 3, and it is easy to see, that it is an SDP relaxation of
problem (2.2). In fact, it is also easy to see, that sSDPω is a weaker relaxation for (2.2) than dSDPω , as
the partial moment and localizing matrices in the constraints of (2.18) are minors of the full moment and
localizing matrices in the constraints of (2.8), i.e.,
min(sSDPω ) ≤ min(dSDPω ) ≤ min(POP)
∀ω ∈ N .
We call (2.18) the sparse SDP relaxations
relaxations for polynomials optimization
Lasserre
or sparse
problems. There are symmetric matrices Bαk and Cαjk such that
P
Mω (y, Ik )
= Pα∈Nn yα Bαk , k = 1, . . . , q,
Mω−ωj (gj y, Ik ) = α∈Nn yα Cαjk , k = 1, . . . , q, j ∈ Jk ,
(2.19)
2.1. POSITIVE POLYNOMIALS AND POLYNOMIAL OPTIMIZATION
with Bαk = 0 and Cαjk = 0 whenever supp(α) ∈
/ AIk . Then we can rewrite (2.18) as
P
inf y
α pα y α
X
s.t.
yα Bαk < −B0k , k = 1, . . . , q,
06X
=α∈Nn
yα Cαjk
<
06=α∈Nn
−C0jk ,
27
(2.20)
j ∈ Jk ; k = 1, . . . , q,
and we derive the dual of this semidefinite program as
(sSDP⋆ω )
sup
X
Yk ,Zjk ,λ
k:α∈AIk
λ
i
h
P
hYk , Bαk i + j∈Jk hZjk , Cαjk i + λδα0 = pα
∀ α ∈ Γω ,
(2.21)
Yk , Zjk < 0,
j ∈ Jk , k = 1, . . . , q,
Sq
Ik
n
where Γω := α ∈ N : α ∈ k=1 A ; | α |≤ 2ω .
The main advantage of the sparse SDP relaxations is the reduction of the size of the matrix inequality
constraints. In order to understand the improved efficiency, let us compare the computational complexity of
theP
dense relaxation
(PωSDP ) and the sparse relaxation (Pωsp ). The number of variables in (Pωsp ) is bounded
q
nk +2ω
by k=1
. Supposed nk ≈ nq for all k, the number of variables is bounded by O(q( nq )2ω ), a strong
ω
improvement compared with O(n2ω ), the number of variables in (PωSDP ). Also in (Pωsp ) there are p LMI
constraints of size O(( nq )ω ) and m + q LMI constraints of size O(( nq )ω−ωmax ), to be compared with a single
LMI constraint of size O(nω ) and m LMI constraints of size O(nω−ωmax ) in (PωSDP ).
As pointed out, the sparse SDP relaxations are weaker than the dense ones. The question arises, whether
we still have convergence to the minimum of the POP. This question was answered positively by Lasserre
[52]. We need two further conditions to show convergence.
Assumption 2: For all k = 1, . . . , q − 1,
Ik+1 ∩
k
[
j=1
Ij ⊆ Is for some s ≤ k.
(2.22)
The property of Assumption 2 is called the running intersection property. Note that (2.22) is always
satsfied for q = 2. Since property (2.22) depends on the ordering, it can be satisfied possibly after some
relabelling of the {Ik }. In the case of Example 2.4 it is easy to check Assumption 2 is satisfied, but in general
it is not obvious. However, Waki et al. [102] presented a general procedure to guarantee Assumption 2 is
satisfied. Given G = (V, E) the correlative sparsity pattern graph, we denote by G̃ = (V, Ẽ) its chordal
extension. A graph is said to be chordal if every (simple) cycle of the graph with more than three edges has
a chord. A graph G(V, E) is a chordal extension of G(V, E) if it is a chordal graph and E ⊆ E. See [4] for
basic properties on choral graphs. Then, the maximal cliques C1 , . . . , Cq of G̃ satisfy the running intersection
property, and the number q of maximal cliques in a chordal graph is bounded by n. Furthermore, there are
efficient algorithms to determine the maximal cliques of a chordal graph, whereas it is NP-hard to determine
the maximal cliques of an arbitrary graph.
Assumption 3: Let K ⊆ Rn be a closed semialgebraic set. Then, there is M > 0 such that || x ||∞ < M
for all x ∈ K.
This assumption implies || x(Ik ) ||2∞ < nk M 2 , k = 1, . . . , q, where x(Ik ) := {xi | i ∈ Ik }, and therefore we
add to K the q redundant quadratic constraints
gm+k (x) := nk M 2 − || x(Ik ) ||≥ 0, k = 1, . . . , q,
and set m′ = m + q, so that K is now defined by:
K := {x ∈ Rn | gj (x) ≥ 0, j = 1, . . . , m′ } .
(2.23)
i
h
Notice that gm+k ∈ R x, AI2k for every k = 1, . . . , q. With Assumption 3, K is a compact semialgebraic
set. Moreover, Assumption 3 is needed to guarantee the quadratic module M (K) is archimedian, the
condition of Putinar’s Positivstellensatz. Finally, we obtain the following convergence result.
28
CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION
Theorem 2.14 Let p⋆ denote the global minimum of (2.2) and let Assumption 1-3 hold. Then:
(a) inf(sSDPω ) ↑ p⋆ as ω → ∞.
(b) If K has nonempty interior, then strong duality holds and (sSDP⋆ω ) solvable for sufficiently large ω,
i.e., inf(sSDPω ) = max(sSDP⋆ω ).
(c) Let y ω be a nearly optimal solution of (sSDPω ), with e.g.
X
α
pα yα ≤ inf(sSDPω ) +
1
,
ω
∀ω ≥ ω0 ,
and let ŷ ω := {yαω : | α |= 1}. If (2.2) has a unique global minimizer x⋆ ∈ K, then ŷ ω → x⋆
ω → ∞.
as
Proof C.f. [52].
As in the dense case, it is also possible to extract global minimizers of the POP from the sparse SDP
relaxations in certain cases where the minimizer of the POP is not unique. In fact, Lasserre derived the
following sparse version of condition (2.17): Given y ⋆ is an optimal solution for the sparse SDP relaxation
sSDPω for some order ω ≥ ωmax . If the rank conditions,
rankMω (y ⋆ , Ih )
= rankMω−ah (y ⋆ , Ih )
⋆
rankMω (y , Ih ∩ Ih′ ) = 1
∀ h ∈ {1, . . . , q},
∀h 6= h′ with Ih ∩ Ih′ 6= ∅,
(2.24)
with ah := maxj∈Jh ωj , hold, then sSDPω is exact and all global minimizers can be extracted. However,
(2.24) are very restrictive sufficient conditions for the SDP relaxations to be exact, and it is not practical
to apply them to large scale POP in most cases.
The software SparsePOP [103] is an implementation of the sparse SDP relaxations. The running intersection
property is guaranteed by choosing the maximal cliques of the chordal extension of the correlative sparsity
pattern graph as the index sets I1 , . . . , Iq in (2.18). Instead of imposing the additional constraints of
Assumption 3, in SparsePOP linear box constraints are imposed for each component of x ∈ Rn ,
lbdi ≤ xi ≤ ubdi
∀ i ∈ {1, . . . , n}.
(2.25)
Moreover, SparsePOP adds small linear perturbation terms to the objective function of the POP, in order
to enforce the POP to have a unique global minimizer.
2.2
Exploiting sparsity in linear and nonlinear matrix inequalities
Optimization problems with nonlinear matrix inequalities, including quadratic and polynomial matrix inequalities, are known as hard problems. They frequently belong to large-scale optimization problems. We
present a basic framework for exploiting the sparsity characterized in terms of a chordal graph structure
via positive semidefinite matrix completion [28]. Depending on where the sparsity is observed, two types of
sparsities are studied: the domain-space sparsity (d-space sparsity) for a symmetric matrix X that appears
as a variable in objective and/or constraint functions of a given optimization problem and is required to be
positive semidefinite, and the range-space sparsity (r-space sparsity) for a matrix inequality involved in the
constraint of the problem.
The d-space sparsity is basically equivalent to the sparsity studied by Fukuda et. al [21, 68] for an equality
standard form SDP. One of the two d-space conversion methods proposed in this section corresponds to
an extension of their conversion method, and the other d-space conversion method is an extension of the
method used for the sparse SDP relaxation of polynomial optimization problems in [102, 103] and for the
sparse SDP relaxation of a sensor network localization problem in [43].
The r-space sparsity concerns with a matrix inequality
M (y) < 0,
(2.26)
2.2. EXPLOITING SPARSITY IN LINEAR AND NONLINEAR MATRIX INEQUALITIES
29
involved in a general nonlinear optimization problem. Here M denotes a mapping from Rs into Sn . If M
is linear, (2.26) is known as a linear matrix inequality (LMI), which appears in the constraint of a dual
standard form of SDP. If each element of M (y) is a multivariate polynomial function in y ∈ Rs , (2.26) is
called a polynomial matrix inequality and the SDP relaxation [36, 37, 46, 48, 49, 52], which is an extension
of the SDP relaxation [51] for POP, can be applied to (2.26). We assume a similar chordal graph structured
sparsity as the d-space sparsity on the set of row and column index pairs (i, j) of the mapping M such
that Mij is not identically zero, i.e., Mij (y) 6= 0 for some y ∈ Rs . A representative example satisfying the
r-space sparsity can be found with tridiagonal M . We do not impose any additional assumption on (2.26)
to derive a r-space conversion method. When M is polynomial in y ∈ Rs , we can effectively combine it with
the sparse SDP relaxation method [46, 49] for polynomial optimization problems over symmetric cones to
solve (2.26).
We propose two methods to exploit the r-space sparsity. One may be regarded as a dual of the d-space
conversion method by Fukuda et. al [21]. More precisely, it exploits the sparsity of the mapping M in the
range space via a dual of the positive semidefinite matrix completion to transform the matrix inequality
(2.26) to a system of multiple matrix inequalities with smaller sizes and an auxiliary vector variable z ∈ Rq .
The resulting matrix inequality system is of the form
fk (y) − L
e k (z) < 0 (k = 1, 2, . . . , p),
M
(2.27)
fk denotes a mapping from
and y ∈ Rs is a solution of (2.26) if and only if it satisfies (2.27) for some z. Here M
s
k
e
R into the space of symmetric matrices with some size and L a linear mapping from Rq into the space of
fk (k = 1, 2, . . . , p)
symmetric matrices with the same size. The sizes of symmetric matrix valued mappings M
and the dimension q of the auxiliary variable vector z are determined by the r-space sparsity pattern of M .
fk are all 2 × 2 and q = n − 2. The other r-space conversion
For example, if M is tridiagonal, the sizes of M
method corresponds to a dual of the second d-space conversion method mentioned previously. We discuss
how the d-space and r-space conversion methods enhance the correlative sparsity for POP introduced in
the previous section. Furthermore, we present numerical results to demonstrate how the size of problems
involving large scale matrix inequalities is reduced under the four proposed conversion methods.
2.2.1
An SDP example
A simple SDP example is shown to illustrate the two types of sparsities considered in this paper, the d-space
sparsity and the r-space sparsity, and compare it to the correlative sparsity from 2.1.4 that characterizes
the sparsity of the Schur complement matrix.
Let A0 be a tridiagonal matrix in Sn such that A0ij = 0 if |i − j| > 1, and define a mapping M from Sn
into Sn by

1 − X11

0



0
M (X) = 


...


0
X21
0
1 − X22
0
...
0
X32
0
0
..
.
...
0
X43
...
...
..
0
0
X12
X23
0
X34
.
...
1 − Xn−1,n−1
...
Xn,n−1
for every X ∈ Sn . Consider an SDP
minimize A0 • X subject to M (X) < 0, X < 0.
...
Xn−1,n
1 − Xnn










(2.28)
Among the elements Xij (i = 1, 2, . . . , n, j = 1, 2, . . . , n) of the matrix variable X ∈ Sn , the elements
Xij with |i − j| ≤ 1 are relevant and all other elements Xij with |i − j| > 1 are unnecessary in evaluating
the objective function A0 • X and the matrix inequality M (X) < 0. Hence, we can describe the d-sparsity
30
CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION
pattern as a symbolic tridiagonal matrix with the nonzero symbol ⋆

⋆
⋆
0 ...
 ⋆
⋆
⋆
...


..
 0
.
⋆
⋆


.
.
..
 . . . . . . ..


.
 0
0 . . . ..
0
0 ... ...
On the other hand, the r-space sparsity pattern

⋆
0
 0
⋆


 ... ...

 0
0
⋆
⋆
0
0
0
..
.
⋆
⋆

0
0 


0 

.
... 


⋆ 
⋆
is described as

... 0
⋆
... 0
⋆ 


..
.
. ... ... 

... ⋆
⋆ 
... ⋆
⋆
Applying the d-space conversion method using basis representation described in 2.2.3, and the r-space
conversion method using clique trees presented in 2.2.5, we can reduce the SDP (2.28) to

n−1
X


0
0
0

Aii Xii + 2Ai,i+1 Xi,i+1 + Ann Xnn
minimize




i=1



1 0
X11 −X12



subject to
−
< 0,

0
0
−X
−z

21
1

1 0
Xii
−Xi,i+1
(2.29)
< 0 (i = 2, 3, . . . , n − 2), 
−


0 0 −Xi+1,i zi−1 − zi


1 0
Xn−1,n−1
−Xn−1,n


< 0,
−


−X
X
+
z
0
1

n,n−1
n,n
n−2



−Xii
−Xi,i+1
0 0

< 0 (i = 1, 2, . . . , n − 1). 
−

−Xi+1,i −Xi+1,i+1
0 0
This problem has (3n − 3) real variables Xii (i = 1, 2, . . . , n), Xi,i+1 (i = 1, 2, . . . , n − 1) and zi (i =
1, 2, . . . , n − 2), and (2n − 1) linear matrix inequalities with size 2 × 2. Since the original SDP (2.28) involves
an n × n matrix variable X and an n × n matrix inequality M (X) < 0, we can expect to solve the SDP
(2.29) much more efficiently than the SDP (2.28) as n becomes larger.
We can formulate both SDPs in terms of a dual standard form for SeDuMi [95]:
maximize bT y subject to c − AT y < 0,
where b ∈ Rl , A ∈ Rl×m and c ∈ Rm for some positive integers l and m. Table 2.2.1 shows numerical
results on the SDPs (2.28) and (2.29) solved by SeDuMi. We observe that the SDP (2.29) greatly reduces
the size of the coefficient matrix A, the number of nonzeros in A and the maximum SDP block compared
to the original SDP (2.28). In addition, it should be emphasized that the l × l Schur complement matrix
is sparse in the SDP (2.29) while it is fully dense in the the original SDP (2.28). As shown in Figure 2.1,
the Schur complement matrix in the SDP (2.29) allows a very sparse Cholesky factorization. The sparsity
of the Schur complement matrix is characterized by the correlative sparsity from 2.1.4. Notice a hidden
correlative sparsity in the SDP (2.28), that is, each element Xij of the matrix variable X appears at most
once in the elements of M (X). This leads to the correlative sparsity when the SDP (2.28) is decomposed
into the SDP (2.29). The sparsity of the Schur complement matrix and the reduction in the size of matrix
variable from 10000 to 2 are the main reasons that SeDuMi can solve the largest SDP in Table 1 with a
29997 × 79992 coefficient matrix A in less than 100 seconds.
2.2. EXPLOITING SPARSITY IN LINEAR AND NONLINEAR MATRIX INEQUALITIES
31
SeDuMi CPU time in seconds (sizeA, nnzA, maxBl, nnzSchur)
the SDP (2.28)
the SDP (2.29)
0.2 (55×200,128,10,3025)
0.1 (27×72, 80,2,161)
1091.4 (5050×20000,10298,100,25502500)
0.6 (297×792,890,2,1871)
OOM
6.3 (2997×7992,8990,2,18971)
OOM
99.2 (29997×79992,89990,2,189971)
n
10
100
1000
10000
Table 2.1: Numerical results on the SDPs (2.28) and (2.29). Here sizeA denotes the size of the coefficient
matrix A, nnzA the number of nonzero elements in A, maxBl the maximum SDP block size, and nnzSchur
the number of nonzeros in the Schur complement matrix. OOM means out of memory error.
0
0
5
50
100
10
150
15
200
20
250
25
0
5
10
15
nz = 94
20
25
0
50
100
150
nz = 1084
200
250
Figure 2.1: The sparsity pattern of the Cholesky factor of the Schur complement matrix for the SDP (2.29)
with n = 10 and n = 100.
2.2.2
Positive semidefinite matrix completion
A problem of positive semidefinite matrix completion is: Given an n × n partial symmetric matrix X with
entries specified in a proper subset F of N × N , where N = {1, . . . , n}, find an X ∈ Sn+ satisfying X ij = Xij
((i, j) ∈ F ) if it exists. If X is a solution of this problem, we say that X is completed to a positive semidefinite
symmetric matrix X. For example, the following 3 × 3 partial symmetric matrix


3 3
X =  3 3 2 
2 2
is completed to a 3 × 3 positive semidefinite symmetric

3
X =  3
2
matrix

3 2
3 2 .
2 2
For a class of problems of positive semidefinite matrix completion, we discuss the existence of a solution
and its characterization in this section. This provides a theoretical basis for both d- and r-space conversion
methods.
32
CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION
Let us use a graph G(N, E) with the node set N and an edge set E ⊆ N × N to describe a class of
n × n partial symmetric matrices. We assume that (i, i) 6∈ E, i.e., the graph G(N, E) has no loop. We also
assume that if (i, j) ∈ E, then (j, i) ∈ E, and (i, j) and (j, i) are interchangeably identified. Define
E•
Sn (E, ?)
Sn+ (E, ?)
= E ∪ {(i, i) : i ∈ N },
= the set of n × n partial symmetric matrices with entries
specified in E • ,
= {X ∈ Sn (E, ?) : ∃X ∈ Sn+ ; X ij = Xij if (i, j) ∈ E • }
(the set of n × n partial symmetric matrices with entries
specified in E • that can be completed to positive
semidefinite symmetric matrices).
For a graph G(N, E) shown in Figure 2.2 as

X11





X22




X33
S6 (E, ?) = 

X43








X61 X62 X63
Let
#C
=
C
=
=
S
SC
+
X(C) =
J(C)
=
an illustrative example, we have


X16




X26 




X34
X36 
 : Xij ∈ R (i, j) ∈ E • .

X44 X45





X54 X55 X56 



X65 X66
(2.30)
the number of elements in C for every C ⊆ N ,
{X ∈ Sn : Xij = 0 if (i, j) 6∈ C × C} for every C ⊆ N,
{X ∈ SC : X < 0} for every C ⊆ N ,
e ∈ SC such that X
eij = Xij ((i, j) ∈ C × C)
X
for every X ∈ Sn and every C ⊆ N ,
{(i, j) ∈ C × C : 1 ≤ i ≤ j ≤ n} for every C ⊆ N .
′
Note that X ∈ SC is an n × n matrix although Xij = 0 for every (i, j) 6∈ C × C. Thus, X ∈ SC and X ′ ∈ SC
can be added even when C and C ′ are distinct subsets of N . When all matrices involved in an equality
or a matrix inequality belong to SC , matrices in SC are frequently identified with the #C × #C matrix
whose elements are indexed with (i, j) ∈ C × C. If N = {1, 2, 3} and C = {1, 3}, then a matrix variable
X ∈ SC ⊂ Sn has full and compact representations as follows:


X11 0 X13
X11 X13


0
0
0
.
and X =
X=
X31 X33
X31 0 X33
It should be noted that X ∈ SC ⊂ Sn has elements Xij with (i, j) ∈ C ×C in the 2×2 compact representation
on the right. Let
Eij
=
the n × n symmetric matrix with 1 in (i, j)th and (j, i)th
elements and 0 elsewhere
for every (i, j) ∈ N × N . Then Eij (1 ≤ i ≤ j ≤ n) form a basis of Sn . Obviously, if i, j ∈ C ⊆ N , then
Eij ∈ SC . We also observe the identity
X(C) =
X
(i,j)∈J(C)
Eij Xij for every C ⊆ N.
(2.31)
2.2. EXPLOITING SPARSITY IN LINEAR AND NONLINEAR MATRIX INEQUALITIES
33
This identity is utilized in 2.2.3.
With these notations we can now state the result from matrix completion which forms the basis for our
d-space and r-space conversion techniques. Let G(N, E) be a graph and Ck (k = 1, . . . , p) be its maximal
k
cliques. We assume that X ∈ Sn (E, ?). The condition X(Ck ) ∈ SC
+ (k = 1, 2, . . . , p) is necessary for
n
X ∈ S+ (E, ?). For the graph G(N, E) shown in Figure 2.2, the maximal cliques are C1 = {1, 6}, C2 =
{2, 6}, C3 = {3, 4}, C4 = {3, 6}, C5 = {4, 5} and C6 = {5, 6}. Hence, the necessary condition for
X ∈ S6 (E, ?) to be completed to a positive semidefinite matrix is that its 6 principal submatrices X(Ck ) (k =
1, 2, . . . , 6) are positive semidefinite. Although this condition is not sufficient in general, it is a sufficient
condition for X ∈ Sn+ (E, ?) when G(N, E) is chordal. As stated in 2.1.4, in this case, the number of the
maximal cliques is bounded by the number of nodes of G(N, E), i.e., p ≤ n. In general we have the following
result.
Lemma 2.2 Let Ck (k = 1, 2, . . . , p) be the maximal cliques of a chordal graph G(N, E). Suppose that
k
X ∈ Sn (E, ?). Then X ∈ Sn+ (E, ?) if and only if X(Ck ) ∈ SC
+ (k = 1, 2, . . . , p).
Proof: C.f. [28].
Since the graph G(N, E) in Figure 2.2 is not a chordal graph, we can not apply Lemma 2.2 to determine
whether X ∈ S6 (E, ?) of the form (2.30) belongs to S6+ (E, ?). In such a case, we need to introduce
a chordal extension of the graph G(N, E) to use the lemma effectively. Figure 2.3 shows two chordal
extensions. If we choose the left graph as a chordal extension G(N, E) of G(N, E), the maximal cliques are
C1 = {3, 4, 6}, C2 = {4, 5, 6}, C3 = {1, 6} and C4 = {2, 6}, consequently, X ∈ S6+ (E, ?) is characterized by
k
X(Ck ) ∈ SC
+ (k = 1, 2, 3, 4).
2
3
4
1
6
5
Figure 2.2: A graph G(N, E) with N = {1, 2, 3, 4, 5, 6}
2
3
4
2
3
4
1
6
5
1
6
5
(a)
(b)
Figure 2.3: Chordal extensions of the graph G(N, E) given in Figure 2.2. (a) The maximal cliques are C1 =
{3, 4, 6}, C2 = {4, 5, 6}, C3 = {1, 6} and C4 = {2, 6}. (b) The maximal cliques are C1 = {3, 4, 5}, C2 =
{3, 5, 6}, C3 = {1, 6} and C4 = {2, 6}.
Remark 2.2 To compute the positive definite matrix completion of a matrix, we can recursively apply
Lemma 2.6 of [21]. A numerical example is shown on page 657 of [21].
2.2.3
Exploiting the domain-space sparsity
In this section, we consider a general nonlinear optimization problem involving a matrix variable X ∈ Sn :
minimize f0 (x, X) subject to f (x, X) ∈ Ω and X ∈ Sn+ ,
(2.32)
34
CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION
where f0 : Rs × Sn → R, f : Rs × Sn → Rm and Ω ⊂ Rm . Let E denote the set of distinct row and
column index pairs (i, j) such that a value of Xij is necessary to evaluate f0 (x, X) and/or f (x, X). More
1
2
precisely, for Xkl
= Xij
(k, l) 6= (i, j), f0 (x, X 1 ) 6= f0 (x, X 2 ) and/or f (x, X 1 ) 6= f (x, X 2 ) hold for some
x ∈ Rs , X 1 ∈ Sn and X 2 ∈ Sn . Consider a graph G(N, E). We call E the d-space sparsity pattern and
G(N, E) the d-space sparsity pattern graph. If G(N, E) is an extension of G(N, E), then we may replace
the condition X ∈ Sn+ by X ∈ Sn+ (E, ?). To apply Lemma 2.2, we choose a chordal extension G(N, E) of
G(N, E). Let C1 , C2 , . . . , Cp be its maximal cliques. Then we may regard f0 and f as functions in x ∈ Rs
and X(Ck ) (k = 1, 2, . . . , p), i.e., there are functions f˜0 and f˜ in the variables x and X(Ck ) (k = 1, 2, . . . , p)
such that
f0 (x, X) = f˜0 (x, X(C1 ), X(C2 ), . . . , X(Cp )) for every (x, X) ∈ Rs × Sn ,
(2.33)
f (x, X) = f˜(x, X(C1 ), X(C2 ), . . . , X(Cp )) for every (x, X) ∈ Rs × Sn .
Therefore, the problem (2.32) is equivalent to
minimize
subject to
f˜0 (x, X(C1 ), X(C2 ), . . . , X(Cp ))
f˜(x, X(C1 ), X(C2 ), . . . , X(Cp )) ∈ Ω and
k
X(Ck ) ∈ SC
+ (k = 1, 2, . . . , p).
(2.34)
As an illustrative example, we consider the problem whose d-space sparsity pattern graph G(N, E) is shown
in Figure 2.2:
X


Xij
minimize
−



(i,j)∈E, i<j
(2.35)
6
X


(Xii − αi )2 ≤ 6, X ∈ S6+ , 
subject to

i=1
where αi > 0 (i = 1, 2, . . . , 6). As a chordal extension, we choose the graph G(N, E) in (a) of Figure 2.3.
Then, the problem (2.34) becomes

4
X


˜

minimize
f0k (X(Ck ))


k=1
(2.36)
4
X


Ck

˜
subject to
fk (X(Ck )) ≤ 6, X(Ck ) ∈ S+ (k = 1, 2, 3, 4), 

k=1
where
f˜01 (X(C1 ))
f˜03 (X(C3 ))
f˜1 (X(C1 ))
f˜2 (X(C2 ))
f˜4 (X(C4 ))
=
=
=
=
=
−X34 − X36 , f˜02 (X(C2 )) = −X45 − X56 ,
−X16 , f˜04 (X(C4 )) = −X26 ,
(X33 − α3 )2 + (X44 − α4 )2 + (X66 − α6 )2 ,
(X55 − α5 )2 , f˜3 (X(C3 )) = (X11 − α1 )2 ,
(X22 − α2 )2 .











(2.37)
k
The positive semidefinite condition X(Ck ) ∈ SC
+ (k = 1, 2, . . . , p) in the problem (2.34) is not an
ordinary positive semidefinite condition in the sense that overlapping variables Xij ((i, j) ∈ Ck ∩ Cl ) exist
Cl
k
in two distinct positive semidefinite constraints X(Ck ) ∈ SC
+ and X(Cl ) ∈ S+ if Ck ∩ Cl 6= ∅. We describe
two methods to transform the condition into an ordinary positive semidefinite condition. The first one
was given in the papers [21, 68] where a d-space conversion method was proposed, and the second one was
originally used for the sparse SDP relaxation of polynomial optimization problems [102, 103] and also in
the paper [43] where a d-space conversion method was applied to an SDP relaxation of a sensor network
localization problem. We call the first one the d-space conversion method using clique trees and the second
one the d-space conversion method using basis representation.
2.2. EXPLOITING SPARSITY IN LINEAR AND NONLINEAR MATRIX INEQUALITIES
35
The d-space conversion method using clique trees
We can replace X(Ck ) (k = 1, 2, . . . , p) by p independent matrix variables X k (k = 1, 2, . . . , p) if we add all
k
l
equality constraints Xij
= Xij
for every (i, j) ∈ Ck ∩ Cl with i ≤ j and every pair of Ck and Cl such that
Ck ∩ Cl 6= ∅. For the chordal graph G(N, E) given in (a) of Figure 2.3, those equalities turn out to be the
8 equalities
k
l
1
2
1
2
X66
− X66
= 0 (1 ≤ k < l ≤ 4), X44
= X44
, X46
= X46
These equalities are linearly dependent, and we can choose a maximal number of linearly independent
equalities that are equivalent to the original equalities. For example, either of a set of 5 equalities
1
2
1
2
1
2
1
3
1
4
X44
− X44
= 0, X46
− X46
= 0, X66
− X66
= 0, X66
− X66
= 0, X66
− X66
= 0.
(2.38)
and a set of 5 equalities
1
2
1
2
1
2
2
3
3
4
X44
− X44
= 0, X46
− X46
= 0, X66
− X66
= 0, X66
− X66
= 0, X66
− X66
=0
(2.39)
is equivalent to the set of 8 equalities above.
In general, we use a clique tree T (K, E) with K = {C1 , C2 , . . . , Cp } and E ⊆ K × K to consistently
choose a set of maximal number of linearly independent equalities. Here T (K, E) is called a clique tree if it
satisfies the clique-intersection property, that is, for each pair of nodes Ck ∈ K and Cl ∈ K, the set Ck ∩ Cl
is contained in every node on the (unique) path connecting Ck and Cl . See [4] for basic properties on clique
trees. We fix one clique for a root node of the tree T (K, E), say C1 . For simplicity, we assume that the
nodes C2 , . . . , Cp are indexed so that if a sequence of nodes C1 , Cl2 , . . . , Clk forms a path from the root node
C1 to a leaf node Clk , then 1 < l2 < · · · < lk , and each edge is directed from the node with a smaller index
to the other node with a larger index. Thus, the clique tree T (K, E) is directed from the root node C1 to
its leaf nodes. Each edge (Ck , Cl ) of the clique tree T (K, E) induces a set of equalities
k
l
Xij
− Xij
= 0 ((i, j) ∈ J(Ck ∩ Cl )),
or equivalently,
Eij • X k − Eij • X l = 0 ((i, j) ∈ J(Ck ∩ Cl )),
where J(C) = {(i, j) ∈ C × C : i ≤ j} for every C ⊆ N . We add equalities of the form above for all
(Ck , Cl ) ∈ E when we replace X(Ck ) (k = 1, 2, . . . , p) by p independent matrix variables X k (k = 1, 2, . . . , p).
We thus obtain a problem

minimize
f˜0 (x, X 1 , X 2 , . . . , X p )



subject to f˜(x, X 1 , X 2 , . . . , X p ) ∈ Ω,
(2.40)
k
l
Eij • X − Eij • X = 0 ((i, j, k, l) ∈ Λ), 


C
X k ∈ S+k (k = 1, 2, . . . , p),
where
Λ = {(g, h, k, l) : (g, h) ∈ J(Ck ∩ Cl ), (Ck , Cl ) ∈ E}.
(2.41)
This is equivalent to the problem (2.34). See Section 4 of [68] for more details.
Now we illustrate the conversion process above by the simple example (2.35). Figure 2.4 shows two clique
trees for the graph given in (a) of Figure 2.3. The left clique tree in Figure 2.4 leads to the 5 equalities in
(2.38), while the right clique tree in Figure 2.4 induces the 5 equalities in (2.39). In both cases, the problem
(2.40) has the following form
minimize
subject to
4
X
k=1
4
X
k=1
fˆ0k (X k )
fˆk (X k ) ≤ 6,
the 5 equalities in (2.38) or (2.39),
k
X k ∈ SC
+ (k = 1, 2, 3, 4),
36
CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION
C ={3,4,6}
1
C ={2,6}
4
C ={1,6}
3
C ={4,5,6}
2
C ={2,6}
4
C ={3,4,6}
1
C ={1,6}
3
C ={4,5,6}
2
Figure 2.4: Two clique trees with K = {C1 = {1, 2}, C2 = {1, 4}, C3 = {1, 6}, C4 = {1, 3, 5}}.
where
1
1
2
2
fˆ01 (X 1 ) = −X34
− X36
, fˆ02 (X 2 ) = −X45
− X56
,
4
4
3
3
ˆ
ˆ
f03 (X ) = −X16 , f04 (X ) = −X26 ,
fˆ1 (X 1 ) = (X 1 − α3 )2 + (X 1 − α4 ) + (X 1 − α6 )2 ,
44
33
66
2
3
4
fˆ2 (X 2 ) = (X55
− α5 )2 , fˆ3 (X 3 ) = (X11
− α1 )2 , fˆ4 (X 4 ) = (X22
− α2 )2 .
Remark 2.3 The d-space conversion method using clique trees can be implemented in many different ways.
The fact that the chordal extension G(N, E) of G(N, E) is not unique offers flexibility in constructing an
optimization problem of the form (2.40). More precisely, a choice of chordal extension G(N, E) of G(N, E)
decides how “small” and “sparse” an optimization problem of the form (2.40) is, which is an important
issue for solving the problem more efficiently. For the size of the problem (2.40), we need to consider
the sizes of the matrix variables X k (k = 1, 2, . . . , p) and the number of equalities in (2.40). Note that
the sizes of the matrix variables X k (k = 1, 2, . . . , p) are determined by the sizes of the maximal cliques
Ck (k = 1, 2, . . . , p). This indicates that a chordal extension G(N, E) with smaller maximal cliques Ck
(k = 1, 2, . . . , p) may be better theoretically. (In computation, however, this is not necessarily true because
of overhead of processing too many small positive semidefinite matrix variables.) The number of equalities
in (2.40) or the cardinality of Λ is also determined by the chordal extension G(N, E) of G(N, E). Choosing
a chordal extension G(N, E) with smaller maximal cliques increases the number of equalities. Balancing
these two contradicting targets, decreasing the sizes of the matrix variables and decreasing the number of
equalities was studied in the paper [68] by combining some adjacent cliques along the clique tree T (K, E).
See Section 4 of [68] for more details. In addition to the choice of a chordal extension G(N, E) of G(N, E),
the representation of the functions and the choice of a clique tree add flexibilities in the construction of the
problem (2.40). That is, the representation of the functions f0 : Rs × Sn → R and f : Rs × Sn → Rm in
the vector variable x and the matrix variables X(Ck ) (k = 1, 2, . . . , p) as in (2.33); for example, we could
move the term (X66 − α6 )2 from f˜1 (x, X(C1 )) to either of f˜k (x, X(Ck )) (k = 2, 3, 4). These choices of the
functions f0 , f and a clique tree affect the sparse structure of the resulting problem (2.40), which is also
important for efficient computation.
The domain-space conversion method using basis representation
Define
J¯ =
p
[
J(Ck ),
k=1
¯ =
(Xij : (i, j) ∈ J)
¯
f¯0 (x, (Xij : (i, j) ∈ J))
=
¯
¯
f (x, (Xij : (i, j) ∈ J)) =
¯
the vector variable consisting of Xij ((i, j) ∈ J),
s
n
f0 (x, X) for every (x, X) ∈ R × S ,
f (x, X) for every (x, X) ∈ Rs × Sn .
2.2. EXPLOITING SPARSITY IN LINEAR AND NONLINEAR MATRIX INEQUALITIES
37
We represent each X(Ck ) in terms of a linear combination of the basis Eij ((i, j) ∈ J(Ck )) of the space SCk
as in (2.31) with C = Ck (k = 1, 2, . . . , p). Substituting this basis representation into the problem (2.34),
we obtain

¯
minimize
f¯0 (x, (Xij : (i, j) ∈ J))



subject to f¯(x,X
(Xij : (i, j) ∈ J¯) ∈ Ω,
(2.42)
Ck
Eij Xij ∈ S+ (k = 1, 2, . . . , p). 


(i,j)∈J(Ck )
We observe that the illustrative example (2.35) is converted into the problem
X

Xij
minimize
−




(i,j)∈E, i<j



6

X
2
(Xii − αi ) ≤ 6,
subject to


i=1X


Ck

Eij Xij ∈ S+ (k = 1, 2, 3, 4). 


(2.43)
(i,j)∈J(Ck )
Remark 2.4 Compared to the d-space conversion method using clique trees, the d-space conversion method
using basis representation described above provides limited flexibilities. To make the size of the problem
(2.42) smaller, we need to select a chordal extension G(N, E) of G(N, E) with smaller maximal cliques Ck
(k = 1, 2, . . . , p). As a result, the sizes of semidefinite constraints become smaller. As we mentioned in
Remark 2.3, however, too many smaller positive semidefinite matrix variables may yield heavy overhead in
computation.
2.2.4
Duality in positive semidefinite matrix completion
In order to present the r-space conversion methods in the next section, we need to derive some results, which
can be understood as a dual approach to the positive semidefinite matrix completion approach from the
2.2.2. Throughout this section, we assume that G(N, E) denotes a chordal graph. In Lemma 2.2, we have
described a necessary and sufficient condition for a partial symmetric matrix X ∈ Sn (E, ?) to be completed
to a positive semidefinite symmetric matrix. Let
Sn (E, 0) =
Sn+ (E, 0) =
{A ∈ Sn : Aij = 0 if (i, j) 6∈ E • },
{A ∈ Sn (E, 0) : A < 0}.
In this section, we derive a necessary and sufficient condition for a symmetric matrix A ∈ Sn (E, 0) to be
positive semidefinite, i.e., A ∈ Sn+ (E, 0). This condition is used for the range-space conversion methods in
Section 5. We note that these two issues have primal-dual relationship:
X
A ∈ Sn+ (E, 0) if and only if
Aij Xij ≥ 0 for every X ∈ Sn+ (E, ?).
(2.44)
(i,j)∈E •
Suppose A ∈ Sn (E, 0). Let C1 , C2 , . . . , Cp be the maximal cliques of G(N, E). Then, we can consistently
p
X
ek ∈ SCk (k = 1, 2, . . . , p) such that A =
ek . We know that A is positive
decompose A ∈ Sn (E, 0) into A
A
k=1
semidefinite if and only if A • X ≥ 0 for every X ∈ Sn+ . This relation and Lemma 2.2 are used in the
following.
Since A ∈ Sn (E, 0), this condition can be relaxed to the condition (2.44). Therefore, A is positive
semidefinite if and only if the following SDP has the optimal value 0.
" p
#
X
X
ek
A
minimize
Xij subject to X ∈ Sn+ (E, ?).
(2.45)
(i,j)∈E •
k=1
ij
38
CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION
We can rewrite the objective function as
X
(i,j)∈E •
"
p
X
k=1
ek
A
#
Xij
=
p
X
k=1
ij
=


(i,j)∈E •
p X
k=1
X

ek Xij 
A
ij
ek • X(Ck ) for every X ∈ Sn (E, ?).
A
+
ek ∈ SCk (k = 1, 2, . . . , p). Applying Lemma 2.2 to the constraint
Note that the second equality follows from A
n
X ∈ S+ (E, ?) of the SDP (2.45), we obtain an SDP
minimize
p X
ek • X(Ck )
A
k=1
k
subject to X(Ck ) ∈ SC
+ (k = 1, 2, . . . , p),
(2.46)
which is equivalent to the SDP (2.45).
The SDP (2.46) involves multiple positive semidefinite matrix variables with overlapping elements. We
have described two methods to convert such multiple matrix variables into independent ones with no overlapping elements in Sections 2.2.3 and 2.2.3, respectively. We apply the method given in Section 2.2.3 to
the SDP (2.46). Let T (K, E) be a clique tree with K = {C1 , C2 , . . . , Cp } and E ⊆ K × K. Then, we obtain
an SDP
minimize
p X
k=1
subject to
ek • X k
A
Eij • X k − Eij • X l = 0 ((i, j, k, l) ∈ Λ),
k
X k ∈ SC
+ (k = 1, 2, . . . , p),
(2.47)
which is equivalent to the SDP (2.46). Here Λ is given in (2.41).
Theorem 2.15 A ∈ Sn (E, 0) is positive semidefinite if and only if the system of LMIs
ek − L
e k (z) < 0 (k = 1, 2, . . . , p).
A
(2.48)
has a solution v = (vghkl : (g, h, k, l) ∈ Λ). Here z = (zghkl : (g, h, k, l) ∈ Λ) denotes a vector variable
consisting of zghkl ((g, h, k, l) ∈ Λ), and
X
X
e k (z) = −
L
Eij zijhk +
Eij zijkl
(i, j, h); (i, j, h, k) ∈ Λ
(i, j, l); (i, j, k, l) ∈ Λ
for every z = (zijkl : (i, j, k, l) ∈ Λ) (k = 1, 2, . . . , p).
(2.49)
Proof: In the previous discussions, we have shown that A ∈ Sn (E, 0) is positive semidefinite if and
only if the SDP (2.47) has the optimal value 0. The dual of the SDP (2.47) is
maximize 0 subject to (2.48).
(2.50)
The primal SDP (2.47) attains the objective value 0 at a trivial feasible solution (X1 , X2 , . . . , Xp ) =
(0, 0, . . . , 0). If the dual SDP (2.50) is feasible or the system of LMIs (2.48) has a solution, then the primal
SDP (2.47) has the optimal value 0 by the week duality theorem. Thus we have shown the “if part” of the
theorem. Now suppose that the primal SDP (2.47) has the optimal value 0. The primal SDP (2.47) has an
interior-feasible solution; for example, take X k to be the #Ck ×#Ck identity matrix in SCk (k = 1, 2, . . . , p).
By the strong duality theorem (Theorem 4.2.1 of [69]), the optimal value of the dual SDP (2.50) is zero,
which implies that (2.50) is feasible.
As a corollary, we obtain the following (Theorem 2.3 of [1]).
2.2. EXPLOITING SPARSITY IN LINEAR AND NONLINEAR MATRIX INEQUALITIES
39
k
Theorem 2.16 A ∈ Sn (E, 0) is positive semidefinite if and only if there exist Y k ∈ SC
+ (k = 1, 2, . . . , p)
p
X
Y k.
which decompose A as A =
k=1
Proof: Since the “if part” is straightforward, we prove the “only if” part. Assume that A is positive
ek − L
e k (z̃) (k = 1, 2, . . . , p).
semidefinite. By Theorem 2.15, the LMI (2.48) has a solution z̃. Let Y k = A
p
X
k
e k (z̃) = 0 by construction, we obtain the desired result.
L
Then Y k ∈ SC
+ (k = 1, 2, . . . , p). Since
k=1
Conversely, Theorem 2.15 can be derived from Theorem 2.16. In the paper [1], Theorem 2.16 was proved
by Theorem 7 of Grone et al. [28] (Lemma 2.2 in this thesis).
We conclude this section by applying Theorem 2.15 to the case of the chordal graph G(N, E) given in
(a) of Figure 2.3. The maximal cliques are C1 = {3, 4, 6}, C2 = {4, 5, 6}, C3 = {1, 6} and C4 = {2, 6}, so
that A ∈ S6 (E, 0) is decomposed into 4 matrices
e1
A
e2
A
e3
A
e4
A
or,
e1
A
e3
A
=
=




= 







= 







= 







= 



0
0
0
0
0
0
0
0
0
0
0
0
0
0
A33
A43
0
A63
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0 A54
0
0
A11
0
0
0
0
A61
0
0
A34
A44
0
A64
0
0
0
0
0 A36
0 A46
0
0
0 A66
0
0
0
A45
A55
A65




 ∈ S{3,4,6} ,




0
0 

0 
 ∈ S{4,5,6} ,
0 

A56 
0

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0 A16
0
0
0
0
0
0
0
0
0
0
0
0
0 A22
0
0
0
0
0
0
0 A62
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0 A26
0
0
0
0
0
0
0
0



 ∈ S{1,6} ,







 ∈ S{2,6} ,






A33 A34 A36
0
A45
0
e2 =  A54 A55 A56  ∈ S{4,5,6} ,
 A43 A44 A46  ∈ S{3,4,6} , A
0 A65
0
A63 A64 A66
A
A
A11 A16
22
26
e4 =
∈ S{2,6}
∈ S{1,6} , A
A62
0
A61
0












(2.51)
in the compact representation. We note that this decomposition is not unique. For example, we can move
e1 to any other A
ek . We showed two clique trees with K = {C1 , C2 , C3 , C4 }
the (6, 6) element A66 from A
in Figure 2.4. For the left clique tree, we have Λ = {(4, 4, 1, 2), (4, 6, 1, 2), (6, 6, 1, 2), (6, 6, 1, 3), (6, 6, 1, 4))}.
40
CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION
Thus, the system of LMIs (2.48) becomes

A33 A34
 A43 A44 − z4412
 A63 A64 − z4412
z4412 A45 z4612
 A54 A55 A56
z4612 A65 z6612
A11 A16
< 0,
A61 z6613

A36
 < 0,
A46 − z4612
A
−
z
−
z
−
z
6612
6613
6614
 66
 < 0,
A22
A62
A26
z6614
< 0.























For the right clique tree, we have Λ = {(4, 4, 1, 2), (4, 6, 1, 2), (6, 6, 1, 2), (6, 6, 2, 3), (6, 6, 3, 4)} and



A33 A34
A36




 A43 A44 − z4412 A46 − z4612  < 0,




A
A
−
z
A
−
z

63
64
4412
66
6612




z4412 A45
z4612
 A54 A55

A56
< 0,




z
A
z
−
z

4612
65
6612
6623



A11
A16
A22 A26


< 0,
< 0. 
A61 z6623 − z6634
A62 z6634
2.2.5
(2.52)
(2.53)
Exploiting the range-space sparsity
In this section, we present two range-space conversion methods, the r-space conversion method using clique
trees based on Theorem 2.15 and the r-space conversion method using matrix decomposition based on Theorem 2.16.
The range-space conversion method using clique trees
Let
F
=
{(i, j) ∈ N × N : Mij (y) 6= 0 for some y ∈ Rs , i 6= j}.
We call F the r-space sparsity pattern and G(N, F ) the r-space sparsity pattern graph of the mapping
M : Rs → Sn . Apparently, M (y) ∈ Sn (F, 0) for every y ∈ Rs , but the graph G(N, F ) may not be chordal.
Let G(N, E) be a chordal extension of G(N, F ). Then
M (y) ∈ Sn (E, 0) for every y ∈ Rs .
(2.54)
Let C1 , C2 , . . . , Cp be the maximal cliques of G(N, E).
fk (k = 1, 2, . . . , p) to decompose the mapping M : Rs →
To apply Theorem 2.15, we choose mappings M
n
S such that
M (y) =
p
X
k=1
fk (y) for every y ∈ Rs , M
fk : Rs → SCk (k = 1, 2, . . . , p).
M
(2.55)
Let T (K, E) be a clique tree where K = {C1 , C2 , . . . , Cp } and E ⊂ K × K. By Theorem 2.15, y is a solution
of (2.26) if and only if it is a solution of
fk (y) − L
e k (z) < 0 (k = 1, 2, . . . , p)
M
(2.56)
e k in (2.49).
for some z = (zghkl : (g, h, k, l) ∈ Λ), where Λ is given in (2.41) and L
We may regard the r-space conversion method using clique trees described above as a dual of the d-space
conversion method using clique trees applied to the SDP
minimize M (y) • X subject to X < 0,
(2.57)
2.2. EXPLOITING SPARSITY IN LINEAR AND NONLINEAR MATRIX INEQUALITIES
41
where X ∈ Sn denotes a variable matrix and y ∈ Rs a fixed vector. We know that M (y) < 0 if and only
if the optimal value of the SDP (2.57) is zero, so that (2.57) serves as a dual of the matrix inequality
M (y) < 0. Each element zijkl of the vector variable z corresponds to a dual variable of the equality
constraint Eij • X k − Eij • X l = 0 in the problem (2.40), while each matrix variable X k ∈ SCk in the
fk (y) − L
e k (z) < 0.
problem (2.40) corresponds to a dual matrix variable of the kth matrix inequality M
Remark 2.5 On the flexibilities in implementing the r-space conversion method using clique trees, the
comments in Remark 2.3 are valid if we replace the sizes of the matrix variable X k by the size of the
fk : Rs → SCk and the number of equalities by the number of elements zijkl of the vector variable
mapping M
z. The correlative sparsity of (2.56) depends on the choice of the clique tree and the decomposition (2.55).
As an example, we consider the case where M is tridiagonal, i.e., the (i, j)th element Mij of M is zero if
|i − j| ≥ 2, to illustrate the range space conversion of the matrix inequality (2.26) into the system of matrix
inequalities (2.56). By letting E = {(i, j) : |i − j| = 1}, we have a simple chordal graph G(N, E) with no
cycle satisfying (2.54), its maximal cliques Ck = {k, k + 1} (k = 1, 2, . . . , n − 1), and a clique tree T (K, E)
with
K = {C1 , C2 , . . . , Cn−1 } and E = {(Ck , Ck+1 ) ∈ K × K : k = 1, 2, . . . , n − 2}.
For every y ∈ Rs , let
fk
M (y)
=
 Mkk (y)
Mk,k+1 (y)


∈ SCk


Mk+1,k (y)
0



Mn−1,n−1 (y) Mn−1,n (y)


∈ SCk

Mn,n−1 (y)
Mnn (y)
if 1 ≤ k ≤ n − 2,
if k = n − 1.
fk : Rs → SCk (k = 1, 2, . . . , n − 1) as in (2.55) with
Then, we can decompose M : Rs → Sn (E, 0) into M
p = n − 1. We also see that
Λ
=
e k (z) =
L
{(k + 1, k + 1, k, k + 1) : k = 1, 2, . . . , n − 2},

 E22 z2212 ∈ SC1
−Ek,k zk,k,k−1,k + Ek+1,k+1 zk+1,k+1,k,k+1 ∈ SCk

−En−1,n−1 zn−1,n−1,n−2,n−1 ∈ SCn−1
if k = 1,
if k = 2, 3, . . . , n − 2,
if k = n − 1,
Thus the resulting system of matrix inequalities (2.56) is
M11 (y) M12 (y)
< 0,
M21 (y) −z2212
Mkk (y) + zk,k,k−1,k Mk,k+1 (y)
< 0 (k = 2, 3, . . . , n − 2),
−zk+1,k+1,k,k+1
Mk+1,k (y)
Mn−1,n−1 (y) + zn−1,n−1,n−2,n−1 Mn−1,n (y)
< 0.
Mn,n−1 (y)
Mnn (y)
The range-space conversion method using matrix decomposition















By Theorem 2.16, we obtain that y ∈ Rs is a solution of the matrix inequality (2.26) if and only if there
exist Y k ∈ SCk (k = 1, 2, . . . , p) such that
p
X
k=1
k
Y k = M (y) and Y k ∈ SC
+ (k = 1, 2, . . . , p).
Let J = ∪pk=1 J(Ck ) and Γ(i, j) = {k : i ∈ Ck , j ∈ Ck } ((i, j) ∈ J). Then we can rewrite the condition
above as
X
k
Eij • Y k − Eij • M (y) = 0 ((i, j) ∈ J) and Y k ∈ SC
(2.58)
+ (k = 1, 2, . . . , p).
k∈Γ(i,j)
42
CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION
We may regard the r-space conversion method using matrix decomposition as a dual of the d-space
conversion method using basis representation applied to the SDP (2.57) with a fixed y ∈ Rs . Each variable
Xij ((i, j) ∈ J) in the problem (2.42) corresponds to a dual real variable of the (i, j)th equality constraint
of the problem (2.58), while X
each matrix variable Y k in the problem (2.58) corresponds to a dual matrix
k
Eij Xij ∈ SC
variable of the constraint
+ .
(i,j)∈J(Ck )
Remark 2.6 On the flexibilities in implementing the r-space conversion method using matrix decomposition, the comments in Remsark 2.4 are valid if we replace the sizes of the semidefinite constraints by the
sizes of the matrix variables Y k (k = 1, 2, . . . , p).
We illustrate the r-space conversion method using matrix decomposition with the same example where
M is tridiagonal as in Section 2.2.3 In this case, we see that
p
= n − 1,
= {k, k + 1} (k = 1, 2, . . . , n − 1),
= {(k, k), (k, k + 1), (k + 1, k + 1)} (k = 1, 2, . . . , n − 1),
[
J = {(k, k) : k = 1, 2, . . . , n} {(k, k + 1) : k = 1, 2, . . . , n − 1},

{1}
if i = j = 1,



{k}
if i = k, j = k + 1 and 1 ≤ k ≤ n − 1,
Γ(i, j) =
{k
−
1,
k}
if i = j = k and 2 ≤ k ≤ n − 1,



{n − 1}
if i = j = n.
Ck
J(Ck )
Hence, the matrix inequality (2.26) with the tridiagonal M : Rs → Sn is converted into
E11 • Y 1 − E11 • M (y) = 0,
Ek,k+1 • Y k − Ek,k+1 M (y) = 0 (k = 1, 2, . . . , n − 1),
Ekk • Y k−1 + Ekk • Y k − Ekk • M (y) = 0 (k = 2, . . . , n − 1),
Enn • Y n−1 − Enn • M (y) =
0,
k
k
Y
Y
kk
k,k+1
k
∈ SC
(k = 1, 2, . . . , n − 1).
Yk =
k
k
+
Yk+1,k
Yk+1,k+1
2.2.6
Enhancing the correlative sparsity
When we are concerned with the SDP relaxation of polynomial SDPs (including ordinary polynomial optimization problems) and linear SDPs, another type of sparsity called the correlative sparsity plays an
important role in solving the SDPs efficiently. The correlative sparsity was dealt with extensively in the paper [45], and was introduced in 2.1.4 . It is known that the sparse SDP relaxation [51, 102] for a correlatively
sparse polynomial optimization problem leads to an SDP that can maintain the sparsity for primal-dual
interior-point methods. See Section 6 of [45]. In this section, we focus on how the d-space and r-space
conversion methods enhance the correlative sparsity. We consider a polynomial SDP of the form
maximize
f0 (y)
subject to
k
Fk (y) ∈ Sm
+ (k = 1, . . . , p).
(2.59)
Here f0 ∈ R[y], Fk a mapping from Rn into Smk with all polynomial components in y ∈ Rn . For simplicity,
we assume that f0 is a linear function of the form f0 (y) = bT y for some b ∈ Rn . In this case, with the
definition from 2.1.4 the correlative sparsity pattern graph is given by the graph G(N, E) with the node set
N = {1, 2, . . . , n} and the edge set
i 6= j, both values yi and yj are necessary
E = (i, j) ∈ N × N :
.
to evaluate the value of Fk (y) for some k
When a chordal extension G(N, E) of the correlative sparsity pattern graph G(N, E) is sparse or all the
maximal cliques of G(N, E) are small-sized, we can effectively apply the sparse SDP relaxation [51, 102] to
2.2. EXPLOITING SPARSITY IN LINEAR AND NONLINEAR MATRIX INEQUALITIES
43
the polynomial SDP (2.59). As a result, we have a linear SDP satisfying a correlative sparsity characterized
by the same chordal graph structure as G(N, E). More details can be found in Section 6 of [45]. Even when
the correlative sparsity pattern graph G(N, E) or its chordal extension G(N, E) is not sparse, the polynomial
SDP may have “a hidden correlative sparsity” that can be recognized by applying the d-space and/or rspace conversion methods to the problem to decompose a large size matrix variable (and/or inequality) into
multiple smaller size matrix variables (and/or inequalities). To illustrate this, let us consider a polynomial
SDP of the form
minimize bT y subject to F (y) ∈ Sn+ ,
where F denotes a mapping from Rn into Sn defined by

1 − y14
0
0
4

0
1
−
y
0
2


..

.
0
0
F (y) = 

 ...
...
...


0
0
0
y1 y2
y2 y3 y3 y4
...
...
..
.
0
0
y1 y2
y2 y3
0
y3 y4
...
4
1 − yn−1
. . . yn−1 yn
...
yn−1 yn
1 − yn4





.




This polynomial SDP is not correlatively sparse at all (i.e., G(N, E) becomes a complete graph) because
all variables y1 , y2 , . . . , yn are involved in the single matrix inequality F (y) ∈ Sn+ . Hence, the sparse SDP
relaxation (2.18) is not effective for this problem. Applying the r-space conversion method using clique trees
to the polynomial SDP under consideration, we have a polynomial SDP

minimize
bT y




1 − y14 y1 y2


< 0,
subject to


y
y
z

1
2
1
4
(2.60)
1 − yi
yi yi+1
< 0 (i = 2, 3, . . . , n − 2), 

y
y
−z
+
z

i
i+1
i−1
i


4

1 − yn−1
yn−1 yn


<
0,

4
yn−1 yn 1 − yn − zn−2
which is equivalent to the original polynomial SDP. The resulting polynomial SDP now satisfies the correlative sparsity as shown in Figure 2.5. Thus the sparse SDP relaxation (2.18) is efficient for solving
(2.60).
The correlative sparsity is important in linear SDPs, too. We have seen such a case in Section 2.2.1. We
can rewrite the SDP (2.28) as

n−1
X

0
0
0

Aii Xii + 2Ai,i+1 Xi,i+1 − Ann Xnn 
maximize
−




i=1


n−1
n
X
X
(2.61)
Ein Xi,i+1 < 0,
Eii Xii +
subject to I −



i=1
i=1

X



Eij Xij < 0,


1≤i≤j≤n
where I denotes the n×n identity matrix. Since the coefficient matrices of all real variables Xij (1 ≤ i ≤ j ≤
n) are nonzero in the last constraint, the correlative sparsity pattern graph G(N, E) forms a complete graph.
Applying the d-space conversion method using basis representation and the r-space conversion method using
clique trees to the original SDP (2.28), we have reduced it to the SDP (2.29) in Section 2.1. We rewrite the
constraints of the SDP (2.29) as an ordinary LMI form:

maximize
bT y


s
X
(2.62)
k
k
subject to A0 −
Ah yh < 0 (k = 1, 2, . . . , p). 

h=1
44
CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION
0
0
5
5
10
10
15
15
20
20
25
25
30
30
35
35
0
5
10
15
20
nz = 218
25
30
35
0
5
10
15
20
nz = 128
25
30
35
Figure 2.5: The correlative sparsity pattern of the polynomial SDP (2.60) with n = 20, and its Cholesky
factor with a symmetric minimum degree ordering of its rows and columns.
Here p = 2n − 2, s = 3n − 3, each Akh is 2 × 2 matrix (k = 1, 2, . . . , p, h = 0, 1, . . . , 3n − 3), b ∈ R3n−3 ,
y ∈ R3n−3 , and each element yh of y corresponds to some Xij or some zi . Comparing the SDP (2.61)
with the SDP (2.62), we notice that the number of variables is reduced from n(n + 1)/2 to 3n − 3, and
the maximum size of the matrix inequality is reduced from n to 2. Furthermore, the correlative sparsity
pattern graph becomes sparse. See Figure 2.6.
0
0
5
50
100
10
150
15
200
20
250
25
0
5
10
15
nz = 161
20
25
0
50
100
150
nz = 1871
200
250
Figure 2.6: The correlative sparsity pattern of the SDP (2.62) induced from (2.29) with n = 10 and n = 100,
and its Cholesky factor with a symmetric minimum degree ordering of its rows and columns.
Now we consider an SDP of the form (2.62) in general. The edge set E of the correlative sparsity pattern
graph G(N, E) is written as
E = (g, h) ∈ N × N : g 6= h, Akg 6= 0 and Akh 6= 0 for some k ,
where N = {1, 2, . . . , s}. It is known that the graph G(N, E) characterizes the sparsity pattern of the Schur
2.2. EXPLOITING SPARSITY IN LINEAR AND NONLINEAR MATRIX INEQUALITIES
45
complement matrix of the SDP (2.62). More precisely, if R denotes the s × s sparsity pattern of the Schur
complement matrix, then Rgh = 0 if (g, h) 6∈ E • . Furthermore, if the graph G(N, E) is chordal, then there
exists a perfect elimination ordering, a simultaneous row and column ordering of the Schur complement
matrix that allows a Cholesky factorization with no fill-in. For the SDP induced from (2.29), we have
seen the correlative sparsity pattern with a symmetric minimum degree ordering of its rows and columns
in Figure 2.6, which coincides with the sparsity pattern of the Schur complement matrix whose symbolic
Cholesky factorization is shown in Figure 2.1.
Remark 2.7 As mentioned in Remark 2.5, the application of r-space conversion method using clique trees
to reduce the SDP (2.28) to the SDP (2.29) can be implemented in many different ways. In practice, it
should be implemented to have a better correlative sparsity in the resulting problem. For example, we can
reduce the SDP (2.28) to

n−1
X


0
0
0

Aii Xii + 2Ai,i+1 Xi,i+1 + Ann Xnn
minimize




i=1



1 0
X11 −X12


subject to
< 0,
−


0
0
−X
−z

21
1

1 0
Xii
−Xi,i+1
(2.63)
< 0 (i = 2, 3, . . . , n − 2),
−

−zi


0 0 −Xi+1,i


Xn−1,n−1
−Xn−1,n
1 0


Pn−2
< 0,
−


0
1

z
−X
X
+
i
n,n−1
n,n

i=1



0 0
−Xii
−Xi,i+1

−
< 0 (i = 1, 2, . . . , n − 1), 
0 0
−Xi+1,i −Xi+1,i+1
which is different from the SDP (2.29). This is obtained by choosing a different clique tree in the rspace conversion method using clique trees for the SDP (2.28). In this case, all auxiliary variables zi
(i = 1, 2, . . . , n−2) are contained in a single matrix inequality. This implies that the corresponding correlative
sparsity pattern graph G(N, E) involves a clique of size n − 2. See Figure 2.7. Thus the correlative sparsity
becomes worse than the previous conversion. Among various ways of implementing the d- and r-space
conversion methods, determining which one is effective for a better correlative sparsity will be a subject
which requires further study.
0
0
5
50
100
10
150
15
200
20
250
25
0
5
10
15
nz = 217
20
25
0
50
100
150
nz = 11377
200
250
Figure 2.7: The correlative sparsity pattern of the SDP (2.62) induced from (2.63) with n = 10 and n = 100
where the rows and columns are simultaneously reordered by the Matlab function symamd (a symmetric
minimum degree ordering).
46
CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION
2.2.7
Examples of d- and r-space sparsity in quadratic SDP
We present how to take advantage of the d- and r-space conversion methods introduced in the previous
sections for the class of quadratic SDP, and demonstrate the effectiveness of these methods on examples of
quadratic SDP. In the following we first consider a quadratic SDP of the form
minimize
s
X
ci xi subject to M (x) < 0,
(2.64)
i=1
where ci ∈ [0, 1] (i = 1, 2, . . . , s), M : Rs → Sn , and each non-zero element Mij of the mapping M : Rs → Sn
is a polynomial in x = (x1 , x2 , . . . , xs ) ∈ Rs with degree at most 2.
We apply d- and r-space conversion methods to (2.64), get a new quadratic SDP with smaller size
matrix inequality constraints and relax this quadratic SDP to obtain a linear SDP which can be solved
by standard SDP solvers. The test problems of quadratic SDP we consider for numerical experiments are
three max-cut problems, a Lovas theta problem, a box-constrained quadratic problem from SDPLIB [9], a
sensor network localization problem and discretized partial differential equations (PDE) with Neumann and
Dirichlet boundary conditions. In fact, a more detailed and systematic study of SDP relaxations exploiting
d- and r-space sparsity in the case of quadratic optimization problems derived from PDEs compared to the
hierarchy of sparse SDP relaxations (2.18) is presented in Chapter 3.
SDP relaxations of a quadratic SDP
In this subsection, we apply the d- and r-space conversion methods to the quadratic SDP (2.64), and derive
four kinds of SDP relaxations:
(a) a dense SDP relaxation without exploiting any sparsity.
(b) a sparse SDP relaxation by applying the d-space conversion method using basis representation given
in 2.2.3.
(c) a sparse SDP relaxation by applying the r-space conversion method using clique trees in 2.2.5.
(d) a sparse SDP relaxation by applying both, the d-space conversion method using basis representation
and the r-space conversion method using clique trees.
We write each non-zero element Mij (x) as
Mij (x)
= Qij •
1 xT
x xxT
for every x ∈ Rs .
for some Qij ∈ S1+s . Assume that the rows and columns of each Qij are indexed from 0 to s. Let us
cij : Rs × Ss → R of the quadratic function Mij : Rs → R:
introduce a linearization (or lifting) M
T
cij (x, X) = Qij • 1 x
M
for every x ∈ Rs and X ∈ Ss ,
x X
c : Rs × Ss → Sn of M : Rs → Sn whose (i, j)th element is M
cij .
which induces a linearization (or lifting) M
Then we can describe the dense SDP relaxation (a) for (2.64) as
minimize
n
X
i=1
c(x, X) < 0 and
ci xi subject to M
1
x
xT
X
For simplicity, we rewrite the dense SDP relaxation above as
n
X
c(W ) < 0, W00 = 1 and W < 0,
ci W0i subject to M
(a)
minimize
i=1
< 0.
2.2. EXPLOITING SPARSITY IN LINEAR AND NONLINEAR MATRIX INEQUALITIES
47
where
T
s
(W01 , W02 , . . . , W0s ) = x ∈ R and W =
1 xT
x X
∈ S1+s .
Let G(N ′ , F ′ ) be the d-space sparsity pattern graph for the SDP (a) with N ′ = {0, 1, . . . , s}, and F ′ =
the set ofPdistinct row and column index pairs (i, j) of Wij that is necessary to evaluate the objective
n
function i=1 ci W0i and/or the LMI M (W ) < 0. Let G(N ′ , E ′ ) be a chordal extension of G(N ′ , F ′ ), and
′
′
C1 , C2 , . . . , Cr′ be the maximal cliques of G(N ′ , E ′ ). Applying the d-space conversion method using basis
representation, we obtain the SDP relaxation

n
X


ci W0i
minimize




i=1
c((Wij : (i, j) ∈ J)) < 0, W00 = 1,
(b)
subject to M

X

C′


Eij Wij ∈ S+k (k = 1, 2, . . . , r).


(i,j)∈J(Ck′ )
Here J = ∪rk=1 J(Ck′ ), (Wij : (i, j) ∈ J) = the vector variable of the elements Wij ((i, j) ∈ J) and
c((Wij : (i, j) ∈ J))
M
=
c(W ) for every W ∈ Ss (E ′ , 0).
M
To apply the r-space conversion method using clique trees to the quadratic SDP (2.64), we assume that
M : Rs → Sn (E, 0) for some chordal graph G(N, E) where N = {1, 2, . . . , n} and E ⊆ N × N . Then, we
convert the matrix inequality M (x) < 0 in (2.64) into an equivalent system of matrix inequalities (2.56).
The application of the LMI relaxation described above to (2.56) leads to the SDP relaxation

n
X

 minimize
ci W0i
(c)
i=1


k
e k (z) < 0 (k = 1, 2, . . . , p), W00 = 1, W < 0,
subject to M (W ) − L
k
fk : Rs → SCk . We may apply the
where M : S1+s → SCk denotes a linearization (or lifting) of M
linearization to (2.64) first to derive the dense SDP relaxation (a), and then apply the r-space conversion
method using clique trees to (a). This results in the same sparse SDP relaxation (c) of (2.64). Note that both
c take values from Sn (E, 0). Thus, they provide the same r-space sparsity pattern characterized
M and M
by the chordal graph G(N, E).
Finally, the sparse SDP relaxation (d) is derived by applying the d-space conversion method using basis
representation to the the sparse LMI relaxation (c). We note that the d-space sparsity pattern graph for
the SDP (c) with respect to the matrix variable W ∈ S1+s is the same as the one for the SDP (a). Hence,
the sparse SDP relaxation (d) is obtained in the same way as the SDP (b) is obtained from the SDP (a).
Consequently, we have the sparse SDP relaxation

n
X



ci W0i

 minimize


i=1
k
e k (z) < 0 (k = 1, 2, . . . , p), W00 = 1,
(d)
subject to M ((Wij : (i, j) ∈ J)) − L

X

Cj′


Eαβ Wαβ ∈ S+ (j = 1, 2, . . . , r).



′
(α,β)∈J(Ck )
Here J =
∪rk=1 J(Ck′ ),
(Wij : (i, j) ∈ J) = the vector variable of the elements Wij ((i, j) ∈ J) and
k
M ((Wij : (i, j) ∈ J))
=
k
M (W ) for every W ∈ Ss (E ′ , 0).
Quadratic SDPs with d- and r-sparsity from randomly generated sparse graphs
Quadratic SDP problems were constructed by first generating two graphs G(Nd , Ed ) with Ns = {1, 2, . . . , 1+
s} and G(Nr , Er ) with Nr = {1, 2, . . . , n} using the Matlab program generateProblem.m [44], which was
48
CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION
developed for sensor network localization problems. Sparse chordal extensions G(Nd , E d ) and G(Nr , E r )
were then obtained by the Matlab functions symamd.m and chol.m. Next, we generated data matrices
Qij ∈ S1+s (i = 1, 2, . . . , n, j = 1, 2, . . . , n) and a data vector c ∈ Rs so that the d- and r-space pattern graphs
of the resulting quadratic SDP coincide with G(Nd , Ed ) and G(Nr , Er ), respectively. Some characteristics
of the chordal extensions G(Nd , E d ) of G(Nr , Er ) and G(Nr , E r ) of G(Nr , Er ) used in the experiments are
shown in Table 4.
For the problem with s = 40 and n = 640, the d- and r-space sparsity pattern obtained from the symmetric approximate minimum degree permutation of rows and columns by the Matlab function symamd.m
is displayed in Figure 2.8.
s
80
320
40
40
n
80
320
160
640
Domain space sparsity
#E d NoC Max Min
143
63
3
3
649
260
7
3
70
30
3
3
70
30
3
3
Range space sparsity
#E d NoC Max Min
216
72
7
3
840
301
9
3
426
150
7
3
1732
616
13
3
Table 2.2: Some characteristics of d- and r-sparsities of the tested quadratic SDPs. #E d (or #E r ) denotes
the number of edges of G(Nd , E d ) (or G(Nr , E r )), NoC the number of the maximal cliques of G(Nd , E d )
(or G(Nr , E r )), Max the maximum size of the maximal cliques of G(Nd , E d ) (or G(Nr , E r )), and Min the
minimum size of the maximal cliques of G(Nd , E d ) (or G(Nr , E r )).
0
0
5
100
10
200
15
300
20
25
400
30
500
35
600
40
0
5
10
15
20
25
nz = 181
30
35
40
0
100
200
300
nz = 3432
400
500
600
Figure 2.8: The d-space sparsity pattern of the quadratic SDP with s = 40 and n = 640 on the left and the
r-space sparsity pattern on the right.
Table 5 shows numerical results on the quadratic SDPs whose d- and r- sparsity characteristics are given
in Table 4. We observe that both the d-space conversion method using basis representation in (b) and the
r-space conversion method using clique tree in (c) work effectively, and that their combination (d) results
in the shortest CPU time among the four methods.
Quadratic SDPs arising from applications
For additional numerical experiments, we selected five SDP problems from SDPLIB [9], an quadratic SDP
from a sensor network localization problem, and two quadratic SDP derived from PDE with Neumann and
Dirichlet boundary condition. The test problems in Table 2.4 are
2.3. REDUCTION TECHNIQUES FOR SDP RELAXATIONS FOR LARGE SCALE POP
s
80
n
80
320
320
40
160
40
640
49
SeDuMi CPU time in seconds
(the size of the Schur complement matrix, the max. size of matrix variables)
(a)
(b)
(c)
(d)
296.51
1.38
1.58
0.73
(3321, 81)
(224, 80)
( 801, 81)
(252, 19)
OOM
74.19
80.09
35.20
(970, 320)
(322, 321)
(1216, 20)
6.70
4.22
2.91
0.74
(861, 160)
(111, 160)
(1626, 41)
(207, 21)
158.95
151.20
120.86
5.71
(861, 640)
(111, 640)
(6776, 41)
(772, 21)
Table 2.3: Numerical results on the quadratic SDPs with d- and r- sparsity from randomly generated sparse
graphs. OOM indicates out of memory error in Matlab.
mcp500-1, maxG11, maxG32: An SDP relaxation of the max cut problem from SDPLIB.
thetaG11: An SDP relaxation of the Lovasz theta problem from SDPLIB.
qpG11: An SDP relaxation of the box constrained quadratic problem form SDPLIB.
d2n01s1000a100FSDP: A full SDP relaxation [6] of the sensor network localization problem with 1000
sensors, 100 anchors distributed in [0, 1]2 , radio range = 0.1, and noise = 10%. The method (iii) in
Table 2.4 for this problem is equivalent to the method used in SFSDP [43], a sparse version of full
SDP relaxation.
ginzOrNeum(11): An SDP relaxation of the discretized nonlinear, elliptic PDE (4.4) (Case II, Neumann
boundary condition) of [61]. We choose a 11 × 11 grid for the domain [0, 1]2 of the PDE.
pdeBifurcation(20): An SDP relaxation of the discretized nonlinear, elliptic PDE (4.5) (Dirichlet boundary
condition) of [61]. We choose a 20 × 20 grid for the domain [0, 1]2 of the PDE.
The SDP relaxations (i), (ii) and (iii) in Table 2.4 indicate
(i) a dense SDP relaxation without exploiting any sparsity.
(ii) a sparse SDP relaxation by applying the d-space conversion method using clique trees given in
Section 3.1.
(iii) a sparse SDP relaxation by applying the d-space conversion method using basis representation given
in Section 3.2.
Table 2.4 shows that CPU time spent by (ii) is shorter than that by (i) and (iii) for all tested problems
except for mcp500-1 and pdeEllipticNeum11. Notice that it took shorter CPU time to solve (iii) than (i)
except for maxG32 and thetaG11. We confirm that applying at least one of the d-space conversion methods
greatly reduces CPU time for the test problems. The d-space sparsity patterns for the test problems are
displayed in Figures 2.9 and 2.10.
2.3
Reduction techniques for SDP relaxations for large scale POP
The global minimization of a multivariate polynomial over a semialgebraic set is a severely nonconvex,
difficult optimization problem in general. In 2.1.3 a hierarchy of SDP relaxations has been proposed whose
50
CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION
Problem
mcp500-1
maxG11
maxG32
thetaG11
qpG11
d2n01s1000a100FSDP
ginzOrNeum(11)
pdeBifurcation(20)
SeDuMi CPU time (size.SC.mat., Max.size.mat.var.)
(i)
(ii)
(iii)
65.5 (500, 500)
94.5 (7222, 44)
15.9 (2878, 44)
220.5 (800, 800)
12.1 (2432, 80)
26.8 (8333, 24)
5373.8 (2000, 2000) 971.4 (13600, 210)
OOM
345.9 (2401, 801)
23.9 (4237, 81) 458.5 (9134, 25)
2628.5 (800, 1600)
16.0 (2432, 80)
72.5 (9133, 24)
5193.5 (4949, 1002)
16.9 (7260, 45) 19.5 (15691, 17)
216.1 (1453, 485)
2.2 (1483, 17)
2.1 (1574, 4)
1120.4 (2401, 801)
4.3 (2451, 17)
5.3 (2001, 3)
Table 2.4: Numerical results on SDPs from some applications. size.SC.mat. denotes the size of the Schur
complement matrix and Max.size.mat.var. the maximum size of matrix variables.
optima have been proven to converge to the optimum of a POP for increasing order of the relaxation. The
practical use of this powerful theoretical result has been limited by the capacity of current SDP solvers,
as the size of the SDP relaxations grows rapidly with increasing order. A first approach to attempt this
problem has been the concept to exploit structured sparsity in a POP [47]. Whenever a POP satisfies a
certain sparsity pattern, a convergent sequence of sparse SDP relaxations (2.18) of substantially smaller size
can be constructed. Compared to the dense SDP relaxation (2.8), the sparse SDP relaxation (2.18) can be
applied to POPs of larger scale.
Still, the size of the sparse SDP relaxation remains the major obstacle in order to solve large scale POPs,
which contain polynomials of higher degree. We propose a substitution procedure to transform an arbitrary
POP into an equivalent quadratic optimization problem (QOP). It is based on replacing quadratic terms in
higher degree monomials by new variables successively, and adding the substitution relations as constraints
to the optimization problem. The idea to transform a POP into an equivalent QOP can be traced back to
Shor [92], who exploited it to derive dual lower bounds for the minimum of a polynomial function. As the
substitution procedure is not unique, we introduce different heuristics which aim at deriving a QOP with
as few additional variables as possible. Moreover, we show that sparsity of a POP is maintained under the
substitution procedure. The main advantage of deriving an equivalent QOP for a POP is that the sparse
SDP relaxation of first order can be applied to solve it approximately.
The substitution procedure and the considerations to minimize the number of additional variables while
maintaining the sparsity are presented in 2.3.1. While a POP and the QOP derived from it are equivalent,
we face the problem that the quality of the SDP relaxation for a QOP deteriorates in many cases. We
discuss in 2.3.2 how to tighten the SDP relaxation for a QOP in order to achieve good approximations to
the global minimum even for SDP relaxation of first or second order. For that purpose methods as choosing
appropriate lower and upper bounds for the multivariate variables, Branch-and-Cut bounds to shrink the
feasible region of the SDP relaxation and locally convergent optimization methods are proposed. Finally,
the power of this technique is demonstrated in 2.3.3, where it is applied to solve various large scale POP of
higher degree.
2.3.1
Transforming a POP into a QOP
The aim of 2.3 is to propose a technique to reduce the size of SDP relaxations for general POPs, which
enables us to attempt large scale polynomial optimization efficiently. This technique, which transforms a
POP into an equivalent QOP, reduces the size of the SDP relaxation by decreasing the minimum relaxation
order ωmax , whereas the technique due to [102] presented
in 2.1.4 aims at reducing the SDP relaxation
n+ω
by replacing matrix inequality constraints of size
through matrix inequality constraints of size
ω
51
2.3. REDUCTION TECHNIQUES FOR SDP RELAXATIONS FOR LARGE SCALE POP
0
200
0
0
100
200
200
400
300
600
400
800
500
1000
600
1200
700
1400
400
600
800
1000
1200
1400
1600
1800
2000
800
0
500
1000
nz = 10000
1500
2000
0
100
200
300
400
nz = 5601
500
600
700
800
1600
0
200
400
600
800
1000
nz = 4800
1200
1400
1600
Figure 2.9: The d-space sparsity pattern with the symmetric approximate minimum degree permutation
of rows and columns provided by the Matlab function symamd.m for maxG32(left), thetaG11(middle) and
qpG11(right).
0
0
100
50
200
100
300
150
400
200
0
100
200
300
500
400
250
600
300
700
500
350
600
800
400
700
900
450
1000
0
200
400
600
nz = 9138
800
1000
0
50
100
150
200
250
300
nz = 2421
350
400
450
800
0
100
200
300
400
nz = 3201
500
600
700
800
Figure 2.10: The d-space sparsity pattern with the symmetric approximate minimum degree permutation
of rows and columns provided by the Matlab function symamd.m for d2n01s1000a100FSDP(left), pdeEllipticNeum11(middle) and pdeEllipticBifur20(right).
ni + ω
p
. A general QOP is a special case of the POP (2.1), where the polynomials p, gi (i = 1, . . . , m)
ω
are at most of degree 2. With respect to the definition of ωk , the minimal relaxation order ωmax of the
sparse SDP relaxation (2.18) equals one. As pointed out in [102], the sparse SDP relaxation sSDP1 and
the dense SDP relaxation dSDP1 of order one are equivalent for any QOP. The equivalence of a QOP and
its SDP relaxation has been shown for a few restrictive classes of QOPs. For instance, if in a QOP p and
−gi (i = 1, . . . , m) are convex quadratic polynomials, the QOP is equivalent to the corresponding SDP
relaxation [56]. Also, equivalence of QOPs and their SDP relaxations was shown for the class of uniformly
OD-nonpositive QOPs [41]. As shown in [52], min(sSDPω ) → min(POP) for ω → ∞, but to the best of our
knowledge there is no result for a rate of convergence or guaranteed approximation of the global minimum
for a fixed relaxation order ω ≥ ωmax in the case of a general POP.
To illustrate the idea of our transformation technique, consider the following example of a simple unconstrained POP, whose optimal value is −∞:
min 10x31 − 102 x31 x2 + 103 x21 x22 − 104 x1 x32 + 105 x42
(2.65)
It is straight forward that POP (2.65) is equivalent to
min 10x1 x3 − 102 x3 x4 + 103 x24 − 104 x4 x5 + 105 x25
s.t. x3 = x21 ,
x4 = x1 x2 ,
x5 = x22 ,
(2.66)
52
CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION
where we introduced three additional variables x3 , x4 and x5 . Obviously QOP (2.66) is not the only QOP
equivalent to POP (2.65): The QOP
min
s.t.
10x3 − 102 x2 x3 + 103 x5 x6 − 104 x1 x4 + 105 x2 x4
x3 = x1 x5 ,
x4 = x2 x6 ,
x5 = x21 ,
x6 = x22 ,
(2.67)
is equivalent to (2.65) as well. We notice the number of additional variables in QOP (2.66) equals three,
whereas it equals four in QOP (2.67). Thus, there are numerous ways to transform a higher degree POP
into a QOP in general. For the transformation procedures we are proposing, we consider 1) the number of
additional variables should be as small as possible, in order to obtain a SDP relaxation of smaller size, 2)
sparsity of a POP should be maintained under the transformation and 3) the quality of the SDP relaxation
for the derived QOP should be as good as possible. How to deal with 3) is discussed in 2.3.2, 1) and 2) are
discussed in the following.
Maintaining sparsity
The transformation proposed in the previous subsection raises the question, whether the correlative sparsity
of a POP is preserved under the transformation, i.e., whether the resulting QOP is correlatively sparse as
well.
Let POP⋆ be a correlatively sparse POP of dimension n, G(N, E ′ ) the chordal extension of its csp
graph, (C1 , . . . , Cp ) the maximal cliques of G(N, E ′ ) and nmax = maxi=1,...,p | Ci |. Let xn+1 = xi xj be
˜ denote the POP derived after substituting
the substitution variable for some i, j ∈ {1, . . . , n}. Let POP
⋆
′
xn+1 = xi xj in POP . Given the chordal extension G(N, E ) of the csp graph of POP⋆ , a chordal extension
˜ over the vertex set Ñ = N ∪ {n + 1} can be obtained by the extension: For a clique
of the csp graph of POP
Cl with {i, j} ⊂ Cl add the edges {v, n + 1} for all v ∈ Cl and obtain the clique C̃l . For each clique Ck not
containing {i, j}, set C̃k = Ck . In the end we obtain the graph G(Ñ , Ẽ ′ ) which is a chordal extension of
˜
the csp graph G(Ñ , Ẽ) of POP.
Note, (C̃1 , . . . , C̃p ) are maximal cliques for G(Ñ , Ẽ ′ ) and for all C̃l holds
| C̃l |≤| Cl | +1, i.e. ñmax ≤ nmax + 1. Moreover, the number of maximal cliques p remains unchanged
under the transformation. As pointed out, G(Ñ , Ẽ ′ ) is one possible chordal extension of G(Ñ , Ẽ). It seems
reasonable to expect that the heuristics we are using for the chordal extension, such as the reverse CuthillMcKee and the symmetric minimum degree ordering, add less edges to G(Ñ , Ẽ) than we did in constructing
G(Ñ , Ẽ ′ ). Thus, we are able to apply the sparse SDP relaxations efficiently to the POPs derived after each
iteration of the transformation algorithm. For illustration we consider Figure 2.11 and Figure 2.12, where
the csp matrices of two POPs and their QOPs are pictured.
0
0
5
10
10
20
15
30
20
40
25
50
30
60
35
70
0
5
10
15
20
nz = 598
25
30
35
0
10
20
30
40
nz = 940
50
60
70
Figure 2.11: CSP matrix of the chordal extension of POP pdeBifurcation(7) (left) and its QOP (right)
derived under strategy BI.
53
2.3. REDUCTION TECHNIQUES FOR SDP RELAXATIONS FOR LARGE SCALE POP
0
0
5
10
10
20
15
20
30
25
40
30
50
35
40
60
45
70
0
5
10
15
20
25
nz = 412
30
35
40
45
0
10
20
30
40
nz = 624
50
60
70
Figure 2.12: CSP matrix of the chordal extension of POP Mimura(25) (left) and its QOP (right) derived
under strategy BI.
We observe that the sparsity pattern of the chordal extension of the csp graph is maintained under the
substitution procedure. However, if the number of substitutions, which is required to transform a higher
degree POP into a QOP, is far greater than the number of variables of the original POP, it may occur that
we obtain a dense QOP under the transformation procedure. To illustrate this effect, consider the chordal
extension of csp matrix of the QOP derived for the POP randomEQ(7,3,5,8,0) which is pictured in Figure
2.13. In that example, the number n of variables of the original POP equals seven, the number of additional
variables equals 108.
0
0
1
20
2
40
3
4
60
5
80
6
100
7
8
0
1
2
3
4
nz = 43
5
6
7
8
0
20
40
60
nz = 2235
80
100
Figure 2.13: CSP matrix of the chordal extension of POP randomWithEQ(7,3,5,8,0) (left) and its QOP
(right).
Minimizing the number of additional variables
Let n denote the number of variables involved in a POP and ñ the number of variables in the corresponding
QOP. The first question we are facing is, how to transform a POP into a QOP such that the number
k0 := ñ − n of additional variables is as small as possible. Each additional variable xn+k corresponds to
54
CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION
the substitution of a certain quadratic monomial xi xj by xn+k . Given an arbitrary POP, the question to
find a substitution procedure minimizing ñ is a difficult problem. We propose four different heuristics for
transforming a POP into a QOP, which aim at reducing the number k0 of additional variables. At the end
of this section we give some motivation, why it is more important to find a strategy optimizing the quality
of the SDP relaxation than one that minimizes the number k0 of additional variables.
Our transformation algorithm iterates substitutions of pairs of quadratic monomials xi xj in the higher
degree monomials in objective function and constraints by a new variable xn+k , and adding the substitution
relation xn+k = xi xj as constraints to the POP. Let POP0 denote the original POP, and POPk the
POP obtained after the k-th iteration, i.e. after substituting xn+k = xi xj and adding it as constraint to
POPk−1 . The algorithm terminates as soon as POPk0 is a QOP for some k0 ∈ N. In each iteration of
the transformation algorithm we distinguish two steps. The first one is to choose which pair of variables
(xi , xj ) (1 ≤ i, j ≤ n + k) is substituted by the additional variable xn+k+1 . The second one is to choose to
which extent xi xj is substituted by xn+k+1 in each higher degree monomial.
Step 1: Choosing the substitution variables
k
Definition 2.7 Let POPk be a POP of dimension ñ with m̃ constraints (g1k , . . . , gm
). The higher monok
k
mial set MS of POP is given by
MkS = α ∈ Nñ | ∃i ∈ {0, . . . , m̃} s.t. α ∈ supp(gik ) and | α |≥ 3 ,
where g0 := p, gi0 := gi , and the higher monomial list Mk of POPk by
Mk = (α, wα ) | α ∈ MkS and wα := # i | α ∈ supp(gik )
.
By Definition 2.7, the higher monomial list of a QOP is empty.
Definition 2.8 Given α ∈ Nn and a pair
(i, j) where 1 ≤ i, jα ≤ n, we define the dividing coefficient
x
xα
α
∈ R[x] and
/ R[x].
ki,j
∈ N0 as the integer that satisfies
α
k
kα +1 ∈
i,j
i,j
(xi xj )
k
(xi xj )
0
Given POP the k-th iterate of POP and its higher monomial list Mk , determine the symmetric matrix
C(POPk ) ∈ R(n+k)×(n+k) given by
X
α
ki,j
wα .
C(POPk )i,j = C(POPk )j,i =
(α,wα )∈Mk
We consider two alternatives to choose a pair (xi , xj )(1 ≤ i, j ≤ n + k) to be substituted by xn+k+1 :
A. Naive criterion:
xα
xi xj ∈ R[x].
Choose a pair (xi , xj ) such that there exists a α ∈ MS (POPk ) which satisfies
B. Maximum criterion: Choose a pair (xi , xj ) such that C(POPk )i,j ≥ C(POPk )u,v ∀1 ≤ u, v ≤ n+k.
Step 2: Choose the substitution strategy Next we have to decide to what extent we substitute
xn+k+1 = xi xj in each monomial of MS (POPk ). We will distinguish full and partial substitution. Let
us demonstrate the importance of considering that question on the following two examples.
Example 2.5 Consider two different substitution strategies for transforming the problem to minimize x41
into a QOP:
min x41
(1) ւ
ց
(2)
min x22
min x21 x2
s.t. x2 = x21
s.t. x2 = x21
(2.68)
↓
min x1 x3
s.t. x2 = x21
x3 = x1 x2
2.3. REDUCTION TECHNIQUES FOR SDP RELAXATIONS FOR LARGE SCALE POP
55
In both substitution strategies, we choose x21 for substitution in the first step. In (1) we fully substituted
x21 by x2 , whereas in (2) we substituted x21 partially. By choosing full substitution in the first iteration in
(1), we need one additional variable to obtain a QOP, partial substitution requires two additional variables
to yield a QOP.
Example 2.6
(1)
min
s.t.
↓
min
s.t.
↓
min
s.t.
ւ
x33
x1 x2 x3 ≥ 0
x3 = x21
min x61
s.t. x31 x2 ≥ 0
ց
min
s.t.
↓
min
s.t.
x3 x4
x1 x2 x3 ≥ 0
x3 = x21
x4 = x23
(2)
x21 x23
x1 x2 x3 ≥ 0
x3 = x21
x24
x2 x4 ≥ 0
x3 = x21
x4 = x1 x2
(2.69)
x3 x4
x2 x5 ≥ 0
x3 = x21
x4 = x23
x5 = x1 x3
In this example full substitution (1) of x21 requires three, and partial substitution (2) only two additional
variables to yield a QOP.
The examples illustrate it depends on the structure of the higher monomial set, whether partial or full
substitution requires less additional variables and results in a smaller size of the SDP relaxation. In general
partial and full substitution are given as follows.
I. Full substitution: Let tf ri,j : R[x] → R[z], where x ∈ Rr and z ∈ Rr+1 for a r ∈ N and i, j ∈
{1, . . . , r}, be a linear operator defined by its mappings for each monomial xα ,
(
min(α ,α )
αj−1 αj −min(αi ,αj ) αj+1
αi−1 αi −min(αi ,αj ) αi+1
zj
zj+1 . . . zrαr zr+1 i j , if i 6= j,
z1α1 . . . zi−1
zi
zi+1 . . . zj−1
r
α
αi
tf i,j (x ) =
⌊ 2 ⌋
αi−1 mod(αi ,2) αi+1
zi+1 . . . zrαr zr+1
zi
z1α1 . . . zi−1
,
if i = j.
P
substitutes xi xj
Thus, tf ri,j (g(x)) = α∈supp(g) cα (g)tf ri,j (xα ) for any g ∈ R[x]. The operator tf n+k
i,j
by xn+k+1 in each monomial to the maximal possible extent.
II. Partial substitution: Let tp ri,j : R[x] → R[z], where x ∈ Rr and z ∈ Rr+1 for a r ∈ N and
i, j ∈ {1, . . . , r}, be a linear operator defined by its mappings for each monomial xα ,


if i 6= j,
tf ri,j (xα ),



t r (xα ),
if
i = j and αi odd,
f i,j
tp ri,j (xα ) =
r
α

if i = j and log2 (αi ) ∈ N0 ,
tf i,j (x ),


z α1 . . . z αi−1 z gi z αi+1 . . . z αr z 21 (αi −gi ) , else,
1
i−1
i
i+1
r
where gi := gcd(2⌊log2 (αi )⌋ , αi ). Thus, tp ri,j (g(x)) =
r+1
P
r
α
α∈supp(g) cα (g)tp i,j (x )
for any g ∈ R[x].
We notice that full and partial substitution only differ in the case i = j, αi even and log2 (αi ) ∈
/ N0 holds.
By pairwise combining the choice of A or B in Step 1 and the choice of I or II in Step 2, we obtain four
different procedures to transform POPk−1 into POPk that we denote as AI, AII, BI and BII. We do not
expect AI or AII to result in a POP with a small number of substitutions, as A does not take into account
56
CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION
the structure of the higher degree monomial list MSk−1 , but we use AI and AII to evaluate the potential of
BI and BII. The numerical performance of these four procedures is demonstrated on some example POPs in
Table 2.6, where n denotes the number of variables in the original POP, deg the degree of the highest order
polynomial in the POP, and k0 the number of additional variables required to transform the POP into a
QOP under the respective substitution strategy. The POPs pdeBifurcation(n) are derived from discretizing
differential equations, which is the topic of Chapter 3, the other POPs are test problems from [102]. As
expected, strategy B is superior to A for all but one example class of POP, when reducing the number of
variables is concerned.
The entire algorithm to transform a POP into a QOP can be summarized by the scheme in Table 2.5.
As mentioned before the QOP of dimension n + k derived by AI, AII, BI or BII is equivalent to the original
POP of dimension n. In fact it is easy to see, if x̃ ∈ Rn+k an optimal solution of the QOP, the vector
(x̃1 , . . . , x̃n ) of the first n components of x̃ is an optimizer of the original POP.
INPUT
WHILE
1.
2.
3.
OUTPUT
POP0 with M0S
MkS 6= ∅
Determine the pair (xi , xj ) for substitution by A or B.
Apply tf ki,j or tp ki,j to each polynomial in POPk and derive POPk+1 .
Update k → k + 1, POPk → POPk+1 , MkS → Mk+1
S .
QOP = POPk0
Table 2.5: Scheme for transforming a POP into a QOP.
POP
BroydenBand(20)
BroydenBand(60)
nondquar(32)
nondquar(8)
optControl(10)
randINEQ(8,4,6,8,0)
randEQ(7,3,5,8,0)
pdeBifurcation(5)
pdeBifurcation(10)
randINEQ(3,1,3,16,0)
randUnconst(3,2,3,14,0)
n
20
60
32
8
60
8
7
25
100
3
3
deg
6
6
4
4
4
8
8
3
3
16
14
k0 (AI)
229
749
93
21
60
253
135
25
100
145
86
k0 (AII)
211
691
93
21
60
307
146
25
100
192
107
k0 (BI)
60
180
94
22
60
248
116
25
100
105
63
k0 (BII)
40
120
94
22
60
238
115
25
100
117
69
Table 2.6: Number of required variables for strategies AI, AII, BI and BII.
Computational complexity
Finally, let us consider how the size of the sparse SDP relaxation of order ω = 1 for a QOP depends on
the number k0 of additional variables. Let a sparse POP of dimension n be given by the polynomials
(p, g1 , . . . , gm ) and the maximal cliques (C1 , . . . , Cp ) of the chordal extension. With the construction in
Maintaining sparsity above, the corresponding QOP of dimension ñ = n + k0 has the maximal cliques
(C̃1 , . . . , C̃p ) such that Ci ⊆ C̃i and ñi ≤ ni + k0 for all (i = 1, . . . , p), where ni =| Ci | and ñi =| C̃i |.
All partial localizing matrices M0 (gk y, F̂k ) are scalars in sSDP1 (QOP ). The size of the partial moment
matrices M1 (y, C̃i ) is
d(1, ñi ) = ñi + 1 ≤ ni + k0 + 1 = O(k0 ).
(2.70)
2.3. REDUCTION TECHNIQUES FOR SDP RELAXATIONS FOR LARGE SCALE POP
57
Thus, the size of the linear matrix inequality is bounded by
m+k
X0
j=1
1+
p
X
i=1
d(1, ñi ) ≤ m + k0 + p (nmax + k0 + 1) ≤ m + k0 + n (nmax + k0 + 1).
(2.71)
The length of the vector variable y in sSDP1 (QOP ) is bounded by
Pp
Pp
| y | ≤ i=1 | y(C̃p ) |= i=1 d(2, 2ñi ) ≤ 12 p (2nmax + 2k0 + 2) (2nmax + 2k0 + 1)
≤ 2p (nmax + k0 + 1)2 ≤ 2n (nmax + k0 + 1) = O(k02 ).
(2.72)
Thus, the size of the linear matrix inequalities of the sparse SDP relaxation is linear and the length of the
moment vector y quadratic in the number k0 of additional variables. For this reason the computational cost
does not grow too fast, even if k0 is not minimal. Heuristics BI and BII are sufficient in order to derive
QOP with a small number k0 of additional variables.
Moreover, the bounds (2.71) and (2.72) for the size of the primal and dual variables of the SDP relaxation
for the QOP are to be compared to the respective bounds for the SDP relaxation of the POP. If we assume
ωmax = ωi for all i ∈ {1, . . . , m}, the size of the linear matrix inequality in the SDP relaxation of order
ωmax for the original POP can be bounded by
m
X
j=1
d(nj , ωmax − ωj ) +
p
X
i=1
d(ni , ωmax ) ≤ m + n
nmax + ωmax
ωmax
,
(2.73)
and the length of the moment vector by
p
X
i=1
d(2ni , 2ωmax ) ≤ n
2nmax + 2ωmax
2nmax
.
(2.74)
Already for ωmax = 2 the bounds (2.73) and (2.74) are of second and fourth degree in nmax , whereas
(2.71) and (2.72) are linear and quadratic in nmax + k0 , respectively. Therefore we can expect a substantial
reduction of the SDP relaxation under the transformation procedure. Note, we did not exploit any sparsity
in the SDP relaxation or any intersection of the maximal cliques (C1 , . . . , Cp ) and (C̃1 , . . . , C̃p ) when deriving
these bounds. Thus, the actual size of SDP relaxations in numerical experiments may be far smaller than
the one suggested by these bounds.
2.3.2
Quality of SDP relaxations for QOP
A polynomial optimization problem (POP) and the quadratic optimization problem (QOP) derived from it
under one of the transformation strategies AI, AII, BI or BII are equivalent. However, the same statement
does not hold for the SDP relaxations of both problems. In fact, the SDP relaxation for QOP is weaker
than the SDP relaxation for the original POP. Before stating this negative result, we consider an example
to illustrate it.
Example 2.7 Let a POP and its equivalent QOP be given by
POP min
s.t.
⇔ QOP min x̃23
x21 x22
s.t. x̃1 x̃3 ≥ 0
x21 x2 ≥ 0
x̃1 x̃2 = x̃3 .
58
CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION
The dense SDP relaxations of minimal relaxation order dSDP2 (POP) and dSDP1 (QOP) are given by
min y22
s.t. y21 ≥ 0




M2 (y) = 



y00
y10
y01
y20
y11
y02
y10
y20
y11
y30
y21
y12
y01
y11
y02
y21
y12
y03
y20
y30
y21
y40
y31
y22
y11
y21
y13
y31
y22
y13
y02
y12
y03
y22
y13
y04

min ỹ002
s.t. ỹ101 ≥ 0
ỹ110 = ỹ001



<0




ỹ000
 ỹ100

M1 (ỹ) = 
ỹ010
ỹ001
ỹ100
ỹ200
ỹ110
ỹ101
ỹ010
ỹ110
ỹ020
ỹ011

ỹ001
ỹ101 
 < 0.
ỹ011 
ỹ002
The equivalence of POP and QOP holds with the relation
=
(x̃1 , x̃2 , x̃3 , x̃21 , x̃1 x̃2 , x̃1 x̃3 , x̃22 , x̃2 x̃3 , x̃23 )
(x1 , x2 , x1 x2 , x21 , x1 x2 , x21 x2 , x22 , x1 x22 , x21 x22 ).
(2.75)
Given a feasible solution y ∈ Rd(2,4) = R15 for dSDP2 (POP), we exploit the relations (2.75) to define a
vector ỹ = (ỹ000 , ỹ100 , ỹ010 , ỹ001 , ỹ200 , . . . , ỹ002 ) ∈ Rd(3,1) = R10 as
ỹ := (y00 , y10 , y01 , y11 , y20 , y11 , y21 , y02 , y12 , y22 ).
Then ỹ110 = y11 = ỹ001 holds by definition of ỹ, and ỹ101
dSDP2 (P OP ). Furthermore, for the moment matrix, we have

y00 y10 y01
 y10 y20 y11
M1 (ỹ) = 
 y01 y11 y02
y11 y21 y12
= y21 ≥ 0 as y is a feasible solution of

y11
y21 
.
y12 
y22
Thus M1 (ỹ) < 0, as M1 (ỹ) is a principal submatrix of M2 (y) < 0. It follows that ỹ is feasible for
dSDP1 (QOP) and that min(dSDP1 (QOP)) ≤ min(dSDP2 (POP)) holds.
A generalization of the observation in Example 2.7 is given by the following proposition.
Proposition 2.1 Let a POP of dimension n with ωmax > 1 of form (2.1) be given by the set of polynomials (p, g1 , . . . , gm ) and the corresponding QOP of dimension n + k derived via AI, AII, BI or BII
by (p̃, g̃1 , . . . , g̃m ). Then, for each feasible solution of dSDPωmax (POP), there exists a feasible solution of
dSDP1 (QOP) with the same objective value. Thus, min(dSDP1 (QOP))≤ min(dSDPωmax (POP)).
Proof:
Let y ∈ Rd(n,2ωmax) be a feasible solution of SDPωmax (POP). Each yα corresponds to a monomial xα for
all α with | α |≤ 2ωmax , x ∈ Rn . Moreover, with respect to the substitution relation for all monomials
x̃α ∈ Rn+k with | α |≤ 2 there exists a monomial xβ(α) ∈ Rn , such that
x̃α = xβ(α) , | β(α) |≤ 2ωmax .
(2.76)
As β(·) in (2.76) is constructed via the substitution relations,
β(α1 ) = β(α2 )
(2.77)
holds for α1 , α2 ∈ Nn+k with | α1 |=| α2 |≤ 2, whenever QOP has a substitution constraint x̃α1 = x̃α2 .
Now, define ỹ ∈ Rd(n+k,2) where ỹα := yβ(α) for all | α |≤ 2. Then, ỹ is feasible for dSDP1 (QOP), as all
equality constraints derived from substitutions are satisfied due to (2.77), and as the principal submatrices of moment matrix M1 (ỹ) and of the localizing matrices M0 (ỹg̃k ) (k = 1, . . . , m), which are obtained
by simultaneously deleting rows/columns linear dependent on the remaining rows/columns, are principal
2.3. REDUCTION TECHNIQUES FOR SDP RELAXATIONS FOR LARGE SCALE POP
59
submatrices of Mωmax (y) and Mωmax −ωk (y gk ) (k=1,. . . ,m), respectively. Finally, the objective values for
y and ỹ coincide.
This result for the dense SDP relaxation can be extended to the sparse SDP relaxation of minimal
relaxation order in an analog manner, if the maximal cliques (C̃1 , . . . , C̃m ) of the chordal extended csp
graph of the QOP are chosen appropriately with respect to the maximal cliques of the chordal extended csp
graph of the POP. Therefore it seems reasonable to expect that in general sSDP1 for the QOP provides an
approximation to the global minimum of the POP which is far weaker than the one provided by sSDPωmax
for the original POP. One possibility to strengthen the SDP relaxation for QOPs is to increase the relaxation
order to some ω > 1. But, as in the case of the SDP relaxation for POP we can not guarantee to find the
global minimum of a general QOP for any fixed ω ∈ N. Moreover each of the additional equality constraints
results in 21 (d(n, ω − 1) + 1)d(n, ω − 1) equality constraints in sSDPω (QOP) for ω > 1. Therefore it seems
more promising to consider additional techniques to improve the quality of the sSDP1 for QOPs.
Local optimization methods
As pointed out before, the minimum of the sparse SDP relaxation converges to the minimum of the QOP
for ω → ∞. Moreover, an accurate approximation can be obtained by the sparse SDP relaxation of order
ω ∈ {ωmax , . . . , ωmax + 3} for many POPs [102]. However, the quality of the sparse SDP relaxation for
the QOP is weaker than the one for the original POP. Therefore, the solution provided by the sparse
SDP relaxation for the QOP can be understood as a first approximation to the global minimizer of the
original POP, and it may serve as initial point for a locally convergent optimization technique applied to
the original POP. For instance sequential quadratic programming (SQP) [8] can be applied to POP where
the sparse SDP solution for the corresponding QOP is taken as starting point. In the case a POP has
equality constraints only, the number of constraints coincides with the number of variables and the feasible
set is finite, we may succeed in finding the global optimizer of the POP by applying Newton’s method for
nonlinear systems [77] to the polynomial system given by the feasible set of the POP, again starting from
the solution provided by the sparse SDP relaxation for the QOP.
Higher accuracy via Branch-and-Cut bounds
The sparse SDP relaxations (2.18) incorporate lower and upper bounds for each component of the ndimensional variable,
lbdi ≤ xi ≤ ubdi ,
∀i ∈ {1, . . . , n},
(2.78)
in order to establish the compactness of the feasible set of a POP. Compactness is a necessary condition
to guarantee the convergence of the sequence of sparse SDP relaxations towards the global optimum of the
POP. Moreover, the numerical performance for solving the sparse SDP relaxations depends heavily on the
bounds (2.78). The tighter we choose these bounds, the better approximates the solution of the SDP the
minimizer of the POP. Prior to solving the sparse SDP relaxations for the QOP derived from a POP, we fix
the bounds (2.78) for the components of the POP and determine lower and upper bounds for the additional
variables according to the substitution relation. For instance for xn+1 = x2i the bounds are defined as
(
0,
if lbdi ≤ 0 ≤ ubdi
lbdn+1 =
,
2
2
(2.79)
min(lbdi , ubdi ), else,
2
2
ubdn+1 = max(lbdi , ubdi ).
In 2.3.3 we will discuss the sensitivity of the choice of the lower and upper bounds on the accuracy of the
SDP solution for some example POPs.
A more sophisticated technique to increase the quality of the SDP relaxation of the QOP is inspired by
a Branch-and-Cut algorithm for bilinear matrix inequalities due to Fukuda and Kojima [22]. As nonconvex
quadratic constraints can be reduced to bilinear ones, we are able to adapt this technique for a QOP derived
from a higher degree POP. The technique is based on cutting the feasible region of the SDP, such that every
feasible solution of the QOP remains feasible for the SDP. We distinguish two sets of constraints which
60
CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION
resemble the convex relaxations (5) proposed in [22]. Let (p, g1 , . . . , gm ) be a QOP with lower and upper
bounds lbdi and ubdi for all components xi (i = 1, . . . , n). The first set of constraints we consider is
the following. For each constraint gi (i = 1, . . . , m) of form xk = xi xj with i 6= j we add the following
constraints to the QOP
xk ≤ ubdj xi + lbdi xj − lbdi ubdj
(2.80)
xk ≤ lbdj xi + ubdi xj − ubdi lbdj .
For each constraint of the form xk = x2i we add the following constraint to the QOP
xk
≤ (ubdi + lbdi ) xi − lbdi ubdi .
(2.81)
The second set of constraints shrinks the feasible set of the SDP relaxation even further than the constraints
(2.80) and (2.81). For each monomial xi xj of degree 2 which occurs in the objective p or one of the
constraints gi (i = 1, . . . , m) of the QOP, we add constraints as follows. If the QOP contains a constraint
gi (i = 1, . . . , m) of the form xk = xi xj , we add the constraints (2.80) for i 6= j and (2.81) for i = j. If the
QOP does not contain a constraint xk = xi xj , we add the quadratic constraints
xi xj
xi xj
for i 6= j and the constraint
x2i
≤ ubdj xi + lbdi xj − lbdi ubdj
≤ lbdj xi + ubdi xj − ubdi lbdj
≤ (ubdi + lbdi ) xi − lbdi ubdi
(2.82)
(2.83)
for i = j. When linearized, both, the linear constraints (2.80) and (2.81) and the quadratic constraints
(2.82) and (2.83) result in a smaller feasible region of the SDP relaxation which still contains the feasible
region of the QOP. The efficiency of these sets of additional constraints is demonstrated in Section 2.3.3 as
well.
Remark 2.8 A general QOP given by (p, g1 , . . . , gm ) of dimension n can be transformed into a QOP of
dimension n+1 with linear objective function by adding the equality constraint h(x, xn+1 ) := xn+1 −p(x) = 0
and choosing xn+1 as objective. A QOP with linear objective is a special case of a quadratic SDP (2.64),
which we can apply the SDP relaxation (a) - (d) from 2.2.7 to. Thus, an arbitrary POP can be attempted by
three different SDP relaxations. 1) The sparse SDP relaxations (2.18) exploiting correlative sparsity applied
directly to the POP, 2) the sparse SDP relaxations (2.18) applied to an equivalent QOP, and 3) the SDP
relaxations (a)-(d) from 2.2.7 exploiting d- and/or - r-space sparsity applied to an equivalent QOP.
Remark 2.9 The constraints (2.82) are a particular case of reformulation-linearization-techniques
(RLT) [89, 90]. For the SDP relaxation (b) from 2.2.7, instead of the constraints (2.80) - (2.83) we impose
RLT constraints in the following way: Add the constraints
Wi,j
Wi,j
Wi,j
Wi,j
− lbdi W1,j − lbdj W1,i
− ubdi W1,j − ubdj W1,i
− lbdi W1,j − ubdj W1,i
− lbdj W1,i − ubdi W1,j
≥ −lbdi lbdj ,
≥ −ubdi ubdj ,
≤ −lbdi ubdj ,
≤ −lbdj ubdi ,
(2.84)
to the SDP relaxation (b), if {i, j} subset of some clique of the chordal extension of the d-space sparsity
pattern graph. The constraints (2.84) strengthen the SDP relaxation and preserve the d-space sparsity
structure at the same time. In the latter, whenever we apply the SDP relaxations (b) from 2.2.7 to a QOP,
we impose the constraints (2.84).
2.3.3
Numerical examples
The substitution procedure and the sparse SDP relaxations are applied to a number of test problems. These
test problems encompass medium and large scale POPs of higher degree. The numerical performance of the
sparse SDP relaxations of these POPs under the transformation algorithm is evaluated. In the following the
2.3. REDUCTION TECHNIQUES FOR SDP RELAXATIONS FOR LARGE SCALE POP
Substitution
AI
AII
BI
BII
ñ
138
153
115
119
size(Aq )
[777, 6934]
[828, 6922]
[753, 5785]
[788, 6497]
61
nnz(Aq )
7106
7116
5934
6653
Table 2.7: Size of sSDP1 for QOPs from POP randEQ(7,3,5,8,0) with n = 7.
Branch-and-Cut bounds (2.80) and (2.81) are denoted as linear BC-bounds, (2.82) and (2.83) as quadratic
BC-bounds. The optional application of sequential quadratic programming starting from the solution of
the SDP relaxation is abbreviated as SQP. Given a numerical solution x of an equality and inequality
constrained POP, its scaled feasibility error is given by
ǫsc = min {− | hi (x)/σi (x) |, min {gj (x)/σ̂j (x), 0} ∀ i, j} ,
where hi (i = 1, . . . , k) denote the equality constraints, gj (j = 1, . . . , l) the inequality constraints, and σi
and σ̂j are the maxima of the monomials in the corresponding polynomials hi and gj at x, respectively.
Note, an equality and inequality constrained POP is a special case of the POP (2.1), if we define fi :=
gi (i = 1, . . . , l), gi := hi (i = l + 1, . . . , l + k) and gi := −hi (i = k + l + 1, . . . , 2k + l). The value of the
objective function at x is given by f0 (x). Let NC denote the number of constraints of a POP. ’OOM’ as
entry for the scaled feasibility error denotes the size of the SDP is too large to be solved by SeDuMi [95]
and results in a memory error (’Out of memory’). A two-component entry for lbd or ubd indicates that the
first component is used as a bound for the first n2 variables and the second component for the remaining n2
variables of the POP.
All numerical experiments are conducted on a LINUX OS with CPU 2.4 GHz and 8 Gb memory. The
total processing time in seconds is denoted as tC .
Randomly generated POPs
As a first class of test problems, consider randomly generated POPs with inequality or equality constraints.
We are interested in the numerical performance of the sparse SDP relaxation for the corresponding QOPs for
different substitution strategies and different choices of lower, upper and Branch-and-Cut bounds. We will
focus on comparing strategies BI and BII as they yield POPs with a small number of additional variables.
For the random equality constrained POP randEQ(7,3,5,8,0) [102] of degree 8 with 7 variables, the
size of the SDP relaxation sSDP4 is described by the matrix Ap of size [2870, 95628] with 124034 non-zero
entries. This size is reduced substantially under each of the four substitution strategies, as can be seen
in Table 2.7. In this table the matrix Aq in SeDuMi input format [95] and its number of nonzero entries
nnz(Aq ) describe the size of the sparse SDP relaxation. The reduction of the size of the SDP relaxation
results in reducing the total processing time tC by two magnitudes, as can be seen in Table 2.8.
Moreover, as reported in Table 2.8, the performance of AI, AII, BI and BII is similar - with the one
of BI being slightly better than the others. In this example with few equality constraints, it is easy
to obtain a feasible solution, but it requires additional techniques as SQP to obtain an optimal solution. We know, min(sSDP1 (QOP )) and min(sSDP4 (P OP )) are lower bounds for min(POP). Moreover,
min(sSDP1 (QOP )) ≤ min(sSDP4 (P OP )) holds with Proposition 2.1. As reported in Table 2.8 the bound
provided by sSDP1 (QOP ) is much weaker than the one provided by sSDP4 (P OP ). Note, the objective
value f0 (x) and min(sSDP1 (QOP )) improve significantly if the lower and upper bounds are chosen tighter.
When chosen sufficiently tight, an accurate optimal solution can be achieved without applying SQP. The
main advantage of the transformation is the reduction of the total processing time by two magnitudes.
The results for the inequality constrained POP randINEQ(8,4,6,8,0) [102] with ωmax = 4 and 8 variables
are given in Table 2.9. In the column for (lbd, ubd) the entries (−0.5, 0.5)⋆ and (−0.5, 0.5)⋆⋆ denote
62
CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION
Substitution
AI
AII
BI
BII
BI
BI
BI
SQP
no
no
no
no
no
yes
no
no
BC-bounds
none
none
none
none
none
none
none
none
(lbd, ubd)
(−∞, ∞)
(-1,1)
(-1,1)
(-1,1)
(-1,1)
(-1,1)
(-0.5,0.5)
(-0.3,0.3)
ω
4
1
1
1
1
1
1
1
n or ñ
7
138
153
115
119
115
115
115
NC
4
135
150
112
116
112
112
112
ǫsc
6e-11
1e-13
1e-13
1e-13
1e-13
7e-18
9e-14
1e-13
min(sSDPω )
-0.708
-247.50
-254.92
-299.11
-284.98
-299.11
-6.55
-1.28
f0 (x)
-0.708
-0.508
-0.517
-0.567
-0.455
-0.708
-0.706
-0.708
tC
333
3
3
2
3
3
3
2
Table 2.8: Results for SDP relaxation of randEQ(7,3,5,8,0).
ubd2 = 0.75 6= 0.5 and (ubd2 , ubd5 ) = (0.75, 0) 6= (0.5, 0.5), respectively. By imposing linear Branch-andCut bounds we obtain a feasible solution, and tightening lbd and ubd improves the objective value of the
approximative solution. Though we did not achieve the optimal value attained by sSDP(POP), it seems
reasonable to expect that successively tightening the bounds further yields a feasible solution with optimal
objective value. As in the previous example tC could be reduced by two magnitudes.
Substitution
BI
BI
BI
BI
SQP
no
no
no
no
no
BC-bounds
none
none
linear
linear
linear
(lbd, ubd)
(−∞, ∞)
(−0.75, 0.75)
(−0.75, 0.75)
(−0.5, 0.5)⋆
(−0.5, 0.5)⋆⋆
ω
4
1
1
1
1
n or ñ
8
239
239
239
239
NC
3
234
680
680
680
ǫsc
0
-1.3
0
0
0
f0 (x)
-1.5
-0.9
-0.6
-0.8
-1.2
tC
1071
14
17
17
16
Table 2.9: Results for SDP relaxation of randINEQ(8,4,6,8,0).
BroydenBand
Another test problem is the BroydenBand(n) problem [66]. It is an unconstrained POP of degree 6 and
dimension n, and its global minimum is 0. Numerical results are given in Table 2.10. The performance
of the sparse SDP relaxation for the QOP with initial bounds and without applying SQP is poor, the
optimal value of the approximate solution and the lower bounds min(sSDP1 (QOP )) are far from the global
optimum. Also, SQP does not succeed in detecting the global optimum if started from an arbitrary starting
point. As reported in Table 2.10, SQP detects a local minimizer with objective 3, if the initial point is
an SDP solution with loose bounds for the QOP. It is interesting to observe that tight lower and upper
bounds, and Branch-and-Cut bounds in combination with applying SQP are crucial to obtain the global
minimum by solving the sparse SDP relaxation for the QOP. In fact, when we apply substitution strategy
BI imposing quadratic Branch-and-Cut bounds yields the global minimum, whereas in the case of applying
BII Branch-and-Cut bounds are not necessary to obtain the global minimum. Note, the total processing
time is reduced from around 1300 seconds to less than 5 seconds.
POPs derived from partial differential equations
An important class of large scale polynomial optimization problems of higher degree is derived from discretizing systems of partial differential equations (PDE). How to derive POPs from PDEs and how to
interpret their solutions is the topic of Chapter 3 and discussed in detail there. In this section we show the
63
2.3. REDUCTION TECHNIQUES FOR SDP RELAXATIONS FOR LARGE SCALE POP
Substitution
BII
BII
BII
BI
BI
BI
BI
BI
BI
BII
BII
BII
BII
BII
BII
SQP
no
yes
yes
yes
no
no
no
yes
yes
yes
no
no
no
yes
yes
yes
BC-bounds
none
none
linear
quadratic
none
linear
quadratic
none
linear
quadratic
none
linear
quadratic
none
linear
quadratic
(lbd, ubd)
(−∞, +∞)
(-1, 1)
(-1, 1)
(-1, 1)
(-0.75,0)
(-0.75,0)
(-0.75,0)
(-0.75, 0)
(-0.75, 0)
(-0.75, 0)
(-0.75, 0)
(-0.75, 0)
(-0.75, 0)
(-0.75, 0)
(-0.75, 0)
(-0.75, 0)
ω
3
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
n or ñ
20
60
60
60
80
80
80
80
80
80
60
60
60
60
60
60
NC
0
40
100
1244
60
60
60
60
140
1284
40
100
1244
40
100
1244
min(sSDPω )
-3e-7
-128
-128
-106
-611
-611
-132
-1396
-611
-611
-26
-24
-8
-26
-24
-8
f0 (x)
5e-9
3
3
3
47
47
28
3
3
6e-8
33
24
9
1e-10
1e-6
2e-7
tC
1328
4
4
4
3
4
4
4
5
5
3
3
4
5
4
5
Table 2.10: Results for SDP relaxation for BroydenBand(20).
transformation procedure from POP to QOP to be a very efficient technique for this class of POPs. Many
POPs derived from PDE are of degree 3, but as the number of their constraints is in the same order as the
number of variables, transformation into QOPs yields SDP relaxations of vastly reduced size. Due to the
structure of the higher degree monomial set of these POPs, there is an unique way to transform them into
QOPs. Therefore, we examine the impact of lower, upper and Branch-and-Cut bounds and not the choice
of the substitution strategy.
POP
pdeBifurcation(6)
pdeBifurcation(10)
pdeBifurcation(14)
Mimura(50)
Mimura(50)
Mimura(100)
Mimura(100)
StiffDiff(6,12)
ginzOrDiri(9)
ginzOrNeum(11)
n
36
100
196
100
100
200
200
144
162
242
ñ
72
200
392
150
150
300
300
216
324
484
ωp
2
2
2
2
3
3
2
2
2
2
size(Ap )
[2186, 17605]
[16592, 139245]
[454497, 3822961]
[3780, 31258]
[19300, 280007]
[39100, 565357]
[7630, 63158]
[18569, 163162]
[74628, 666987]
[166092, 1451752]
nnz(Ap )
23801
189737
5208475
39068
354067
713767
78818
219020
906558
2504418
ωq
1
1
1
1
2
2
2
1
1
1
size(Aq )
[422, 4039]
[1643, 18646]
[4126, 45189]
[690, 5728]
[7223, 76383]
[14623, 155183]
[1390, 11628]
[878, 6700]
[4567, 49305]
[8063, 96367]
nnz(Aq )
4174
19039
46000
6078
91755
186155
12328
7402
50233
97776
Table 2.11: Size of the SDP relaxation for POP and QOP, respectively.
Consider the POPs in Table 2.11, where ωp and ωq the relaxation order of sSDPω for POP and QOP,
respectively, to demonstrate the reduction of the size of the SDP relaxation described by the size of the
matrix A in SeDuMi input format [95] and its number of nonzero entries nnz(A). Thus, the SDP relaxations
for the QOPs can be solved in vastly shorter time than the one for the original POPs. The computational
results of the original SDP relaxation and the SDP relaxation of the QOPs for different lower, upper and
Branch-and-Cut bounds are reported in Table 2.12 for the POP pdeBifurcation(·). In this example the
accuracy of the sparse SDP relaxation for the QOP is improved by tightening the upper bounds for the
64
CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION
components of the variable x̃ in the QOP. Also, the additional application of SQP improves the accuracy a
lot. Additional Branch-and-Cut bounds seem to have no impact on the quality of the solution. The total
processing time is reduced substantially under the transformation. The original sparse SDP relaxation for
pdeBifurcation(14) of dimension 200 cannot be solved in SeDuMi due to a memory error, but the SDP
relaxation for the corresponding QOP with tight upper bounds can be solved accurately in 100 seconds.
POP
pdeBifurcation(6)
pdeBifurcation(6)
pdeBifurcation(6)
pdeBifurcation(6)
pdeBifurcation(6)
pdeBifurcation(6)
pdeBifurcation(6)
pdeBifurcation(10)
pdeBifurcation(10)
pdeBifurcation(10)
pdeBifurcation(10)
pdeBifurcation(10)
pdeBifurcation(14)
pdeBifurcation(14)
pdeBifurcation(14)
pdeBifurcation(14)
pdeBifurcation(14)
Substitution
AI
AI
AI
AI
AI
AI
AI
AI
AI
AI
AI
AI
AI
AI
SQP
no
no
no
no
yes
no
yes
no
no
yes
no
yes
no
no
yes
no
yes
BC-bounds
none
none
linear
quadratic
none
none
none
none
none
none
none
none
none
none
none
none
none
ubd
0.99
0.99
0.99
0.99
0.99
0.45
0.45
0.99
0.99
0.99
0.45
0.45
0.99
0.99
0.99
0.45
0.45
ω
2
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
n or ñ
36
72
72
72
72
72
72
100
200
200
200
200
196
392
392
392
392
ǫsc
8e-11
9.6e-2
9.6e-2
9.6e-2
7.3e-9
1.5e-2
1.4e-11
3.1e-10
4.7e-2
2.7e-13
6.4e-3
1e-11
OOM
2.4e-2
7.9e-14
3.6e-3
5.2e-11
f0 (x)
-9.0
-22.1
-22.1
-22.1
-9.0
-9.5
-9.0
-21.6
-56.0
-21.6
-23.2
-21.6
-103.1
-39.9
-43.1
-39.9
tC
14
2
2
2
5
1
2
2159
20
66
13
22
90
418
85
107
Table 2.12: Results for SDP relaxation for POP pdeBifurcation with lbd=0.
In the case of POP Mimura(50), c.f. Table 2.13, quadratic Branch-and-Cut bounds are necessary in
addition to applying SQP, in order to obtain an accurate approximate solution of the global minimizer. In
the POPs in Table 2.14 it is sufficient to apply SQP starting from the solution of the sparse SDP relaxation
for the QOP. For these problems tC can be reduced by up to two magnitudes. Furthermore, the original
SDP relaxation for ginzOrDiri(9) and ginzOrDiri(13) is too large to be solved, whereas the SDP relaxations
for the QOPs are tractable.
POP
Mimura(50)
Mimura(50)
Mimura(50)
Mimura(50)
Mimura(50)
Mimura(50)
Mimura(100)
Mimura(100)
Substitution
AI
AI
AI
AI
-
SQP
no
yes
no
yes
no
yes
no
yes
BC-bounds
none
none
none
none
quadratic
quadratic
none
none
ubd
[11, 14]
[11, 14]
[11, 14]
[11, 14]
[11, 14]
[11, 14]
[11, 14]
[11, 14]
ω
2
2
1
1
1
1
3
3
n or ñ
100
100
150
150
150
150
200
200
ǫsc
1.8e-1
4.1e-9
6.1e-1
5.1e-3
3.3e-1
1.0e-13
4.5e-2
2.0e-11
f0 (x)
-899
-701
-1067
-731
-1017
-719
-733
-712
tC
20
31
2
163
2
16
532
557
Table 2.13: Results for SDP relaxation for POP Mimura with lbd = [0, 0].
The POP ginzOrNeum(·) in Table 2.15 is another example where the global optimizer can be found
in a processing time reduced by a factor 100, if the lower bounds lbd and upper bounds ubd are chosen
sufficiently tight and SQP is applied. In Table 2.13 and Table 2.15 the first components of lbd and ubd
correspond to the lower and upper bounds for (x1 , . . . , x n2 ), respectively, whereas the second components
correspond to the lower and upper bounds for (x n2 +1 , . . . , xn ).
65
2.3. REDUCTION TECHNIQUES FOR SDP RELAXATIONS FOR LARGE SCALE POP
POP
ginzOrDiri(5)
ginzOrDiri(5)
ginzOrDiri(5)
ginzOrDiri(5)
ginzOrDiri(9)
ginzOrDiri(9)
ginzOrDiri(9)
ginzOrDiri(13)
ginzOrDiri(13)
StiffDiff(4,8)
StiffDiff(4,8)
StiffDiff(6,12)
StiffDiff(6,12)
Substitution
AI
AI
AI
AI
AI
AI
AI
SQP
no
yes
no
yes
no
no
yes
no
yes
yes
yes
yes
yes
ubd
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
5
5
5
5
ω
2
2
1
1
2
1
1
2
1
2
2
1
1
n or ñ
50
50
100
100
162
324
324
338
676
64
96
144
216
ǫsc
6e-6
4e-15
3e-1
4e-11
OOM
1e-1
6e-12
OOM
7e-9
2e-11
7e-10
4e-9
8e-10
f0 (x)
-25
-25
-100
-22
-324
-72
-158
-32
-32
-71
-71
tC
598
598
7
10
144
185
1992
54
4
1008
48
Table 2.14: Results for SDP relaxation for POP ginzOrDiri with lbd =0 and StiffDiff with lbd=0.
POP
ginzOrNeum(5)
ginzOrNeum(5)
ginzOrNeum(5)
ginzOrNeum(5)
ginzOrNeum(5)
ginzOrNeum(5)
ginzOrNeum(5)
ginzOrNeum(5)
ginzOrNeum(11)
ginzOrNeum(11)
ginzOrNeum(11)
Substitution
AI
AI
AI
AI
AI
AI
SQP
no
yes
no
yes
no
yes
no
yes
no
no
yes
lbd
[0, 0]
[0, 0]
[0, 0]
[0, 0]
[1, 0.5]
[1, 0.5]
[1, 0.5]
[1, 0.5]
[1, 0.5]
[1, 0.5]
[1, 0.5]
ubd
[4, 2]
[4, 2]
[4, 2]
[4, 2]
[4, 1.5]
[4, 1.5]
[4, 1.5]
[4, 1.5]
[4, 1.5]
[4, 1.5]
[4, 1.5]
ω
2
2
1
1
2
2
1
1
2
1
1
n
50
50
100
100
50
50
100
100
242
484
484
ǫsc
2.6
2e-13
24
8e-10
1e-1
2e-13
6e-2
4e-10
OOM
4e-2
5e-11
Table 2.15: Results for SDP relaxation for POP ginzOrNeum.
f0 (x)
-47
-45
-100
-45
-45
-45
-57
-45
tC
448
449
9
10
582
583
6
7
-263
-207
740
748
66
CHAPTER 2. SEMIDEFINITE PROGRAMMING AND POLYNOMIAL OPTIMIZATION
Chapter 3
SDP Relaxations for Solving
Differential Equations
3.1
Numerical analysis of differential equations
Differential equations arise in models of many problems in engineerings, physics, chemistry, biology or
economics. Only the simplest differential equations allow to find solutions given by explicit formulas. In
most problems involving differential equations self-contained formulas for the solutions are not available.
Therefore, one is interested in finding approximations to their solutions by applying numerical methods.
In general we distinguish ordinary differential equations (ODE), which are differential equations where the
unknown function is a function of a single variable, and partial differential equations (PDE), which are
differential equations where the unknown function is a function of multiple independent variables and the
equation involves its partial derivatives. Moreover, we distinguish linear and nonlinear differential equations.
A differential equations is linear if the unknown function and its derivatives appear to the power one and
nonlinear otherwise.
The beginning of numerical analysis of ODE dates back to 1850 when Adams formulas were proposed,
which are based on polynomial interpolation in equally spaced points. The idea is, given some initial value
problem with ODE u′ = f (t, u) for t > t0 and u(t0 ) = u0 , we choose a time step ∆t > 0 and consider a
finite set of time values
tn = t0 + n∆t, n ≥ 0.
We then replace the ODE by an algebraic expression that enables us to calculate a succession of approximate
values
vn ≈ u(tn ), n ≥ 0,
where the simplest such approximate formula dates back to Euler:
vn+1 = vn + ∆t f (tn , vn ) = vn + ∆tfn , fn := f (tn , vn ).
The Adams formulas are higher order generalizations of Euler’s formula that are far more efficient at
generating accurate approximate solutions. For instance, the fourth-order Adams-Bashworth formula is
vn+1 = vn +
1
∆t (55fn − 59fn−1 + 37fn−2 − 9fn−3 ) .
24
(3.1)
The formula (3.1) is fourth order in the sense that it will normally converge at the rate O (∆t)4 . The
second important class of ODE algorithms are the Runge-Kutta methods, which were developed at the
beginning of the twentieth century [50, 86]. The most commonly used member of the family of RungeKutta methods is the fourth-order Runge-Kutta method, which advances a numerical solution from time
67
68
CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS
step tn to tn+1 with the aid of four evaluations of the function f :
a
b
c
d
vn+1
= ∆t f (tn , vn ),
= ∆t f (tn + 21 ∆t, vn + 21 a),
= ∆t f (tn + 12 ∆t, vn + 21 b),
= ∆t f (tn + ∆t, vn + c),
= vn + 16 (a + 2b + 2c + d).
(3.2)
Another seminal step in the numerical analysis of ODE is the concept of stability due to Dahlquist [17]. He
introduced what might be called the fundamental theorem of numerical analysis:
consistency + stability = convergence.
This theory is based on precise definitions of these three notions. Consistency is the property that the
discrete formula has locally positive order of accuracy and thus models the right ODE. Stability is the property that discretization errors introduced at one time step cannot grow unboundedly at later time steps.
Convergence is the property that the numerical solution converges to the correct result as ∆t → 0.
When it comes to the numerical analysis of PDEs, we distinguish three main classes of methods: Finite
Difference Methods (FDM), Finite Element Methods (FEM) and Finite Volume Methods (FVM). The origin
of the Finite Difference Method dates back to the paper [13] of Courant, Friedrichs and Lewy. A finite
difference scheme is applied to formulate PDE problems as polynomial optimization problems in 3.2. We
give an introduction to the FDM in 3.1.1. The FEM dates back to the 1960s and will be briefly introduced
in 3.1.2. As in the case of ODEs, stability is a crucial issue in the numerical analysis of PDEs. The group
around von Neumann and Lax discovered that some finite difference methods for PDEs were subject to
catastrophic instabilities. The fundamental result linking convergence and stability of a finite difference
scheme is the Lax equivalence theorem [57].
Numerical methods for finding approximate solutions to PDEs have successfully offered insights for important and difficult examples of PDE problems: The Schrödinger equation in chemistry, elasticity equations
in structural mechanics, the Navier-Stokes equations in fluid dynamics, Maxwell’s equations in telecommunications, Einstein’s equations in cosmology, nonlinear wave equations in optical fibers, Black-Scholes
equations in option pricing, reaction-diffusion equations in biological systems. Because such a variety of
nonlinear PDE problems arises in many disciplines in science and engineering, which requires to solve largescale nonlinear algebraic systems, the numerical analysis of partial differential equations remains a very
challenging field.
3.1.1
The finite difference method
All discretization based methods for solving differential equations aim at algebraizing the differential equations. In the finite difference method (FDM) [19, 65, 96] the most important step to algebraize the equation
is achieved via replacing differentials by finite differences. In a first step the domain of the differential
equations needs to be discretized. Note, the FDM requires the geometry of the domain to be simple. We
will restrict ourselves to intervals [xmin , xmax ] in the one-dimensional case, and to rectangular domains
[xmin , xmax ] × [ymin , ymax ] in the two-dimensional case. We choose a discretization Nx or (Nx , Ny ), respec−ymin
−xmin
, hy = ymax
tively and define hx = xmax
Nx −1
Ny −1 ,
xi := xmin + (i − 1) hx , yj := ymin + (j − 1) hy ,
ui := u(xi ),
ui,j := u(xi , yj ),
(i = 1, . . . , Nx ; j = 1, . . . , Ny ),
(i = 1, . . . , Nx ; j = 1, . . . , Ny ),
(3.3)
where u denotes the unknown function in a differential equation. There are three choices to approximate
the first derivate ux at some point xi :
 u −u
i
i−1

(forward difference)
 hx ,
ui+1 −ui
(3.4)
ux (xi ) ≈
,
(backward difference)
h

 ui+1x−ui−1
,
(central
difference)
2hx
69
3.1. NUMERICAL ANALYSIS OF DIFFERENTIAL EQUATIONS
An approximation to uxx is derived by successively forming (ux )x :
uxx (xi ) ≈
ui+1 − 2ui + ui−1
.
h2x
(3.5)
Choosing these approximations the question arises what errors are inherent in substituting differentials by
finite differences. The following proposition provides a simple accuracy analysis.
Proposition 3.1 Let u be a three times continuously differentiable function on Ω = [xmin , xmax ], with
0 ∈ Ω. It holds,
a) |
ui+1 −ui−1
2hx
b) |
ui+1 −ui
hx
− ux (xi ) |≤ 12 hx maxx∈Ω | uxx (x) |,
c) |
ui −ui−1
hx
− ux (xi ) |≤ 12 hx maxx∈Ω | uxx (x) |,
d) |
ui+1 −2ui +ui−1
h2x
− ux (xi ) |≤ 16 h2x maxx∈Ω | uxxx (x) |,
− uxx (xi ) |≤
1 2
12 hx
maxx∈Ω | yxxxx (x) |.
Proof:
Without loss of generality set xi−1 := −h, xi := 0, xi+1 := h.
Using Taylor’s theorem we expand ui−1 and ui+1 around 0 and obtain
ui−1
ui+1
= ui − hx ux (xi ) +
= ui + hx ux (xi ) +
1 2
2! hx uxx (xi )
1 2
2! hx uxx (xi )
−
+
1 3
3! hx uxxx (ξ1 ),
1 3
3! hx uxxx (ξ2 ),
for some ξ1 ∈ [xi−1 , xi ],
for some ξ2 ∈ [xi , xi+1 ].
Subtracting these two equations yields
1
1 2
(ui+1 − ui−1 ) = ux (xi ) +
h [uxxx(ξ1 ) + uxxx(ξ2 )],
2hx
2 · 3! x
and implies a). Now, expand ui+1 around 0 with second order remainder:
ui+1 = ui + hx ux (xi ) +
1 2
h uxx (ξ1 ), for some ξ1 ∈ [xi , xi+1 ]
2! x
This equation implies b):
|
1
1
(ui+1 − ui ) − ux (xi ) |≤ hx max | uxx (x) | .
x∈Ω
hx
2
(3.6)
c) is shown analogously to b).
As for d), expand ui+1 and ui−1 around xi = 0 with fourth order remainder:
ui−1
ui+1
= ui − hx ux (xi ) +
= ui + hx ux (xi ) +
1 2
2! hx uxx (xi )
1 2
2! hx uxx (xi )
−
+
1 3
3! hx uxxx (xi )
1 3
3! hx uxxx (xi )
+
+
1 4
4! hx uxxxx (ξ1 ),
1 4
4! hx uxxxx (ξ2 ),
for some ξ1 ∈ [xi−1 , xi ],
for some ξ2 ∈ [xi , xi+1 ].
Addition of these two equations yields
1
ui+1 − 2ui + ui−1
− uxx (xi ) = h2x (uxxxx (ξ1 ) + uxxxx(ξ2 )) ,
h2x
4!
which implies d).
Proposition 3.1 implies, if uxxx(x) is bounded between xi−1 and xi+1 , the error replacing ux (xi ) by the
central difference scheme is of order h2x , whereas the error of using a forward or a backward difference scheme
is only O(hx ).
In addition to discretizing the domain of a differential equation and approximating its differentials by finite
difference schemes, one needs to take into account conditions for the unknown function on the boundary
70
CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS
∂Ω of the domain. The most common types of boundary conditions are Dirichlet and Neumann conditions.
Given a boundary point x ∈ ∂Ω, its boundary condition is called Dirichlet, if u(x) is fixed on ∂Ω, and it
orthogonal to ∂Ω is fixed on ∂Ω. We consider periodic
is called Neumann if the partial derivative ∂u(x)
∂n
boundary conditions as a third type, which is given, if values of u at the lower and upper end of its domain
are identified. For example, in the one-dimensional case with Ω = [xmin , xmax ], periodic boundary conditions
are given by u(xmin ) = u(xmax ). A differential equation equipped with boundary conditions on the entire
∂Ω is called a boundary value problem. In the one-dimensional case we have Nx and in the two-dimensional
case we have Nx Ny unknown variables ui and ui,j , respectively. Replacing the differential equation at each
interior grid point by its finite difference discretization generates Nx − 2 and (Nx − 2)(Ny − 2) equations,
respectively. By exploiting relations between the boundary and the interior variables given by Dirichlet,
Neumann or periodic boundary conditions and substituting them into the equations, we can reduce the
number of variables to Nx − 2 and (Nx − 2)(Ny − 2), respectively. Thus, the number of variables coincides
with the number of equations. In the case of a linear differential equation, finding the Nx Ny variables ui,j is
therefore equivalent to solving a system of linear equations. In the case of a nonlinear differential equation,
the far more challenging problem of a system of nonlinear algebraic equations needs to be solved.
As mentioned in the introduction of this section, in order to establish the convergence ui,j → u(xi , yj ) (1 ≤
i ≤ Nx , 1 ≤ j ≤ Ny ) for (Nx , Ny ) → ∞, the notions of consistency and stability are crucial. Let
D(u(x, y)) = f (x, y) ∀ (x, y) ∈ Ω,
B(u(x, y)) = g(x, y) ∀ (x, y) ∈ ∂Ω,
(3.7)
a differential equation, where D(·) and B(·) are differential operators and f, g functions on Ω. Applying a
finite difference discretization to (3.7) yields the system of equations
Di,j ((uk,l )k,l ) = fi,j
Bi,j ((uk,l )k,l ) = gi,j
∀ (i, j) ∈ {2, . . . , Nx − 1} × {2, . . . , Ny − 1},
∀ (i, j) ∈ {1, Nx } × {1, . . . , Ny } ∪ {1, . . . , Nx } × {1, Ny },
(3.8)
where fi,j := f (xi , yj ), gi,j := g(xi , yj ), Di,j and Bi,j finite difference approximations for the operators D
and B, respectively.
Definition 3.1 For the finite difference discretization (3.8) of the PDE problem (3.7) we define the local
truncation error ri,j as
ri,j := Di,j ((u(xk , yl ))k,l ) − fi,j ,
where (u(xk , yl ))k,l the vector of values of the exact solution u of (3.7) at the grid points (xk , yl ), which is
approximated by the solution (uk,l )k,l of (3.8). The finite difference equation (3.8) is consistent with the
original equation (3.7), if ri,j → 0 as (hx , hy ) → (0, 0).
Consistency is a prerequisite for ui,j to converge to u(xi , yj ) as (hx , hy ) → 0, but it is not sufficient. We
need to introduce the notion of stability of a difference scheme for that purpose:
Definition 3.2 A finite difference scheme Di,j ((uk,l )k,l ) = fi,j for a first order PDE is stable if there is
a J ∈ N and positive numbers hx0 and hy0 such that there exists a constant C for which
hx
Ny
X
l=1
2
| uk,l | ≤ Chx
Ny
J X
X
j=1 l=1
| uj,l |2
for k ∈ {1, . . . , Nx }, 0 < hx ≤ hx0 , and 0 < hy ≤ hy0 .
A finite difference scheme Di,j ((uk,l )k,l ) = fi,j for a PDE which is second order in x is stable if there is a
J ∈ N and positive numbers hx0 and hy0 such that there exists a constant C for which
hx
Ny
X
l=1
| uk,l |2 ≤ (1 + k 2 ) Chx
for k ∈ {1, . . . , Nx }, 0 < hx ≤ hx0 , and 0 < hy ≤ hy0 .
Ny
J X
X
j=1 l=1
| uj,l |2
3.1. NUMERICAL ANALYSIS OF DIFFERENTIAL EQUATIONS
71
For characterizing stability the following notion is useful.
Definition 3.3 An explicit finite difference scheme is any scheme that can be written as
uk+1,l = a finite sum of ur,s with r ≤ k.
A nonexplicit scheme is called implicit.
In general, it is stability of a finite difference scheme which requires some effort to show, whereas consistency is straightforward. The oldest and most famous criterion for stability is the Courant-Friedrichs-Lewy
condition:
Theorem 3.1 For an explicit scheme for the hyperbolic PDE defined by
ux + a uy = 0
(3.9)
of the form uk+1,l = αuk,l−1 +βuk,l +γuk,l+1 , a necessary condition for stability is the Courant-FriedrichsLewy (CFL) condition,
hx
|≤ 1.
|a
hy
Proof: [13]
Moreover, Courant, Friedrichs and Lewy derived the general result:
Theorem 3.2 There are no explicit, unconditionally stable, consitent finite difference schemes for hyperbolic systems (3.9) of partial differential equations.
There is no general negative result like Theorem 3.2 for implicit schemes. Thus, from a stability point of
view, it is advisable to choose central or backward difference approximations for the first order derivates.
The result which links consistency, stability and convergence is the Lax-Richtmyer Equivalence Theorem:
Theorem 3.3 A consistent finite difference scheme for a linear partial differential equation of first or
second order for which the initial value problem is well-posed is convergent if and only if it is stable.
Proof: [96]
Thus, convergence of a finite difference scheme is usually proven by showing stability. There is no general
convergence result for finite difference schemes of nonlinear partial differential equations. However, for
certain classes of PDEs convergence of the FDM has been shown:
Theorem 3.4 Let a parabolic PDE problem be given by
a(x, y) uxx + d(x, y) ux − uy + f (y, u) = 0
ux (0, y) = ux (1, y)
=0
u(x, 0)
= u0 (x)
∀ (x, y) ∈ (0, 1) × (0, T ),
∀ y ∈ (0, T ),
∀ x ∈ (0, 1).
If a, ax , ay , b, bx and by continuous in [0, 1] × [0, T ], there exists an a0 s.t. a ≥ a0 in [0, 1] × [0, T ], f ,
fu and fuu continuous in [0, T ] × R, there exists M0 ∈ R s.t. ∂f /∂u ≤ M0 in [0, T ] × R, some technical
conditions hold and the solution u of the PDE problem is smooth, then
(ui,j (Nx , Ny ))i,j converges uniformly to u in [0, 1] × [0, T ] as (Nx , Ny ) → ∞.
Proof: Theorem 2.1 in [97].
Note, Theorem 3.4 can be extended to the case of arbitrary rectangular domains and arbitrary Dirichlet
and Neumann conditions at xmin and xmax . To proof convergence of classes of elliptic or hyperbolic PDE
problems is far more difficult and no result for a broad class of problems like Theorem 3.4 has been found
in those cases.
Some simple accuracy analysis of finite difference approximations was provided in Proposition 3.1, where
we have seen that the accuracy of central differences is better than the one of forward and backward
72
CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS
differences. On the other hand, it is a well known phenomenon that central difference approximations for
the first derivatives may cause oscillations in the numerical solution of a boundary value problem [65]. These
oscillations do not occur under forward or backward difference schemes. However, we have also discussed
that a forward finite difference scheme may not be stable, whereas implicit finite difference schemes are
unconditionally stable for many classes of PDE problems. Thus, when choosing a difference scheme we have
to consider accuracy, stability and how to avoid oscillations. It depends on the PDE problem which finite
difference scheme is ”the best” one. Therefore, numerous difference schemes have been proposed, which
may use different difference approximations on certain segments of the domain, may use different difference
approximations in the different dimensions, may depend of the type of the PDE or may depend of the
type of boundary conditions. Detailed analysis of these issues for a variety of finite difference schemes for
different classes of PDE problems is discussed in detail in [96].
To summarize, when solving a linear or nonlinear ODE or PDE problem with the FDM, the three main
tasks are 1) to choose a finite difference scheme whose solutions are accurate approximations for solutions of
the PDE problem, 2) to show convergence of the difference scheme, and 3) to solve the (nonlinear) algebraic
system of equations (3.8). To solve a system of nonlinear equations is a very hard problem in general. As
mentioned in 2.1 solving a system of nonconvex polynomial equations is NP-hard. A standard method to
solve the system of algebraic equations resulting from a finite difference approximation of a nonlinear PDE
is to solve the system of equations corresponding to the linear part of the PDE, and to take the solution
of the linear system as a starting point for gradient type methods, Newton’s method or other iterative
methods applied to the nonlinear system or to a system where the nonlinear part is successively increased.
The eventual success of such a continuation type method is base on the assumption that the solution of
nonlinear system does not change much if the nonlinear part is increased by a small factor. In this thesis we
attempt problem 3) by reformulating (3.8) as a polynomial optimization problem (POP) and solving this
POP by sparse semidefinite programming relaxation techniques introduced in Chapter 2. One of the main
advantages of this approach is the fact, that we do not require any initial guess and the nonlinearity of the
scheme is taken into account directly. This is presented in detail in 3.2.
3.1.2
The finite element method and other numerical solvers
The finite element method
In addition to the finite difference method which our technique to be proposed in 3.2 is based on, we briefly
introduce alternative methods for solving a PDE problem numerically. The most important one is the finite
element method (FEM). The FEM is a very active field of research and there is exhaustive literature on
it. For example, see [11, 59, 106] for detailed introductions and more advanced studies. The FEM is based
on the idea to approximate a solution u of a PDE problem by a function ũ which is an element of a finitedimensional subspace of the function space u belongs to. I.e., the FEM can be understood as a method
which discretizes the space we are searching for solutions of a PDE problem, whereas in the FDM the PDE
is discretized. The origins of the FEM date back to works of Rayleigh [83], Ritz [84] and Galerkin [23] at
the beginning of the 20th century. The FEM in its modern formulation is due to Courant [16] and Turner,
Clough, Martin and Topp [99], among others. Typically, a FEM approach to solve a PDE consists of the
following steps. First of all one is looking for a solution u for a PDE problem defined on a domain Ω in a
certain function space. The most common function space to this end is the Sobolov space H0s (Ω) ⊂ Ls (Ω).
Given that function space, one replaces the PDE problem by a weak, variational formulation where the
test functions are elements of the same space H0s (Ω). In a second step known as meshing the domain Ω is
partitioned into a finite number of subdomains of simple geometry, which are called elements. We denote
such a partition by T . In the one-dimensional case intervals, in the two-dimensional case triangles, and
in the three-dimensional case tetrahedra are a common choice for the elements. These elements define a
mesh for Ω with nd nodes. Then, in a third step H0s (Ω) is approximated by the nd -dimensional subspace
which is spanned by functions f1 , . . . , fnd . A common choice for this basis are for instance piece-wise linear
functions fi which equal one at the node i and are 0 at all other nodes. The larger nd , i.e., the finer we
choose the mesh, the better approximates span(f1 , . . . , fnd ) the space H0s (Ω). When replacingP
H0s (Ω) by
nd
span(f1 , . . . , fnd ) in the weak formulation of the PDE problem and approximating u by ũ = i=1
di fi ,
73
3.1. NUMERICAL ANALYSIS OF DIFFERENTIAL EQUATIONS
ones obtains a finite number of equations in the unknowns d1 , . . . , dnd . Solving this system of equations
yields the numerical approximation ũ for a solution of the original PDE problem. Finally, as for the Finite
Difference Method, convergence of a finite element discretization needs to be shown, i.e., ones has to show
that span(f1 , . . . , fnd ) converges to a subspace dense in H0s (Ω) if the number nd of nodes in the mesh and
the corresponding number of basis functions goes to infinity. One of the biggest advantages of the FEM is its
sound mathematical basis. As the PDE is formulated as a variational problem one has lots of powerful tools
from functional analysis at hand to proof convergence of a finite element discretization. Let us demonstrate
the outlined procedure on the following, simple ODE problem. Find u ∈ H02 (Ω) such that
u′′ (x)
u(0)
= g(x)
= u(1) = 0.
on x ∈ Ω := [0, 1],
(3.10)
Its weak formulation is given by
Z
0
1
u′′ (x) φ(x)dx =
Z
0
1
g(x) φ(x)dx
∀ φ ∈ H02 (Ω),
(3.11)
a partition is given by T h := {xi := i h | i ∈ {0, . . . , h1 }} for any h > 0 with h1 ∈ N. With this partition the
nodes of the mesh are given by x1 , . . . , x h1 −1 , i.e., nd = h1 − 1. We define the finite dimensional subspace
Vh = span(f1 , . . . , fnd ) of H02 (Ω) via the basis
Pnd functions fi : Ω → R with fi (xj ) := δi,j and fi linear on
each interval (xj , xj+1 ). Then, let ũ :=
i=1 di fi satisfy the finite-dimensional relaxation of the weak
formulation (3.11):
R 1 Pnd
R1
( i=1 di f ′′ i (x)) φ(x) dx = 0 g(x)φ(x) dx
0
R
R
Pnd
1 ′′
1
⇔
fj (x) dx = 0 g(x)fj (x) dx
i=1 di 0 f i (x)
R
1
dj−1 −2dj +dj+1
⇔
= 0 g(x)fj (x) dx
h
∀ φ ∈ Vh
∀ j ∈ {1, . . . , nd }
∀ j ∈ {1, . . . , nd }
(3.12)
which is a system of linear equations in d1 , . . . , dnd . Solving it provides a numerical approximation ũ to a
weak solution u of ODE (3.10). If we moreover assume g(x) = g constant on [0, 1], we obtain the system of
equations
dj−1 − 2dj + dj+1
= g ∀ j ∈ {1, . . . , nd }.
(3.13)
h2
With the definition of the basis functions fi it is clear, in this example holds di = f (xi ) for all i ∈ {1, . . . , nd }.
Thus, (3.13) is actually identical to the system of equations we obtain when approximating (3.10) by a
finite difference scheme. However, this connection with finite difference methods does not hold in general.
The Finite Element Method provides the user with a great deal of freedom, such as how to choose the
finite dimensional subspace to approximate H0s (Ω) or the mesh for Ω, and for most other finite element
discretizations there is no equivalent finite difference scheme.
When comparing the FEM to the FDM, we already mentioned its sound basis from functional analysis as one
main advantage. Another one is, it is highly flexible for different domains of PDE problems: complicated
geometries can be dealt with easily, whereas the FDM is restricted to relatively simple geometries. On
the other hand, the FDM is far easier to implement than the FEM for many PDE problems arising in
applications. However, in both methods one of the greatest challenges is to solve a system of nonlinear
algebraic equations, in the case they are applied to a system of nonlinear differential equations. In general
it is highly dependent on the PDE problem to be solved numerically which method, FDM or FEM, provides
a better approximation to the continuous world when choosing a similar discretization.
Other numerical solvers
Beside the finite difference method and the finite element method another important class of methods to
solve differential equations numerically is the finite volume method (FVM). Similar to the FDM, values
are calculated at discrete points on a mesh. In the FVM these values are derived from calculating volume
integrals over small volumes around the node points of the mesh. The FVM applies the divergence theorem
74
CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS
to convert volume into surface integrals, and it exploits, that a flux entering a given, small volume is equal
to that leaving the volume. Each integration over a volume results in an algebraic equation. Thus, as in
the FDM and FEM a system of algebraic equations needs to be solved to obtain a numerical solution. The
FVM is popular for solving hyperbolic PDE problem, in particular in computational fluid dynamics. For a
detailed introduction, see [58].
The Spectral method [27] is a class of techniques involving the use of the Fast Fourier Transform. It is
suitable for PDE problems with very smooth solutions and provides highly accurate approximations for
these problems. It is based on replacing the unknown function in the PDE by its Fourier series to get
a system of ODEs in the time-dependent coefficients of the Fourier series. The spectral method and the
FEM share the idea to approximate the solution of a PDE by a linear combination of basic functions. The
difference being these basic functions in the Spectral method are nonzero over the entire domain, whereas
the basic functions of the FEM are nonzero on small subdomains only. For this reason, the spectral method
can be understood as a global approximation approach, whereas the FEM constitutes a local approximation
approach.
There are numerous other methods, such as multigrid methods, domain decomposition methods, level-set
methods or meshfree methods. As PDE problems arise from very different settings and applications, it is
highly dependent on a particular PDE which numerical method is most suitable for providing an accurate
approximation. In all numerical methods a potentially hard system of algebraic equations need to be solved.
In the next section we will attempt this problem for the FDM by sparse semidefinite programming relaxation
and polynomial optimization techniques.
Remark 3.1 We have seen, in each discretization-based method for nonlinear PDEs a system of algebraic
equations needs to be solved. The classical tool for a system of nonlinear equations is Newton’s method,
which converges locally quadratic. However, Newton’s method requires a starting point close to a solution of
the system, in order to converge. For difficult nonlinear problems it may be a very challenging problem to
find a good initial guess. There are various techniques to find a good initial guess for Newton’s method or
other locally fast convergent techniques. One is to apply gradient methods or other first order techniques to
find a rough approximation of a solution. Another one is to apply some homotopy-like continuation method,
where the nonlinear problem is linearized and a solution of the linear problem is taken as initial point for a
problem where the weight of the nonlinear part is increased incrementally. Finally, in many problems partial
information of the solution, which may be obtained by numerical simulation, is utilized to get a sufficiently
close initial guess. We will show in the following that SDP relaxation for polynomial problems are very
useful to find a good initial guess for locally fast convergent methods.
3.2
3.2.1
Differential equations and the SDPR method
Transforming a differential equation into a sparse POP
In the previous section we gave an introduction to the numerical analysis of differential equations. In
Chapter 2 we introduced polynomial optimization problems and their semidefinite programming (SDP)
relaxations. In this section we will see how to transform a problem involving differential equations into
a polynomial optimization problem (POP), in order to apply SDP relaxations to solve these differential
equations numerically. For discretizing a differential equation we choose the Finite Difference Method
(FDM). The FDM has the advantage of being easy to implement. Moreover, applying the FDM to a
differential equation yields a sparse POP, as we will show in this section. In the following we will restrict
ourselves to discussing the two-dimensional case. However, the procedures for differential equations with
domains of different dimension are derived analogously. Recall, a general differential equation is given by
D(u(x, y)) = f (x, y) ∀ (x, y) ∈ Ω,
B(u(x, y)) = g(x, y) ∀ (x, y) ∈ ∂Ω,
(3.14)
75
3.2. DIFFERENTIAL EQUATIONS AND THE SDPR METHOD
where D(·) and B(·) are differential operators and f, g functions on Ω := [xmin , xmax ]×[ymin, ymax ]. Applying
the FDM yields the system of equations
Di,j ((uk,l )k,l ) = fi,j
Bi,j ((uk,l )k,l ) = gi,j
∀ (i, j) ∈ {2, . . . , Nx − 1} × {2, . . . , Ny − 1},
∀ (i, j) ∈ {1, Nx } × {1, . . . , Ny } ∪ {1, . . . , Nx } × {1, Ny },
(3.15)
where fi,j := f (xi , yj ), gi,j := g(xi , yj ), Di,j and Bi,j finite difference approximations for the operators D
and B, respectively. We assume the operators D and B are both polynomial in u(·, ·) and its derivates,
which implies (3.15) is a system of polynomial equations. Note, we can reduce the dimension of this system
of equations by exploiting the boundary conditions. In their most basic form Dirichlet, Neumann and
periodic boundary conditons at xmin are given by
u1,j = g1,j ,
u1,j = g1,j hx + u2,j
and
u1,j = uNx ,j ,
(3.16)
respectively. If we substitute the ui,j corresponding to the boundary grid points in (3.15) by the terms given
in (3.16), the number of variables in each direction is reduced by 2 in the case of Dirichlet and Neumann
boundary conditions, or by 1 in the case of periodic boundary conditions. Thus, (3.15) is reduced to a
system of n equations in n variables,
D̂i,j ((uk,l )k,l ) − fˆi,j = 0
∀ (i, j) ∈ {1, . . . , N̂x } × {1, . . . , N̂y },
(3.17)
where n := N̂x N̂y . For instance under Neumann condition in x direction and periodic condition in y
direction, n is given by n = N̂x N̂y = (Nx − 2)(Ny − 1). For simplicity of notation we will denote a solution
of the original PDE problem (3.14) as u(·, ·) and a solution (uk,l )k,l of (3.17) as the vector u ∈ RN̂x N̂y , with
u = (u1,1 , . . . , u1,N̂y , u2,1 , . . . , uN̂x ,N̂y ).
As (3.15) is polynomial in the variable u, so is (3.17). We attempt to solve this system of equations by
transforming it into a POP of type,
min p(u)
s.t. gi (u) ≥ 0 ∀i = 1, . . . , m,
hj (u) = 0 ∀j = 1, . . . , k.
(3.18)
Note, (3.18) is a special case of (2.1), as an equality constraint h(x) = 0 is equivalent to the pair of inequality
constraints h(x) ≥ 0 and −h(x) ≥ 0. Given a PDE problem (3.14), we take (3.17) derived from it as system
of equality constraints for an optimization problem. Moreover, we choose lower and upper bounds of type
(2.25),
lbdi,j ≤ ui,j ≤ ubdi,j
∀ (i, j) ∈ {1, . . . , N̂x } × {1, . . . , N̂y }.
(3.19)
Choosing an objective function
To derive a POP, it remains to choose an objective function F that is polynomial in u as well. The choice of
an appropriate objective function is dependent on the PDE problem we are aiming to solve. In case there
is at most one solution of the PDE problem, we are interested in the feasibility of the POP we construct.
Thus any objective is a priori acceptable for that purpose. However, the accuracy of obtained solutions may
depend on the particular objective function. In the case the solution of the PDE problem is not unique,
the choice of the objective function determines the particular solution to be found. The objective function
may correspond to a physical quantity a solution needs to optimize. For instance, in problems in fluid
dynamics one is often interested in finding a solution of minimal kinetic energy. For such a problem we
obtain the objective function by discretizing the kinetic energy function. A large class of PDEs which occur
in many applications can be written as Euler-Lagrange equations. A typical case is a stable state equation
of reaction-diffusion type. In this case, a canonical choice is a discretization of the corresponding energy
integral as in 3.3.1. Another case is optimal control: A finite difference discretization of state and control
constraints yields the feasible set and a discretization of the optimal value function yields the objective. We
discuss numerical examples for optimal control problem in 3.3.6.
76
CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS
Additional polynomial constraints
We mentioned above, it is crucial to add lower and upper bounds (3.19) for each ui,j when constructing
a POP from a differential equation. When choosing these bounds care has to be taken. A choice of lbd
and ubd which is too tight may exclude solutions of (3.17) from the feasible set of the POP, while a choice
which is too loose may cause inaccurate results. In addition to those constraints we may impose further
inequality or equality constraints
gl (u) ≥ 0 and hj (u) = 0,
(3.20)
respectively, with gl , hj ∈ R[u]. Constraints (3.20) can be understood as restrictions for the admissible
space of functions we search for solutions of a differential equation. One possibility to obtain such bounds
is derived by constraining the partial derivatives. We call bounds of this type variation bounds. For the
derivative in x−direction they are given by
|
∂u(xi , yj )
|≤ M
∂x
∀i ∈ {2, . . . , Nx − 1}, ∀j ∈ {2, . . . , Ny − 1}.
(3.21)
Expression (3.21) can be transformed into polynomial constraints easily. Another possibility is to impose
bounds in the spirit of [22] like (2.80), (2.81), (2.82) and (2.83) introduced in 2.3. Add
≤ ubdk,l ui.j + lbdi,j uk,l − lbdi,j ubdk,l
≤ lbdk,l ui,j + ubdi,j uk,l − ubdi,j lbdk,l .
us,t
us,t
(3.22)
for each constraint us,t = ui,j uk,l in the POP, and for each constraint us,t = u2i,j we add
us,t
≤ (ubdi,j + lbdi,j ) ui,j − lbdi,j ubdi,j .
(3.23)
If there is no quadratic constraint of this type for (i, j, k, l) or (i, j), respectively, we may add the quadratic
constraints
ui,j uk,l ≤ ubdk,l ui,j + lbdi,j uk,l − lbdi,j ubdk,l
(3.24)
ui,j uk,l ≤ lbdk,l ui,j + ubdi,j uk,l − ubdi,j lbdk,l
for (i, j) 6= (k, l) and the constraint
u2i,j
≤ (ubdi,j + lbdi,j ) ui,j − lbdi,j ubdi,j
(3.25)
for (i, j) = (k, l). Note, by the construction of the constraints (3.22) - (3.25), they shrink the feasible set of
an SDP relaxation for a POP, but they do not change the feasible set of the POP. I.e., they may be added
to improve the numerical accuracy of SDP relaxations for solving the POP, but they have no impact on the
space of functions we are searching for discrete approximations to a solution for a differential equation.
A POP derived from a differential equation
If we take together all constraints and the chosen objective function, we obtain the following POP:
min
s.t.
F (u)
Di,j (u)
gl (u)
hk (u)
lbdi,j
= fi,j
≥0
=0
≤ ui,j ≤ ubdi,j
∀ (i, j) ∈ {1, . . . , N̂x } × {1, . . . , N̂y },
∀ l ∈ {1, . . . , s},
∀ k ∈ {1, . . . , m},
∀ (i, j) ∈ {1, . . . , N̂x } × {1, . . . , N̂y }.
(3.26)
Every feasible solution u of (3.26) is a solution of the finite difference scheme for the PDE problem (3.14).
Let demonstrate how to derive (3.26) for an example.
Example 3.1 Consider the nonlinear elliptic PDE
uxx (x, y) + uyy (x, y) + λu(x, y) 1 − u(x, y)2
=0
u(x, y) = 0
0 ≤ u(x, y) ≤ 1
∀ (x, y) ∈ [0, 1]2 ,
∀ (x, y) ∈ ∂[0, 1]2 ,
∀ (x, y) ∈ [0, 1]2 ,
(3.27)
3.2. DIFFERENTIAL EQUATIONS AND THE SDPR METHOD
77
where the parameter λP
is set to λ = 22. We apply the standard finite difference discretization for N̂ = N̂x =
N̂y , choose F (u) = − 1≤i,j≤N̂ ui,j as objective function, and obtain the POP
min
s.t.
−
P
ui,j
= 0 ∀ (i, j) ∈ {1, . . . , N̂ }2 ,
(ui+1,j + ui,j+1 + ui,j−1 + ui−1,j − 4ui,j ) + 22ui,j 1 − u2i,j
0 ≤ ui,j
≤ 1 ∀ (i, j) ∈ {1, . . . , N̂ }2 ,
(3.28)
where u0,k = uk,0 = uN̂+1,k = uk,N̂+1 = 0 for all k ∈ {1, . . . , N̂ }. The choice for F is motivated by the fact,
that (3.27) is known to have one strictly positive and the trivial solution. The optimal solution of (3.28) is
a discrete approximation to the strictly positive solution of (3.27).
1
h2x
1≤i,j≤N̂
Correlative sparsity
In 3.2.2 we will introduce a method to solve (3.26) by sparse SDP relaxations. In order to apply this method
efficiently, we need to show (3.26) satisfies a structured sparsity pattern. To show correlative sparsity is
straight-forward:
Proposition 3.2 Let two differential operators D1 and D2 be given by
D1 (u) := a(u)uxx + c(u)uyy + d(u)ux + e(u)uz + f˜(u)
and
D2 (u) := a(u)uxx + b(u)uxy + c(u)uyy + d(u)ux + e(u)uz + f˜(u),
where a(·), . . . , f˜(·) polynomial in the function u(·). Let N̂ = N̂x = N̂y and n := N̂ 2 Let F be linear function
in u. Let nz (R) denote the number of nonzero entries in the CSP matrix R of the POP (3.26). Then,
nz (R) ≤ 13n,
if (3.26) derived from (3.15) with D := D1 , and
nz (R) ≤ 25n,
if (3.26) derived from (3.15) with D := D2 , for any choice of B and f . This implies, (3.26) is correlatively
sparse in both cases.
Proof:
As F linear, the objective function does not cause any nonzero entries in R by Definition 2.5. Due to
the finite difference discretization at most 12 unknown uk,l can occur in some equality constraint with a
particular unknown ui,j for D = D1 , as pictured in Figure 3.1. Hence the maximum number of nonzero
elements in the row of R corresponding to ui,j is 13, which implies nz (R) ≤ 13n. With the same argument
holds nz (R) ≤ 25n for D = D2 ; see Figure 3.1. These bounds are tight; they are attained in the case of
periodic conditions for x and y.
Let R′ denote the n × n matrix corresponding to the graph G(N, E ′ ), which is a chordal extension of CSP
graph G(N, E). For the computational efficiency it is also useful to know whether R′ is sparse or not.
nz (R′ ) depends on the employed ordering method P for R, which is used to avoid fill-ins in the symbolic
sparse Cholesky factorization LLT of the ordered matrix P RP T . R′ is constructed as R′ = L + LT . We
examine two different methods of ordering R, the symmetric minimum degree (SMD) ordering and
reverse Cuthill-McKee (RCM) ordering. See [24] for details about these orderings.
78
CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS
ui-1,j+1
ui+1,j+1
ui-2,j
ui,j
ui,j
Figure 3.1: uk,l involved in some constraint with ui,j for D = D1 (left) and D = D2 (right).
300
600
250
500
200
400
n (R’)/ n
150
nz(R’)/ n
nz(R’)/ n
z
n (R’)/ n
z
300
100
200
50
100
0
0
2
4
6
8
n
Figure 3.2:
10
12
14
0
16
4
x 10
nz (R′ )
n
0
2
4
6
n
8
10
12
4
x 10
for SMD (left) and RCM (right) ordering if D = D1 .
We conduct some numerical experiments, in order to estimate the behavior of nz (R′ ). Figure 3.3 shows
′
)
obtained by the SMD and RCM
examples of R′ after SMD and RCM ordering, and Figure 3.2 shows nz (R
n
orderings for the n × n-matrix R, respectively, for D = D1 and Dirichlet or Neumann condition in x and
′
′
)
)
periodic condition in y. For n ∈ [100, 160000] it holds nz (R
≤ 300 for SMD ordering and nz (R
≤ 600 for
n
n
′
)
RCM ordering, respectively. The behavior of nz (R
may suggest nz (R′ ) = O(n) for both ordering methods.
n
Hence we expect the sparse SDP relaxations to be efficient for solving (3.26) in numerical experiments.
However, since the constants 300 and 600 are large, we can not always expect a quick solution of the sparse
SDP relaxation.
Domain-space sparsity
In 2.2 we introduced the concept of domain-space and range-space sparsity of an optimization problem
with matrix variables. Moreover, in 2.2.7 we constructed some linear SDP relaxations for quadratic SDP
exploiting this sparsity. A QOP is a special case of a quadratic SDP. As all constraints in a QOP are
scalar, it does not satisfy a range-space sparsity pattern. Thus, if we transform (3.26) into a QOP, we can
apply the relaxations (a) or (b) from 2.2.7 to find approximate solutions for (3.26). In order to apply these
relaxations efficiently, the question arises, whether QOPs derived from (3.26) satisfy domain-space sparsity.
For certain classes of PDE problems, we obtain the following sparsity results.
79
3.2. DIFFERENTIAL EQUATIONS AND THE SDPR METHOD
0
0
0
50
50
50
100
100
100
150
150
150
200
200
200
250
250
250
300
300
300
350
350
350
400
400
0
50
100
150
200
250
300
350
400
0
50
100
150
200
250
300
350
400
400
0
50
100
150
200
250
nz = 27344
300
350
400
Figure 3.3: R (left), R′ obtained by SMD (center) and RCM (right) orderings for D = D1 with n = 400.
Example 3.2 Given a rectangular domain Ω, f : Ω → R and differential operator B(·). Define the
operators
D1 (u) := auxx + cuyy + dux + euy + gu + hu2 ,
D2 (u) := auxx + cuyy + dux + euy + gu + hu3 ,
with a, c, d, e, g, h : Ω → R. Moreover, choose F linear in u when constructing (3.26) for a discretization
n = N̂x N̂y . In the case D = D1 (3.26) is a quadratic SDP, in the case D = D2 we need to apply the method
from 2.3 to (3.26) to transform it into a quadratic SDP. In fact, for the example with D = D2 there is
an unique way to transform the POP into an QOP by defining n variables vi,j := u2i,j . Then, it is easy to
see that the domain-space sparsity patterns of the quadratic SDP corresponding to (3.26) for D = D1 and
D = D2 , respectively, is given by Figure 3.4.
0
0
0
5
100
100
10
200
200
300
300
400
400
500
500
600
600
15
20
25
30
35
700
700
40
800
0
5
10
15
20
25
nz = 121
30
35
40
0
100
200
300
400
nz = 3201
500
600
700
800
800
0
100
200
300
400
nz = 3201
500
600
700
800
Figure 3.4: d-space sparsity pattern of (3.26) for D = D1 (lef t), and D = D2 before (center) and after
(right) reordering of rows and columns.
Note, for the two cases in Example 3.2 the number of nonzero entries in every row but the first and
last one of the domain-space sparsity pattern matrix is less or equal to two and three, respectively. This
is considerably smaller than the upper bound 13 for the number of nonzero entries in each row of the
correlative sparsity pattern matrix provided by Proposition 3.2. Likewise the size of an average maximal
clique of the chordal extension of the domain-space sparsity pattern graph is far smaller than the one of
the chordal extension of the correlative sparsity pattern graph. Thus, the primal SDP relaxation (b) from
2.2.7 for an QOP derived from a PDE of one of the two classes in Example 3.2 is far smaller than the
sparse SDP relaxation (2.18) of relaxation order ω = 1 for the same QOP, and can be solved for much finer
discretizations. However, in the primal SDP relaxation (b) there is no relaxation order we can increase,
80
CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS
and there is no general result how well the primal SDP relaxation approximates a QOP. The sparse SDP
relaxations (2.18) provide a sequence of SDPs whose minima converge to the optimum of a QOP. In fact,
their approximation accuracy is improving monotonously for increasing order ω. Therefore, as we will see
in numerical examples in 3.3, the primal SDP relaxation (b) is only useful for QOPs, where the solution of
the primal SDP relaxation is a good approximation of an optimal solution of the QOP.
3.2.2
The SDPR method
In this section we introduce the method to solve the POP (3.26) derived from a PDE problem (3.14) to
obtain discrete approximations to solutions of (3.14). In order to derive (3.26) from a PDE problem, we
need to choose a discretization (Nx , Ny ), the bounds lbd and ubd, the objective F , if not given by the PDE
problem, and possibly additional polynomial constraints gl and hk . If (3.26) is a POP of degree three or
larger, we can either apply the sparse SDP relaxations (2.18) for some relaxation order ω, or we apply one
of the heuristics AI, AII, BI, BII from 2.3.1 to transform (3.26) into a QOP. To the QOP we apply either
the sparse SDP relaxations (2.18) with relaxation order ω = 1 or the primal SDP relaxation (b) exploiting
domain-space sparsity from 2.2.7. Solving the SDP relaxations for the POP or the QOP, we obtain a first
approximation û to an optimal solution of (3.26). The solution û can be used as an initial guess for locally
fast convergent methods. One possibility is to apply sequential quadratic programming (SQP) [8] to (3.26),
another one is to apply Newton’s method for nonlinear systems [77] to (3.17), both starting from û, in
order to obtain a more accurate discrete approximation u to a solution of the PDE problem (3.14). This
procedure is called the semidefinite programming relaxation (SDPR) method for solving a PDE
problem of type (3.14), it is summarized in the following chart:
Method 3.1 (SDPR method)
I. Given a PDE problem (3.14), choose (Nx , Ny ), lbd, ubd, F , gl and hk to derive (3.26).
II. If POP (3.26) of degree three or larger, we may apply AI, AII, BI or BII to transform it into a QOP.
Then, apply sSDP1 (2.18) or relaxation (b) from 2.2.7 to this QOP. Denote the first n components of
the solution vector of the applied SDP relaxation as û.
III. If POP (3.26) of degree three or larger which has not been transformed into a QOP, choose relaxation
order ω ≥ ωmax , apply sSDPω (2.18) to that POP and obtain its solution û.
IV. Apply Newton’s method to (3.17) or SQP to (3.26), both with initial guess û, and obtain u as an
approximate to an optimal solution of (3.26) and as a discrete approximation for a solution of the
PDE problem (3.14).
Recall, when applying the SDP relaxation (b) from 2.2.7 to a QOP, we impose the additional constraints
(2.84) as explained in Remark 2.9. For locally fast convergent methods we consider SQP and Newton’s
method, which we describe briefly in the following. However, we are by no means restricted to these two for
choosing an iterative method which is fast convergent towards an highly accurate discrete approximation
of a solution to (3.17) when starting from a guess close to the accurate approximation.
Newton’s method
The discretized PDE (3.17) is a special case of the problem
r(x) = 0,
where r : Rn → Rn , r(x) = [r1 (x), . . . , rn (x)]T and ri : Rn → R are smooth functions for all i ∈ {1, . . . , n}.
The functions ri may be nonlinear in x. The basic form of Newton’s method for solving nonlinear equations
3.2. DIFFERENTIAL EQUATIONS AND THE SDPR METHOD
81
is given by
NEWTON
Choose x0 ;
for k = 0, 1, 2, . . .
Calculate a solution pk to the Newton equations
J(xk )pk = −r(xk );
xk+1 = xk + pk ;
end (for)
This algorithm is motivated by the multidimensional Taylor’s theorem.
Theorem 3.5 Suppose that r : Rn → Rn is continuously differentiable in some convex open set D and that
x and x + p are vectors in D. We then have that
Z 1
J(x + tp)pdt.
r(x + p) = r(x) +
0
J(x + tp) is the Jacobian of r at x + tp, it is defined as
J(x) =
∂rj
∂xi
i,j=1,...,n


∇r1 (x)T


..
=
.
.
T
∇rn (x)
We define a linear model Mk (p) of r(xk + p) given in Theorem 3.5, i.e., we approximate the second term
on the right-hand-side by J(x)p, and write
Mk (p) = r(xk ) + J(xk )p.
The vector pk = −J(xk )−1 r(xk ) satisfies Mk (pk ) = 0. It is equivalent to the solution pk of the Newton
equations in the NEWTON algorithm. As shown in [77], if x0 close to a nondegenerate root x⋆ and r
continuously differentiable, then the sequence (xk )k of Newton’s method converges superlinearly to x⋆ . If r
furthermore Lipschitz continuously differentiable, the convergence is quadratic.
Sequential Quadratic Programming
The POP (3.26) is a special case of the nonlinear programming problem
min f (x)
s.t. h(x) = 0,
g(x) ≤ 0,
(3.29)
where f : Rn → R, h : Rn → Rm and g : Rn → Rp three times continuously differentiable. The basic idea of
SQP is to model (3.29) at a given approximate solution xk by a quadratic program, and to use the solution
to this subproblem to construct a better approximation xk+1 to the solution of (3.29). This method can
be viewed as the natural extension of Newton’s method to the constrained optimization setting. It shares
with Newton’s method the property of rapid convergence, when the iterates are close to the solution, and
possible erratic behavior, when the iterates are far from the solution. SQP has two key features: First,
it is not a feasible point method, its iterates xk+1 do not need to be feasible for (3.29). Second, in each
iteration of an SQP approach a quadratic program is to be solved, which is not too demanding since highly
efficient procedures for quadratic programs, i.e., programs with quadratic objective and linear constraints,
exist. Let the Lagrange function of (3.29) be given by L(x, u, v) and a quadratic subproblem by
min
s.t.
∇f (xk )T (x − xk ) + 21 (x − xk )T Bk (x − xk )
∇h(xk )T (x − xk ) + h(xk ) = 0,
∇g(xk )T (x − xk ) + g(xk ) ≤ 0,
(3.30)
82
CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS
where Bk an approximation of the Hessian of the Lagrangian function at the iterate (xk , uk , vk ). Then, an
SQP approach in its most basic form is given by the algorithm
SQP
Choose (x0 , u0 , v0 ), B0 , and a merit function φ
for k = 0, 1, 2, . . .
Form and solve (3.30) to obtain its optimal solution (x⋆ , u⋆ , v ⋆ ).
Choose step length α so that φ(xk + α(x⋆ − xk )) < φ(xk ).
xk+1 = xk + α(x⋆ − xk ),
Set uk+1 = uk + α(u⋆ − uk ),
vk+1 = vk + α(v ⋆ − vk ).
Stop if (xk , uk , vk ) satisfies some convergence criterion.
Compute Bk+1 from (xk , uk , vk ).
end (for)
A merit function is a function whose reduction implies progress towards the global optimum of problem
(3.29). For more details about SQP see [8].
Grid-refining method
In order to guarantee, that a discretized PDE problem (3.15) is a good approximation of (3.14), i.e., its
solutions are good discrete approximations of continuous functions u(·, ·), it is necessary to choose fine grid
discretizations (Nx , Ny ). However, a fine grid discretization results in a large scale POP (3.26). Even when
exploiting correlative or domain-space sparsity, transforming it into a QOP and imposing tight lower and
upper bounds, the SDP relaxation, which needs to be solved, is often computationally demanding - in
particular in the cases where we need to choose a high relaxation order to obtain an accurate approximation
to an optimal solution of (3.26). Thus, for many difficult PDE problems the SDP relaxations resulting from
fine grid discretizations are intractable for current SDP solvers. To overcome this problem, we consider a
grid-refining method. In our grid-refining method a solution obtained by applying the SDPR method
to (3.14) for a coarse grid discretization in a first step is extended stepwise to finer and finer grids by
subsequently interpolating coarse grid solutions and applying the SDPR method or locally convergent
methods. This method is described by the following scheme:
Step 1 - Initialize
Step 2 - Extend
Step 3a
Step 3b
Iterate
Apply SDPR method with Nx (1), Ny (1), F1 , lbd1 , ubd1 , g(1), h(1), ω1 .
obtain u1
Nx (k) = 2 Nx (k − 1) − 1
or
Ny (k) = 2 Ny (k − 1) − 1
⋆
Interpolation of uk−1
obtain uk−1 .
Apply SDPR method with , Nx (k), Ny (k), Fk , lbdk , ubdk , g(k), h(k), ωk .
Apply Newton’s method or SQP.
obtain uk
Step 2 and Step 3
Step 1 - SDPR method: Choose an objective function F1 (u), a discretization grid size (Nx (1), Ny (1)),
lower bounds lbd1 , upper bounds ubd1 and an initial relaxation order ω1 . Apply SDPR with these parameters to (3.14) and obtain a solution u1 .
Step 2 - Extension: Extend the (k-1)th iteration’s solution uk−1 to a finer grid. Choose either x- or
y-direction as the direction of refinement, i.e. choose either Nx (k) = 2 Nx (k − 1) − 1 and Ny (k) = Ny (k − 1),
or Nx (k) = Nx (k −1) and Ny (k) = 2 Ny (k −1)−1. In order to extend uk−1 to the new grid with the doubled
number of grid points, assume without loss of generality the direction of extension is x. The interpolation
⋆
of the solution uk−1 to uk−1 is given by the scheme
3.2. DIFFERENTIAL EQUATIONS AND THE SDPR METHOD
k−1
u2i−1,j
k−1 ⋆
u2i,j
⋆
=
=
83
k−1
ui,j
∀i ∈ {1, . . . , Nx (k − 1)}, ∀j ∈ {1, . . . , Ny (k)},
k−1
k−1
1
∀i ∈ {1, . . . , Nx (k − 1) − 1}, ∀j ∈ {1, . . . , Ny (k)}.
2 ui+1,j + ui,j
⋆
The interpolated solution uk−1 is a first approximation to a solution of POP (3.26) for the Nx (k) × Ny (k)grid.
Step 3a - Apply SDPR method: We choose new parameters Fk , lbdk , ubdk , ωk , g(k) and h(k). It
is based on the idea, that we may be able to choose ωk < ωk−1 , if we exploit information given by the
⋆
interpolated solution uk−1 . One possibility to do this, is to choose the new objective function Fk = FM ,
k
where FM
is defined by
X
k−1 ⋆ 2
k
(ui,j − ui,j
) .
(3.31)
FM
(u) =
i,j
We may choose this objective function as we are interested in finding a feasible solution of (3.26) with
⋆
minimal Euclidean distance to the interpolated solution uk−1 . Another possibility to utilize the information
k−1 ⋆
given by u
is to tighten the lower and upper bounds by
n
o
k−1
k−1 ⋆
lbdki,j = max lbdi,j
, ui,j
−δ
∀ i, j,
n
o
k−1
k
k−1 ⋆
ubdi,j = min ubdi,j , ui,j + δ
∀ i, j,
for some δ > 0. Apply the SDPR method to obtain uk .
Step 3b - Apply Newton’s method or SQP: It may occur the SDP relaxations of (3.26) for the finer
grid become intractable, even if ωk < ωk−1 . Therefore, we may just apply Newton’s method to (3.17) or
⋆
SQP to (3.26) for the finer discretization (Nx (k), Ny (k)), both starting with the interpolated solution uk−1
as initial guess, to obtain a better approximate solution u.
The steps 2 and 3 are repeated until an accurate solution for a high resolution grid is obtained.
The SDPR method with all its options and the grid-refining method are demonstrated on a variety of
numerical examples in 3.3.
3.2.3
Enumeration algorithm
The SDPR method aims at finding a discrete approximation to a solution of a PDE problem. The freedom
to choose an objective function for detecting particular solutions of a PDE problem is a feature of the
SDPR method which is particularly interesting for a PDE problem with many solutions. Beside finding
a particular solution of a PDE problem, another challenging problem is to find all solutions of a PDE
problem. The problem of finding discrete approximations for all solutions of a PDE problem (3.14) is the
problem of finding all real solutions of the system of polynomial equations (3.17). Classical methods to
solve this problem are Gröbner basis or polyhedral homotopy method, which we describe briefly at the end
of this section. They compute all complex solutions to (3.17) and it remains to choose the real solutions
among them. A recent method which avoids computing all complex solutions and directly computes all real
solutions is given by [55] for the case the solution set of (3.17) is finite. Another method is the extraction
algorithm presented in [36] for finding all optimal solutions of (3.26). However, both methods, [55] and [36]
do not exploit sparsity in (3.17) and (3.26), respectively, which restricts their applicability to small- and
medium-scale systems.
In the following we present an algorithm to enumerate all solutions of a system of polynomial equations first
proposed in [63] for the cavity flow problem, but which can be extended to a more general case of (3.17).
For this method we need to assume the number of solutions of (3.17) is finite, i.e. the feasible set of (3.26) is
finite. We also assume that all feasible solutions of (3.26) are distinct with respect to the objective function
F , i.e., there is no pair of feasible solutions with identical objective value. The SDPR method enables us
to approximate the global minimal solution u⋆ =: u(1)⋆ of (3.26). Beside the minimal solution, we are also
interested in finding the solution u(2)⋆ with the second smallest objective value, the solution u(3)⋆ with the
third smallest objective value or in general the solution u(k)⋆ with the kth smallest objective value. Based
84
CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS
on the SDPR method we propose an algorithm that enumerates the solutions of (3.17) with the k smallest
objective values. Our algorithm shares the idea of separating the feasible set by additional constraints
with Branch-and-Bound and cutting plane methods that are used for solving mixed integer linear programs
and general concave optimization problems [38]. In contrast to the linear constraints of those methods we
impose quadratic constraints to separate the feasible set.
Algorithm 3.1 Find the approximations to the solutions of (3.17) with the k smallest objective values:
Given u(k−1) , the approximation to the solution with the (k − 1)th smallest objective value obtained by
applying the SDPR method with relaxation order ω to POPk−1 from the (k − 1)th iteration of the algorithm.
1. Choose ǫk > 0.
2. Choose a vector bk ∈ {0, 1}n.
3. Add the following quadratic constraints to POPk−1 and denote the resulting POP with smaller feasible
set as POPk .
(k−1) 2
(3.32)
) ≥ ǫk for all j with bj = 1.
(uj − uj
4. Apply the SDPR method with relaxation order ω to POPk . Obtain an approximation u(k) for u(k)⋆ .
5. Iterate steps 1–4.
The idea of Algorithm 3.1 is to impose an additional polynomial inequality constraint (3.32) to the POP
(3.26) in iteration k, that excludes the previous iteration’s solution u(k−1) from the feasible set of (3.26).
In the case the feasible set of (3.26) is finite and u(k−1) is sufficiently close to u(k−1)⋆ , the new constraint
excludes u(k−1)⋆ from the feasible set of (3.26) and u(k)⋆ is the new global minimizer of (3.26). Of course,
there are various alternatives to step 3 in Algorithm 3.1, in order to exclude u(k−1)⋆ from the feasible set of
the POP. One alternative constraint is
(k−1)⋆
un+i − ǫi = 0 for all i with bi = 1,
(3.33)
ui − ui
n
where b ∈ {0, 1} , ǫi > 0 and un+i an additional slack variable bounded by −1 and 1. It is easy to see that
(3.33) is violated, if u = u(k−1)⋆ . However, it turned out that the numerical performance of (3.33) is inferior
to the one of (3.32) for problems of type (3.26) as the tuning parameters ǫi and b is far more difficult for
(3.33) compared to (3.32). A second alternative to exclude u(k−1)⋆ are lp -norm constraints such as
k u − u(k−1)⋆ kp =
n p
X
(k−1)⋆
ui − ui
i=1
! p1
≥ ǫ,
(3.34)
for p ≥ 1. The disadvantage of the constraints (3.34) is, they destroy the correlative sparsity of (3.26), as
all ui (i = 1, . . . , n) occur in the same constraint. Therefore the advantage of the sparse SDP relaxations
is lost and the POP can not be solved efficiently anymore. These observations justify to impose (3.32) as
additional constraints in Algorithm 3.1. We obtain the following results for Algorithm 3.1.
Proposition 3.3 Let (u(1) , . . . , u(k−1) ) be the output of the first (k − 1) iterations of Algorithm (3.1). If
this output is a sufficiently close approximation of the vector (u(1)⋆ , . . . , u(k−1)⋆ ) of the (k − 1) solutions
with smallest objective value, and if the feasible set of POP (3.26) is finite and distinct in terms of the
n
objective, i.e. F (u(1)⋆ ) < F (u(2)⋆ ) < . . ., then there exist b ∈ {0, 1} and ǫ ∈ Rn such that the output u(k)
of Algorithm 3.1 (for kth iteration) satisfies
u(k) (ω) → u(k)⋆
when ω → ∞.
(3.35)
3.2. DIFFERENTIAL EQUATIONS AND THE SDPR METHOD
85
n
Proof:
As each u(j) is in a neighborhood of u(j)⋆ for all j ∈ {1, . . . , k − 1}, we can choose b ∈ {0, 1}
and a vector ǫ ∈ Rn , s.t.
2
(j)⋆
(j)
< ǫi ,
∀j ∈ {1, . . . , k − 1} ∃i with bi = 1 s.t. ui − ui
and for each j ∈ {1, . . . , k − 1} holds
2
(l)⋆
(j)
ui − ui
≥ ǫi ∀l ≥ k ∀i with bi = 1.
Let POP(k) denote (3.26) with the k systems of additional constraints given by step 3 in Algorithm 3.1,
where the kth constraints are given by (3.32) for the constructed b and ǫ. Then it holds
n
o
feas POP(k) = feas (3.26) \ u(1)⋆ , . . . , u(k−1)⋆ .
Thus, u(k)⋆ is the global minimizer of POP(k) and the global minimum is F (u(k)⋆ ). As the bounds (3.19)
guarantee the compactness of the feasible set, it holds with the convergence theorem for the sparse SDP
relaxations [52], if ω → ∞,
(3.36)
u(k) (ω) → u(k)⋆ .
Although we have proven convergence, the capacity of current SDP solvers restricts the choice of the
relaxation order ω to small integers, typically ω = ωmax + 1 or ω = ωmax + 2. Moreover, we need to choose
the parameters ǫ and b appropriately, to obtain good approximations of the k feasible solutions with the
smallest objective value. In the numerical experiments in 3.3.5, we see the Gröbner basis method is an
useful tool to tune the two parameters ǫ and b, as it allows to confirm whether we derive the k solutions of
smallest objective value successfully in case (Nx , Ny ) is small. In the following we briefly describe Gröbner
basis method and Polyhedral Homotopy Continuation as methods to test the SDPR method and to tune
the parameters in Algorithm 3.1.
Gröbner basis method
The Gröbner basis method to find all complex solutions of a given system of zero dimensional polynomial
equations is an useful tool for tuning the parameters of the SDPR method and Algorithm 3.1, and for
validating its numerical results. In order to do this, we study (3.17) by the rational univariate representation
[85], [78], which is a variation of the Gröbner basis method, for coarse discretizations (Nx , Ny ). For a mesh
with N := Nx = Ny small (for instance N = 5 in the example in 3.3.5), (3.17) is solvable with this method
(Groebner(Fgb) in Maple 11, nd gr trace and tolex gsl in Risa/Asir). Applying Gröbner basis method to
solve (3.17) for a problem satisfying the assumptions of Proposition 3.3, and enumerating all solutions by
their objective value allows us to confirm whether the solutions of the SDPR method are indeed the minimal
solutions of (3.17) and to determine which relaxation order ω is sufficient to derive this global minimizer.
The result is also used to tune parameters ǫki in Algorithm 3.1. We have no theorem which states that
the tuning based on the coarse mesh case is good for the fine mesh case. However, we believe this tuning
provides a better approximation for the fine mesh case, too. Note, whereas the Gröbner basis method finds
all complex solutions of (3.17), the SDPR method finds the real solution of (3.17) that minimizes F .
Polyhedral Homotopy Continuation Method
Another recent approach for solving (3.17) is the Polyhedral Homotopy Continuation Method for polynomial
systems [39]. Consider the problem to find all isolated zeros of a system of n polynomials
f (x) = (f1 (x), . . . , fn (x)) = 0
in a n−dimensional complex vector variable x = (x1 , . . . , xn ) ∈ Cn . The idea of homotopy continuation
methods is to define a smooth homotopy system with a continuation parameter t ∈ [0, 1],
h(x, t) = (h1 (x), . . . , hn (x)) = 0,
86
CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS
using the algebraic structure of the polynomial system. The homotopy system is constructed, such that
the solutions of the starting polynomial system h(x, 0) = 0 can be computed easily and that the target
polynomial system h(x, 1) = 0 coincides with the system f (x) = 0 to be solved. Furthermore, every
connected component of the solutions (x, t) ∈ Cn × [0, 1) of h(x, t) = 0 forms a smooth curve. The number
of homotopy curves that are necessary to connect the isolated zeros of the target system to isolated zeros
of the starting system determines the computational work involved in tracing homotopy curves.
A recent software package to determine all complex isolated solutions of f (x) = 0 is PHoM by Gunji et al.
[31]. Thus, we may apply PHoM to find the complex solutions of a discretized PDE problem pi,j (u) = 0 for
all (i, j). Then, we select the real among all complex solutions of (3.17) and compare those solutions to the
solutions obtained by the SDPR method. Beside its property of finding all complex solutions, PHoM has
the drawback, that the computation time grows exponentially in the dimension n of the system. This is due
to the fact, the number of isolated solutions increases exponentially in n. Therefore we are restricted to very
coarse meshes with n ≤ 10 when applying PHoM to (3.17). As the Gröbner base method, the polyhedral
homotopy method can be used for tuning the parameters in Algorithm 3.1, too.
3.2.4
Discrete approximations to solutions of differential equations
In 3.1.1 we discussed, if a finite difference scheme is convergent and the discretization (Nx , Ny ) is chosen
sufficiently fine, then a solution of (3.17) is a discrete approximation to a solution of the differential equation
(3.14). However, there is no theorem proving the convergence of finite difference schemes for general classes
of nonlinear differential equations. In the case of many nonlinear PDE we can not guarantee that a solution
of (3.17) is indeed a discrete approximation of a solution of (3.14).
Definition 3.4 A solution of (3.17) that is not a discrete approximation of a solution of the PDE problem
(3.14) is called a fake solution.
In numerical experiments our main indicator for a solution u of (3.17) not being a fake solution is if we
succeed to extend u from a coarse grid to finer and finer grid via the grid refining method.
Another property of solution to (3.17) is the notion of stability.
Definition 3.5 Let J(·) denote the Jabobian of (3.17) and me (·) its maximal eigenvalue. A solution u to
(3.17) is called stable, if all eigenvalues of J(u) are non-positive, i.e., if me (u) ≤ 0. If not, it is called
unstable.
Distinguishing stable and unstable solutions is of interest for certain classes of nonlinear PDE problems. In
3.3.3 we will discuss Reaction-Diffusion equations as an example of such a class.
3.3
Numerical experiments
In 3.2.2 we introduced the SDPR method for computing discrete approximations to solutions of differential
equations. In this section we demonstrate the broad scope of this method by applying it to problems
involving nonlinear differential equations which arise from a wide range of fields. It is indeed possible to find
highly accurate approximations to many nonlinear differential equations by techniques based on sparse SDP
relaxations. For our numerical experiments we apply the software SparsePOP [103] as an implementation
of the sparse SDP relaxations (2.18). For applying domain-space sparsity conversion methods we use
SparseCoLO [20], and as an SQP based solver we utilize the MATLAB Optimization toolbox routine
fmincon. As SDP solver for SparsePOP or the primal SDP relaxation from (2.2.7) we apply SeDuMi [95].
For an eventual transformation from POP into a QOP we use one of the heuristics AI, AII, BI or BII from
2.3. When applying the SDPR method to obtain an approximate solution for (3.26), the most important
measure of accuracy of its solution u is the scaled feasibility error ǫsc defined by
hk (u)
gl (u)
Di,j (u) − fi,j
|, − |
|, min
,0
∀ i, j, k, l ,
ǫsc = min − |
σi,j (u)
σk (u)
σ̂l (u)
87
3.3. NUMERICAL EXPERIMENTS
where σi,j (u), σk (u) and σ̂l (u) the maxima over all monomials in the corresponding enumerator polynomials.
Note, ǫsc (u) does measure how accurate u approximates a feasible solution of (3.26), but it does not measure
how accurate the finite difference scheme approximates the continuous problem (3.14). Another question is,
how well F (u) approximates the minimum of (3.26). In the case we choose the primal relaxations (a) and
(b) from 2.2.7 or the dual relaxations (2.18) for some linear objective F , F (u) approximates the minimum
of (3.26) very accurately, if the feasibility error of u is small. But in case F is nonlinear and we choose the
dual relaxations, we have to consider the optimality error ǫobj defined by
ǫobj =
| min(sSDPω ) − F (u) |
max {1, | F (u) |}
as a measure for the optimality of u. Recall, when applying the SDPR method to solve a differential
equation, the most important choices are: The objective function F , bounds lbd and ubd, relaxation order
ω in the case of dual relaxations (2.18), Newton’s method or SQP as locally fast convergent method, possibly
additional constraintss hk and gl . Finally, ones needs to decide whether to apply the grid-refining method
from 3.2.2 with extension strategy 3a or 3b.
To evaluate the results of the SDPR method for approximating the solutions of differential equations, we
consider (a) to apply the SDPR method to PDE problems where an analytical solution is known, and
(b) to compare the performance of the SDPR method with the performance of MATLAB Optimization
Toolbox, a general purpose solver based on the Finite Element Method. Finally, we may apply Gröbner
basis computation or the Polyhedral Homotopy Method to verify that the SDPR method provides accurate
approximations to feasible solutions of (3.26), as mentioned in 3.2.3. Moreover, J(u) denotes the Jacobian
of (3.17) at u and me (u) its largest eigenvalue.
All numerical experiments are conducted on a LINUX OS with CPU 2.4 GHz and 8 Gb memory. The total
processing time in seconds is denoted as tC .
3.3.1
A nonlinear elliptic equation with bifurcation
A well known yet interesting nonlinear elliptic PDE problem, which we have already seen in Example 3.27,
is given by
uxx (x, y) + uyy (x, y) + λu(x, y) 1 − u(x, y)2
= 0 ∀ (x, y) ∈ [0, 1]2 ,
u(x, y) = 0 ∀ (x, y) ∈ ∂[0, 1]2 ,
(3.37)
0 ≤ u(x, y) ≤ 1 ∀ (x, y) ∈ [0, 1]2 ,
where λ ≥ 0. In fact, this PDE is known as the Allen-Cahn Equation. It was shown in [93], there exists
a unique nontrivial solution for this problem if λ > λ0 = 2π 2 ≈ 19.7392, and there exists only the trivial
zero solution if λ ≤ λ0 . Due to the bifurcation at λ0 , homotopy-like continuation methods, which start from
a solution of a system with weak non-linearity to attempt the system with strong non-linearity,
Pcannot be
applied to solve (3.37). We fix λ = 22 and apply the SDPR method with ω = 2 and F (u) = − i,j ui,j . In
order to study the efficiency of the various options of the SDPR method, we consider different settings: Dual
SDP relaxations with and without additional local solver for the POP derived from (3.37), dual and primal
SDP relaxations for a QOP equivalent to the POP, the grid-refining method starting from a coarse grid
solution, tight and loose upper bounds. The numerical results are given in Table 3.1 and pictured in Figure
3.5. When applying dual SDP relaxations to the original POP, we observe that the solution provided by
the SDPR method is very accurate even in the case no additional local method is used. Moreover, the size
of the SDP relaxations and thus the computational cost increases rapidly for increasing Nx and Ny . One
way to address this problem is to apply the transformation from POP into an equivalent QOP. Both, the
primal and dual SDP relaxations are substantially smaller than the dual SDP relaxations for the original
POP. But, ubd needs to be tightened when applying SDP relaxations to the QOP, in order to preserve
numerical accuracy. Note, for this problem d-space sparsity is richer than correlative sparsity, as the size of
the corresponding SDP relaxations and the resulting tC are smaller. The most efficient mean to obtain high
resolution approximations to a solution of (3.37) is the grid-refining method. However, the grid-refining
method relies on the assumption that the behavior of the discretized system does not change much for
increasing Nx and Ny , which is not the case when starting from a very coarse discretization in general.
88
CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS
Therefore, SDP relaxations for the equivalent QOP are a promising tool to attempt a PDE problem on a
higher resolution grid directly. Finally, we notice that the solution of the sparse SDP relaxation remains a
sufficiently good initial guess for SQP, even if its feasibility error ǫsc is not that small.
SDP relaxation
Dual
Dual + Grid-refining 3b
Dual
Dual
Dual + Grid-refining 3b
Dual
Dual + Grid-refining 3b
Dual
Dual
Primal
Primal
Primal
Local solver
SQP
SQP
none
SQP
SQP
SQP
SQP
SQP
SQP
SQP
none
SQP
POP to QOP
no
no
no
no
no
no
no
no
yes
yes
yes
yes
ubd
0.99
0.99
0.99
0.99
0.99
0.99
0.99
0.99
0.45
0.45
0.6
0.6
n
16
1521
81
81
1521
49
225
196
196
196
841
841
Nx
6
41
11
11
41
9
17
16
16
16
31
31
Ny
6
41
11
11
41
9
17
16
16
16
31
31
ǫsc
-1e-14
-1e-13
-1e-10
-4e-15
-1e-13
-9e-15
-1e-9
-5e-11
-8e-15
-5e-4
-5e-14
tC
3
426
418
420
1016
39
49
OOM
107
35
1133
3763
Table 3.1: SDPR method results for (3.37), where OOM stands for ’Out of Memory’.
0.4
0.4
0.5
0.35
0.4
u(x,y)
0.25
0.2
0.2
0.1
u(x,y)
0.3
0.3
0.5
0.35
0.4
0.3
0.3
0.25
0.2
0.2
0.1
0.15
0.15
0
1
0.1
0.8
0.8
0.6
0.4
y
0.05
0.4
0.2
0.1
0.8
1
0.6
0
1
1
0.6
0.8
0
0
0.6
0.4
y
x
0.05
0.4
0.2
0.2
0.2
0
0
x
Figure 3.5: Solution for (3.37) in case λ = 22 for (Nx , Ny ) = (6, 6) and (Nx , Ny ) = (41, 41).
To examine whether the obtained solution is the only strictly positive one of the discretized PDE, we
impose additional constraints
| ux (x, y) | ≤ M
| uy (x, y) | ≤ M
∀ (x, y) ∈ [0, 1]2 .
(3.38)
We apply the SDPR method with ω = 2 and Nx = Ny = 6 to (3.37) for λ = 22 under the additional constraints (3.38). For M > 0.8 we detect the positive solution obtained before. If we decrease M sufficiently,
we obtain the zero solution. Hence, it seems there exists exactly one positive non-trivial solution to the
discretization of (3.37), and this solution converges to the strictly positive solution of (3.37) for Nx , Ny → ∞.
As another way to confirm the accuracy of the SDPR method, we take advantage of a further property
of (3.37). It was shown in [15], a function u : [0, 1]2 → R that is a minimizer of the optimization problem
2
R
u4
u
2
2
u
+
u
−
2λ
dx dy
−
minu: [0,1]2 →R
2
x
y
2
4
[0,1]
(3.39)
s.t.
u=0
on ∂[0, 1]2 ,
0≤u≤1
on [0, 1]2 ,
89
3.3. NUMERICAL EXPERIMENTS
is a solution to (3.37). The integral to be minimized in this problem is called the energy integral. By
discretizing (3.39) via a finite difference scheme it can be transformed into a POP analogously to a PDE of
form (3.14). In opposite to (3.26) that we derive from (3.14), the objective function F is not of free choice
but canonically given by the discretization of the objective function in (3.39). We apply the SDPR method
with relaxation order ω = 2 to (3.37) and (3.39) on a 6 × 6- and a 11 × 11- grid and obtain an identical
solution for both problems. These results are reported in Table 3.2, where ∆u, given by
∆u = max | ui,j − ûi,j |,
i,j
evaluates the deviation of the SDPR method solutions for both problems; ui,j denotes the SDPR solution
to (3.37) and ûi,j the SDPR solution to (3.39).
Problem
(3.37)
(3.39)
(3.37)
(3.39)
Nx
6
6
11
11
Ny
6
6
11
11
tC
3
2
418
98
ǫobj
2e-14
1e-10
4e-15
2e-10
ǫsc
-1e-14
-9e-15
-
∆u
2e-6
2e-6
9e-7
9e-7
Table 3.2: SDPR results for (3.37) and (3.39).
The solutions to both problems are highly accurate and we note that the total computation time to
minimize the energy integral is less than the time required to solve the polynomial optimization problem
corresponding to (3.37).
Finally, we compare the numerical performance of the SDPR method for (3.37) to existing solvers for
nonlinear PDE problems. We apply the nonlinear solver from Matlab PDE toolbox to (3.37). The Matlab
solver is FEM based and requires an initial guess to search for a solution of the PDE problem. When
starting from the zero-solution or a number of random positive functions, this solvers detects the trivial
solution. Even when choosing u0 with u0 (x, y) := 0.43 sin(πx) sin(πy) on [0, 1]2 as initial guess, the FEM
solver detect the trivial solution not the non-trivial
one, although u0 is close to the non-trivial solution (On
P
41 × 41-grid: max | u − u0 |= 0.006, 4112 i,j | ui,j − u0i,j | = 0.003). Although the FEM solver finds the
trivial solution in less than 60 seconds on a mesh of much higher resolution (67356 nodes, 134204 triangles)
than those solved by the SDPR method, it needs a very good initial guess to find the more interesting,
non-trivial solution. It is the advantage of the SDPR method, that no initial guess is required to find a
accurate approximation of the strictly positive solution of (3.37).
3.3.2
Illustrative nonlinear PDE problems
A problem in Yokota’s text book
Simple ODE problems can be solved by the SDPR method with ease. To demonstrate this, consider the
easy solvable nonlinear boundary value problem
ü(x) + 81 u(x)u̇(x) − 4 − 14 x3
u(1)
u(3)
10 ≤ u(x)
=0
∀ x ∈ [1, 3],
= 17,
= 43
3 ,
≤ 20 ∀ x ∈ [1, 3].
(3.40)
For details about problem (3.40) see [105]. Applying the SDPR method with ω = 2, objective function F ,
defined by
Nx
X
ui ,
F (u) =
i=1
and without using a locally fast convergent method, yields the highly accurate, stable solution, that is
documented in Table 3.3 and pictured in Figure 3.6.
90
CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS
Nx
200
ǫsc
2e-9
ǫobj
-4e-11
me
-3
tC
104
Table 3.3: Numerical results for problem (3.40).
17
16
u
u(x)
15
14
13
12
11
1
1.2
1.4
1.6
1.8
2
x
2.2
2.4
2.6
2.8
3
Figure 3.6: SDPR method solution u for problem (3.40).
Nonlinear wave equation
As an example of a hyperbolic PDE problem we study time periodic solutions of the nonlinear wave
equation
−uxx + uyy + u (1 − u) + 0.2 sin(2x) = 0
∀ (x, y) ∈ [0, π] × [0, 2π],
u(0, y) = u(π, y) = 0
u(x, 0) = u(x, 2π)
−3 ≤ u(x, y) ≤ 3
∀ y ∈ [0, 2π],
∀ x ∈ [0, π],
(3.41)
∀ (x, y) ∈ [0, π] × [0, 2π].
As far as we have checked on the mathsci data base, there is no mathematical proof of the existence of
periodic solution of this system. However, the SDPR method finds some periodic solutions. We observed
the POP corresponding to problem (3.41) has various solutions. Therefore, the choice of the objective
determines the solution found by the sparse SDP relaxation. We consider the functions
X
X
ui,j ,
σi,j ui,j ,
F2 (u) =
F1 (u) =
i,j
i,j
as objective for (3.26), where σi,j (i = 1, . . . , Nx , j = 1, . . . , Ny ) are random variables that are uniformly
distributed on [−0.5, 0.5]. We apply the SDPR method with ω = 2 and Newton’s method as a local solver.
The results are enlisted in Table 3.4 and pictured in Figures 3.7 and 3.8.
Imposing Variation bounds: Beside choosing different objective functions in the SDPR method, a
second possibility to detect other solutions of a PDE is to impose additional constraints polynomial in the
unknown functions. In 3.2.1 we introduced variation bounds (3.21) to restrict the space of functions we are
searching for solutions. For (3.41) we impose the variation bounds
| uy (x, y) |≤ 0.5
∀ (x, y) ∈ (0, π) × (0, 2π).
(3.42)
91
3.3. NUMERICAL EXPERIMENTS
SDP relaxation
Dual
Dual + Grid-refining 3b
Dual
Dual + Grid-refining 3b
objective
F1
F1
F2
F2
Nx
5
33
5
33
Ny
6
40
5
33
ǫsc
-4e-10
-3e-8
-3e-11
-5e-10
tC
151
427
19
86
Table 3.4: SDPR method results for (3.41).
2.5
2
2
1.5
1
0.5
u(x,y)
u(x,y)
1.5
1
0.5
0
−0.5
0
−1
−0.5
8
−1.5
8
6
4
6
3
4
4
3
4
2
2
y
2
2
1
0
0
y
x
1
0
0
x
Figure 3.7: Solutions of (3.41) by SDPR method with objective F1 .
If the SDPR method is applied to (3.41) with additional condition (3.42) and F1 as objective, another
solution to the PDE problem is obtained, which is documented in Table 3.5 and pictured in Figure 3.9.
Thus several solutions of (3.41) are detected by choosing different objective functions (3.26) and by imposing
additional polynomial inequality constraints.
SDP relaxation
Dual
Dual + Grid-refining 3b
objective
F1
F1
Nx
5
17
Ny
6
21
ǫsc
-3e-15
-3e-14
tC
437
1710
Table 3.5: SDPR method results for (3.41) under additional constraint (3.42).
A system of elliptic nonlinear PDEs
As a PDE problem of two unknown functions in two variables, consider the following problem, where we
distinguish two types of boundary conditions, Case I (Dirichlet condition) and Case II (Neumann condition).
uxx + uyy + u 1 − u2 − v 2 = 0,
vxx + vyy + v 1 − u2 − v 2
= 0,
0 ≤ u, v
≤ 5.
∀ (x, y) ∈ [0, 1]2 .
(3.43)
92
CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS
2.5
2
2
1.5
1.5
u(x,y)
u(x,y)
2.5
1
0.5
1
0.5
0
8
0
8
6
4
6
3
4
4
3
4
2
2
2
1
0
y
2
0
1
0
y
x
0
x
Figure 3.8: SDPR solutions of (3.41) with objective F2 .
0.05
0.06
0.04
u(x,y)
u(x,y)
0.02
0
0
−0.02
−0.05
8
−0.04
8
6
4
6
3
4
4
3
4
2
2
y
2
2
1
0
0
y
x
1
0
0
x
Figure 3.9: SDPR solutions of (3.41) under additional constraint (3.42) with objective F1 .
Case I:
u(0, y)
u(x, 0)
v(x, 0)
or
Case II:
ux (0, y)
uy (x, 0)
vx (0, y)
vy (x, 0)
= 0.5y + 0.3 sin(2πy),
= 0.4x + 0.2 sin(2πx),
= v(x, 1)
u(1, y)
= 0.4 − 0.4y
u(x, 1)
= 0.5 − 0.5x
= v(0, y) = v(1, y) = 0
∀ y ∈ [0, 1],
∀ x ∈ [0, 1],
∀ x ∈ [0, 1],
∀ y ∈ [0, 1].
= −1,
= 2x,
= 0,
= −1,
ux (1, y)
uy (x, 1)
vx (1, y)
vy (x, 1)
∀ y ∈ [0, 1],
∀ x ∈ [0, 1],
∀ y ∈ [0, 1],
∀ x ∈ [0, 1].
In both cases, we choose F (u, v) =
P
i,j
=1
= x + 5 sin( πx
2 )
=0
=1
ui,j for the SDPR method.
Case I. We apply the SDPR method with both primal and dual SDP relaxations exploiting sparsity. Also,
as the degree of this PDE problem is three, we can apply the POP to QOP transformation to reduce the
size of the SDP relaxations. Finally we apply the grid refining method to extend the coarse grid solutions
to finer grids. When applying the dual SDP relaxations to the POP, lbd = 0 and ubd = 5 are given by
(3.43). The bound ubd is tightened to ubd = 0.6 for the primal and dual SDP relaxations of the QOP in
order to obtain accurate solutions. The numerical results of the SDPR method with ω = 2, and SQP as
93
3.3. NUMERICAL EXPERIMENTS
a local solver are reported in Table 3.6. We observe, exploiting d-space sparsity and applying the primal
SDP relaxations is very efficient for this problem. In fact, the d-space sparsity is richer than the correlative
sparsity, as a comparison of the total computation time of primal and dual SDP relaxations for the QOP
derived from (3.43) for Nx = Ny = 11 reveals. When applying the dual SDP relaxations, the grid-refining
method is useful to extend coarse grid solutions to high resolution grids. The approximate solution for
u(·, ·) is pictured in Figure 3.10, the corresponding v equals zero on the entire domain.
SDP relaxation
Dual
Dual + Grid-refining 3b
Dual
Dual + Grid-refining 3b
Dual + Grid-refining 3b
Dual
Primal
Primal
Transform POP to QOP
no
no
no
no
no
yes
yes
yes
n
18
7938
32
162
722
162
162
338
Nx
5
65
6
11
21
11
11
15
Ny
5
65
6
11
21
11
11
15
ǫsc
-4-e13
-2e-14
-4e-13
-2e-13
-4e-16
-9e-9
-3e-8
-3e-8
tC
2
12280
150
156
238
185
19
107
Table 3.6: Results of SDPR method for (3.43) in Case I.
W
0.5
0.5
0.4
0.4
W
u(x,y)
0.3
0.3
0.2
0.2
0.1
0.1
0
1
0
1
0.8
1
0.6
[
0.8
1
0.6
0.8
0.6
0.4
0.4
0.2
y
0.2
0
0
x
Z
0.8
0.6
0.4
[
0.4
0.2
0.2
0
0
Z
Figure 3.10: SDPR method solution u of (3.43) for Case I and two different discretizations.
We compare the performance of the SDPR method for Case I of (3.43) to Matlab PDE toolbox. Starting
from an arbitrary initial guess the Matlab solver detects the same solution as the SDPR method in 2 seconds
on a mesh with 2667 nodes and 5168 triangles, and in 15 seconds on a mesh with 10501 nodes and 20672
triangles, c.f. Figure 3.11. Thus, the FEM solver from Matlab is much more efficient in finding the solution.
However, the discretization of (3.43) under Dirichlet condition has exactly one real solution. In a more
difficult PDE problem with many solutions a good initial guess is required for the Matlab solver to find a
solution of interest.
Case II. We apply the SDPR method with the same settings as in Case I, and compare the efficiency of
primal and dual SDP relaxations and the grid-refining method. For the primal and dual SDP relaxations of
the QOP the upper bounds are tightened to ubdu = 4 and ubdv = 1.5. The numerical performance of the
SDPR method is reported in Table 3.7. The single solution (u, v) of the discretized differential equation is
illustrated in Figure 3.12. As in Case I, we observe that the transformation from POP to QOP is efficient
to reduce the size of the SDP relaxations while the accuracy of the approximation is preserved. Moreover,
d-space sparsity is richer than correlative sparsity, as the primal SDP relaxations for n = 242 can be solved
in 58s whereas solving the dual SDP relaxations for the same QOP requires 748s.
94
CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS
Height: u
Height: u
0.5
0.5
0.45
0.45
0.4
0.4
0.35
0.35
0.3
0.3
0.25
0.25
0.2
0.2
0.15
0.15
0.1
0.1
0.05
0.05
0
1
0.8
1
0.6
0.8
0
1
0.8
1
0.6
0.6
0.4
0.8
0.4
0.2
0.2
0
0.6
0.4
0.4
0.2
0
0
0.2
0
Figure 3.11: Solution u of (3.43) by Matlab PDE Toolbox for mesh with 2667 nodes/ 5168 triangles (left)
and 10501 nodes/ 20672 triangles (right).
SDP relaxation
Dual
Dual + Grid-refining 3b
Dual
Dual + Grid-refining 3b
Dual
Primal
Transform POP to QOP
no
no
no
no
yes
yes
n
32
3042
50
242
242
242
Nx
6
41
7
13
13
13
Ny
6
41
7
13
13
13
ǫsc
-7e-13
-3e-15
-1e-10
-6e-13
-5e-11
-6e-13
tC
128
2160
713
728
748
58
Table 3.7: Results of the SDPR method for (3.43) in Case I.
Nonlinear parabolic equation
Consider a nonlinear parabolic PDE problem of two dependent scalar functions
1
50 uxx − uy + 1
1
50 vxx − vy
+ u2 v − 4u
+ 3u − u2 v
u(0, y) = u(1, y)
v(0, y) = v(1, y)
u(x, 0)
v(x, 0)
=0
=0
=1
=3
= 1 + sin(2πx)
=3
∀x ∈ [0, 1], y ≥ 0,
∀x ∈ [0, 1], y ≥ 0,
∀ y ≥ 0,
∀ y ≥ 0,
∀ x ∈ [0, 1],
∀ x ∈ [0, 1].
(3.44)
In order to bring (3.44) into form (3.14), we need to cut y at y = T . Since problem (3.44) is parabolic,
the solutions ((u, v)(Nx , Ny )) of the discretized problems converge to solutions (u(·, ·), v(·, ·)) of (3.44)
P with
Theorem 3.4. We apply the grid-refining method with strategy 3b, where F is given by F (u, v) = i,j ui,j
and ω = 3. Furthermore, lbd ≡ 0 and ubd ≡ 5 are chosen as bounds for u and v. The grid-refinig method
yields a highly accurate, stable solutions on a 33 × 65-grid; see Table 3.8 and Figure 3.13.
95
3.3. NUMERICAL EXPERIMENTS
3.5
W
1.1
3
1
X
2.5
v(x,y)
u(x,y)
0.9
2
0.8
1.5
0.7
1
0.5
1
1
0.8
0.8
1
0.6
1
0.6
0.8
0.6
0.4
[
0.2
y
0.2
0
0
0.8
0.6
0.4
0.4
[
Z
0.2
0.2
0
y
x
0.4
0
Z
x
Figure 3.12: SDPR method solutions u (left) and v (right) of (3.43) for Case II.
strategy
initial SDPR method
Grid-refining 3b
Nx
5
33
Ny
9
65
me
-4.12
-2.88
ǫsc
-2e-10
-5e-11
Table 3.8: Results for diffusion problem (3.44).
First order PDEs
An optimization based approach to attempt first order PDE was proposed by Guermond and Popov [29, 30].
In [29] the following example of a first order PDE with a discontinuous solution is solved on a 40 × 40−grid:
ux (x, y)
u(0, y)
u(0, y)
= 0 ∀ (x, y) ∈ [0, 2] × [0.2, 0.8],
= 1 if y ∈ [0.5, 0.8],
= 0 if y ∈ [0.2, 0.5[.
(3.45)
Applying the SDPR method with an forward or central difference approximation for the first derivative in
(3.45) we detect the discontinuous solution
(
1 if y ≥ 0.5
u(x, y) =
0 otherwise
on a 40 × 40-grid.
A more difficult first order PDE problem is given by
ux (x, y) + u(x, y) − 1 = 0
u(0, y) = u(1, y)
=0
0 ≤ u(x, y)
≤1
∀ (x, y) ∈ [0, 1]2 ,
∀ y ∈ [0, 1],
∀ (x, y) ∈ [0, 1]2 .
(3.46)
As can be seen easily and was pointed out in [30], problem (3.46) is not well-posed since the outflow
boundary condition is over-specified. Problem (3.46) is discussed in detail in [30] and the authors obtained
1
an accurate approximation to the exact solution by L
P approximation on a 10×10-grid. Applying the SDPR
method with ω = 1 and objective function F (u) = i,j ui,j on a 10 × 10 grid, we obtain a highly accurate
96
CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS
3
4
2.5
3.5
3
v(x,y)
u(x,y)
2
1.5
1
2.5
2
0.5
1.5
0
10
1
8
6
0.6
8
0.6
4
1
0.8
1
10
0.8
4
0.2
0
y
0.4
6
0.4
2
0
0.2
2
x
0
y
0
x
Figure 3.13: Solutions u (left) and v (right) for diffusion problem (3.44).
approximation (| ǫsc |< 1e − 14) to this solution in less than 10 seconds in the case we choose a forward
difference approximation for the first derivative. In the case of choosing a central or a backwards difference
scheme the dual problem in the resulting SDP relaxation becomes infeasibe. Moreover, by applying the
SDPR method on a 50 × 50-grid we are able to obtain a highly accurate approximation to the solution of
(3.46) in less than 250 seconds, as pictured in Figure 3.14.
0.6
0.5
0.5
0.4
0.4
u(x,y)
u(x,y)
0.6
0.3
0.2
0.2
0.1
0.1
0
1
0.3
0.8
0.6
0.4
0.2
0
1
0.8
0.6
0.4
0.2
0
0
1
0
0.2
0.8
0.4
0.6
0.6
0.4
0.8
0.2
0
1
y
y
x
x
Figure 3.14: Solution u by SDPR method for (3.46).
3.3.3
Reaction-diffusion equations
An interesting class of PDE problems to analyze by the SDPR method is the class of reaction-diffusion
equations. Many reaction-diffusion equations are systems of nonlinear PDE with multiple solutions. For
some of them special unstable solutions exist that are difficult to detect by standard homotopy-like continuation methods. We demonstrate how the features of the SDPR method can be used to find special solutions
of a reaction-diffusion equation, in particular interesting unstable ones.
97
3.3. NUMERICAL EXPERIMENTS
A reaction-diffusion equations due to Mimura
An exciting and difficult reaction-diffusion problem of two dependent functions is a problem by M. Mimura
[64] which arises from the context of planktonic prey and predator models in biology. This problem is given
below and we briefly call it Mimura’s problem.
1
1
′′
− u(t)2 u(t) − u(t) v(t) = 0,
20 u (t) + 9 35 + 16u(t)
= 0,
4 v ′′ (t) − 1 + 25 v(t) v(t) + u(t) v(t)
u̇(0) = u̇(5) = v̇(0) = v̇(5) = 0,
(3.47)
0 ≤ u(t) ≤ 14,
0 ≤ v(t) ≤ 14,
∀t ∈ [0, 5] .
In [64] the problem is analyzed, and the existence of continuous solutions is shown in [82]. In order to
construct a POP of the type (3.26), we consider different objective functions:
PN
F1 (u, v) = −u⌈ N ⌉ , F2 (u, v) = − i=1 ui ,
F3 (u, v) = −u2 ,
2
(3.48)
P
F4 (u, v) = −uN −1 , F5 (u, v) = −u2 − uN −1 , F6 (u, v) = N
i=1 (ui + vi ).
First, we apply the SDPR method with ω = 3 and N = 5. In order to confirm the numerical results
obtained for this very coarse grid, we apply PHoM [31], which is a C++ implementation of the polyhedral
homotopy continuation method for computing all isolated complex solutions of a polynomial system of
equations, to the system of discretized PDEs. In that case the dimension n of the polynomial system equals
6, as there are 2 unknown functions with 3 interior grid points each. PHoM finds 182 complex, 64 real and
11 nonnegative real solutions. Varying the upper and lower bounds for u2 and u4 and choosing one of the
functions F1 , . . . , F5 as an objective function, all 11 solutions are detected accurately (| ǫsc |< 1e − 7) by
the SDPR method, as enlisted in Table 3.9.
u2
4.623
4.607
0.259
5.683
6.274
0.970
0.297
0.962
0.304
0.939
5.000
u3
6.787
6.930
6.930
2.971
0.177
7.812
7.932
7.932
8.045
6.787
5.000
u4
0.939
0.259
4.607
5.683
6.274
0.970
0.966
0.297
0.304
4.623
5.000
v2
9.748
9.737
5.166
10.388
10.638
5.735
5.230
5.729
5.234
5.659
10.000
v3
10.799
10.831
10.831
8.248
6.404
10.94
10.94
10.94
10.94
10.80
10.000
v4
5.659
5.166
9.737
10.388
10.638
5.735
5.729
5.230
5.234
9.748
10.000
objective
F3
F3
F2
F3
F3
F3
F4
F3
F1
F4
F2
ubdu2
5
5
0.5
6
7
2
0.5
2
14
2
14
ubdu4
1.5
0.5
6
6
7
2
2
0.5
14
14
14
Table 3.9: SDPR method solutions of (3.47) for N = 5.
The confirmation of our SparsePOP results by PHoM encourages us to solve Mimura’s problem for a
higher discretization. Relaxation order ω = 3 is necessary to obtain an accurate solution in case N = 7
(Table 3.10, row 1). The upper bounds for u2 and uN −1 are chosen to be 1. When extending the grid
size from 7 to 13, the accuracy of the SDPR solution deteriorates. Also, if we choose ω = 2 for the initial
application of the SDPR method, or if Newton’s method is applied with another arbitrary starting point, or
if we start for instance with N = 5 or N = 9, it is not possible to get an accurate solution. One possibility to
overcome these difficulties is to start the grid-refining method with strategy 3b on a finer grid. We obtain a
highly accurate stable solution 2teeth when we start with N = 7 and F2 as objective function, and a highly
accurate stable solution 2,3peak when we start with N = 25 and F5 as objective function. See Table 3.10
and Figure 3.15. It seems reasonable to state that the SDPR method provides an appropriate initial guess
for Newton’ s method, which leads to accurate solutions for sufficiently high discretizations.
98
CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS
SDP relaxation
Dual
Grid-ref 3b
Grid-ref 3b
Grid-ref 3b
Grid-ref 3b
Grid-ref 3b
Grid-ref 3b
Dual
Grid-ref 3b
Grid-ref 3b
Grid-ref 3b
QOP
no
no
no
no
no
no
no
no
no
no
no
Local solver
Newton
Newton
Newton
Newton
Newton
Newton
Newton
Newton
Newton
Newton
Newton
obj
F2
F5
ubdu1
14
14
14
14
14
14
14
14
14
14
14
N
7
13
25
49
97
193
385
26
51
101
401
solution
2teeth
2,3peak
ǫsc
-5e-9
-2e+0
-2e-6
-1e-12
-5e-12
-2e-4
-5e-5
-1e-1
-5e-2
-2e-15
-6e-16
me
-2.2
2.7
-0.9
0.1
0.2
0.3
0.2
2.09
-0.18
-0.07
-0.07
Table 3.10: Results of grid-refining strategy 3b.
12
12
10
10
8
8
6
6
u
v
4
4
u
v
2
0
2
0
0.5
1
1.5
2
2.5
x
3
3.5
4
4.5
5
0
0
0.5
1
1.5
2
2.5
x
3
3.5
4
4.5
5
Figure 3.15: Unstable solution 2teeth (left) and stable solution 2,3peak (right).
As the most powerful approach we apply the grid-refining method with strategy 3b, ω1 = 3 and ωk = 2
for k ≥ 2. We obtain the highly accurate stable solutions 3peak and 4peak, that are documented in Table
3.11 and pictured in Figure 3.16. As objective function for the POPs to be solved in each iteration we
choose the function FM from (3.31).
Another way to attempt finer grids directly is to transform the POP derived from 3.47 into a QOP, and to
apply both primal and dual SDP relaxations to that QOP. As reported in Table 3.11, the total computation
time in the case N = 51 is reduced by two magnitudes under this method. A highly accurate solution
is obtained when applying SQP as local solver in the SDPR method. Finally, we yield various stable
and unstable solutions to Mimura’s problem, when choosing different functions as objective F and when
tightening or loosening ubdu 1 . By the SDPR method we obtain the stable solutions 3peak, 4peak, 2,3peak
and the unstable solutions 2teeth, peak3unstable, peak4unstable, 2valley.
Reaction-diffusion equations from collision processes
Another interesting class of reaction-diffusion equations arises from collision processes of particle-like patterns in dissipative systems. Various different input-output relations such as annihilation, repulsion, fusion,
and even chaotic dynamics are observed after collision of these patterns. The reaction-diffusion equations
99
3.3. NUMERICAL EXPERIMENTS
SDP Relaxation
Dual
Grid-ref 3a
Grid-ref 3a
Grid-ref 3a
Dual
Grid-ref 3a
Grid-ref 3a
Grid-ref 3a
Dual
Grid-ref 3b
Dual
Grid-ref 3b
Dual
Grid-ref 3b
Dual
Primal
QOP
no
no
no
no
no
no
no
no
no
no
no
no
no
no
yes
yes
Local solver
Newton
Newton
Newton
Newton
Newton
Newton
Newton
Newton
SQP
SQP
SQP
SQP
SQP
SQP
SQP
SQP
obj
F5
FM
FM
FM
F1
FM
FM
FM
F1
F1
F1
F1
F2
F2
F6
F6
ubdu1
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
14
14
0.5
0.5
0.5
0.5
N
26
51
101
201
26
51
101
201
51
201
51
201
51
201
51
51
solution
ǫsc
-2e-1
-4e-2
-4e-4
-3e-11
-1e-3
-4e-3
-3e-16
-2e-11
-5e-13
-1e-12
-1e-10
-4e-12
-1e-2
-3e-16
-1e-13
-1e-9
4peak
3peak
3peak
peak3unstable
peak4unstable
2valley
2valley
me
2.09
-0.05
-0.02
-0.02
-0.12
-0.08
-0.08
-0.07
-0.07
-0.07
0.70
0.70
1.62
1.43
2.10
2.10
tC
203
224
383
1082
270
348
511
1192
470
501
393
427
806
868
16
8
Table 3.11: Results of grid-refining strategy 3a.
describing the collision processes have special unstable solutions, so-called scattors, which are difficult to
detect and attract lots of interest [73, 74, 75, 76]. We show how scattors are detected by the SDPR method.
Gray-Scott model in 1D: As an first example we consider the stationary equation of a Gray-Scott
model for the dynamics of two traveling pulses in one dimension from [74]:
Du uxx (x) − u(x)v(x)2 + f (1 − u(x))
Dv vxx (x) + u(x)v(x)2 − (f + k)v(x)
= 0 ∀ x ∈ [0, 4] ,
= 0 ∀ x ∈ [0, 4] ,
(3.49)
where Du > 0, Dv > 0, and f > 0 and k > 0 are two parameters related to inflow and removal rate of
chemical species. Moreover, we impose Neumann boundary conditions. The existence and the shape of
stable solutions and scattors doublePeak depends heavily on the choice for f and k. We set the parameters
to Du = 5e − 5, Dv = 2.5e − 5, f = 0.0198 and k = 0.0497859. For this setting a scattor withP
positive
eigenvalues (λ1 , λ2 , λ3 ) = (0.0639, 0.0638, 0.0023) was reported in [74]. We choose F (u, v) = − N
i=1 ui ,
lbdu = lbdv = 0, ubdu = 1.0, ubdv = 0.8, lbdu 1 = lbdu N = 0.4, ubdu 1 = ubdu N = 0.6 and ubdu N = 0.9.
2
Applying the SDPR method with these settings yields the scattor doublePeak as reported in Table 3.12 and
pictured in Figure 3.18.
SDP Relaxation
Dual
Grid-ref 3b
Grid-ref 3b
QOP
yes
yes
yes
Local solver
SQP
SQP
SQP
ω
2
N
20
640
1280
solution
doublePeak
doublePeak
λ1
0.0441
0.0639
0.0638
λ2
0.0434
0.0638
0.0637
λ3
0.0023
0.0023
tC
32
469
1247
Table 3.12: Results of the SDPR method for (3.49).
In (3.49) the shape and number of scattors depends on the choice of the two parameters F and k. It is
well known, that if we fix f there is a bifurcation point k0 , to be precise, there are no scattors for (3.49) if
k < k0 and several scattors if k > k0 . For a bifurcation analysis by the SDPR method we fix f = 0.0270.
100
CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS
12
12
u
v
10
10
8
8
6
6
4
4
2
2
0
0
0.5
1
1.5
2
2.5
x
3
3.5
4
4.5
0
5
0
0.5
1
1.5
2
2.5
x
3
3.5
4
4.5
5
Figure 3.16: Stable solutions 4peak and 3peak.
12
12
u
v
u
v
10
10
8
8
6
6
4
4
2
2
0
0
0.5
1
1.5
2
2.5
x
3
3.5
4
4.5
0
5
0
0.5
1
1.5
2
2.5
x
3
3.5
4
4.5
5
Figure 3.17: Unstable solutions peak4unstable and peak3unstable.
Moreover, we define the norm of a numerical solution u by
k u k=
We define F (u, v) = −
as follows:
PN
i=1
N
X
i=1
u2i
! 12
.
u2i as objective function for the SDPR method. Bounds lbd and ubd are chosen
lbdu i
(
N
0.5, if i ∈ {1, . . . , ⌈ 10
⌉} ∪ {⌈ 9N
10 ⌉, . . . , N } ,
=
0,
else
(3.50)
ubdu = 0.8, lbdv = 0 and ubdv = 0.8. If we apply the SDPR method with ω = 2 and N = 256 to (3.49) for
k ≤ 0.05281, we do not obtain any nontrivial solution. If we apply the SDPR method for the same settings
to (3.49) with k increasing from 0.05282 to 0.0535 we obtain a scattor (u, v)1 with k u1 k increasing from
101
3.3. NUMERICAL EXPERIMENTS
1
0.9
0.8
u
v
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.5
1
1.5
2
2.5
3
3.5
4
Figure 3.18: SDPR method detects double peak scattor for (3.49).
0.1682 to 0.1843. To obtain a second scattor requires to tighten the bounds lbdu and ubdu :
(
N
0.4, if i ∈ {1, . . . , ⌈ 10
⌉} ∪ {⌈ 9N
10 ⌉, . . . , N } ,
lbdui =
0,
else
(
N
⌉} ∪ {⌈ 9N
0.6, if i ∈ {1, . . . , ⌈ 10
10 ⌉, . . . , N } .
ubdui =
0.8, else
(3.51)
The bounds for v remain unchanged. Applying the SDPR method with the tighter bounds (3.51) for
k = 0.0535 yields a second solution (u, v)2 with k u2 k= 0.1566 6= 0.1843 =k u1 k. However, it is not
possible to obtain accurate approximations for the second solution when applying the SDPR method for
smaller choices of k. Therefore we apply a continuation technique based on the SDPR method to obtain
the second solution for smaller k: Set k̃ = 0.0535, choose some stepsize ∆k and apply the SDPR method
to (3.49) for k = k̃ − ∆k with objective function Gk̃ defined by
Gk̃ (u, v) =
N
X
i=1
ui − uk̃i
2
,
where uk̃ solution of (3.49) for k = k̃. Update k̃ = k̃ − ∆k and iterate. Following this procedure we obtain
(u, v)2 for k decreasing from 0.0535 to 0.05281 and k u2 k increasing from 0.1566 to 0.1655. Thus, the
bifurcation point of (3.49) is k0 ≈ 0.05281 for f = 0.0270. The results of the SDPR method are illustrated
in the bifurcation diagram in Figure 3.19.
Three-component reaction-diffusion equation
tion from [74]:
Consider another one-dimensional steady state equa-
Du uxx (x) + 2u(x) − u(x)3 − κ3 v(x) − κ4 w(x) + κ1
1
τ (Dv vxx (x) + u(x) − v(x))
1
θ (Dw wxx (x) + u(x) − w(x))
−2 ≤ u(x), v(x), w(x)
=0
=0
=0
≤2
∀ x ∈ [0, 0.5] ,
∀ x ∈ [0, 0.5] ,
∀ x ∈ [0, 0.5] ,
∀ x ∈ [0, 0.5] ,
(3.52)
under Neumann boundary conditions. We set the parameters to Du = 5e − 6, Dv = 5e − 5, Dw = 1e − 2,
κ1 = −7, κ3 = 1, κ4 = 8.5, τ = 16.1328 and θ = 1. In [74] the existence of two scattors, twin-horn and
fusion, is shown for this setting. We choose F (u, v, w) = −u⌈ N ⌉ . fusion has one positive eigenvalue and
2
102
CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS
0.185
0.18
||u1||
||u2||
0.175
0.17
0.165
0.16
0.155
0.0528
0.0529
0.053
0.0531
0.0532
0.0533
0.0534
0.0535
k
Figure 3.19: Bifurcation diagram by the SDPR method (left) compared to the one (right) from [100].
twin-horn three positive eigenvalues (λ1 , λ2 , λ3 ) = (0.9069, 0.1297, 0.0138). Applying the SDPR method
with SQP as local solver starting from N = 40 and subsequent application of the grid-refining method with
strategy 3b, we obtain a highly accurate approximation of the scattor fusion pictured in Figure 3.20.
1.5
1
0.5
0
−0.5
−1
−1.5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Figure 3.20: SDPR method detects fusion for (3.52), with u (blue), v (green) and w (red).
Swift-Hohenberg equation
Another type of reaction-diffusion equations arising from modeling pattern formations is given by SwiftHohenberg equations [80]. Swift-Hohenberg equations are interesting from the point of view of pattern
formation because they have many qualitatively different stationary solutions. Differential equations with
many solutions are of particular interest for applying the SDPR method. We examine a stationary SwiftHohenberg equation from [80]:
−uxxxx(x) − 2ux (x) − (1 − α)u(x) − u(x)3
u(0) = u(L) = uxx (0) = uxx (L)
= 0 ∀ x ∈ [0, L],
= 0,
(3.53)
where α ∈ R. It is known that the shape and number of solutions depends on the choice for L. For our
numerical analysis we set the parameters to L = 9 and α = 0.3. The forth order derivative is approximated
103
3.3. NUMERICAL EXPERIMENTS
by
uxxxx(xi ) ≈
1
(ui+2 − 4ui+1 + 6ui − 4ui−1 + ui−2 ) .
h4x
We consider the following functions as objective functions:
PN
PN
F1 (u) = − i=1 ui , F2 (u) = i=1 ui , F3 (u) = −u2 ,
F5 (u) = −u⌈ 2N ⌉ , F6 (u) = −u⌈ N ⌉ ,
F4 (u) = −u⌈ N ⌉
3
3
(3.54)
4
We apply the SDPR method with ω = 3 for N = 40 and varying objective functions, and obtain five
different solutions for (3.53) as reported in Table 3.13 and pictured in Figure 3.21.
SDP relaxation
Dual
Dual
Dual
Dual
Dual
POP to QOP
no
no
no
no
no
Local Solver
SQP
SQP
SQP
SQP
SQP
Objective
F1
F2
F3
F4
F5
ǫsc
-1e-9
-3e-15
-3e-10
-3e-10
-1e-13
tC
29
32
31
69
58
Table 3.13: Results of the SDPR method with ω = 3, N = 40 and varying objective function.
In a next step we investigate a more systematic approach to enumerate solutions of (3.53). Applying
Algorithm 3.1 does not yield an accurate enumeration of solutions for (3.53). The numerical accuracy of
the solutions deteriorates from the second iteration. This may be explained by the fact, that the relaxation
of the quadratic constraints added in step 3 of Algorithm 3.1 is too weak to provide a good starting point
for the initial solver. We therefore consider a variant of Algorithm 3.1. Instead of adding constraint
(k−1) 2
(ui − ui
to the POP, we consider 2
Pn
i=1
bi
) ≥ ǫk for all i with bi = 1
POPs where we add the constraints
(k−1)
− ǫki
(k−1)
+ ǫki
ui ≤ ui
or
ui ≥ ui
for every i with
Pn bi = 1. It is clear, that the solution with kth smallest objective value is feasible for exactly
one of the 2 i=1 bi POPs. This procedure has the advantage that the added linear constraint remains
hard under the SDP relaxation, it has the disadvantage that many SDPs need to be solved in course of
the algorithm. Therefore, we consider this modified enumeration algorithm for the SDPR method with
small relaxation order. We apply this modified enumeration algorithm and SDPR method with N = 50,
bk := b := (0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, . . ., 0), ǫk := ǫ := 0.1, POP to QOP transformation and ω = 2, F6 as
objective and 6 iterations of the enumeration algorithm. In its first five iterations we obtain the same five
solutions as those obtained by the SDPR method when choosing different objective functions, the output
of the sixth iteration is the same as for the fifth one, although it is not a feasible solution for the POP from
the sixth iteration. Our numerical results are reported in Table 3.14 and pictured in Figure 3.21. Note,
applying the modified enumeration algorithm requires a longer time to obtain the five solutions to (3.53).
However, enumerating the five solutions one by one with respect to the same objective function is a more
systematic approach than choosing five arbitrary objective functions.
3.3.4
Differential algebraic equations
The class of PDE problems (3.14) contains so called differential algebraic equations (DAE). These
are differential equations, where the derivatives of several unknown functions do not occur explicitly. We
104
CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS
k
0
1
2
3
4
5
SDP relaxation
Dual
Dual
Dual
Dual
Dual
Dual
POP to QOP
yes
yes
yes
yes
yes
yes
Local Solver
SQP
SQP
SQP
SQP
SQP
SQP
ǫsc
-4e-15
-9e-11
-5e-10
-9e-13
-1e-11
-1e-11
F6 (u(k) )
-0.43
-0.17
0.00
0.17
0.43
0.43
tC
30
153
293
400
500
593
Table 3.14: Results of the modified enumeration algorithm and SDPR method with ω = 2, N = 50 and F6
as objective.
0.6
0.6
0.4
0.4
0.2
0.2
0
0
−0.2
−0.2
F
1
−0.4
−0.4
(0)
F
u
2
u(1)
F
3
(2)
u
F
4
−0.6
−0.6
(3)
u
F
5
(4)
u
−0.8
0
1
2
3
4
5
6
7
8
−0.8
9
0
1
2
x
3
4
5
6
7
8
9
x
Figure 3.21: SDPR method with varying objective for N = 40 (left) and modified enumeration algorithm
with SDPR method for N = 50 (right).
demonstrate for the following example that the SDPR method can be applied to solve DAE as well. Consider
the DAE problem
u̇1 (x)
0
0
u1 (0)
= u3 (x),
= u2 (x) (1 − u2 (x)) ,
= u1 (x) u2 (x) + u3 (x) (1 − u2 (x)) − x,
= u0 .
∀ x ∈ [0, T ]
(3.55)
It is easy to see that two closed-form solutions u1 and u2 are given by
u1 (x)
u2 (x)
= u0 +
x2
2 ,
T
0, x
T
= (x, 1, 1)
x ∈ [0, T ],
x ∈ [0, T ].
We choose lbd ≡ 0 and ubd ≡ 10 for each function u1 , u2 and u3 and define two objective functions F1 and
F2 ,
Nx
Nx
X
X
u2i .
F2 (u) =
u1i ,
F1 (u) =
i=1
i=1
First we choose u0 = 0 and apply the SDPR method with F2 as an objective, and we obtain an highly
accurate approximation for u2 , which is documented in Table 3.15 and Figure 3.22.
105
3.3. NUMERICAL EXPERIMENTS
objective
F2
F2
F1
F1
F1
F1
F1
F1
F1
F1
ω
2
2
2
2
2
2
2
2
3
4
Nx
100
200
200
200
10
20
30
40
30
30
u0
0
0
1
2
0.5
0.5
0.5
0.5
0.5
0.5
ǫobj
4e-10
3e-9
8e-9
3e-10
3e-10
3e-10
8e-10
7e-8
9e-9
8e-9
ǫsc
-4e-10
-3e-9
-2e-6
-4e-8
-3e-7
-9e-6
-3e-3
-1e-1
-2e-3
-6e-4
tC
29
122
98
107
4
9
15
24
51
210
Table 3.15: Results of SDPR method for (3.55) with T = 2.
2
3
1.8
2.5
u1
1.6
u2, u3
1.4
u1(x), u2(x), u3(x)
u1(x), u2(x), u3(x)
2
1.2
1
0.8
1.5
1
u1
0.6
u2
0.4
u3
0.5
0.2
0
0
0.2
0.4
0.6
0.8
1
x
1.2
1.4
1.6
1.8
2
0
0
0.2
0.4
0.6
0.8
1
x
1.2
1.4
1.6
1.8
2
Figure 3.22: Solutions u2 (left) and u1 (right) of DAE problem (3.55).
Next we apply the SDPR method with F1 as objective. For u0 ∈ {1, 2} highly accurate approximations
of u1 are obtained. An interesting phenomenon is observed in case u0 is small. For instance, if we choose
u0 = 0.5 and ω = 2, we get a highly accurate solution for Nx = 10. But, as we increase Nx stepwise to
40, the accuracy decreases, although the relaxation order remains constant. For numerical details see Table
3.15. This effect can be slightly compensated by increasing ω, as demonstrated for the case Nx = 30. But
due to the limited capacity of current SDP solvers it is not possible to increase ω as much as needed to
obtain a high accuracy solution. However, we obtain highly accurate approximations to both solutions of
(3.55) by the SDPR method even without applying a locally fast convergent method.
3.3.5
The steady cavity flow problem
One of the most challenging present PDE problems is the numerical analysis of the Navier-Stokes equations.
As a first step to attempt this class of PDE problems by the SDPR method we consider the steady cavity
flow problem, which contains a steady state version of the Navier-Stokes equations. The steady cavity
flow problem is a simple model of a flow with closed streamlines and is used for examining and validating
numerical solution techniques in fluid dynamics. Although it has been discussed in the literature of numerical
analysis of fluid mechanics (see, e.g., [40], [12], [32], [14], [98]), it is still an interesting problem to a number of
researchers for a range of Reynolds numbers. The setting of the steady cavity flow problem is the following.
106
CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS
Let (v1 (x, y, t), v2 (x, y, t)) be the velocity of the two dimensional cavity flow of an incompressible fluid
on the cavity region ABCD with the coordinates A = (0, 1), B = (0, 0), C = (1, 0), D = (1, 1).
A
-
B
D
C
It follows from the continuity equation of the incompressible fluid (preservation of the mass)
that there exists a function ψ(x, y, t) such that
∂ψ
∂ψ
= −v2 ,
= v1 .
∂x
∂y
∂v1
∂v2
∂x + ∂y
=0
(3.56)
~ = rot v is called the vorticity. Since the last coordinate of v is 0, φ
~ can be written as
Put v = (v1 , v2 , 0). φ
(0, 0, φ(x, y, t)). The continuity equation and the Navier-Stokes equation (preservation of the momentum)
can be written as follows in terms of ψ and φ.
∆ψ = −φ
∂ψ ∂φ ∂ψ ∂φ
1
∂φ
=
−
+ ∆φ,
∂t
∂y ∂x
∂x ∂y
R
(3.57)
(3.58)
where the parameter R is called the Reynolds number.
The steady cavity flow problem is (3.57) and (3.58) with the steady condition
condition
v1 (0, y) = v1 (x, 0) = v1 (1, y) = 0
v2 (0, y) = v2 (x, 0) = v2 (1, y) = 0
v1 (x, 1) = s, v2 (x, 1) = 0
on AB, BC, CD
on AB, BC, CD
on AD
∂φ
∂t
= 0 and the boundary
(3.59)
(3.60)
(3.61)
Here s is the velocity of the stream out of the cavity ABCD. We set the boundary velocity at AD to
s := 1. Due to its boundary conditions PDE problem (3.56), (3.57), (3.58), (3.59) - (3.61) is not of form
(3.14). Therefore, we need to follow a specialized strategy to discretize this problem via a finite different
scheme. We devide the square ABCD into a N × N mesh and define h := N1−1 . We translate the boundary
conditions for v1 and v2 into boundary conditions for ψ and φ. It follows from (3.59), (3.60), (3.61) that the
function ψ is constant on the boundaries AB, BC, CD, DA. Since ψ is continuous, we suppose that ψ = 0
on the boundaries. The boundary condition for φ is a little complicated. We derive it from the discussion
in [98], p 162: Let us consider the case of the boundary AD first. We take a mesh point M on AD. Let P
be the mesh point inside the cavity adjacent to M and P ′ the mirror image of P with respect to AD. We
supposed that the size of the mesh is h.
P′
M
A
h
P
D
107
3.3. NUMERICAL EXPERIMENTS
We denote the value of ψ at the point P by ψ(P ) or ψP . Moreover, we have −φ(M ) = ∆ψ(M ) = ψyy ≈
. We need to determine the value of ψP ′ to get an approximate value of φ at M . By using the
ψP ′ −ψP
holds. Then, ψP ′ ≈ 2h + ψP . Therefore,
central difference approximation, s = 1 = v2 = ∂ψ
∂y (M ) ≈
2h
we have
2ψP + 2h
φM ≈ −
.
(3.62)
h2
Analogously, we obtain
2ψP
φM ≈ − 2
(3.63)
h
when M is a grid point on AB or BC or CD and P is the adjacent internal grid point of M . From this
discussion follows, we obtain the following finite difference scheme for the steady cavity flow problem.
ψP −2ψM +ψP ′
h2
1
gi,j
(ψ, φ) = 0
2
gi,j
(ψ, φ)
∀ 2 ≤ i, j ≤ N − 1,
∀ 2 ≤ i, j ≤ N − 1,
=0
ψ1,j = ψN,j = 0
ψi,1 = ψi,N = 0
ψ
φ1,j = −2 h2,j
2
ψ
φN,j = −2 Nh−1,j
2
ψ
φi,1 = −2 hi,2
2
ψ
+h
φi,N = −2 i,Nh−1
2
where
1
gi,j
(ψ, φ) :=
∀ j ∈ {1, . . . , N } ,
∀ i ∈ {1, . . . , N } ,
∀ j ∈ {1, . . . , N } ,
∀ j ∈ {1, . . . , N } ,
∀ i ∈ {1, . . . , N } ,
∀ i ∈ {1, . . . , N } ,
(3.64)
(3.65)
(3.66)
−4φi,j + φi+1,j + φi−1,j + φi,j+1 + φi,j−1
+R
4 (ψi+1,j − ψi−1,j )(φi,j+1 − φi,j−1 )
−R
4 (ψi,j+1 − ψi,j−1 )(φi+1,j − φi−1,j ),
−4ψi,j + ψi+1,j + ψi−1,j + ψi,j+1
+ψi,j−1 + h2 φi,j .
2
gi,j
(ψ, φ) :=
We call the polynomial system (3.64), (3.65), (3.66) the discrete steady cavity flow problem denoted
as DSCF(R, N ). It depends on two parameters, the Reynolds number R and the discretization N of the
cavity region ABCD = Ω = [0, 1]2 . Its dimension is given by n = 2(N − 2)2 .
Remark 3.2 We conjecture that the discrete cavity flow problem DSCF(R, N ) has finite complex solutions.
In other words, it defines a zero-dimensional ideal. We checked this conjecture up to N = 5 by Gröbner
basis computation.
DSCF(R, N ) is a sparse, polynomial system which we apply the SDPR method and Algorithm 3.1 to.
The choice for F in 3.26 is motivated by the fact, that one is interested in the solution to the PDE problem
which minimizes the kinetic energy given by
Z Z
ABCD
∂ψ
∂y
2
+
−∂ψ
∂x
2
dxdy.
(3.67)
Thus, by discretizing (3.67) we yield the following function as a canonical choice for F :
F (ψ, ω) =
1
4
X
2
2
2
2
ψi+1,j
+ ψi−1,j
+ ψi,j+1
+ ψi,j−1
2≤i,j≤N −1
(3.68)
−2ψi+1,j ψi−1,j − 2ψi,j+1 ψi,j−1 .
We denote the optimization problem to minimize F subjected to DSCF(R, N ) as he steady cavity flow
optimization problem CF(R, N ). It is characterized by a simple proposition.
Proposition 3.4
a) CF(0, N ) is a convex quadratic program for any N .
108
CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS
b) CF(R, N ) is non-convex for any N , if R 6= 0.
Proof:
a) In case
0 all constraints are linear. Furthermore, the objective function can be written as
P R =
1
2
+ Fi,j
, where
F = i,j Fi,j
1
Fi,j
(ψ, ω)
=
T ψi−1,j
ψi+1,j
2 −2
−2 2
ψi−1,j
ψi+1,j
.
2 −2
It follows that
is convex as
positive semidefinite with eigenvalues 0 and 4. The
−2 2
2
convexity of Fi,j
follows analoguously. Thus, F can be written as a sum of convex function and is
therefore convex as well. The proposition follows.
1
Fi,j
1
b) In case R 6= 0, the equality constraint function gi,j
is indefinite quadratic. Thus, CF(R, N ) is a
non-convex quadratic program.
Solving CF(R, N ) by the SDPR method
First, we apply the SDPR method with ω = 1 and Newton’s method as local solver to CF(100, N ). Highly
accurate solutions with ǫsc < 1e − 10 are obtained for N ∈ {10, 15, 20}. By applying the grid-refining
method we succeed in extending the solutions to grids of size 30 × 30 and 40 × 40, as pictured for N = 40
in Figure 3.23 and reported in Table 3.16. Thus, it seems reasonable to conclude, that the minimal energy
solution of CF(100, N ) converges to a continuous solution of the steady cavity flow problem for N → ∞.
The discrete steady cavity flow problem has multiple solutions. It is an advantage of the SDPR method
to show that the minimal kinetic energy solution u⋆ (N ) of CF(R, N ) converges to an analytic solution for
N → ∞.
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure 3.23: (v1 , v2 ) for solution u of CF(100, 40).
N
10
15
20
30
40
ω
1
1
1
1
1
ǫsc
4e-11
6e-16
6e-16
4e-11
4e-11
tC
14
255
948
1759
4156
F (u⋆ )
0.0169
0.0313
0.0409
0.0503
0.0554
Table 3.16: Results for CF (100, N ) for increasing N .
Second, we apply the SDPR method to CF(R, N ) for a much larger R. For example we examine
CF (10000, N ) for N ∈ {8, . . . , 18}. For all tested discretizations we were able to find accurate solutions by
the SDPR method with ω = 1 and SQP as local solver, c.f. Table 3.17 and Figure 3.24.
109
3.3. NUMERICAL EXPERIMENTS
N
8
10
12
14
16
18
ω
1
1
1
1
1
1
ǫsc
2e-7
3e-10
1e-7
5e-9
4e-12
2e-8
F (u(k) )
1.5e-6
3.2e-6
6.0e-6
1.1e-5
1.9e-5
3.9e-5
tC
7
21
49
99
199
501
Table 3.17: Results for CF (10000, N ) for increasing N .
1
1
1
0.9
0.9
0.9
0.8
0.8
0.8
0.7
0.7
0.7
0.6
0.6
0.6
0.5
0.5
0.5
0.4
0.4
0.4
0.3
0.3
0.3
0.2
0.2
0.2
0.1
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
1
1
1
0.9
0.9
0.9
0.8
0.8
0.8
0.7
0.7
0.7
0.6
0.6
0.6
0.5
0.5
0.5
0.4
0.4
0.4
0.3
0.3
0.3
0.2
0.2
0.2
0.1
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
Figure 3.24: Solutions of CF (10000, N ) of the SDPR method for N = 8 (left, top), N = 10 (center, top),
N = 12 (right, top), N = 14 (left, bottom), N = 16 (center, bottom) and N = 18 (right, bottom).
If we compare the pictures in Figure 3.24, it seems the SDPR(1) solution of CF (10000, 1, N ) evolves into
some stream-like solution for increasing N . However, unlike the solutions of CF (100, 1, N ), we have not
been able to expand this solution to a grid of higher resolution by the grid-refinement method. Therefore,
it is possible the solution pictured in Figure 3.24 is a fake solution, which confirms that the Steady Cavity
Flow problem becomes a hard problem for increasing Reynolds number.
Enumerating the solutions of DSCF(R, N )
A further interesting question is to find all solutions of the cavity flow problem, in particular for large
Reynolds number. Therefore, we examine the efficiency of Algorithm 3.1 for enumerating the solutions of
DSCF(R, N ) with respect to their discretized kinetic energy. For the parameter bk ∈ {0, 1}n to be chosen
in each iteration of Algorithm 3.1 we restrict ourselves to the case where bk is given by
(
1, if i ∈ {1, . . . , bk1 } ∪ { n2 + 1, . . . , n2 + bk2 },
k
bi =
0, else.
110
CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS
1
1
1
0.9
0.9
0.9
0.8
0.8
0.8
0.7
0.7
0.7
0.6
0.6
0.6
0.5
0.5
0.5
0.4
0.4
0.4
0.3
0.3
0.3
0.2
0.2
0.2
0.1
0.1
0
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure 3.25: (v1 , v2 ) for solutions u(0) (left), u(1) (center) and u(2) (right) of CF(4000, 5) on the interior of
[0, 1]2 .
Thus bk is defined by the two parameters bk1 , bk2 ∈ {1, . . . , n2 }. The parameters ǫk1 and ǫk2 are corresponding
to the constraints imposed by bk1 and bk2 , respectively.
CF(4000,5): In a first setting we choose the discretization N = 5, i.e. the dimension is n = 2 · 32 = 18.
This dimension is small enough to apply the Gröbner basis method to determine all complex solutions of
DSCF(R, N ). Therefore, we are able to verify whether the solutions provided by Algorithm 3.1 are optimal.
The computational results are given in Table 3.18. Comparing the solutions of the SDPR method to all
k
0
1
2
ω
1
1
1
ǫk1
1e-3
1e-3
bk1
3
3
bk2
0
0
tC
2
5
8
ǫsc
2e-10
5e-4
5e-4
F (u(k) )
4.6e-4
6.3e-4
1.0e-3
solution
u(0)
u(1)
u(2)
Table 3.18: Results of Algorithm 3.1 for CF(4000, 5).
solutions of the polynomial system obtained by polyhedral homotopy method or Gröbner basis method, it
turns out that the solutions u(0) , u(1) and u(2) indeed coincide with the three smallest energy solutions u(0)⋆ ,
u(1)⋆ and u(2)⋆ . The velocities (v1 , v2 ) derived from these three solutions via (3.56) are displayed in Figure
3.25. Note, that the third smallest energy solution u(2) shows a vortex in counter-clockwise direction, which
may indicate that this solution is a fake solution.
CF(20000,7): We apply Algorithm 3.1 with ω = 1 to CF(20000, 7) and obtain the results in Table 3.19.
The two parameter settings (ǫ11 , b11 ) = (1e − 3, 1) and (ǫ11 , b11 ) = (1e − 6, 5) are not sufficient to obtain an
other solution than u(0) , whereas (ǫ11 , b11 ) = (1e − 5, 5) yields u(1) , a solution of larger energy. After another
iteration with (ǫ21 , b21 ) = (1e − 5, 5) we obtain a third solution u(3) of even larger energy.
k
0
1
1
1
2
ω
1
1
1
1
1
ǫk1
1e-3
1e-6
1e-5
1e-5
bk1
1
5
5
5
bk2
0
0
0
0
tC
2
5
5
9
14
ǫsc
3e-7
5e-4
6e-6
5e-6
5e-6
F (u(k) )
3.4e-4
3.4e-4
3.4e-4
5.9e-4
5.2e-3
solution
u(0)
u(0)
u(0)
u(1)
u(2)
Table 3.19: Results of Algorithm 3.1 for CF(20000, 7).
111
3.3. NUMERICAL EXPERIMENTS
1
1
1
0.9
0.9
0.9
0.8
0.8
0.8
0.7
0.7
0.7
0.6
0.6
0.6
0.5
0.5
0.5
0.4
0.4
0.4
0.3
0.3
0.3
0.2
0.2
0.2
0.1
0.1
0
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure 3.26: (v1 , v2 ) for solutions u(0) (left), u(1) (center) and u(2) (right) of CF(20000, 7) on [0, 1]2 .
It is interesting to observe in Figure 3.26 that u(1) and u(2) are one-vortex solutions, whereas there seems
to be no vortex in the smallest energy solution u(0) .
CF(40000,7): Next, we examine CF (40000, 7), which is a good example to demonstrate that solving
DSCF(R, N ) and CF(R, N ) are becoming more difficult for larger Reynolds numbers. As for the previous
problem, the dimension of the POP is n = 50, which is too large to be solved by Gröbner basis. Our
computational results are reported in Table 3.20.
k
0
1
2
3
0
ω
1
1
1
1
2
ǫk1
5e-6
5e-6
8e-6
-
bk1
5
5
5
-
bk2
0
0
0
-
tC
3
7
11
16
5872
ǫsc
2e-7
6e-9
3e-6
5e-6
8e-10
F (u(k) )
3.4e-4
7.3e-4
5.9e-4
2.3e-4
2.6e-4
solution
u(0) (1)
u(1) (1)
u(2) (1)
u(3) (1)
u(0) (2)
Table 3.20: Results of Algorithm 3.1 for CF (40000, 7).
Solution u(2) (1) is of smaller energy than u(1) (1), and u(3) (1) is even of smaller energy than u(0) (1).
Thus, unlike the solutions for CF(20000,7) reported in Table 3.19, the solutions of CF(40000,7) are not
enumerated in the correct order. This phenomenon can be explained by the fact, that the SDP relaxation
with ω = 1 is not tight enough to yield a solution that converges to u⋆ under the local optimization
procedure. The energy of u(0) (2) obtained by SDPR(2) is smaller than the one of u(0) (1), but it is not the
global minimizer as well. In fact, Algorithm 3.1 with ω = 1 generates a better solution u(3) (1) (with smaller
energy) in 3 iterations requiring 16 seconds computation time, compared to solution u(0) (2) obtained by
applying the SDPR method with ω = 2 to CF (40000, 7) requiring 5872 seconds. Thus, despite failing to
enumerate the smallest energy solutions in the right order with ω = 1, applying the enumeration algorithm
with relaxation order ω = 1 is far more efficient than the original sparse SDP relaxation (2.18) with ω = 2
for approximating the global minimizer of POP (3.26). It is a future problem to make this construction
systematic.
Alternative finite difference scheme
∂φ
∂ψ ∂φ
To derive DSCF(R, N ) we discretize the Jacobian ∂ψ
∂y ∂x − ∂x ∂y by the standard central difference scheme.
Arakawa [3] showed that the standard central difference scheme does not keep important physical invariants.
Therefore, Arakawa proposed an alternative finite difference discretization for the Jacobian, that is shown
to preserve those invariants. We use this alternative scheme to derive an alternative discrete steady cavity
flow problem ADSCF(R, N ) and solve it via the SDPR method. In ADSCF(R, N ), the finite difference
112
CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS
approximation for
∂ψ ∂φ
∂y ∂x
−
∂ψ ∂φ
∂x ∂y
is replaced by
∂ψ ∂φ
∂ψ ∂φ
∂y ∂x − ∂x ∂y (xi , yj ) ≈
1
− 12h
2 [(φi,j−1 + φi+1,j−1 − φi,j+1 − φi+1,j+1 ) (ψi+1,j + ψi,j )
− (φi−1,j−1 + φi,j−1 − φi−1,j+1 − φi,j+1 ) (ψi,j + ψi−1,j )
+ (φi+1,j + φi+1,j+1 − φi−1,j − φi−1,j+1 ) (ψi,j+1 + ψi,j )
− (φi+1,j−1 + φi+1,j − φi−1,j−1 − φi−1,j ) (ψi,j + ψi,j−1 )
+ (φi+1,j − φi,j+1 ) (ψi+1,j+1 + ψi,j ) − (φi,j−1 − φi−1,j ) (ψi,j + ψi−1,j−1 )
+ (φi,j+1 − φi−1,j ) (ψi−1,j+1 + ψi,j ) − (φi+1,j − φi,j−1 ) (ψi,j + ψi+1,j−1 )].
(3.69)
Note, ADSCF(R, N ) is less sparse than DSCF(R, N ) and it is more difficult to derive accurate solutions
by the SDPR method with relaxation order ω = 1. However, we succeed in solving ADSCF(R, N ) in some
instances. For example, in Table 3.21 and Figure 3.27 we compare the minimum kinetic energy solutions
obtained for DSCF(5000, N ) and ADSCF(5000, N ). It is interesting that the vortex in the minimum kinetic
energy solution for ADSCF(5000, N ) is preserved for increasing N , whereas the vortex in solution for
DSCF(5000, N ) seems to deteriorate.
Problem
ADSCF(5000,14)
ADSCF(5000,16)
DSCF(5000,14)
DSCF(5000,16)
ǫsc
7e-12
5e-10
1e-11
3e-10
tC
1304
2802
419
768
F (u⋆ )
1.8e-4
3.1e-4
5.6e-4
1.1e-4
Table 3.21: Results for solving ADSCF(5000,N) compared to DSCF(5000,N).
Solutions of CF(R, N ) for increasing R
In order to understand why convergence of discrete approximations to the analytic solution is a lot more
difficult to obtain for large R, we examine the behavior of the minimal energy solution of DSCF(R, N ) and
CF(R, N ), respectively, for increasing Reynolds number R. The SDPR method is one possible approach
to solve DSCF(R, N ). If ω is chosen sufficiently large, the output u of the SDPR method is guaranteed to
accurately approximate the minimal energy solution u⋆ of CF(R′ , N ) and DSCF(R′ , N ), respectively. In
order to show the advantage of the SDPR method we compare our results to solutions of DSCF(R′ , N )
obtained by the following standard procedure:
Method 3.2 Naive homotopy-like continuation method
1. Choose the parameters R′ , N and a step size ∆R.
2. Solve DSCF(0, N ), i.e. a linear system, and obtain its unique solution u0 .
3. Increase Rk−1 by ∆R: Rk = Rk−1 + ∆R
4. Apply Newton’s method to DSCF(Rk , N ) starting from uk−1 . Obtain solution uk as an approximation
to a solution of the discrete cavity flow problem.
5. Iterate 3. and 4. until the desired Reynold’s number R′ is reached.
Note, the continuation method does not necessarily yield the minimal kinetic energy solution of DSCF(R, N ).
Let u⋆ (R, N ) denote the global minimizer of CF(R, N ), the minimal energy is given by Emin (R, N ) =
F (u⋆ (R, N )). Obviously, Emin (0, N ) = F (u0 (N )) holds. In a next step, the solution of DSCF(R, N )
obtained by the continuation method starting from u0 is denoted as ũ(R), and its energy as EC (R, N ) :=
F (ũ(R, N )). As illustrated for N = 5 in Figure 3.28, it is possible to find a continuation ũ of u0 for all R.
113
3.3. NUMERICAL EXPERIMENTS
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0
0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0
1
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure 3.27: Solutions for ADSCF(5000,14) (top left), ADSCF(5000,16) (top right), DCSF(5000,14) (bottom
left) and DCSF(5000,16) (bottom right).
For N = 5 the dimension of DSCF(R, N ) is n = 18. This dimension is small enough to solve a polynomial
system by Gröbner basis method and to determine all complex solutions of the system. Therefore, we can
verify whether SDPR method detects the global minimizer of CF(R, N ) or not. It is worth pointing out,
that we are able to find the minimal energy solution of CF(R, N ) by applying the SDP relaxation method,
whereas this solution cannot be obtained by the continuation method. We observe applying the SDPR
method with ω = 1 is sufficient to detect the global optimizer for R ≤ 10000, and for R ≥ 20000 the global
optimizer is obtained by the SDPR method with ω = 2.
In the case of N = 6 and N = 7 the dimension of the polynomial system is too large to be solved by
Gröbner basis method for R > 0. For N = 6 the continuation method, and the SDPR method with ω = 1
and ω = 2 yield the same solution for all tested R. And in the case of N = 7 the continuation solution
ũ(R) is detected by the SDPR method with ω = 1 as well, except the case R = 6000, where a solution with
slightly smaller energy is detected, as documented in Table 3.23.
Summarizing these results, F (u0 (N )) ≥ F (ũ(R, N )) for any of the tested R > 0. It is an advantage
of the SDPR method to show, ũ(R, N ) is in general not the optimizer of CF (R, N ) for increasing R. In
fact, for some settings we obtain far better approximations to the minimal energy solution than ũ(R, N ).
Furthermore, Emin (R) and EC (R) are both decreasing in R. The behavior of EC , ESDPR and Emin coincides
for all chosen discretizations N and motivates the following conjecture.
Conjecture 3.1 Let discretization N be fixed.
a) F (u0 (N )) = Emin (0, N ) ≥ Emin (R, N ) ≥ 0
∀R ≥ 0.
114
CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS
R
0
100
500
1000
2000
4000
6000
8000
10000
30000
100000
NC
1
37
37
37
37
37
36
36
35
35
34
NR
1
13
13
13
13
17
16
16
17
17
16
EC
0.0096
0.0030
6.2e-4
5.4e-4
6.2e-4
6.3e-4
5.7e-4
5.2e-4
4.7e-4
4.5e-4
4.5e-4
ESDP R(1)
0.0096
0.0030
6.2e-4
5e-4
6.2e-4
4.6e-4
4.5e-4
4.5e-4
4.5e-4
4.5e-4
4.5e-4
ESDP R(2)
0.0096
0.0030
6.2e-4
5e-4
6.2e-4
4.6e-4
4.5e-4
4.5e-4
4.5e-4
2.5e-4
8.8e-5
Table 3.22: Numerical results for CF (R, 5), where ESDP R(ω) the discretized kinetic energy of the solution
of SDPR method with relaxation order ω.
0.01
0.009
0.008
EC(R)
ESDPR(1)(R)
0.007
ESDPR(2)(R) / Emin(R)
0.006
0.005
0.004
0.003
0.002
0.001
0
0
1
2
3
4
5
R
6
7
8
9
10
4
x 10
Figure 3.28: EC (R), ESDPR(1) (R), ESDPR(2) (R) and Emin (R) for N = 5.
b) Emin (R, N ) → 0 for R → ∞.
As an application, Conjecture 3.1 can be used as a certificate for the non-optimality of a feasible solution
u′ of CF(R, N ) in the case F (u′ (R, N )) > Emin (0, N ). If it is possible to extend u0 to R via continuation
method, ũ(R, N ) can serve as a non-optimality certificate in the case F (u′ (R, N )) > F (ũ(R, N )).
3.3.6
Optimal control problems
A class of challenging problems involving differential equations that goes beyond the class of PDE problems
(3.14) is optimal control, in particular nonlinear optimal control. To solve nonlinear optimal control problems (OCP) analytically is a challenging problem, even though powerful techniques such as the maximum
principle and Hamilton-Jacobi-Bellman optimality equations exist. For numerical methods to solve OCP, we
distinguish direct and indirect methods [18, 25, 94, 101]. But, in particular for OCPs with state constraints
many numerical methods are difficult to use. A recent approach by Lasserre et al. [54] takes advantage
of semidefinite programming (SDP) relaxations to generate a convergent sequence of lower bounds for the
optimal value of an OCP. We demonstrate on the following examples that the SDPR method can be applied
as well to solve OCPs numerically. An OCP can be discretized via finite difference approximations to obtain
a POP satisfying a structured sparsity pattern. The POP we derive from an OCP is essentially of the form
(3.26); the main difference being that we do not choose the objective function F , but that F is given as
the discretization of the objective of the OCP. By applying the SDPR method to an OCP we (a) obtain
115
3.3. NUMERICAL EXPERIMENTS
R
EC
ESDPR(1)
0
2.0e-2
2.0e-2
100
7.7e-3
7.7e-3
4000
4.1e-4
4.1e-4
6000
3.7e-4
3.6e-4
10000
3.4e-4
3.4e-4
Table 3.23: Numerical results for CF (R, 7), where ESDP R(ω) the discretized kinetic energy of the solution
of SDPR method with relaxation order ω.
a lower bound for its optimal value, and (b) unlike the approach in [54] we obtain approximations for the
optimal value, the optimal control and trajectory. As in the PDE case, it is a feature of the SDPR method
that state and/or control contraints can be incorporated by defining additional polynomial equality and
inequality constraints.
Control of production and consumption
The following problem arises from the context of control of production and consumption of a factory. Let
x(t) be the amount of output produced at time t ≥ 0, α(t) the control variable which denotes the
fraction of output reinvested at time t ≥ 0, with 0 ≤ α(t) ≤ 1. The dynamics of the system are
provided by the ODE problem
ẋ(t) = kα(t)x(t) ∀ t ∈ [0, T ],
x(0) = x0 ,
where k > 0 a constant modeling the growth rate of a reinvestment. It is the aim to maximize the functional
Z T
(1 − α(t)) x(t)dt,
P (α(·), x(·)) =
0
i.e., the total consumption of the output, our consumption at a given time t being (1 − α(t))x(t). Thus, the
control problem can be written as
RT
(1 − α(t)) x(t)dt
max
0
s.t.
ẋ(t)
= k α(t)x(t) ∀ t ∈ [0, T ],
(3.70)
x(0)
= x0 ,
0 ≤ α(t)
≤1
∀ t ∈ [0, T ].
The constraining ODE problem can be discretized by a finite difference scheme in the same way as
(3.14). In contrast to the previous examples, we are not free to choose the objective function of (3.26). It is
given by the objective function P (α(·), x(·)) of the optimal control problem (3.70). We obtain the POP’s
objective function F by discretizing P (α(·), x(·)) as
F (α, x) =
N
X
i=1
(1 − αi )xi .
It is easy to show with the Pontryagin Maximum Principle, see for example [60], the optimal control
law α⋆ (·) is given by
(
1 if 0 ≤ t ≤ t⋆
⋆
α (t) =
0 if t⋆ < t ≤ T
for an appropriate switching time t⋆ , 0 ≤ t⋆ ≤ T . In the case k = 1 the switching time is given by
t⋆ = T − 1. We apply the SDPR method to (3.70) with objective function F and ω = 2, and can confirm
numerically t⋆ = T − 1 holds for k = 1. Our results are reported in Table 3.24 and illustrated in Figure
3.29.
Thus, the solution of the control problem and in particular the optimal control law α⋆ are approximated
accurately. Moreover, we observe that the switching time fits the predicted value.
116
CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS
T
3
3
4
4
Nt
200
300
200
300
ǫsc
-3.9e-6
-1.3e-4
-4.5e-4
-2.7e-4
ǫobj
4.4e-9
3.9e-9
1.9e-8
3.9e-8
tC
102
354
126
367
t⋆
2.00
2.00
3.00
3.00
Table 3.24: Results of the SDPR method for (3.70) with k = 1, x0 = 0.25.
2
6
1.8
5
1.6
x(t)
alpha*(t)
1.4
4
x(t)
alpha(t)
1.2
1
3
0.8
2
0.6
0.4
1
0.2
0
0
0.5
1
1.5
t
2
2.5
3
0
0
0.5
1
1.5
2
t
2.5
3
3.5
4
Figure 3.29: SDPR method solutions for (3.70), x(t) (blue) and α(t) (green) for Nt = 300, T = 3 (left) and
T = 4 (right), respectively.
Control of reproductive strategies of social insects
As another example, consider a problem arising from reproductive strategies of social insects.
max P (w(·), q(·), α(·))
s.t.
ẇ(t)
w(0)
q̇(t)
q(0)
0 ≤ α(t)
= q(T )
= −µw(t) + b s(t)α(t) w(t) ∀ t ∈ [0, T ],
= w0 ,
= −νq(t) + c(1 − α(t))s(t)w(t) ∀ t ∈ [0, T ],
= q0 ,
≤ 1 ∀ t ∈ [0, T ],
(3.71)
where w(t) the number of workers at time t, q(t) the number of queens, α(t) the control variable, which
denotes the fraction of the colony effort devoted to increasing work force, µ the workers death rate, ν
the queens death rate, s(t) a known rate at which each worker contributes to the bee-economy, b and c
constants. It follows from Pontryagin Maximum Principle [60], the optimal control law α of problem (3.71)
is a bang-bang control law for any rate s(t), i.e., α(t) ∈ {0, 1} for all t ∈ [0, 1].
For the SDPR method we choose as objective the function F given by
F (w, q, α) = qNt ,
117
3.3. NUMERICAL EXPERIMENTS
which is a discretization of the objective function P (w(·), q(·), α(·)). Table 3.25 shows the numerical results,
which are illustrated in Figure 3.30.
s(t)
1
1
2 (sint + 1)
Nt
300
300
ǫobj
2e-7
1e-4
ǫsc
-2e-6
-4e-5
Table 3.25: Results of the SDPR method for (3.71) with T = 3, µ = 0.8, b = 1, w0 = 10, ν = 0.3, c = 1,
q 0 = 1 and ω = 2.
s(t)=1
s(t) = 0.5 (sin(t) +1)
14
10
w(t)
9
w(t)
q(t)
alpha(t)
12
q(t)
8
10
alpha(t)
7
6
8
5
6
4
3
4
2
2
1
0
0
0.5
1
1.5
t
2
2.5
3
0
0
0.5
1
1.5
t
2
2.5
3
Figure 3.30: SDPR method solutions for (3.71) for s(t) = 1 (left) and s(t) = 0.5(sin(t) + 1) (right).
In the case of s(t) = 1, it is sufficient to choose w(t), q(t) ≤ 20 as upper bounds to get accurate results.
For the more difficult problem with s(t) = 0.5(sin(t) + 1) it is necessary to tighten the upper bounds to
w(t) ≤ 10 and q(t) ≤ 3, in order to obtain fairly accurate results. In both cases, the bang-bang control law
is approximated with high precision.
The double integrator
Consider the optimal control problem given by
min T
s.t. ẋ1 (t) = x2 (t) ∀t ∈ [0, T ],
ẋ2 (t) = u(t)
∀t ∈ [0, T ],
x1 (0) = x1,0 ,
x1 (T ) = 0,
x2 (0) = x2,0 ,
x2 (T ) = 0,
x2 (t) ≥ −1,
u(t) ∈ [−1, 1].
(3.72)
Note, we can not apply the SDPR method directly to (3.72), since the length T of the domain is not
specified. Furthermore as a system of first order PDE with terminal condition is overspecified, we replace
118
CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS
the constraints x1 (T ) = x2 (T ) = 0 by | x1 (T ) | + | x2 (T ) |≤ ǫ for a small ǫ > 0. We apply a standard
coordinate transformation to (3.72) and obtain the equivalent problem
min T
s.t. ẋ1 (t) = T x2 (t)
∀t ∈ [0, 1],
ẋ2 (t) = T u(t)
∀t ∈ [0, 1],
x1 (0) = x1,0 ,
x2 (0) = x2,0 ,
| x1 (1) | + | x2 (1) |≤ ǫ,
x2 (t) ≥ −1,
u(t) ∈ [−1, 1].
(3.73)
Optimal control problem (3.73) is of a form we can apply the SDPR method to. Lower bounds lbdu =
lbdx = −1 and upper bound ubdu = 1 are given, and we choose ubdx = 10. Given some starting point
x0 ∈ R2 it is the aim to find the minimal time T ⋆ to steer x(t) into the origin. For this simple problem it is possible to determine the minimal time T ⋆ (x0 ) analytically, c.f. [54]. Thus, for each choice
ω (x0 ))
and evaluate the performance of our approach. We apply
x0 we can calculate the ratio min(sSDP
T ⋆ (x0 )
the SDPR method with ω = 3 for discretization N = 50. We choose the same set of x0 as in [54], x0,1 ∈
{0, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0} and x0,2 ∈ {−1.0, −0.8, −0.6, −0.4, −0.2, 0.0, 0.2, 0.4, 0.6, 0.8, 1.0}.
3 (x0 ))
for the 11 × 11 different values of x0 . Some entries are
In Table 3.26 we report the fraction min(sSDP
T ⋆ (x0 )
larger than 1, which can be explained by the fact that there is a discretization error for the medium scale
discretization N = 50. Compared to the corresponding table in [54], we achieve better lower bounds for
T ⋆ in most cases. Moreover, the SDPR method approximates optimal control and trajectory in addition
to generating lower bounds for the optimal value. See Figure 3.31 for approximations of u⋆ and x⋆ before
and after applying sequential quadratic programming with the SDP solution as initial guess. We observe,
that the approximation (ũ, x̃) provided by the sparse SDP relaxation is already close to the highly accurate
approximation (u, x) to the optimal solution of the discretized OCP obtained by additionally applying SQP.
x0,1 \x0,2
0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
-1
0.8783
0.8158
0.7228
1.0139
1.0079
1.0141
1.0162
1.0185
1.0189
1.0196
1.0234
-0.8
0.6162
0.4756
0.9440
0.9975
1.0071
1.0124
1.0131
1.0156
1.0195
1.0182
1.0205
-0.6
0.5543
0.9060
0.9237
0.9971
1.0090
1.0119
1.0109
1.0135
1.0148
1.0187
1.0177
-0.4
0.5665
0.8756
0.9258
0.9886
0.9972
1.0050
1.0064
1.0114
1.0136
1.0150
1.0168
-0.2
0.8472
0.8281
0.9023
0.9991
0.9983
1.0009
1.0044
1.0086
1.0114
1.0133
1.0140
0
1.0000
0.7869
0.8708
0.9876
1.0035
0.9926
1.0018
1.0067
1.0069
1.0082
1.0109
0.2
0.8420
0.7362
0.8539
0.9382
1.0005
1.0086
1.0086
1.0042
1.0070
1.0085
1.0095
0.4
0.5447
0.7420
0.8495
0.9588
0.9962
1.0016
1.0076
1.0017
1.0062
1.0076
1.0082
Table 3.26: min(sSDP3 (x0 ))/T ⋆ (x0 ) for different choices of x0 .
0.6
0.8191
0.9068
0.9423
0.9507
0.9772
1.0020
1.0018
1.0065
1.0021
1.0065
1.0045
0.8
0.8858
0.9339
0.9692
0.9875
1.0025
1.0026
1.0043
0.9991
1.0009
1.0027
1.0028
1.0
0.9128
0.9537
0.9811
0.9848
0.9901
0.9961
0.9974
0.9967
0.9997
1.0010
1.0024
119
3.3. NUMERICAL EXPERIMENTS
1
1
0.8
0.8
x1
0.6
x1
0.6
x2
0.4
x2
0.4
u
0.2
0.2
0
0
−0.2
−0.2
−0.4
−0.4
−0.6
−0.6
−0.8
−0.8
−1
0
5
10
15
20
25
30
35
40
45
50
−1
u
0
5
10
15
20
25
30
35
40
45
50
Figure 3.31: Optimal control and trajectories for x0 = (0.8, −1) for Double Integrator OCP before (left)
and after (right) applying SQP.
The Brockett Integrator
Consider the nonlinear optimal control problem given by
min T
s.t. ẋ1 (t) = u1 (t)
ẋ2 (t) = u2 (t)
ẋ3 (t) = u1 (t)x2 (t) − u2 (t)x1 (t)
x1 (0) = x1,0 ,
x2 (0) = x2,0 ,
x3 (0) = x3,0 ,
x1 (T ) = 0,
x2 (T ) = 0,
x3 (T ) = 0,
u1 (t)2 + u2 (t)2 ≤ 1.
∀t ∈ [0, T ],
∀t ∈ [0, T ],
∀t ∈ [0, T ],
(3.74)
Applying the same transformation as for (3.72), we bring (3.74) into a form we can apply the SDPR method
to:
min T
s.t. ẋ1 (t) = T u1 (t)
∀t ∈ [0, 1],
ẋ2 (t) = T u2 (t)
∀t ∈ [0, 1],
ẋ3 (t) = T u1 (t)x2 (t) − T u2 (t)x1 (t) ∀t ∈ [0, 1],
x1 (0) = x1,0 ,
(3.75)
x2 (0) = x2,0 ,
x3 (0) = x3,0 ,
| x1 (1) | + | x2 (1) | + | x3 (1) |≤ ǫ,
u1 (t)2 + u2 (t)2 ≤ 1,
for some small ǫ > 0. As in example (3.72), it is the aim to find the minimal time T ⋆ (x0 ) to steer some
point x(t) with x(0) = x0 ∈ R3 into the origin. For the lower and upper bounds of u and x, we choose
lbdu = −1, ubdu = 1, lbdx = −5, ubdx = 5.
For this optimal control problem it is possible to calculate the minimal time T ⋆ exactly [54]. Thus, we
can compare the performance of the SDPR method to the approach in [54]. We apply the SDPR method
120
CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS
x0,2 \x0,3
0
1
2
3
0
0.0000
1.6049
2.7816
3.9705
1
1.0081
1.2827
2.0269
2.8498
2
2.0276
2.0337
2.1959
2.6177
3
3.0145
3.0125
3.0170
3.1906
Table 3.27: min(sSDP3 (x0 )) for x0,1 = 0 and (x0,2 , x0,3 ) ∈ {0, 1, 2, 3}2.
x0,2 \x0,3
0
1
2
3
0
0.0000
2.5066
3.5449
4.3416
1
1.0000
1.7841
2.6831
3.4328
2
2.0000
2.1735
2.5819
3.0708
3
3.0000
3.0547
3.2088
3.4392
Table 3.28: T ⋆ (x0 ) for x0,1 = 0 and (x0,2 , x0,3 ) ∈ {0, 1, 2, 3}2.
with ω = 3 to (3.75). The numerical results for N = 50 and x0,1 = 0, (x0,2 , x0,3 ) ∈ {0, 1, 2, 3}2 are
reported in Table 3.27, the results for N = 30 and x0,1 = 1, (x0,2 , x0,3 ) ∈ {1, 2, 3}2 in Table 3.29. Again,
min(sSDPw (x0 )) is larger than T ⋆ (x0 ) for some x0 , which is explained by the discretization error due to
the medium scale choice N ∈ {30, 50}. This gap closes for N → ∞.
In particular for choices x0 to be found in the lower left corner of Table 3.27 and 3.29, we obtain better
lower bounds than [54]. Again, unlike the method in [54] we also obtain an accurate approximation of
optimal control and trajectory, as pictured in Figure 3.32.
x0,2 \x0,3
1
2
3
1
1.8862
2.6077
3.2969
2
2.4412
2.7737
3.2033
3
3.3145
3.4516
3.6618
Table 3.29: min(sSDP3 (x0 )) for x0,1 = 1 and (x0,2 , x0,3 ) ∈ {1, 2, 3}2.
121
3.3. NUMERICAL EXPERIMENTS
x0,2 \x0,3
1
2
3
1
1.8257
2.5231
3.1895
2
2.3636
2.6856
3.1008
3
3.2091
3.3426
3.5456
Table 3.30: min(T ⋆ (x0 )) for x0,1 = 1 and (x0,2 , x0,3 ) ∈ {1, 2, 3}2.
2
2
x1
x1
x2
1.5
x2
1.5
x3
x3
u1
u1
u2
1
0.5
0.5
0
0
−0.5
−0.5
−1
0
5
10
15
20
25
30
35
40
45
u2
1
50
−1
0
5
10
15
20
25
30
35
40
45
50
Figure 3.32: Optimal control and trajectories for x0 = (0, 2, 1) for Brockett Integrator OCP before (left)
and after (right) applying SQP.
122
CHAPTER 3. SDP RELAXATIONS FOR SOLVING DIFFERENTIAL EQUATIONS
Chapter 4
Concluding Remarks and Future
Research
4.1
Conclusion
Hierarchies of SDP relaxations are a powerful tool to solve general, severely nonconvex POPs. However, to
solve large scale POPs remains a very challenging task due to the limited capacity of contemporary SDP
solvers. In this thesis we discussed two major approaches to attempt large scale POPs by reducing the size
of the SDP relaxations.
In the first one presented in 2.2, our focus has been on developing a theoretical framework consisting
of the d- and r-space conversion methods to exploit structured sparsity, characterized by a chordal graph
structure, via the positive semidefinite matrix completion for an optimization problem involving linear and
nonlinear matrix inequalities. The two d-space conversion methods are provided for a matrix variable X
in objective and/or constraint functions of the problem, which is required to be positive semidefinite. The
methods decompose X into multiple smaller matrix variables. The two r-space conversion methods are
aimed at a matrix inequality in the constraint of the problem. In these methods, the matrix inequality is
converted into multiple smaller matrix inequalities. As mentioned in Remarks 2.3, 2.5 and 2.7, the d-space
conversion method using clique trees and the r-space conversion method using clique trees have plenty of
flexibilities in implementation. This should be explored further for increasing the computational efficiency.
In 2.2.7 we constructed linear SDP relaxations for general quadratic SDP that exploit d- and r-space sparsity.
When applying these relaxations to quadratic SDP arising from different applications, we observed that the
computational performance is greatly improved compared to the classical SDP relaxations, which do not
apply d- and r-space conversion methods. Of particular interest for the numerical analysis of differential
equation is the linear SDP relaxation exploiting d-space sparsity. It reduces the size of SDP relaxations for
POPs derived from certain differential equations a lot. It will be interesting topic to study the efficiency of
d- and r-space conversion methods for further classes of nonlinear SDPs.
In 2.3, we proposed four different heuristics to transform a general POP into a QOP. The advantage
of this transformation is that the sparse SDP relaxation of order one can be applied to the QOP. The
sparse SDP relaxation of order one is of vastly smaller size than the sparse SDP relaxation of minimal order
ωmax for the original POP. By solving the sparse SDP relaxation of the QOP, approximates to the global
minimizer of a large scale POP of higher degree can be derived. The reduction of the SDP relaxation and
the gain in numerical tractability come at the cost of deteriorating feasibility and optimality errors of the
approximate solution obtained by solving the SDP relaxation. In general the SDP relaxation of order one
for the QOP is weaker than the SDP relaxation of order ωmax for the original POP. We discussed how to
overcome this difficulty by imposing tighter lower and upper bounds for the components of the n-dimensional
variable of a POP, by adding linear or quadratic Branch-and-Cut bounds, and by applying local convergent
optimization methods such as sequential quadratic programming to the POP starting from the solution
provided by the SDP relaxation for the QOP. The proposed heuristics have been demonstrated with success
123
124
CHAPTER 4. CONCLUDING REMARKS AND FUTURE RESEARCH
on various medium and large-scale POPs. We have seen that imposing additional Branch-and-Cut bounds
was necessary to yield accurate approximations to the global optimizer for some problems. However, for
most problems it was crucial to choose the lower and upper bounds for the variable x sufficiently tight and
to apply SQP to obtain a highly accurate approximation of the global optimizer. The total processing time
could be reduced by up to three magnitudes for the problems we tested. For these reasons we think the
proposed technique is promising to find first approximate solutions for POPs, whose size is too large to be
solved by the more precise, original SDP relaxations due to Waki et al.
Our most important application for both approaches to reduce the size of SDP relaxations is the numerical analysis of nonlinear differential equations. We were able to transform nonlinear PDE problems
with polynomial structure into POPs. The description is based on the discretization of a PDE problem,
the approximation of its partial derivatives by finite differences and the choice of an appropriate objective
function. Due to the finite difference discretization, the POPs derived from PDEs satisfy both, correlative and domain-space sparsity patterns under fairly general assumptions. Therefore, we can apply dual
standard form SDP relaxations exploiting correlative sparsity and primal standard form SDP relaxations
exploiting domain-space sparsity efficiently. For many PDE problems the solution of the SDP relaxation is
an appropriate initial guess for locally fast convergent methods like Newton’s method and SQP. However,
we have seen it is often necessary to impose tighter or additional bounds to the POPs to derive highly
accurate solutions. Moreover, we demonstrated how to choose an objective function and bounds for the
unknown function, in order to detect particular solutions of a discretized PDE problem. In other words,
one of the features of using the SDPR method instead of several existing methods is that a function space
to find solutions may be translated into natural constraints for a sparse POP. In the case we have partial
information about a particular solution we want to find, this information can be exploited by the SDPR
method to provide an appropriate initial guess for a local method. The reduction techniques from Chapter
2 are highly efficient to solve POPs derived from PDEs for higher resolution grids and to obtain accurate
discrete approximations for solutions of the continuous PDE problem. Another technique to extend solutions to finer and finer grid is the grid-refining method, which is efficient even when starting from solutions
on very coarse grids.
We have shown that the SDPR method is very promising for nonlinear differential equations with several
solutions. One feature of the SDPR method is the ability to detect a particular solution. Another challenging
problem is to enumerate all solutions of a discretized PDE problem and ultimately to enumerate accurate
approximations for all solutions of the underlying continuous PDE problem. We proposed an algorithm
based on the SDPR method to approximately enumerate all real solutions of a zero dimensional radical
polynomial system with respect to a cost function. If the order of the SDP relaxations tends to infinity,
we can guarantee the convergence of the algorithm’s output to the smallest kinetic energy solutions of the
polynomial system. The algorithm can be applied successfully to enumerate the solutions of the discrete
cavity flow problem with the kinetic energy of the flow as cost function. A variant of the enumeration
algorithm has been applied to detect all solutions of an interesting reaction-diffusion equation. Since both,
the enumeration algorithm and its variant, are based on the SDPR method that exploits sparsity, it is
possible to attempt POPs of much larger scale than by the approaches in [35] and [55].
To conclude, the SDPR method constitutes a general purpose method for solving problems involving
differential equations polynomial in the unknown functions. We demonstrated the potential of the SDPR
method on differential equations arising from a range of areas: Elliptic, parabolic and hyperbolic PDE,
reaction-diffusion equations, fluid dynamics, nonlinear optimal control, differential algebraic equations and
first order PDEs. This list of differential equations we may analyze by the SDPR method is by no means
complete. But it illustrates that the SDPR method provides a powerful tool to get new insights in the
numerical analysis of differential equations.
4.2
Outlook on future research directions
Efficient software for large scale POPs and their SDP relaxations remains a challenging field with many
open problems. The research presented in this thesis motivates to look for answers to a number of questions:
We discussed four heuristics to transform an arbitrary POP into an equivalent QOP. Moreover, we
4.2. OUTLOOK ON FUTURE RESEARCH DIRECTIONS
125
encountered that correlative sparsity and domain-space sparsity of a QOP can differ significantly. Therefore,
the size of the resulting dual and primal form SDP relaxations may be of vastly different size. The question
remains whether (a) there is a way to transform a POP into a QOP that enhances these types of sparsity,
and (b) whether we may find a more general concept of sparsity for a QOP that combines these two types
of sparsity. We have also seen that the approximation accuracy of the SDP relaxation for the QOP is
weaker than for the original POP. It remains a future problem to strengthen the sparse SDP relaxation of
order one for a QOP further. In that respect, it is desirable to find a systematic approach to tighten lower
and upper bounds successively, without shrinking the optimal set of the POP. Furthermore, the additional
quadratic constraints derived under the transformation algorithm allow to express some moments as linear
combination of other moments. As proposed by Henrion and Lasserre in [35] and Laurent in [56] these linear
combinations can be substituted in the moment and localizing matrices of the SDP relaxation to reduce the
size of the moment vector y. Exploiting this technique will shrink the size of the sparse SDP relaxations
for QOPs further and may enable us to solve POPs of even larger scale.
Compared to the methods [35, 55] for finding all real solutions of a zero-dimensional radical polynomial
system, our enumeration algorithm can be applied to problems of much larger scale. However, the numerical
stability depends heavily on the choice of the parameters in the algorithm. The variant of the enumeration
algorithm for the Swift-Hohenberg equation constitutes a promising first step to improve the numerical
stability, since the additional linear constraints remain unchanged under the SDP relaxation. The idea of
this variant may be exploited more systemically in future.
Although we are able to solve some PDE problems with minimal relaxation order ω = ωmax , in many
cases it is a priori not possible to predict the relaxation order ω which is necessary to attain an accurate
solution. As the size of the sparse SDP relaxation increases polynomially in ω, the tractability of the SDP
is limited by the capacity of current SDP solvers. It is a further challenging question whether we can
characterize a class of differential equation problems that is guaranteed to be approximated accurately for
a certain fixed relaxation order. At the moment there are only very few results for error bounds of SDP
relaxations for general, nonconvex POPs [72].
Not every solution of a discretized differential equation is a discrete approximation for an actual solution
of this differential equation, as we encountered in the analysis of the steady cavity flow problem. It is
therefore interesting to close the gap between the discrete and the continuous world. An approach based
on the SDPR method for narrowing this gap takes advantage of maximum entropy estimation [10, 53]. In
this approach the solution of the SDPR method is used to compute discrete approximations to moments
of a measure corresponding to the differential equation, and when applying maximum entropy estimation
to these discretized moments we obtain a smooth approximation for a solution of the differential equation.
This is the topic of some ongoing joint work with Jean Lasserre and Didier Henrion.
Finally, we applied the SDPR method to a wide variety of problems involving differential equations.
However, the classes of problems we may attempt by this approach are by no means exhausted. It will
be an interesting topic of future research to apply the SDPR methods to challenging nonlinear differential
equations satisfying a polynomial structure. Also, nonlinear optimal control seems an challenging area to
apply this methodology to, as the numerical experiments for the simple optimal control problems presented
in this thesis suggest.
126
CHAPTER 4. CONCLUDING REMARKS AND FUTURE RESEARCH
Bibliography
[1] J. Agler, J. W. Helton, S. McCullough, L. Rodman, Positive semidefinite matrices with a given sparsity
pattern, Linear Algebra Appl. (1988), Vol. 107, pp. 101-149.
[2] E.L. Allgower, D.J. Bates, A.J. Sommese, C.W. Wampler, Solution of polynomial systems derived from
differential equations, Computing, 76 (2006), No. 1, pp. 1-10.
[3] A. Arakawa, Computational design for long-term numerical integration of the equation of fluid motion:
two dimensional incompressible flow, part I., Journal of Computational Physics 135 (1997), pp. 103-114.
[4] J.R.S. Blair, B. Peyton, An introduction to chordal graphs and clique trees, Graph Theory and Sparse
Matrix Computation, Springer Verlag (1993), pp. 1-29.
[5] D. Bertsimas, C. Caramanis, Bounds on linear PDEs via semidefinite optimization, Math. Programming, Series A 108 (2006), pp. 135-158.
[6] P. Biswas, Y. Ye, A distributed method for solving semidefinite programs arising from Ad Hoc Wireless
Sensor Network Localization, Multiscale Optimization Methods and Applications, 69-84, SpringerVerlag
[7] J. Bochnak, M. Coste, M.-F. Roy, Real Algebraic Geometry, Springer-Verlag (1998).
[8] P.T. Boggs, J.W. Tolle, Sequential Quadratic Programming, Acta Numerica 4 (1995), pp. 1-50.
[9] B. Borchers, SDPLIB 1.2, A library of semidefinite programming test problems, Optim. Methods Softw.
(1999), 11-12, pp. 683-689.
[10] J. Borwein, A.S. Lewis, On the convergence of moment problems, Trans. Am. Math. Soc., 325 (1991),
pp. 249-271.
[11] D. Braess, Finite Elements, Theory, fast solvers, and applications in solid mechanics, Cambridge
University Press (2001).
[12] O.R. Burggraf, Analytical and numerical studies of the structure of steady separated flows, J. Fluid
Mech 24 (1966), pp. 113-151.
[13] R. Courant, K. Friedrichs, H. Lewy, Über die partiellen Differenzengleichungen der mathematischen
Physik, Math. Ann. 100 (1928), No. 1, pp. 32-74.
[14] M. Cheng, K.C. Hung, Vortex structure of steady flow in a rectangular cavity, Computers & Fluids,
Volume 35, Issue 10 (2006), pp. 1046-1062.
[15] R. Courant, D. Hilbert, Methoden der Mathematischen Physik, Vol 1 (1931), Chapter 4, The method
of variation.
[16] R. Courant, Variational methods for the solution of problems of equilibrium and vibrations, Bull. Amer.
Soc. 49 (1943), pp. 1-23.
127
128
BIBLIOGRAPHY
[17] G. Dahlquist, Convergence and stability in the numerical integration of ordinary differential equations,
Math. Scand. 4 (1956), pp. 33-53.
[18] R. Fletcher, Practical Methods of Optimization. Vol. 1 Unconstrained Optimization, John Wiley, Chichester (1980).
[19] I. Fried, Numerical Solutions of Differential Equations, Academic Press (1979).
[20] K. Fujisawa, S. Kim, M. Kojima, Y. Okamoto, M. Yamashita, User’s Manual for SparseCoLO: Conversion Methods for SPARSE COnic-form Linear Optimization Problems, Research reports on Mathematical and Computing Sciences B-453, Tokyo Institute of Technology.
[21] M. Fukuda, M. Kojima, K. Murota, K. Nakata Exploiting sparsity in semidefinite programming via
matrix completion I: General framework, SIAM J. Optim., 11 (2000), pp. 647-674.
[22] M. Fukuda, M. Kojima, Branch-and-Cut Algorithms for the Bilinear Matrix Inequality Eigenvalue
Problem, Computational Optimization and Applications, 19 (2001), pp. 79-105.
[23] B.G. Galerkin, Series solution of some problems in elastic equilibrium of rods and plates, Vestn. Inzh.
Tech. 19 (1915), pp. 897-908.
[24] A. George, J.W. Liu, Computer Solution of Large Sparse Positive Definite Systems, Prentice-Hall
(1981).
[25] P.E. Gill, W. Murray, M.H. Wright, Practical Optimization, Academic Press, London, New York (1981).
[26] M. Goemans, D.P. Williamson, Improved Approximation Algorithms for Maximum Cut and Satisfiability Problems Using Semidefinite Programming, Journal of the ACM, 42 (1995), No. 6, pp. 1115-1145.
[27] D. Gottlieb, S. Orszag, Numerical Analysis of Spectral Methods: Theory and Applications, SIAM,
Philadelphia (1977).
[28] R. Grone, C. R. Johnson, E. M. Sá, H. Wolkowitz, Positive definite completions of a partial hermitian
matrices, Linear Algebra Appl. 58 (1984), pp. 109-124.
[29] J.L. Guermond, A finite element technique for solving first-order PDEs in LP , SIAM Journal Numerical
Analysis 42 (2004), No. 2, pp. 714-737.
[30] J.L. Guermond, B. Popov, Linear advection with ill-posed boundary conditions via L1 -minimization,
International Journal of Numerical Analysis and Modeling 4 (2007), No. 1, pp. 39-47.
[31] T. Gunji, S. Kim, M. Kojima, A. Takeda, K. Fujisawa, and T. Mizutani, PHoM - a Polyhedral Homotopy
Continuation Method for Polynomial Systems, Research Reports on Mathematical and Computing
Sciences, Dept. of Math. and Comp. Sciences, Tokyo Inst. of Tech., B-386 (2003)
[32] K. Gustafson, K. Halasi, Cavity flow dynamics at higher Reynolds number and higher aspect ratio,
Journal of Computational Physics 70 (1987), pp. 271-283.
[33] W. Hao, J.D. Hauenstein, B. Hu, Y. Liu, A.J. Sommese, Y.-T. Zhang, Multiple stable steady states of
a reaction-diffusion model on zebrafish dorsal-ventral patterning, Discrete and Continuous Dynamical
Systems, Series S, To appear.
[34] J.D. Hauenstein, A.J. Sommese, C.W. Wampler, Regeneration Homotopies for Solving Systems of
Polynomials, Mathematics of Computation, To appear.
[35] D. Henrion, J.B. Lasserre, Detecting global optimality and extracting solutions in GloptiPoly, Chapter in D. Henrion, A. Garulli, editors, Positive polynomials in control. Lecture Notes in Control and
Information science, Springer Verlag (2005), Berlin.
BIBLIOGRAPHY
129
[36] D. Henrion, J. B. Lasserre, Convergent relaxations of polynomial matrix inequalities and static output
feedback, IEEE Trans. Automatic Control (2006), 51, pp. 192-202.
[37] C. W. J. Hol, C. W. Scherer, Sum of squares relaxations for polynomial semidefinite programming,
Proc. Symp. on Mathematical Theory of Networks and Systems (MTNS), Leuven, Belgium, 2004.
[38] R. Horst, P.M. Pardalos, N.V. Thoai, Introduction to Global Optimization, Kluwer Academic Publishers
(2000).
[39] B. Huber, B. Sturmfels, A polyhedral method for solving sparse polynomial systems, Math. of Comp.
64 (1995), pp. 1541-1555.
[40] M. Kawaguti, Numerical solution of the Navier-Stokes equations for the flow in a two dimensional
cavity, J. Phys. Soc. Jpn. 16 (1961), pp. 2307-2315.
[41] S. Kim, M. Kojima, Exact solutions of some nonconvex quadratic optimization problems via SDP and
SOCP relaxations, Computational Optimization and Applications, 26 (2003), pp. 143-154.
[42] S. Kim, M. Kojima, M. Mevissen, M. Yamashita, Exploiting Sparsity in Linear and Nonlinear Matrix
Inequalities via Positive Semidefinite Matrix Completion, Mathematical Programming, To Appear.
[43] S. Kim, M. Kojima, H. Waki, Exploiting Sparsity in SDP Relaxation for Sensor Network Localization,
SIAM Journal of Optimization 20 (2009), No. 1, pp. 192-215.
[44] S. Kim, M. Kojima H. Waki, M. Yamashita, SFSDP: a Sparse Version of Full Semidefinite Programming Relaxation for Sensor Network Localization Problems, Research Report B-457, Department of
Mathematical and Computing Sciences, Tokyo Institute of Technology (2009).
[45] K. Kobayashi, S. Kim, M. Kojima, Correlative sparsity in primal-dual interior-point methods for LP,
SDP and SOCP, Appl. Math. Optim. (2008), 58, pp. 69-88.
[46] M. Kojima, Sums of Squares Relaxations of Polynomial Semidefinite Programs, Research Report B-397,
Department of Mathematical and Computing Sciences, Tokyo Institute of Technology (2003).
[47] M. Kojima, S. Kim, H. Waki, Sparsity in sums of squares of polynomials, Mathematical Programming,
103 (2005), pp. 45-62.
[48] M. Kojima, M. Muramatsu, An Extension of Sums of Squares Relaxations to Polynomial Optimization
Problems over Symmetric Cones, Math. Programming (2007), 110, pp. 315-336.
[49] M. Kojima, M. Muramatsu, A note on sparse SOS and SDP relaxations for polynomial optimization
problems over symmetric cones, Comput. Optim. Appl. (2009), 42, pp. 31-41.
[50] W. Kutta, Beitrag zur näherungsweisen Integration totaler Differentialgleichungen, Zeitschrift Math.
Physik 46 (1901), pp. 435-453.
[51] J.B. Lasserre, Global optimization with polynomials and the problem of moments, SIAM Journal on
Optimization, 11 (2001), pp. 796-817.
[52] J.B. Lasserre, Convergent SDP-Relaxations in Polynomial Optimization with Sparsity, SIAM Journal
on Optimization, 17 (2006), No. 3, pp. 822-843.
[53] J.B. Lasserre, Semidefinite programming for gradient and Hessian computation in maximum entropy
estimation, Proc. IEEE Conf. Dec Control, 2007.
[54] J.B. Lasserre, D. Henrion, C. Prieur, E. Trelat, Nonlinear optimal control via occupation measures and
LMI-relaxations, SIAM Journal on Control and Optimization, 47 (2008), pp. 1649-1666.
[55] J.B. Lasserre, M. Laurent, P. Rostalski, Semidefinite characterization and computation of real radical
ideals, Foundations of Computational Mathematics, Vol. 8 (2008), No. 5, pp. 607-647.
130
BIBLIOGRAPHY
[56] M. Laurent, Sums of squares, moment matrices and optimization over polynomials, Emerging Applications of Algebraic Geometry, Vol. 149 of IMA Volumes in Mathematics and its Applications (2009),
M. Putinar and S. Sullivant (eds.), Springer, pp. 157-270.
[57] P.D. Lax, R.D. Richtmyer, Survey of the stability of linear finite difference equations, Comm. Pure
Appl. Math. 9 (1956), pp. 267-293.
[58] R.J. LeVeque, Finite Volume Methods for Hyperbolic Problems, Cambridge University Press (2002).
[59] G.R. Liu, S.S. Quek, The Finite Element Method, A practical course, Elsevier (2003).
[60] J. Macki, A. Strauss, Introduction to Optimal Control Theory, Springer-Verlag (1982), pp. 108.
[61] M. Mevissen, M. Kojima, J. Nie and N. Takayama, Solving partial differential equations via sparse SDP
relaxations, Pacific Journal of Optimization, 4 (2008), No. 2, pp. 213-241.
[62] M. Mevissen, M. Kojima, SDP Relaxations for Quadratic Optimization Problems Derived from Polynomial Optimization Problems, Asia-Pacific Journal for Operations Research 27 (2010), No. 1, pp.
1-24.
[63] M. Mevissen, K. Yokoyama and N. Takayama, Solutions of Polynomial Systems Derived from the Cavity
Flow Problem, Proceedings of the 2009 International Symposium on Symbolic Computation, 2009, pp.
255 - 262.
[64] M. Mimura, Asymptotic Behaviors of a Parabolic System Related to a Planktonic Prey and Predator
Model, SIAM Journal on Applied Mathematics, 37 (1979), no. 3, pp. 499-512.
[65] A.R. Mitchell, D.F. Griffiths, The Finite Difference Method in Partial Differential Equations, John
Wiley and Sons (1980).
[66] J.J. More, B.S. Garbow and K.E. Hillstrom, Testing unconstrained optimization software, ACM Trans.
Math. Software, 7 (1981), pp. 17-41.
[67] K.G. Murty, S.N. Kabadi, Some NP-complete problems in quadratic and nonlinear programming, Mathematical Programming, 39 (1987), pp. 117-129.
[68] K. Nakata, K. Fujisawa, M. Fukuda, M. Kojima, K. Murota Exploiting sparsity in semidefinite programming via matrix completion II: Implementation and numerical results, Math. Programming, 95
(2003), pp. 303-327.
[69] Ju. E. Nesterov, A. S. Nemirovski, Interior Point Polynomial Methods in Convex Programming: Theory
and Applications, SIAM, Philadelphia, PA, 1994.
[70] Y. Nesterov, Squared functional systems and optimization problems, in J.B.G. Frenk, C. Roos, T. Terlaky, and S. Zhang, editors, High Performance Optimization, pp. 405-440. Kluwer Academic Publishers
(2000).
[71] J. Nie, Sum of squares method for sensor network localization, Computational Optimization and Applications 43 (2009), No. 2, pp. 151-179.
[72] J. Nie, An Approximation Bound Analysis for Lasserre’s Relaxation in Multivariate Polynomial Optimization, preprint (2009).
[73] Y. Nishiura, D. Ueyama, Spatio-temporal chaos for the Gray-Scott model, Physica D, 150 (2001), pp.
137 - 162.
[74] Y. Nishiura, T. Teramoto, K. Ueda, Dynamic transitions through scattors in dissipative systems, Chaos,
13 (2003), No. 3, pp. 962 - 972.
BIBLIOGRAPHY
131
[75] Y. Nishiura, T. Teramoto, K. Ueda, Scattering of traveling spots in dissipative systems, Chaos, 15
(2005), 047509.
[76] Y. Nishiura, T. Teramoto, X. Yuan, K. Ueda, Dynamics of traveling pulses in heterogeneous media,
Chaos, 17 (2007), 037104.
[77] J. Nocedal, S.J. Wright, Numerical Optimization, Series in Operations Research, Springer, New York
2006.
[78] M. Noro, K. Yokoyama, A modular method to compute the rational univariate representation of zerodimensional ideals, Journal of Symbolic Computation 28 (1999), pp. 243–263.
[79] P.A. Parrilo, Semidefinite programming relaxations for semialgebraic problems, Math. Programming,
96 (2003), pp. 293 - 320.
[80] L.A. Peletier, V. Rottschäfer, Pattern selection of solutions of the Swift-Hohenberg equation, Physica
D, 194 (2004), pp. 95 - 126.
[81] M. Putinar, Positive Polynomials on Compact Semi-algebraic Sets, Indiana Univ. Math. Journal 42
(1993), No. 3, pp. 969-984
[82] J. Rauch, J. Smoller, Qualitative theory of the FitzHugh-Nagumo equations, Advances in Mathematics,
27 (1978), pp. 12-44.
[83] L. Rayleigh, On the theory of resonance, Trans. Roy. Soc. A 161 (1870), pp. 77 - 118.
[84] W. Ritz, Über eine neue Methode zur Lösung gewisser Variationsprobleme der mathematischen Physik,
Journal für die reine und angewandte Mathematik 135 (1908), pp. 1-61.
[85] F. Rouillier, Solving zero-dimensional systems through the rational univariate representation, Applicable Algebra in Engineering, Communication and Computing 9 (1999), pp. 433–461.
[86] C. Runge, Über die numerische Auflösung von Differentialgleichungen, Math. Ann. 46 (1895), pp. 167
-178.
[87] K. Schmüdgen, The K-moment problem for compact semi-algebraic sets, Math. Ann. 289 (1991), pp.
203-206.
[88] M. Schweighofer, Optimization of polynomials on compact semialgebraic sets, SIAM J. Optimization
15 (2005), pp. 805-825.
[89] H.D. Sherali, C.H. Tuncbilek, A global optimization algorithm for polynomial programming problems
using a reformulation-linearization technique, Journal of Global Optimization, 2 (1992), pp. 101-112.
[90] H.D. Sherali, C.H. Tuncbilek, New reformulation-linearization technique based relaxations for univariate and multivariate polynomial programming problems, Operations Research Letters, 21 (1997), 1, pp.
1-10.
[91] N.Z. Shor, Class of global minimum bounds of polynomial functions, Cybernetics, 23 (1987), 6, pp.
731-734.
[92] N.Z. Shor, Nondifferentiable Optimization and Polynomial Problems, Kluwer (1998).
[93] J. Smoller, Shock Waves and Reaction-Diffusion Equations, Springer-Verlag (1983), pp. 106.
[94] J. Stoer, R. Bulirsch, Introduction to Numerical Analysis, 3rd edition, Springer-Verlag, New York
(2002).
[95] J.F. Sturm, SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones, Optimization
Methods and Software, 11 and 12 (1999), pp. 625-653.
132
BIBLIOGRAPHY
[96] J.C. Strikwerda, Finite Difference Schemes and Partial Differential Equations, Wadsworth and Brooks
(1989).
[97] M. Tabata, A finite difference approach to the number of peaks of solutions for semilinear parabolic
problems, J. Math. Soc. Japan, 32 (1980), pp. 171-192.
[98] T. Takami, T. Kawamura, Solving Paritial Differential Equations with Difference schemes, Tokyo University Press (1994).
[99] M.J. Turner, R.M. Clough, H.C. Martin, L.J. Topp, Stiffness and deflection analysis of complex structures, J. Aeron. Sci. 23 (1956), pp. 805-823, pp. 854.
[100] T. Teramoto, Personal communication.
[101] O. Von Stryk, R. Bulirsch, Direct and indirect methods for trajectory optimization, Ann. Oper. Res.
37 (1992), pp. 357-373.
[102] H. Waki, S. Kim, M. Kojima, M. Muramatsu, Sums of squares and semidefinite program relaxations
for polynomial optimization problems with structured sparsity, SIAM Journal of Optimization 17 (2006)
218-242.
[103] H. Waki, S. Kim, M. Kojima, M. Muramatsu, SparsePOP: a Sparse Semidefinite Programming Relaxation of Polynomial Optimization Problems, Research Reports on Mathematical and Computing
Sciences, Dept. of Math. and Comp. Sciences, Tokyo Inst. of Tech., B-414 (2005).
[104] M. Yamashita, K. Fujisawa, M. Kojima, Implementation and evaluation of SDPA 6.0 (SemiDefinite
Programming Algorithm 6.0), Optimization Methods and Software 18 (2003), pp. 491-505.
[105] Yokota, http://next1.cc.it-hiroshima.ac.jp/MULTIMEDIA/numeanal2/node24.html.
[106] O.C. Zienkiewicz, R.L. Taylor, J.Z. Zhu, The Finite Element Method, Its Basis and Fundamentals,
Elsevier (2005).