Alan Turing Institute

IBM Research - Ireland
Some of polynomial optimisation
in power systems at IBM Research – Ireland
Jakub Mareček
IBM Research – Ireland
Based on joint work with Bissan Ghaddar, Martin Mevissen, and Martin Takáč
Alan Turing Institute, January 29th 2016, Edinburgh
IBM Research - Ireland
Overview
The busy agenda:
• The non-convexity of steady-state problems in power systems
• Polynomial optimisation and the use of SDP therein
• Combining first- and second-order methods therein
Based on joint work with a number of co-authors:
• Ghaddar, JM, Mevissen, arXiv:1404.3626
• JM, Takáč: arXiv:1506.08568
• Liu, JM, Takáč: arXiv:1510.02171
although there is much beyond these, e.g., arXiv:1412.8054,
arXiv:1510.06797, arXiv:1601.06739.
IBM Research - Ireland
Motivation
• The cost of energy production is approximately 10% of world GDP
(IEA, 2014 report using 2012 data)
• Savings from a 5% increase in efficiency amount to $26B/year
in the US alone
• Power systems exhibit non-linear, uncertain dynamics
• But even for deterministic steady-state problems, which are solved
every couple of minutes, there are no good solvers.
• But then: in data science (e.g. compressed sensing), large-scale
non-convex problems are tackled regularly. Can we learn something?
IBM Research - Ireland
A FERC Study: Where do Leading Solvers Fail?
Termination
Exceeded time limit with an
infeasible solution
Exceeded time limit with a
feasible solution
Early termination with an
infeasible solution
Early termination with a
feasible solution
Normal termination with an
infeasible solution
Percentage of feasible
Percentage of best seen
Conopt
7.9%
lpopt
22.2%
Knitro
38.4%
Minos
0.0%
Snopt
23.0%
6.5%
0.9%
6.7%
0.0%
14.4%
0.0%
0.4%
0.6%
11.4%
0.0%
31.2%
0.6%
1.8%
2.3%
0.2%
10.5%
3.4%
0.0%
55.9%
1.6%
81.6%
43.8%
74.1%
72.5%
61.0%
52.5%
32.7%
30.3%
75.4%
60.8%
Source: Anya Castillo and Richard O’Neill, Staff Technical Conference, FERC, Washington DC, June 24–26, 2013. 118–3120 bus
instances, 100 runs per instance and solver, 20 min runs, Xeon E7458 2.4GHz, 64 GB RAM
IBM Research - Ireland
Is this a Failure of Modelling?
Real-valued:
• Polar Power-Voltage
pk , qk , vk , θk
• Rectangular Power-Voltage (R-PQV)
pk , qk , <vk , =vk
• Polar Current Injection
pk , qk , vk , θk , ik
• Rectangular Current Injection (R-CI)
• Polar Voltage-Current
• Rectangular Voltage-Current (R-IV)
pk , qk , <vk , =vk , ik
pk , qk , vk , θk , ilm
pk , qk , <vk , =vk , ilm
Further variants: dropping pk , qk , mixing polar and rectangular.
Complex-valued:
• With complex conjugate (PQV, CI, IV)
• Without (McCoy’s trick)
IBM Research - Ireland
Rectangular Power-Voltage Formulation
Variables:
• vn is complex voltage at bus n
• pn is active power at bus n (the real part)
• qn is reactive power at bus n (imag. part)
Key constraints:
X
X
<vn
gnm <vm − bnm =vm + =vn
gnm =vm + bnm <vm − pn + pnd = 0
m∈N
m∈N
X
X
(1)
=vn
m∈N
gnm <vm − bnm =vm + <vn
gnm =vm + bnm <vm − qn +
qnd
=0
m∈N
(2)
IBM Research - Ireland
The SDP Relaxation of Lavaei and Low I
Lavaei and Low have reformulated the R-PQV ACOPF as:
min
V ,P G ,Q G ,
m
X
fl (Pl )
(3)
l=1
P min ≤ P G ≤ P max
(4)
Q min ≤ Q G ≤ Q max
(5)
Vkmin
(6)
≤ |Vk | ≤ Vkmax
∗
max
<{Vl (Vl − Vm )∗ ylm
} ≤ Plm
trace{WY ∗ ek ek∗ } = Pk + Qk ı
∗
W 0, W = VV
rank(W ) = 1
(7)
(8)
(9)
(10)
IBM Research - Ireland
The SDP Relaxation of Lavaei and Low II
Drop the rank constraint to obtain a SDP:
min
X ,P G ,Q G
m
X
cl2 (trace{Yk W } + PDk )2 + cl1 trace{Yk W } + PDk
(11)
l=1
P min ≤ trace{Yk W } ≤ P max
(12)
Q min ≤ trace{Ȳk W } ≤ Q max
(13)
(Vkmin )2
(14)
≤ trace{Mk W } ≤ (Vkmax )2
max 2
trace{Ylm W }2 + trace{Ȳlm W }2 ≤ (Slm
)
max
trace{Ylm W } ≤ Plm
max 2
trace{Mlm W } ≤ (Vlm
)
T
W = XX
There is no duality gap under some spectral conditions.
(15)
(16)
(17)
(18)
IBM Research - Ireland
How to See the SDP?
For a polynomial optimisation problem with multi-variate polynomials f , gi :
min
f (x)
s.t. gi (x) ≥ 0
i = {1, . . . , m}
[PP]
one can construct:
• The Lasserre hierarchy (2001) and its Parrilo (2003) dual
• The sparse hierarchy of Waki et al (2006), Lasserre (2006)
• The DSOS/SDSOS hierarchy of Ahmadi and Majumdar (2014)
• The DIGS hierarchy of Ghaddar et al (2015)
• The SOCP hierarchy of Kuang et al (2015)
• The variant hierarchy of Ma et al (2016)
The first levels of the hierarchies of Lasserre, Waki, Ghaddar, and Ma are
equivalent to the dual of the SDP of previous slide.
IBM Research - Ireland
The Lasserre Hierarchy
A Conic Reformulation
Given S ⊆ Rn , define Pd (S) to be the cone of polynomials of degree at
most d that are non-negative over S
max ϕ
s.t. f (x) − ϕ ≥ 0 ∀ x ∈ SG ,
s.t. f (x) − ϕ ∈ Pd (SG ).
= max ϕ
for G = {gi (x) : i = 1, . . . , m}, SG = {x ∈ Rn : g (x) ≥ 0, ∀g ∈ G}.
The hierarchy of Lasserre approximates Pd (SG ) by sums-of-squares
KGr = Σr +
m
X
gi (x)Σr −deg(gi ) ,
i=1
where Σd := {
PN
i=1 pi (x)
2
: pi (x) ∈ Rb d c [x]}, with N =
2
n+d
d
for r ≥ d.
IBM Research - Ireland
The Lasserre Hierarchy II
The Corresponding Optimisation Problem [PP-Hr ]
max ϕs.t. f (x) − ϕ = σ0 (x) +
ϕ,σi (x)
m
X
σi (x)gi (x)
i=1
σ0 (x) ∈ Σr ,
σi (x) ∈ Σr −deg(gi ) .
Notice that this is a semidefinite programming (SDP) problem, i.e.
minX ∈Sn hC , X iSn subject to hAi , X iSn = bi , i = 1, . . . , m, X 0
IBM Research - Ireland
The Convergence
Summarising the results of Lasserre and others:
Proposition (arXiv:1404.3626)
Under mild assumptions, for the dense hierarchy [PP-Hr ]∗ for [PP] and the
respective duals the following holds:
(a) inf [PP-Hr ] % min ([PP]) as r → ∞,
(b) sup [PP-Hr ]∗ % min ([PP]) as r → ∞,
(c) If the interior of the feasible set of [PP] is nonempty, there is no
duality gap between [PP-Hr ] and [PP-Hr ]∗ .
(d) If [PP] has a unique global minimizer, x ∗ , then as r tends to infinity
the components of the optimal solution of [PP-Hr ] corresponding to
the linear terms converge to x ∗ .
See arXiv:1404.3626 for a discussion of assumptions, which are satisfied in the following applications.
IBM Research - Ireland
Solving the SDPs
Take 1:
• Consider the tree-width decompositions, r -space sparsity,
• and use standard SDP solvers
Take 2:
• Do not consider the tree-width decompositions, r -space sparsity,
• but use a first-order method on
• a reformulation with coordinate-wise constraints.
Take 3:
• Combine first-order methods on a convexification with Newton
method on the non-convex problem.
Take 4:
• Do consider the tree-width decompositions and r -space sparsity,
• but use a first order method, allowing for
• warm-starting as you go higher in a new hierarchy.
IBM Research - Ireland
Correlative Sparsity [Waki et al. 2006]
The n × n Correlative
 Sparsity Pattern Matrix

? for i = j



? for x , x in the same monomial of f
i j
Rij =
? for xi , xj in the same constraint gk



0 otherwise,
The chordal extension of its adjacency graph G have maximal cliques
{Ik }pk=1 , Ik ⊂ {1, . . . , n}. The sparse approximation of Pd (S) is


p
X
X
Σr (Ik ) +
KGr (I ) =
gj Σr −deg(gj ) (Ik ) ,
k=1
j∈Jk
where Σd (Ik ) is the set of all sum-of-squares polynomials of degree up to d supported on Ik and (J1 , . . . , Jp ) is a partitioning
of the set of polynomials {gj }j defining S such that for every j in Jk , the corresponding gj is supported on Ik . The support
I ⊂ {1, . . . , n} of a polynomial contains i of terms xi in one of the monomials.
IBM Research - Ireland
Correlative Sparsity II
Another Hierarchy of Relaxations ([PP-SHr ]∗ ) [Waki et al. 2006]
max
ϕ
ϕ,σk (x),σr ,k (x)
s.t. f (x) − ϕ =
p
X


σk (x) +
k=1
X
gj (x)σj,k (x)
j∈Jk
σk ∈ Σr ((Ik )), σj,k ∈ Σr −deg(gj ) (Ik ).
Reduction in size:
Size of k matrix inequalities
| Ck | +ω
| Ck |
vastly smaller if | Ck | n.
IBM Research - Ireland
The Convergence
Summarising the results of Waki, Kojima and others:
Proposition (arXiv:1404.3626)
Under mild assumptions, for the sparse hierarchy [PP-SHr ]∗ for [PP] and
the respective duals the following holds:
(a) inf [PP-SHr ] % min ([PP]) as r → ∞,
(b) sup [PP-SHr ]∗ % min ([PP]) as r → ∞,
(c) If the interior of the feasible set of [PP] is nonempty, there is no
duality gap between [PP-SHr ] and [PP-SHr ]∗ .
(d) If [PP] has a unique global minimizer, x ∗ , then as r tends to infinity
the components of the optimal solution of [PP-SHr ] corresponding to
the linear terms converge to x ∗ .
IBM Research - Ireland
Sparsity in SDPs [Kim et al. 2011]
min b T y s.t. F (y ) < 0, y ∈ Rn ,
where
1 − y14
0
0
...
0
y1 y2
4

0
1
−
y
0
.
.
.
0
y
2 y3
2


.
..

0
y3 y4
0
0
F (y ) = 

.
..
 ...
...
...
...
...

4

0
0
0
1 − yn−1
yn−1 yn
y1 y2
y2 y3 y3 y4 . . . yn−1 yn 1 − yn4

No correlative sparsity, but r-space sparsity!





.




IBM Research - Ireland
Sparsity in SDPs II
Applying d-space and r-space conversion yields equivalent problem:
min
s.t.
T
b
y
1 − y14 y1 y2
0,
y1 y2 4 z1
yi yi+1
1 − yi
0 (i = 2, 3, . . . , n − 2),
yi yi+14 −zi−1 + zi
1 − yn−1
yn−1 yn
0,
yn−1 yn 1 − yn4 − zn−2
Equivalent problem satisfies correlative sparsity pattern!



















IBM Research - Ireland
What can Off-The-Shelf Solvers Do
Following much Matlab pre-processing, SeDuMi performs well:
Instance
case9mod
case14
case14mod
case30mod
case39
case57
case118
case300
case2383wp
case2736sp
case2746wp
L1
L2
Bound
Dim Time
Bound
Dim
Time
2753.23
588×168
0.6 3087.89
1792×14847
17.5
8081.52
888×94
0.4 8081.54
7508×66740 1420.5
7792.72
888×94
0.9 7991.07
7508×66740
904.2
576.89
4706×684
3.8
578.56 36258×49164 13740.0
41862.08
7282×758
2.2 41864.18 26076×215772 4359.1
41737.79
13366×356
3.2
*
*
129654.62
56620×816
6.1
*
*
719711.63
362076×1938
13.6
*
*
1.814×106 22778705×47975 3731.5
*
*
1.307×106 30019740×57408 3502.2
*
*
1.584×106 30239940×57978 3903.2
*
*
See arXiv:1404.3626 for the details.
IBM Research - Ireland
What can Off-The-Shelf Solvers cannot Do
• Run on a standard desktop
• “Good feasible solutions quick”
• Overall run-times comparable to MATPOWER, a popular heuristic
• Parallel and/or distributed computation for the SDP relaxations
• Exact tree-width decomposition done fast – after all, it’s NP-Hard
• Exploitation of the structure of the higher-level relaxations
• Warm start: a solution to [PP-H1 ] is a very good solution to [PP-H2 ],
for an appropriate mapping of variables
IBM Research - Ireland
Solving the SDPs
Take 1:
• Consider the tree-width decompositions, r -space sparsity,
• and use standard SDP solvers
Take 2:
• Do not consider the tree-width decompositions, r -space sparsity,
• but use a first-order method on
• a reformulation with coordinate-wise constraints.
Take 3:
• Combine first-order methods on a convexification with Newton
method on the non-convex problem.
Take 4:
• Do consider the tree-width decompositions and r -space sparsity,
• but use a first order method, allowing for
• warm-starting as you go higher in a new hierarchy.
IBM Research - Ireland
A Lifted Reformulation
min F (W ) :=
X
fk (W )
[Rr BC]
k∈G
s.t. tk = tr(Yk W )
Pkmin
−
Pkd
≤tk ≤
Pkmax
−
(19)
Pkd
(20)
gk = tr(Ȳk W )
(21)
Qkmin − gk ≤gk ≤ Qkmax − gk
(22)
hk = tr(Mk W )
(Vkmin )2
≤hk ≤
(23)
(Vkmax )2
(24)
ulm = tr(Ylm W )
(25)
vlm = tr(Ȳlm W )
2
(26)
2
zlm = (ulm ) + (vlm )
(27)
max 2
(Slm
)
(28)
zlm ≤
W 0, rank(W ) ≤ r .
(29)
IBM Research - Ireland
Some Observations
O1 : constraints (19–26) and (28) are box constraints,
O2 : using elementary liner algebra:
{W ∈ S n s.t. W 0,
= {RR
T
rank(W ) ≤ r }
s.t. R ∈ Rn×r },
O3 : if rank(W ∗ ) > 1 for the optimum W ∗ of the Lavaei-Low relaxation,
there are no known methods for extracting the global optimum of
[R1BC] from W , except for going higher in the hierarchies.
O4 : zero duality gap at any SDP relaxation in the hierarchy of Lasserre
does not guarantee the solution of the SDP relaxation is exact for
[R1BC].
IBM Research - Ireland
Solving the SDP Relaxation
• Use the low-rank method of Burer and Monteiro, i.e. for increasing
rank r , the augmented Lagrangian in variable RR T ∈ Rn×n rather
than W ∈ Rn×n :
t
g
h
u
v
z
L(R, t, h, g , u, v , z, λ , λ , λ , λ , λ , λ ) :=
X
T
fk (RR )
k∈G
−
X t
X
T
T 2
λk (tk − trace(Yk RR )) + µ
(tk − trace(Yk RR ))
2
−
X g
X
T
T 2
λk (gk − trace(Ȳk RR )) + µ
(gk − trace(Ȳk RR ))
2
−
X h
X
T
T 2
λk (hk − trace(Mk RR )) + µ
(hk − trace(Mk RR ))
2
−
X
k
k
k
(l,m)
−
X
(l,m)
−
X
(l,m)
k
k
k
X
u
T
T 2
λ(l,m) (u(l,m) − trace(Y(l,m) RR )) + µ
(u(l,m) − trace(Y(l,m) RR ))
2
(l,m)
X
v
T
T 2
λ(l,m) (v(l,m) − trace(Ȳ(l,m) RR )) + µ
(v(l,m) − trace(Ȳ(l,m) RR ))
2
(l,m)
X
z
2
2
2
2
2
λ(l,m) (z(l,m) − u(l,m) − v(l,m) ) + µ
(z(l,m) − u(l,m) − v(l,m) ) + νR.
2
(l,m)
• Combined with a parallel coordinate descent with a closed-form step
IBM Research - Ireland
Solving the SDP Relaxation II
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
for r = 1, 2, . . . do
choose R ∈ Rm×r
compute corresponding values of t, h, g , u, v , z
project t, h, g , u, v , z onto the box constraints
for k = 0, 1, 2, . . . do
in parallel, minimize L in t, g , h, u, v , z, coordinate-wise,
in parallel, minimize L in R, coordinate-wise
update Lagrange multipliers λt , λg , λh , λu , λv , λz
update µ
terminate, if criteria are met
end for
end for
IBM Research - Ireland
What can this Do
Let’s compare the run-times with:
• MATPOWER, a popular heuristic
• sdp pf, the pre-processor of Molzahn et al. with SeDuMi
• OPF Solver, the pre-processor of Lavaei et al. with SDPT3
Name
case6ww
case14
case30
case39
case57
case118
case300
MATPOWER
Obj. T [s]
3.144+03 0.114
8.082e+03 0.201
5.769e+02 0.788
4.189e+04 0.399
4.174e+04 0.674
1.297e+05 1.665
7.197e+05 2.410
sdp pf
OPF Solver
Our Approach
Obj. T [s]
Obj. T [s]
Obj.
T [s]
3.144e+03 0.74 3.144e+03 2.939 3.144e+03 0.260
8.082e+03 0.84
–
– 8.082e+03 0.031
5.769e+02 2.70 5.765e+02 6.928 5.769e+02 0.074
4.189e+04 3.26 4.202e+04 7.004 4.186e+04 0.885
4.174e+04 2.69
–
– 4.174e+04 0.857
1.297e+05 6.57
–
– 1.297e+05 1.967
– 17.68
–
– 7.197e+05 90.103
IBM Research - Ireland
Solving the SDPs
Take 1:
• Consider the tree-width decompositions, r -space sparsity,
• and use standard SDP solvers
Take 2:
• Do not consider the tree-width decompositions, r -space sparsity,
• but use a first-order method on
• a reformulation with coordinate-wise constraints.
Take 3:
• Combine first-order methods on a convexification with Newton
method on the non-convex problem.
Take 4:
• Do consider the tree-width decompositions and r -space sparsity,
• but use a first order method, allowing for
• warm-starting as you go higher in a new hierarchy.
IBM Research - Ireland
More Observations
• The quadratic convergence of Newton method is hard to beat.
• Nevertheless, without applying a convexification, Newton method can
converge to particularly poor local optima of a non-convex problem.
• Point-estimation theory of Smale offers a test, based solely on
derivative-at-the-current-iterate, for inclusion in a domain of
monotonicity of a zero of a polynomial.
• Thereby, we can combine the first-order methods for the (hierarchy
of) convex Lagrangians with Newton method for (a hierarchy of)
non-convex Lagrangians, while preserving convergence guarantees
associated with the convexification.
IBM Research - Ireland
An Illustration
case2383wp
10
10
Newton only
200 CD it
500 CD it
1000 CD it
CD only
infeasibility
0
10
−10
10
−20
10
0
2
4
6
8
iteration
10
12
14
4
x 10
IBM Research - Ireland
Conclusions
Conclusions:
• Key problems in power systems can benefit from methods developed
in “data science”
Challenges:
• SDP relaxations solved faster than Matpower
• Rates of convergence in SDP hierarchies
• Dynamics, alternative theories of power, ...
We would love to hear from you!
• Jakub Marecek: [email protected]