IBM Research - Ireland Some of polynomial optimisation in power systems at IBM Research – Ireland Jakub Mareček IBM Research – Ireland Based on joint work with Bissan Ghaddar, Martin Mevissen, and Martin Takáč Alan Turing Institute, January 29th 2016, Edinburgh IBM Research - Ireland Overview The busy agenda: • The non-convexity of steady-state problems in power systems • Polynomial optimisation and the use of SDP therein • Combining first- and second-order methods therein Based on joint work with a number of co-authors: • Ghaddar, JM, Mevissen, arXiv:1404.3626 • JM, Takáč: arXiv:1506.08568 • Liu, JM, Takáč: arXiv:1510.02171 although there is much beyond these, e.g., arXiv:1412.8054, arXiv:1510.06797, arXiv:1601.06739. IBM Research - Ireland Motivation • The cost of energy production is approximately 10% of world GDP (IEA, 2014 report using 2012 data) • Savings from a 5% increase in efficiency amount to $26B/year in the US alone • Power systems exhibit non-linear, uncertain dynamics • But even for deterministic steady-state problems, which are solved every couple of minutes, there are no good solvers. • But then: in data science (e.g. compressed sensing), large-scale non-convex problems are tackled regularly. Can we learn something? IBM Research - Ireland A FERC Study: Where do Leading Solvers Fail? Termination Exceeded time limit with an infeasible solution Exceeded time limit with a feasible solution Early termination with an infeasible solution Early termination with a feasible solution Normal termination with an infeasible solution Percentage of feasible Percentage of best seen Conopt 7.9% lpopt 22.2% Knitro 38.4% Minos 0.0% Snopt 23.0% 6.5% 0.9% 6.7% 0.0% 14.4% 0.0% 0.4% 0.6% 11.4% 0.0% 31.2% 0.6% 1.8% 2.3% 0.2% 10.5% 3.4% 0.0% 55.9% 1.6% 81.6% 43.8% 74.1% 72.5% 61.0% 52.5% 32.7% 30.3% 75.4% 60.8% Source: Anya Castillo and Richard O’Neill, Staff Technical Conference, FERC, Washington DC, June 24–26, 2013. 118–3120 bus instances, 100 runs per instance and solver, 20 min runs, Xeon E7458 2.4GHz, 64 GB RAM IBM Research - Ireland Is this a Failure of Modelling? Real-valued: • Polar Power-Voltage pk , qk , vk , θk • Rectangular Power-Voltage (R-PQV) pk , qk , <vk , =vk • Polar Current Injection pk , qk , vk , θk , ik • Rectangular Current Injection (R-CI) • Polar Voltage-Current • Rectangular Voltage-Current (R-IV) pk , qk , <vk , =vk , ik pk , qk , vk , θk , ilm pk , qk , <vk , =vk , ilm Further variants: dropping pk , qk , mixing polar and rectangular. Complex-valued: • With complex conjugate (PQV, CI, IV) • Without (McCoy’s trick) IBM Research - Ireland Rectangular Power-Voltage Formulation Variables: • vn is complex voltage at bus n • pn is active power at bus n (the real part) • qn is reactive power at bus n (imag. part) Key constraints: X X <vn gnm <vm − bnm =vm + =vn gnm =vm + bnm <vm − pn + pnd = 0 m∈N m∈N X X (1) =vn m∈N gnm <vm − bnm =vm + <vn gnm =vm + bnm <vm − qn + qnd =0 m∈N (2) IBM Research - Ireland The SDP Relaxation of Lavaei and Low I Lavaei and Low have reformulated the R-PQV ACOPF as: min V ,P G ,Q G , m X fl (Pl ) (3) l=1 P min ≤ P G ≤ P max (4) Q min ≤ Q G ≤ Q max (5) Vkmin (6) ≤ |Vk | ≤ Vkmax ∗ max <{Vl (Vl − Vm )∗ ylm } ≤ Plm trace{WY ∗ ek ek∗ } = Pk + Qk ı ∗ W 0, W = VV rank(W ) = 1 (7) (8) (9) (10) IBM Research - Ireland The SDP Relaxation of Lavaei and Low II Drop the rank constraint to obtain a SDP: min X ,P G ,Q G m X cl2 (trace{Yk W } + PDk )2 + cl1 trace{Yk W } + PDk (11) l=1 P min ≤ trace{Yk W } ≤ P max (12) Q min ≤ trace{Ȳk W } ≤ Q max (13) (Vkmin )2 (14) ≤ trace{Mk W } ≤ (Vkmax )2 max 2 trace{Ylm W }2 + trace{Ȳlm W }2 ≤ (Slm ) max trace{Ylm W } ≤ Plm max 2 trace{Mlm W } ≤ (Vlm ) T W = XX There is no duality gap under some spectral conditions. (15) (16) (17) (18) IBM Research - Ireland How to See the SDP? For a polynomial optimisation problem with multi-variate polynomials f , gi : min f (x) s.t. gi (x) ≥ 0 i = {1, . . . , m} [PP] one can construct: • The Lasserre hierarchy (2001) and its Parrilo (2003) dual • The sparse hierarchy of Waki et al (2006), Lasserre (2006) • The DSOS/SDSOS hierarchy of Ahmadi and Majumdar (2014) • The DIGS hierarchy of Ghaddar et al (2015) • The SOCP hierarchy of Kuang et al (2015) • The variant hierarchy of Ma et al (2016) The first levels of the hierarchies of Lasserre, Waki, Ghaddar, and Ma are equivalent to the dual of the SDP of previous slide. IBM Research - Ireland The Lasserre Hierarchy A Conic Reformulation Given S ⊆ Rn , define Pd (S) to be the cone of polynomials of degree at most d that are non-negative over S max ϕ s.t. f (x) − ϕ ≥ 0 ∀ x ∈ SG , s.t. f (x) − ϕ ∈ Pd (SG ). = max ϕ for G = {gi (x) : i = 1, . . . , m}, SG = {x ∈ Rn : g (x) ≥ 0, ∀g ∈ G}. The hierarchy of Lasserre approximates Pd (SG ) by sums-of-squares KGr = Σr + m X gi (x)Σr −deg(gi ) , i=1 where Σd := { PN i=1 pi (x) 2 : pi (x) ∈ Rb d c [x]}, with N = 2 n+d d for r ≥ d. IBM Research - Ireland The Lasserre Hierarchy II The Corresponding Optimisation Problem [PP-Hr ] max ϕs.t. f (x) − ϕ = σ0 (x) + ϕ,σi (x) m X σi (x)gi (x) i=1 σ0 (x) ∈ Σr , σi (x) ∈ Σr −deg(gi ) . Notice that this is a semidefinite programming (SDP) problem, i.e. minX ∈Sn hC , X iSn subject to hAi , X iSn = bi , i = 1, . . . , m, X 0 IBM Research - Ireland The Convergence Summarising the results of Lasserre and others: Proposition (arXiv:1404.3626) Under mild assumptions, for the dense hierarchy [PP-Hr ]∗ for [PP] and the respective duals the following holds: (a) inf [PP-Hr ] % min ([PP]) as r → ∞, (b) sup [PP-Hr ]∗ % min ([PP]) as r → ∞, (c) If the interior of the feasible set of [PP] is nonempty, there is no duality gap between [PP-Hr ] and [PP-Hr ]∗ . (d) If [PP] has a unique global minimizer, x ∗ , then as r tends to infinity the components of the optimal solution of [PP-Hr ] corresponding to the linear terms converge to x ∗ . See arXiv:1404.3626 for a discussion of assumptions, which are satisfied in the following applications. IBM Research - Ireland Solving the SDPs Take 1: • Consider the tree-width decompositions, r -space sparsity, • and use standard SDP solvers Take 2: • Do not consider the tree-width decompositions, r -space sparsity, • but use a first-order method on • a reformulation with coordinate-wise constraints. Take 3: • Combine first-order methods on a convexification with Newton method on the non-convex problem. Take 4: • Do consider the tree-width decompositions and r -space sparsity, • but use a first order method, allowing for • warm-starting as you go higher in a new hierarchy. IBM Research - Ireland Correlative Sparsity [Waki et al. 2006] The n × n Correlative Sparsity Pattern Matrix ? for i = j ? for x , x in the same monomial of f i j Rij = ? for xi , xj in the same constraint gk 0 otherwise, The chordal extension of its adjacency graph G have maximal cliques {Ik }pk=1 , Ik ⊂ {1, . . . , n}. The sparse approximation of Pd (S) is p X X Σr (Ik ) + KGr (I ) = gj Σr −deg(gj ) (Ik ) , k=1 j∈Jk where Σd (Ik ) is the set of all sum-of-squares polynomials of degree up to d supported on Ik and (J1 , . . . , Jp ) is a partitioning of the set of polynomials {gj }j defining S such that for every j in Jk , the corresponding gj is supported on Ik . The support I ⊂ {1, . . . , n} of a polynomial contains i of terms xi in one of the monomials. IBM Research - Ireland Correlative Sparsity II Another Hierarchy of Relaxations ([PP-SHr ]∗ ) [Waki et al. 2006] max ϕ ϕ,σk (x),σr ,k (x) s.t. f (x) − ϕ = p X σk (x) + k=1 X gj (x)σj,k (x) j∈Jk σk ∈ Σr ((Ik )), σj,k ∈ Σr −deg(gj ) (Ik ). Reduction in size: Size of k matrix inequalities | Ck | +ω | Ck | vastly smaller if | Ck | n. IBM Research - Ireland The Convergence Summarising the results of Waki, Kojima and others: Proposition (arXiv:1404.3626) Under mild assumptions, for the sparse hierarchy [PP-SHr ]∗ for [PP] and the respective duals the following holds: (a) inf [PP-SHr ] % min ([PP]) as r → ∞, (b) sup [PP-SHr ]∗ % min ([PP]) as r → ∞, (c) If the interior of the feasible set of [PP] is nonempty, there is no duality gap between [PP-SHr ] and [PP-SHr ]∗ . (d) If [PP] has a unique global minimizer, x ∗ , then as r tends to infinity the components of the optimal solution of [PP-SHr ] corresponding to the linear terms converge to x ∗ . IBM Research - Ireland Sparsity in SDPs [Kim et al. 2011] min b T y s.t. F (y ) < 0, y ∈ Rn , where 1 − y14 0 0 ... 0 y1 y2 4 0 1 − y 0 . . . 0 y 2 y3 2 . .. 0 y3 y4 0 0 F (y ) = . .. ... ... ... ... ... 4 0 0 0 1 − yn−1 yn−1 yn y1 y2 y2 y3 y3 y4 . . . yn−1 yn 1 − yn4 No correlative sparsity, but r-space sparsity! . IBM Research - Ireland Sparsity in SDPs II Applying d-space and r-space conversion yields equivalent problem: min s.t. T b y 1 − y14 y1 y2 0, y1 y2 4 z1 yi yi+1 1 − yi 0 (i = 2, 3, . . . , n − 2), yi yi+14 −zi−1 + zi 1 − yn−1 yn−1 yn 0, yn−1 yn 1 − yn4 − zn−2 Equivalent problem satisfies correlative sparsity pattern! IBM Research - Ireland What can Off-The-Shelf Solvers Do Following much Matlab pre-processing, SeDuMi performs well: Instance case9mod case14 case14mod case30mod case39 case57 case118 case300 case2383wp case2736sp case2746wp L1 L2 Bound Dim Time Bound Dim Time 2753.23 588×168 0.6 3087.89 1792×14847 17.5 8081.52 888×94 0.4 8081.54 7508×66740 1420.5 7792.72 888×94 0.9 7991.07 7508×66740 904.2 576.89 4706×684 3.8 578.56 36258×49164 13740.0 41862.08 7282×758 2.2 41864.18 26076×215772 4359.1 41737.79 13366×356 3.2 * * 129654.62 56620×816 6.1 * * 719711.63 362076×1938 13.6 * * 1.814×106 22778705×47975 3731.5 * * 1.307×106 30019740×57408 3502.2 * * 1.584×106 30239940×57978 3903.2 * * See arXiv:1404.3626 for the details. IBM Research - Ireland What can Off-The-Shelf Solvers cannot Do • Run on a standard desktop • “Good feasible solutions quick” • Overall run-times comparable to MATPOWER, a popular heuristic • Parallel and/or distributed computation for the SDP relaxations • Exact tree-width decomposition done fast – after all, it’s NP-Hard • Exploitation of the structure of the higher-level relaxations • Warm start: a solution to [PP-H1 ] is a very good solution to [PP-H2 ], for an appropriate mapping of variables IBM Research - Ireland Solving the SDPs Take 1: • Consider the tree-width decompositions, r -space sparsity, • and use standard SDP solvers Take 2: • Do not consider the tree-width decompositions, r -space sparsity, • but use a first-order method on • a reformulation with coordinate-wise constraints. Take 3: • Combine first-order methods on a convexification with Newton method on the non-convex problem. Take 4: • Do consider the tree-width decompositions and r -space sparsity, • but use a first order method, allowing for • warm-starting as you go higher in a new hierarchy. IBM Research - Ireland A Lifted Reformulation min F (W ) := X fk (W ) [Rr BC] k∈G s.t. tk = tr(Yk W ) Pkmin − Pkd ≤tk ≤ Pkmax − (19) Pkd (20) gk = tr(Ȳk W ) (21) Qkmin − gk ≤gk ≤ Qkmax − gk (22) hk = tr(Mk W ) (Vkmin )2 ≤hk ≤ (23) (Vkmax )2 (24) ulm = tr(Ylm W ) (25) vlm = tr(Ȳlm W ) 2 (26) 2 zlm = (ulm ) + (vlm ) (27) max 2 (Slm ) (28) zlm ≤ W 0, rank(W ) ≤ r . (29) IBM Research - Ireland Some Observations O1 : constraints (19–26) and (28) are box constraints, O2 : using elementary liner algebra: {W ∈ S n s.t. W 0, = {RR T rank(W ) ≤ r } s.t. R ∈ Rn×r }, O3 : if rank(W ∗ ) > 1 for the optimum W ∗ of the Lavaei-Low relaxation, there are no known methods for extracting the global optimum of [R1BC] from W , except for going higher in the hierarchies. O4 : zero duality gap at any SDP relaxation in the hierarchy of Lasserre does not guarantee the solution of the SDP relaxation is exact for [R1BC]. IBM Research - Ireland Solving the SDP Relaxation • Use the low-rank method of Burer and Monteiro, i.e. for increasing rank r , the augmented Lagrangian in variable RR T ∈ Rn×n rather than W ∈ Rn×n : t g h u v z L(R, t, h, g , u, v , z, λ , λ , λ , λ , λ , λ ) := X T fk (RR ) k∈G − X t X T T 2 λk (tk − trace(Yk RR )) + µ (tk − trace(Yk RR )) 2 − X g X T T 2 λk (gk − trace(Ȳk RR )) + µ (gk − trace(Ȳk RR )) 2 − X h X T T 2 λk (hk − trace(Mk RR )) + µ (hk − trace(Mk RR )) 2 − X k k k (l,m) − X (l,m) − X (l,m) k k k X u T T 2 λ(l,m) (u(l,m) − trace(Y(l,m) RR )) + µ (u(l,m) − trace(Y(l,m) RR )) 2 (l,m) X v T T 2 λ(l,m) (v(l,m) − trace(Ȳ(l,m) RR )) + µ (v(l,m) − trace(Ȳ(l,m) RR )) 2 (l,m) X z 2 2 2 2 2 λ(l,m) (z(l,m) − u(l,m) − v(l,m) ) + µ (z(l,m) − u(l,m) − v(l,m) ) + νR. 2 (l,m) • Combined with a parallel coordinate descent with a closed-form step IBM Research - Ireland Solving the SDP Relaxation II 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: for r = 1, 2, . . . do choose R ∈ Rm×r compute corresponding values of t, h, g , u, v , z project t, h, g , u, v , z onto the box constraints for k = 0, 1, 2, . . . do in parallel, minimize L in t, g , h, u, v , z, coordinate-wise, in parallel, minimize L in R, coordinate-wise update Lagrange multipliers λt , λg , λh , λu , λv , λz update µ terminate, if criteria are met end for end for IBM Research - Ireland What can this Do Let’s compare the run-times with: • MATPOWER, a popular heuristic • sdp pf, the pre-processor of Molzahn et al. with SeDuMi • OPF Solver, the pre-processor of Lavaei et al. with SDPT3 Name case6ww case14 case30 case39 case57 case118 case300 MATPOWER Obj. T [s] 3.144+03 0.114 8.082e+03 0.201 5.769e+02 0.788 4.189e+04 0.399 4.174e+04 0.674 1.297e+05 1.665 7.197e+05 2.410 sdp pf OPF Solver Our Approach Obj. T [s] Obj. T [s] Obj. T [s] 3.144e+03 0.74 3.144e+03 2.939 3.144e+03 0.260 8.082e+03 0.84 – – 8.082e+03 0.031 5.769e+02 2.70 5.765e+02 6.928 5.769e+02 0.074 4.189e+04 3.26 4.202e+04 7.004 4.186e+04 0.885 4.174e+04 2.69 – – 4.174e+04 0.857 1.297e+05 6.57 – – 1.297e+05 1.967 – 17.68 – – 7.197e+05 90.103 IBM Research - Ireland Solving the SDPs Take 1: • Consider the tree-width decompositions, r -space sparsity, • and use standard SDP solvers Take 2: • Do not consider the tree-width decompositions, r -space sparsity, • but use a first-order method on • a reformulation with coordinate-wise constraints. Take 3: • Combine first-order methods on a convexification with Newton method on the non-convex problem. Take 4: • Do consider the tree-width decompositions and r -space sparsity, • but use a first order method, allowing for • warm-starting as you go higher in a new hierarchy. IBM Research - Ireland More Observations • The quadratic convergence of Newton method is hard to beat. • Nevertheless, without applying a convexification, Newton method can converge to particularly poor local optima of a non-convex problem. • Point-estimation theory of Smale offers a test, based solely on derivative-at-the-current-iterate, for inclusion in a domain of monotonicity of a zero of a polynomial. • Thereby, we can combine the first-order methods for the (hierarchy of) convex Lagrangians with Newton method for (a hierarchy of) non-convex Lagrangians, while preserving convergence guarantees associated with the convexification. IBM Research - Ireland An Illustration case2383wp 10 10 Newton only 200 CD it 500 CD it 1000 CD it CD only infeasibility 0 10 −10 10 −20 10 0 2 4 6 8 iteration 10 12 14 4 x 10 IBM Research - Ireland Conclusions Conclusions: • Key problems in power systems can benefit from methods developed in “data science” Challenges: • SDP relaxations solved faster than Matpower • Rates of convergence in SDP hierarchies • Dynamics, alternative theories of power, ... We would love to hear from you! • Jakub Marecek: [email protected]