Mechanizing Optimization and Statistics Ashish Agarwal Yale University IBM Programming Languages Day Watson Research Center July 29, 2010 Ashish Agarwal () 1 / 34 Acknowledgments Optimization: Ignacio Grossmann (Carnegie Mellon) Nick Sawaya and Vikas Goel (Exxon Mobil) Indexing: Bob Harper (Carnegie Mellon) Statistics: Sooraj Bhat and Alex Gray (GeorgiaTech) Rich Vuduc (GeorgiaTech, linear algebra and autotuning) Ashish Agarwal () 2 / 34 Linear programs (LP) Dantzig (1982) x2 4 3 x1 ≥ 1.0 x2 ≥ 1.0 x1 + x2 ≤ 5.0 2 1 0 x1 0 1 2 3 Ashish Agarwal () 4 3 / 34 Linear programs (LP) Dantzig (1982) x2 4 max x1 − x2 x1 ≥ 1.0 x2 ≥ 1.0 x1 + x2 ≤ 5.0 3 2 1 0 x1 0 1 2 3 Ashish Agarwal () 4 3 / 34 Linear programs (LP) Dantzig (1982) x2 4 max x1 − x2 x1 ≥ 1.0 x2 ≥ 1.0 x1 + x2 ≤ 5.0 3 2 1 0 x1 0 1 2 3 4 Cannot represent multiple polyhedra. Ashish Agarwal () 3 / 34 Declaring discrete choice – with disjunction Disjunctive programs (DP), Balas (1974), Jeroslow and Lowe (1984), Raman and Grossmann (1994) x2 7 6 5 4 3 2 1 0 R2 R1 x1 0 1 2 3 4 5 6 7 8 Ashish Agarwal () x1 ≥ 1 x2 ≥ 1 x1 + x2 ≤ 5 {z } | R1 5 ≤ x1 ≤ 8 4 ≤ x2 ≤ 7 | {z } R2 4 / 34 Declaring discrete choice – with disjunction Disjunctive programs (DP), Balas (1974), Jeroslow and Lowe (1984), Raman and Grossmann (1994) x2 7 6 5 4 3 2 1 0 R2 R1 x1 0 1 2 3 4 5 6 7 8 x1 ≥ 1 5 ≤ x1 ≤ 8 ∨ x2 ≥ 1 4 ≤ x2 ≤ 7 x1 + x2 ≤ 5 | {z } {z } | R2 R1 Language of DP extends LP with disjunction Ashish Agarwal () 4 / 34 Declaring discrete choice – with disjunction Disjunctive programs (DP), Balas (1974), Jeroslow and Lowe (1984), Raman and Grossmann (1994) x2 7 6 5 4 3 2 1 0 R2 R1 x1 0 1 2 3 4 5 6 7 8 x1 ≥ 1 5 ≤ x1 ≤ 8 ∨ x2 ≥ 1 4 ≤ x2 ≤ 7 x1 + x2 ≤ 5 | {z } {z } | R2 R1 Language of DP extends LP with disjunction Few algorithms for solving DPs directly. Ashish Agarwal () 4 / 34 Declaring discrete choice – with integers Mixed-integer linear programs (MILP) Basic idea: multiply terms by y ∈ {0, 1} 0≤y ≤1 x ≤ 3.0y + 2.0(1 − y ) if y = 1, then x ≤ 3.0 if y = 0, then x ≤ 2.0 Ashish Agarwal () 5 / 34 Declaring discrete choice – with integers Mixed-integer linear programs (MILP) Basic idea: multiply terms by y ∈ {0, 1} 0≤y ≤1 x ≤ 3.0y + 2.0(1 − y ) if y = 1, then x ≤ 3.0 if y = 0, then x ≤ 2.0 Language of MILP extends LP with the integer type Ashish Agarwal () 5 / 34 Declaring discrete choice – with integers Mixed-integer linear programs (MILP) Basic idea: multiply terms by y ∈ {0, 1} 0≤y ≤1 x ≤ 3.0y + 2.0(1 − y ) if y = 1, then x ≤ 3.0 if y = 0, then x ≤ 2.0 Language of MILP extends LP with the integer type Goal: Ashish Agarwal () Express as DP Convert to MILP −→ (intuitive) (accepted by solvers) 5 / 34 Multiple Transformation Techniques Available Choice affects computational efficiency x2 best big-M reformulation 7 6 5 4 3 2 1 0 x2 x1 0 1 2 3 4 5 6 7 8 Ashish Agarwal () convex-hull reformulation 7 6 5 4 3 2 1 0 x1 0 1 2 3 4 5 6 7 8 6 / 34 System overview disjunctive constraint [A1 x ≤ b 1 ] ∨ [A2 x ≤ b 2 ] x2 7 6 5 4 3 2 1 0 Boolean propositions R2 R1 x1 0 1 2 3 4 5 6 7 8 tor st con bi g -M ica convex-hull i nd r x2 x2 7 6 5 4 3 2 1 0 x1 0 1 2 3 4 5 6 7 8 7 6 5 4 3 2 1 0 y = 0 ⇒ A1 x ≤ b1 y = 1 ⇒ A2 x ≤ b2 Pure integer inequalities x1 0 1 2 3 4 5 6 7 8 Ashish Agarwal () 7 / 34 Previous definition Convex-hull reformulation of [A1 x ≤ b 1 ] ∨ [A2 x ≤ b 2 ] is A1 x̄ 1 ≤ b 1 y1 A2 x̄ 2 ≤ b 2 y2 Ashish Agarwal () y1 + y2 = 1 x = x̄ 1 + x̄ 2 8 / 34 Previous definition Convex-hull reformulation of [A1 x ≤ b 1 ] ∨ [A2 x ≤ b 2 ] is A1 x̄ 1 ≤ b 1 y1 A2 x̄ 2 ≤ b 2 y2 y1 + y2 = 1 x = x̄ 1 + x̄ 2 Insufficient for automation Real programs not declared in canonical matrix form How are variables introduced? Disjuncts should be bounded, how is this checked? How are variable bounds tracked? Disaggregated variables should have same bounds as those they replace (except range must include zero) Ashish Agarwal () 8 / 34 A syntactic foundation for mathematical programs Agarwal, Bhat, Gray, Grossmann (2010) ρ ::= [rL , rU ] | [rL , ∞) | (−∞, rU ] | real | hrL , rU i | hrL , ∞) | (−∞, rU i | int | {true} | {false} | bool e ::= x | r | true | false | not e | e1 or e2 | e1 and e2 | −e | e1 + e2 | e1 − e2 | e1 ∗ e2 c ::= T | F | isTrue e | e1 = e2 | e1 ≤ e2 | c1 ∨ c2 | c1 ∧ c2 | ∃x : ρ c p ::= maxx1 :ρ1 ,...,xm :ρm {e | c} Υ ::= • | Υ, x : ρ Ashish Agarwal () 9 / 34 cvx Υ ⊢ c 7−→ c ′ Convex-hull transformation Agarwal, Bhat, Gray, Grossmann (2010) n n Υ⊢ prop Υ ⊢ cj 7−→ cj′ o ctxt Υ 7−→ Υ′ o n j ′′ j ′′′ ~ y ⊛ x /~x cj ֒→ cj j∈{A,B} o ′′ ⊢ ⊸x j ,...,x j cj m 1 j∈{A,B} j∈{A,B} A B A B ∃~x : ρ ~ ∃~x : ρ ~ ∃y : h0, 1i ∃y : h0,1i cvx cA ∨ cB 7−→ ~x = ~x A + ~x B ∧ y A + y B = 1 ∧ (cA′′′ ∧ cB′′′ ) Υ′ cj′ Compile disjuncts and context Add bounding constraints Substitute disaggregated variables in each disjunct Multiply constant terms by respective y Ashish Agarwal () 10 / 34 Example: single disjunctive constraint Input Output var x:<10.0, 100.0> var w:<2.0, 50.0> var x:<10.0, 100.0> var w:<2.0, 50.0> min x + w subject_to (x <= w) disj (x >= w + 4.0) min x + w subject_to exists y1:[0, 1] exists y2:[0, 1] exists x1:<0.0, 100.0> exists x2:<0.0, 100.0> exists w1:<0.0, 50.0> exists w2:<0.0, 50.0> w = w1 + w2, x = x1 + x2, y1 + y2 = 1, 10.0 * y1 <= x1, x1 <= 100.0 * y1, 2.0 * y1 <= w1, w1 <= 50.0 * y1, x1 <= w1, 10.0 * y2 <= x2, x2 <= 100.0 * y2, 2.0 * y2 <= w2, w2 <= 50.0 * y2, x2 >= w2 + 4.0 * y2 Ashish Agarwal () 11 / 34 Example: single disjunctive constraint Input Output var x:<10.0, 100.0> var w:<2.0, 50.0> var x:<10.0, 100.0> var w:<2.0, 50.0> min x + w subject_to (x <= w) disj (x >= w + 4.0) min x + w subject_to exists y1:[0, 1] exists y2:[0, 1] exists x1:<0.0, 100.0> exists x2:<0.0, 100.0> exists w1:<0.0, 50.0> exists w2:<0.0, 50.0> w = w1 + w2, x = x1 + x2, y1 + y2 = 1, Output generated in MPS and AMPL formats Implemented as a DSL embedded in OCaml Ashish Agarwal () 10.0 * y1 <= x1, x1 <= 100.0 * y1, 2.0 * y1 <= w1, w1 <= 50.0 * y1, x1 <= w1, 10.0 * y2 <= x2, x2 <= 100.0 * y2, 2.0 * y2 <= w2, w2 <= 50.0 * y2, x2 >= w2 + 4.0 * y2 11 / 34 Switched flow process on off m̄α m̄β α hi low β M max M min F out Ashish Agarwal () 12 / 34 Switched flow process, example constraint Pump α has three kinds of mode transition dynamics: ∀i ∈ N\{n}, isTrue Z α(off, on, i ) isTrue Z α (on, off, i ) R e (i ) ≥ 2.0 ∨ ĉ α (i ) = 0.0 ĉ α (i ) = 50.0 r̂ α (i ) = −R e (i ) r̂ α (i ) = −R e (i ) isTrue YY α (i ) ∨ ĉ α (i ) = 0.0 r̂ α (i ) = 0.0 Booleans and disjunction enable the natural modeling of such logical relations between constraints. Ashish Agarwal () 13 / 34 Switched flow process, example constraint Pump α has three kinds of mode transition dynamics: ∀i ∈ N\{n}, isTrue Z α(off, on, i ) isTrue Z α (on, off, i ) R e (i ) ≥ 2.0 ∨ ĉ α (i ) = 0.0 ĉ α (i ) = 50.0 r̂ α (i ) = −R e (i ) r̂ α (i ) = −R e (i ) isTrue YY α (i ) ∨ ĉ α (i ) = 0.0 r̂ α (i ) = 0.0 ...which interact with each other: ∀i ∈ N\{n}, ∀a ∈ {α, β}, isTrue YY a (i ) ⇔ _ Z a (q, q, i ) q∈Qa Booleans and disjunction enable the natural modeling of such logical relations between constraints. Ashish Agarwal () 13 / 34 Switched flow process, comparison to ILOG Concert Method flow-Concert flow-IC flow-BM flow-CH Ashish Agarwal () #vars (#binary) 1061 (874) 477 (291) 477 (291) 1194 (631) #constr. (#IC) 1080 (718) 1001 (438) 1198 2747 time (sec) 36.85 11.60 3.37 1.09 14 / 34 Switched flow process, comparison to ILOG Concert Method flow-Concert flow-IC flow-BM flow-CH #vars (#binary) 1061 (874) 477 (291) 477 (291) 1194 (631) #constr. (#IC) 1080 (718) 1001 (438) 1198 2747 time (sec) 36.85 11.60 3.37 1.09 All 3 of our methods improve on state-of-the-art. Ashish Agarwal () 14 / 34 Strip packing Ashish Agarwal () 15 / 34 Strip packing, formulation The most natural formulation uses disjunction. min length s.t. length ≥ xi + Li ∀i ∈ N [xi + Li ≤ xj ] ∨ [xj + Lj ≤ xi ] no ∨ [yi − Hi ≥ yj ] overlapping ∨ [yj − Hj ≥ yi ] ∀i , j ∈ N, i < j stay 0 ≤ xi ≤ Lmax − Li ∀i ∈ N in bounds Hi ≤ yi ≤ W ∀i ∈ N (xi , yi ) is the position of the top-left corner of rectangle i . Ashish Agarwal () 16 / 34 Strip packing, comparison to expert Method pack12-IC pack12-BM pack12-CH pack12-BM-expert pack12-CH-expert #vars (#binary) 289 (264) 289 (264) 1345 (264) 289 (264) 1345 (264) #constr. (#IC) 342 (264) 342 2718 342 1662 time (sec) 1.83 1.22 168.38 1.82 149.57 pack21-IC pack21-BM pack21-CH pack21-BM-expert pack21-CH-expert 883 (840) 883 (840) 4243 (840) 883 (840) 4243 (840) 1071 (840) 1071 8631 1071 5271 24.44 55.01 991.68 29.56 > 3600.00 Ashish Agarwal () 17 / 34 Strip packing, comparison to expert Method pack12-IC pack12-BM pack12-CH pack12-BM-expert pack12-CH-expert #vars (#binary) 289 (264) 289 (264) 1345 (264) 289 (264) 1345 (264) #constr. (#IC) 342 (264) 342 2718 342 1662 time (sec) 1.83 1.22 168.38 1.82 149.57 pack21-IC pack21-BM pack21-CH pack21-BM-expert pack21-CH-expert 883 (840) 883 (840) 4243 (840) 883 (840) 4243 (840) 1071 (840) 1071 8631 1071 5271 24.44 55.01 991.68 29.56 > 3600.00 Our mechanizations perform just as well as expert encodings. Ashish Agarwal () 17 / 34 Indexing is Ubiquitous We solved a problem with 150,000 equations and 25,000 variables. How were so many equations and variables declared? Ashish Agarwal () 18 / 34 Indexing is Ubiquitous We solved a problem with 150,000 equations and 25,000 variables. How were so many equations and variables declared? Sometimes, use matrix notation. Often, use indexing. Ashish Agarwal () 18 / 34 Indexing is Ubiquitous We solved a problem with 150,000 equations and 25,000 variables. How were so many equations and variables declared? Sometimes, use matrix notation. Often, use indexing. Indexed operators: n X xi i =1 Families of equations: ∀i ∈ [1 . . . n] xi +1 = xi + yi Indexing is a meta-programming feature Index variables i distinct from mathematical variables x Ashish Agarwal () 18 / 34 Complex Index Sets Arise in Real Problems Job shop scheduling: ∀j ∈ J ∀s ∈ Sj ∀j ′ ∈ Prej,s tj ′ ,s ≤ tj,s Mappings from sets to set of all sets: S Dependent types: Sj depends on value of j Ashish Agarwal () 19 / 34 Indexing Language: Syntax Agarwal (2006) Index Expressions ε ::= i | k | (ε1 , . . . , εm ) | ε.k | −ε | ε1 + ε2 | ε1 − ε2 | ε1 ∗ ε2 | case ε of {kj ⇒ εj }m j=1 Index Sets (Types) σ ::= [εL ..εU ] | i1 : σ1 × · · · × im : σm | case ε of {kj ⇒ σj }m j=1 | λi σ | σ ε | σ :: κ Kinds κ ::= IndexSet | i : σ ⇒ κ Ashish Agarwal () 20 / 34 Example Index Sets set JOBS = {’a’,’b’,’c’} set STAGES = fn i . case i of ’a’ => {’s1’,’s2’} | ’b’ => {’s1’,’s3’,’s4’} | ’c’ => {’s3’,’s4’} set JOBS_STAGES = i:JOBS * STAGES[i] Explicitly: {(’a’,’s1’), (’a’,’s2’), (’b’,’s1’), (’b’,’s3’), (’b’,’s4’), (’c’,’s3’), (’c’,’s4’)} Ashish Agarwal () 21 / 34 Memory Reduction Load this program: ∀i ∈ [1 . . . n] xi +1 = xi + yi Other software expand this to: x2 = x1 + y1 x3 = x2 + y2 x4 = x3 + y3 x5 = x4 + y4 .. .. .. . . . Ashish Agarwal () 22 / 34 Memory Reduction Load this program: ∀i ∈ [1 . . . n] xi +1 = xi + yi Other software expand this to: x2 = x1 + y1 x3 = x2 + y2 x4 = x3 + y3 x5 = x4 + y4 .. .. .. . . . We retain indexing structure: Memory requirements reduced from O(n) to O(1). Ashish Agarwal () 22 / 34 Computational Improvements Input to our software: _ w ≥ xi + 4.0 i :[1..10] Our software’s output: 10.0 ∗ yi ≤ wi′ wi′ ≤ 90.0 ∗ yi ^ 5.0 ∗ yi ≤ x ′ i ,d xi′,d ≤ 75.0 ∗ yi ^ i :[1..10] d:[1..10] wi′ ≥ xi′,i + 4.0 ∗ yi Ashish Agarwal () 23 / 34 Computational Improvements Input to our software: _ w ≥ xi + 4.0 i :[1..10] Our software’s output: 10.0 ∗ yi ≤ wi′ wi′ ≤ 90.0 ∗ yi ^ 5.0 ∗ yi ≤ x ′ i ,d xi′,d ≤ 75.0 ∗ yi ^ i :[1..10] d:[1..10] wi′ ≥ xi′,i + 4.0 ∗ yi Reformulation time reduced from O(n) to O(1). Ashish Agarwal () 23 / 34 Indexing Language: Type System and Semantics Agarwal (2006) Usual Way: Syntax → Type System → Semantics Existence is prior to meaning. Alternative Way: Syntax → Semantics → Type System Meaning is prior to existence. Ashish Agarwal () 24 / 34 Indexing Language: Type System and Semantics Agarwal (2006) Usual Way: Syntax → Type System → Semantics Existence is prior to meaning. Alternative Way: Syntax → Semantics → Type System Meaning is prior to existence. Admits more programs. Possible only because all types are finitary. Ashish Agarwal () 24 / 34 What Are Random Variables? Wasserman (2004) says: A random variable is a mapping X :Ω→R that assigns a real number X (ω) to each outcome ω. Ashish Agarwal () 25 / 34 What Are Random Variables? Wasserman (2004) says: A random variable is a mapping X :Ω→R that assigns a real number X (ω) to each outcome ω. However: Treated as real: P(X ≥ 5) Not random: We write X ∼ Bernoulli(p) to mean that X is exactly distributed as f (x) = p x (1 − p)1−x for x ∈ {0, 1} Ashish Agarwal () 25 / 34 What Are Random Variables? Not variables: Cannot substitute occurrences of X for anything. e.g. In P(X ≥ 5), certainly cannot replace X with its distribution. Dependence matters. e.g. Two random variables X and Y , both distributed as Bernoulli(0.5), each 0 or 1 with probability 0.5. What is P(X + Y = 2)? Perhaps 0.25? But not if Y = 1 − X . Ashish Agarwal () 26 / 34 What Are Random Variables? Not variables: Cannot substitute occurrences of X for anything. e.g. In P(X ≥ 5), certainly cannot replace X with its distribution. Dependence matters. e.g. Two random variables X and Y , both distributed as Bernoulli(0.5), each 0 or 1 with probability 0.5. What is P(X + Y = 2)? Perhaps 0.25? But not if Y = 1 − X . Random variables are neither random nor variable. Ashish Agarwal () 26 / 34 Previous Work Giry (1981), Jones and Plotkin (1989) Probability distributions are a monad. Kozen (1981) Formalized semantics. Ramsey and Pfeffer (2002) Efficient expectations, but discrete distributions only. Park, Pfenning, and Thrun (2004) Continuous distributions also, but support only sampling. Erwig and Kollmansberger (2006) Provide Haskell library, but discrete distributions only, computational efficiency not optimized. Ashish Agarwal () 27 / 34 Previous Work Giry (1981), Jones and Plotkin (1989) Probability distributions are a monad. Kozen (1981) Formalized semantics. Ramsey and Pfeffer (2002) Efficient expectations, but discrete distributions only. Park, Pfenning, and Thrun (2004) Continuous distributions also, but support only sampling. Erwig and Kollmansberger (2006) Provide Haskell library, but discrete distributions only, computational efficiency not optimized. Our goal: Unify these results in a single system. Ashish Agarwal () 27 / 34 Syntax: Probability Language Bhat, Agarwal, Gray, Vuduc (2010) T ::= Bool | Int | Real | T1 × T2 | Prob T E ::= x | true | false | r | E1 + E2 | E1 × E2 | (E1 , E2 ) | fst E | snd E | if E1 then E2 else E3 | E1 = E2 | E1 ≤ E2 | uniform | return E | let x ∼ E1 in E2 Ashish Agarwal () 28 / 34 Language: Type System Example typing rule: Γ ⊢ E1 : Prob T1 Γ, x : T1 ⊢ E2 : Prob T2 Γ ⊢ let x ∼ E1 in E2 : Prob T2 Ashish Agarwal () 29 / 34 Language: Type System Example typing rule: Γ ⊢ E1 : Prob T1 Γ, x : T1 ⊢ E2 : Prob T2 Γ ⊢ let x ∼ E1 in E2 : Prob T2 Pass: Fail: var U ∼ uniform in return (U ≤ 0.7) Ashish Agarwal () var U ∼ uniform in (U ≤ 0.7) 29 / 34 0.00 0.02 0.04 0.06 0.08 0.10 Gaussian Model 150 160 170 180 Height (cm) Ashish Agarwal () 30 / 34 0.00 0.02 0.04 0.06 0.08 0.10 Mixture of Gaussians Model 150 160 170 180 Height (cm) Ashish Agarwal () 31 / 34 Trying alternative statistical models Formulation: Xi ∼ Normal(θ, 1) θ̂ = arg max f (x | θ) θ Ashish Agarwal () 32 / 34 Trying alternative statistical models Formulation: Xi ∼ Normal(θ, 1) θ̂ = arg max f (x | θ) θ Solution: n 1X xi θ̂ = n i =1 Ashish Agarwal () 32 / 34 Trying alternative statistical models Formulation: Formulation: Zi ∼ Bernoulli(0.5) Xi ∼ Normal(θ, 1) Xi ∼ Normal((1 − Zi )θ0 + Zi θ1 , 1) θ̂ = arg max f (x | θ) θ̂ = arg max f (x | θ) θ θ Solution: n 1X xi θ̂ = n i =1 Ashish Agarwal () 32 / 34 Trying alternative statistical models Formulation: Formulation: Zi ∼ Bernoulli(0.5) Xi ∼ Normal(θ, 1) Xi ∼ Normal((1 − Zi )θ0 + Zi θ1 , 1) θ̂ = arg max f (x | θ) θ̂ = arg max f (x | θ) θ θ Solution: Solution: n 1X xi θ̂ = n i =1 Ashish Agarwal () (θ̂0 , θ̂1 ) := rand (); while (...) for i = 1 to n do γi := φ(xi ; θ̂1 ,1)/(φ(xi ; θ̂0 ,1)+φ(xi ; θ̂1 ,1)); Pn Pn θ̂0 := (1 -γi )*xi / i =1 (1 -γi ); Pin=1 Pn θ̂1 := i =1 γi *xi / i =1 γi ; return (θ̂0 , θ̂1 ); 32 / 34 Interactive algorithm assistant Bhat, Agarwal, Gray, Vuduc (2010) Features enter problems apply schemas undo/redo combinators Status can solve several textbook examples of MLE, incl. via EM autotuning + more sophisticated code generation is planned Ashish Agarwal () 33 / 34 Conclusions Automated bigM and convex-hull methods Beginnings of a formalization of probability and statistics Library of transformations Formalization of indexing provides: advances on previous MP languages: e.g. GAMS, AMPL, OPL fundamental improvements in time and space performance possible Ashish Agarwal () 34 / 34 Conclusions Automated bigM and convex-hull methods Beginnings of a formalization of probability and statistics Library of transformations Formalization of indexing provides: advances on previous MP languages: e.g. GAMS, AMPL, OPL fundamental improvements in time and space performance possible challenges remain: e.g. conversion of _^ e i :σ i ′ :σ′ to indexed CNF f ^ :(i :σ→σ′ ) _ {f (i) /i ′ } e i :σ not supported in current theory. Ashish Agarwal () 34 / 34

- Similar pages