Horizontal and Vertical Decomposition in Interior Point Methods for Linear Programs Masakazu Kojimay Nimrod Megiddoz Shinji Mizunox Susumu Shindoh{ Abstract. Corresponding to the linear program: Maximize cT x subject to Ax = a Bx = b x 0 we introduce two functions in the penalty parameter t > 0 and the Lagrange relaxation parameter vector w, ~p(t w) = maxfcT x ; wT (Ax ; a) + t X ln xj : Bx = b n f j =1 x 0g > (for horizontal decomposition), ~d (t w) = minfaT w + bT y ; t X ln zj : B T y ; z = c ; AT w n f j =1 z 0g > (for vertical decomposition). For each t > 0, f~p (t ) and f~d (t ) are strictly convex C 1 functions with a common minimizer w^ (t), which converges to an optimal Lagrange multiplier vector w associated with the constraint Ax = a as t ! 0, and enjoy the strong self-concordance property given by Nesterov and Nemirovsky. Based on these facts, we present conceptual algorithms with the use of Newton's method for tracing the trajectory fw^ (t) : t > 0g, and analyze their computational complexity. 1. Introduction. This paper presents a theoretical framework for incorporating horizontal and vertical decomposition techniques into interior-point methods (see for example the survey paper 13]) Research supported in part by ONR Contract N00014-94-C-0007 Department of Information Sciences, Tokyo Institute of Technology, Oh-Okayama, Meguro-ku, Tokyo 152, Japan. z IBM Research Division, Almaden Research Center, 650 Harry Road, San Jose, CA 95120, and School of Mathematical Sciences, Tel Aviv University, Tel Aviv, Israel x The Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-ku, Tokyo 106, Japan. { Department of Mathematics and Physics, The National Defense Academy, Hashirimizu 1-10-20, Yokosuka, Kanagawa, 239, Japan. y 1 for linear programs. Let a 2 Rm , b 2 Rk , c 2 Rn , A 2 Rmn and B 2 Rkn . We consider the following equality form linear program P and its dual D: P Maximize cT x subject to Ax = a Bx = b x 0: T T D Minimize a w + b y subject to AT w + B T y ; z = c z 0: Let R++ = ft 2 R : t > 0 g (the set of positive numbers) Rn++ = fx 2 Rn : x > 0 g (the positive orthant) P++ = fx 2 Rn++ : Ax = a Bx = b g (the interior of the feasible region of P ) D++ = f(w y z) 2 Rm Rk Rn++ : AT w + B T y ; z = c g (the interior of the feasible region of D) Q++ = fx 2 Rn++ : Bx = b g D++ (w) = f(y z) 2 Rk Rn++ : B T y ; z = c ; AT w g: Then we obviously see that \ P++ = fx 2 Rn : Ax = ag Q++ D++ = f(w y z) 2 Rm+k+n : (y z ) 2 D++ (w )g: w2Rm Throughout the paper we impose the following assumptions on the constraint set of P : (A) P++ is nonempty and bounded. (B) Q++ is nonempty and bounded. ! (C) The (m + k) n matrix A B has full row rank. For every (t x w y z) 2 R++ Rn++ Rm Rk Rn++ , dene n X f p(t w x) = cT x ; wT (Ax ; a) + t ln xj f d(t w y z) = aT w + bT y ; t n X j =1 j =1 ln zj : Here t 2 R++ denotes the barrier parameter and w 2 Rm the Lagrange multiplier vector associated with the constraint Ax = a. We can regard the horizontal (or vertical) decomposition technique as a numerical tracing of the trajectory which consists of the solution w^ (t) (t 2 R++ ) of the following parametric min-max (or min-min) problem: minimize w2Rm maximize ff p(t w x) : x 2 Q++g (1) d minimize w2Rm minimize ff (t w y z) : (y z) 2 D++ (w)g: (2) 2 Under the assumptions (A), (B) and (C), the inner maximization problem in (1) (or the inner minimization problem in (2)) has a unique solution for every (t w) 2 R++ Rm, so that we can consistently dene the optimal value functions and the optimizers of the inner problems for every (t w) 2 R++ Rm , let f~p(t w) = maxff p(t w x) : x 2 Q++ g x~ (t w) = arg max ff p(t w x) : x 2 Q++ g f~d(t w) = minff d(t w y z) : (y z) 2 D++ (w)g (~y(t w) z~ (t w)) = arg min ff d(t w y z) : (y z) 2 D++ (w)g: We are mainly concerned with theoretical aspects of the horizontal and vertical decomposition techniques. In particular we show the following features. (a) For every t 2 R++ , the function f~p(t ) : Rm ! R is a strictly convex C 1 function with the unique minimizer w^ (t) over Rm . method (b) For every (t w) 2 R++ Rm , we can use the primal-dual interior-point (5, 7, 8,p 9, 11], etc.) to compute (~x(t w) y~ (t wp) z~ (t w)), f~p(t w), the gradient vector f~w (t w) w.r.t. w and the Hessian matrix f~ww (t w) w.r.t. w. (c) The set f(~x(t w^ (t)) w^ (t) y~ (t w^ (t)) z~ (t w^ (t))) : t 2 R++ g forms the central trajectory 9] converging to the analytic center of the optimal solution set of the primal-dual pairp of linear programs P and D. (d) ff~ (t ) : t 2 R++ g is a strongly self-concordant familyp 10] (with the parameter p functions (t) = t (t) = (t) = 1 (t) = n=t (t) = n=(2t)). (e) f~d(t w) ; f~p(t w) = nt(1 ; ln t) for every (t w) 2 R++ Rm . Note that the right hand side is independent of w. Hence (a), (b), (c) and (d) above remain valid even if we replace f~p by f~d. In view of the features (a), (b), (c) and (e) above, we obtain approximate optimal solutions of P and D if we numerically trace the trajectory fw^ (t) : t > 0g in the w-space until the barrier parameter t 2 R++ gets suciently small. It should be emphasized that w^ (t) is the common minimizer of strictly convex C 1 functions f~p(t ) and f~d (t ) over the entire space Rm . Thus we can utilize various unconstrained minimization techniques such as Newton's, quasi-Newton and conjugate gradient methods to approximate w^ (t). The feature (d) is a major theoretical contribution of this paper, which makes it possible to eectively utilize Newton's method for tracing the trajectory fw^ (t) : t > 0g. The notion and the theory of self-concordance were given by Nesterov and Nemirovsky 10] for a wide class of polynomial-time interior-point methods for convex programming. After listing in Section 2 symbols and notation used throughout the paper, we show in Section 3 that the inner optimization problems in (1) and (2) share a common necessary and sucient optimality condition, and derive the feature (e) from the condition. 3 Section 4 pdescribes several derivatives of the function f~p :p R++ Rm ! R including the gradient f~w (t w) and the positive denite Hessian matrix f~ww (t w) w.r.t. w 2 Rm . Details of pcalculation of derivatives are given in Appendix. Section 5 establishes that the family ff~ (t ) : t 2 R++ g is strongly self-concordant. In Section 6 we present conceptual decomposition algorithms with the use of Newton's method, and investigate their polynomial-time computational complexity. Although we will state a horizontal decomposition for the general linear program P and a vertical decomposition mainly for its dual D, it is interesting in practice to apply them to a special case where the matrix B has a block diagonal structure: 1 0 B O O 1 BB O B O CC 2 CC : (3) B = BB@ A O O B` Our horizontal and vertical decomposition techniques applied to such special cases are roughly corresponding to the Dantzig-Wolfe and the Benders decomposition methods, respectively. See for example the book 6]. It should be noted that when the matrix B has a block diagonal structure as in (3), we can decompose the inner maximization problem in (1) (or the inner minimization problem in (2)) into ` smaller subproblems of the same type. Many studies (1, 2, 12], etc.) have been done on how we exploit special structures such as generalized upper bounds, block diagonal structures and staircase structures in interiorpoint methods. The conceptual algorithms given in Section 6 has a close relation with a compact-inverse implementation (see Section 9 of 2]) of the primal-dual interior-point method applied to P and D. This is shown in Section 7 where we discuss some issues toward implementation of the horizontal and vertical decomposition techniques. 2. Notation. R++ P++ D++ Q++ D++ (w) f p(t w x) = = = = = = ft 2 R : t > 0 g Rn++ = fx 2 Rn : x > 0 g fx 2 Rn++ : Ax = a Bx = b g f(w y z) 2 Rm Rk Rn++ : AT w + B T y ; z = c g fx 2 Rn++ : Bx = b g f(y z ) 2 Rk Rn++ : B T y ; z = c ; AT w g cT x ; wT (Ax ; a) + t X ln xj f d (t w y z) = aT w + bT y ; t n n X j =1 4 j =1 ln zj f~p(t w) x~ (t w) f~d(t w) (~y(t w) z~ (t w)) X~ ~ w^ (t) = = = = = = = = maxff p(t w x) : x 2 Q++g arg max ff p(t w x) : x 2 Q++g minff d (t w y z) : (y z) 2 D++ (w)g arg min ff d (t w y z) : (y z) 2 D++ (w)g X~ (t w) = diag x~ (t w) Z~ = Z~ (t w) = diag z~ (t w) X~ 1=2Z~ ;1=2 p arg min ff~ (t w) : w 2 Rm g 2 ; 31=2 = 0:2679 . . . = 1=4 = 0:25 : 3. Duality between f~p and f~d . Given t > 0 and w 2 Rm , the term aT w involved in the objective functions f p (t w ) of the inner problem in (1) and f d(t w ) of the inner problem in (2) is constant. Hence we may eliminate it from the objective functions when we are concerned with the inner problems in (1) and (2). Thus the pair of the inner problems is equivalent to the pair: P (t w ) Maximize (c ; ATP w )T x + t Pnj=1 ln xj subject to x 2 Q++: subject to (y z) 2 D++ (w): D(t w ) Minimize bT y ; t nj=1 ln zj This pair has exactly the same structure as the one often studied with the primal-dual interior-point method (5, 7, 8, 9, 11], etc.). Therefore we can apply the primal-dual interiorpoint method to the pair of problems above. By using the well-known argument (see for example the paper 9]) we obtain the theorem below that states a common necessary and sucient optimality condition for the inner maximization problem in (1) (i.e., P (t w)) and the inner minimization problem in (2) (i.e., D(t w)). Theorem 3.1. Let (t w) 2 R++ Rm . Then x 2 Rn is a maximizer of f p (t w ) over Q++ and (y z) 2 Rk+n is a minimizer of f d(t w ) over D++ (w) if and only if 9 BxT ; b =T0 x > 0 > = A w + B y ; z ; c = 0 z > 0 > Xz ; te = 0: (4) Here X denotes the diagonal matrix of the coordinates of x 2 Rn and e = (1 . . . 1)T 2 Rn . Let (t w) 2 R++ Rm . Suppose that (x y z) satises (4). Then (w y z) is a feasible solution of the original dual linear program D. Hence aT w + bT y gives an upper bound of the maximal objective value of the original primal linear program P to be solved. On the other hand, x is not a feasible solution of P unless the additional equality Ax ; a = 0 5 holds. It will be shown in Section 4 that the equality holdsp when and only when w is a minimizer of the outer problem in (1), i.e., a minimizer of f~ (t ) over Rm . Recall that x~ (t w) and (~y(t w) z~ (t w)) denote the maximizer of the inner problem in (1) and the minimizer of the inner problem in (2), respectively. For every xed w 2 Rm , the set f(~x(t w) y~ (t w) z~ (t w)) : t 2 R++ g forms the central trajectory of the primal-dual pair of linear programs: P (w ) Maximize (c ; AT w )T x subject to Bx = b x 0: D (w ) Minimize bT y subject to B T y ; z = c ; AT w z 0: Theorem 3.1 implies that the relations BTx~ (t w) ;T b = 0 x~ (t w) > 0 A w + B y~ (t w) ; z~ (t w) ; c = 0 z~ (t w) > 0 X~ (t w)~z(t w) ; te = 0 9 > = > (5) hold for every (t w) 2 R++ Rm. It follows that f~d(t w) ; f~p(t w) n X T T T ~ ~ ~ = b y(t w) ; c x(t w) + w Ax(t w) ; t ln x~j (t w)~zj (t w) j =1 n T X = AT w + BT y~ (t w) ; c x~ (t w) ; t ln t = x~ (t j =1 w z w) ; nt ln t = nt(1 ; ln t): )T ~ (t Therefore we obtain: Theorem 3.2. f~d (t w) ; f~p(t w) = nt(1 ; ln t) for every (t w) 2 R++ Rm : 4. Derivatives. In this section, we present various derivatives of f~p : R++ Rm ! R. Details of calculation of derivatives are omitted here but are given in Appendix. Theorem 4.1. (i) f~pw (t w) = a ; Ax~ (t w) for every (t w) 2 R ++ Rm . T (ii) f~pww (t w) = A~ I; ~ BT (B~ 2BT );1B~ ~ A ~ T (B X~ 2B T );1BX~ XA ~ T = 1t AX~ I ; XB for every (t w) 2 R++ Rm . 6 ~ 1=2Z~ ;1=2, X~ = X~ (t w) = diag x~ (t w) and Z~ = Z~ (t w) = diag z~ (t w). Here ~ = X As a corollary we obtain: Theorem 4.2. Let t 2 R++ be xed arbitrarily. ~pww (t w) is positive denite for every w 2 Rm . (i) The Hessian matrix f (ii) The function f~p(t ) : Rm ! R is a strictly convex C 1 function with the unique minimizer over Rm . Proof: (i) Obviously the Hessian matrix f~pww (t w) = A~ I ; ~ B T (B ~ 2B T );1B ~ ~ AT is positive semi-denite. Hence it suces to show that the matrix is nonsingular. Assume on the contrary that A~ I ; ~ B T (B~ 2B T );1B~ ~ AT u = 0 for some nonzero u 2 Rm . Since I ; ~ BT (B~ 2BT );1B~ is a projection matrix, we have ~ T ~ 2 T ;1 ~ ~ T I ; B (B B ) B A u = 0 which implies that AT u ; BT (B~ 2BT );1B~ 2AT u = 0: This contradicts the assumption (C). (ii) The strict convexity of f~p(t ) follows directly from (i). We have observed that the mappings x~ y~ z~ on R++ Rm are dened implicitly through the equalities in (4) (see (5)). Since the Jacobian matrix of the left hand side of the equalities with respect to (x y z) is nonsingular at every (t w x y z) 2 R++ Rm Rn++ Rk Rn++ and the left hand side is C 1 dierentiable with respect to (t w x y z), we see by the implicit function theorem (for example 3]) that the mappings x~ y~ z~ are C 1 dierentiable with respect to (t w). It is well-known 9] that under the assumptions (A)and (C), there exists a unique (w x y z) for which AxT ; a =T0 Bx ; b = 0 x > 0 9>= (6) A w + B y ; z ; c = 0 z > 0 > Xz ; te = 0 hold. We know by Theorems 3.1 andp 4.1 that (6) is a necessary and sucient condition for w 2 Rm to be a minimizer of f~ (t ) over Rm . This ensures the existence and the p unique minimizer of f~ (t ). 7 For every t 2 R++ , dene w^ (t) to be the minimizer of f~p(t p) over Rm whose existence and uniqueness have been shown above i.e., w^ (t) = arg min ff~ (t w) : w 2 Rm g: In view of Theorem 3.2, we see that w^ (t) = arg min ff~d(t w) : w 2 Rm g: Furthermore, for every t 2 R++ , (x w y z) = (~x(t w^ (t)) w^ (t) y~ (t w^ (t)) z~ (t w^ (t))) satises (6). Thus the set ^ (t)) w^ (t) y~ (t w^ (t)) z~ (t w^ (t))) : t 2 R++ g forms the central trajectory converging f(~x(t w to the analytic center of the optimal solution set of the primal-dual pair of linear programs P and D . The theorem below will be utilized in Section 5 where we discuss the self-concordant property of the family ff~p(t ) : t 2 R++ g of functions. Theorem 4.3. Let w 2 Rm and h 2 Rm be xed arbitrarily. For every (t s) 2 R++ R, let ~ X (t s) = X (t w + sh) T u(t s) = I ; X (t s)B (BX (t s)2B T );1BX (t s) X (t s)AT h v(t s) = I ; X (t s)BT (BX (t s)2B T );1BX (t s) e Then the following (i) { (v) hold for every s 2 R. ~p(t w + sh) d f (i) = (a ; Ax~ (t w + sh))T h 2 ~p ds n X (ii) d f (tdsw2 + sh) = 1t uj (t s)2. j =1 n 3 f~p (t w + sh) X d 2 (iii) = ; 2 uj (t s)3. 3 ds t j=1 2 f~p (t w + sh) d u(t s)T v(t s) . (iv) = ; dtds t p 3 ~ t w + sh) u(t s)T (2diag v (t s) ; I ) u(t s) (v) d f (dtds = . 2 t2 5. Self-Concordance. In this section, we apply the notion and the theory of self-concordance to the family ff~p(t ) : t 2 R++ g of functions according to the paper 10] by Nesterov and Nemirovsky. To avoid complicated notation and discussion, however, we employ a simplied version of the selfconcordance. Our denition of the \global self-concordance" and related properties shown below are less general than those of the original strong self-concordance given in the paper 8 10], but can be adapted directly to the family ff~p(t ) : t 2 R++ g. The readers who are familiar to the strong self-concordance can easily see that if a family fF (t ) : t 2 R++ g of functions on Rm is globally self-concordant then F = fRm F (t ) Rmgt2R++ is strongly self-concordant. De nition 5.1. A function F on Rm is globally self-concordant with the parameter value > 0 if it satises: (a) F is a strictly convex C 3 function on Rm with a positive denite Hessian matrix Fww (w) at every w 2 Rm . (b) There is a unique global minimizer of F over Rm . (c) For every w 2 Rm and h 2 Rm , 3 !3=2 d F (w + sh) 2;1=2 d2F (w + sh) s=0 s=0 ds3 ds2 De nition 5.2. A family fF (t ) : t 2 R++ g of functions on Rm is globally self-concordant with the positive parameter functions and if it satises: (d) : R++ ! R++ are continuous functions. (e) For every t 2 R++ , F (t ) is a globally self-concordant function on Rm with the parameter value (t). (f) F (t w), Fw (t w) and Fww (t w) are dierentiable in t 2 R++ and the resultant derivatives in t 2 R++ are continuous in (t w) 2 R++ Rm . (g) For every (t w) 2 R++ Rm and h 2 Rm , !1=2 2 d F (t w + sh) (t)(t)1=2 d2 F (t w + sh) s=0 s=0 : dtds ds2 (h) For every (t w) 2 R++ Rm and h 2 Rm , ! 3 d F (t w + sh) 2(t) d2F (t w + sh) : s=0 s=0 dtds2 ds2 Let fF (t ) : t 2 R++ g be a globally self-concordant family of functions on Rm with the positive parameter functions : R++ ! R++ . For every t 2 R++ , dene Newton's decrement of the function F (t ) at w 2 Rm by 1=2 (t w) = inf f : jFw (t w)T hj (t)1=2 hT Fww (t w)h for every h 2 Rm g: 9 Or alternatively, (t w) is dened by ! F ( t w ) ; inf f(t u) : u 2 Rm g 2 (t w) = 2 (t) 1 = (t) Fw (t w)T Fww (t w);1Fw (t w) where (t ) is a quadratic approximation of the function F (t ) at w 2 Rm (t u) = F (t w) + Fw (t w)T (u ; w) + 12 (u ; w)T Fww (t w)(u ; w) for every (t u) 2 R++ Rm : Newton's decrement is a continuous function from R++ Rm into the set of nonnegative numbers such that for each t 2 R++ , (t ) takes the minimal value 0 over Rm at w 2 Rm if and only if w is the unique global minimizer of F (t ) over Rm . Thus, for every 2 R++ , the set N () = f(t w) 2 R++ Rm : (t w) g forms a closed neighborhood of the trajectory f(t w) 2 R++ Rm : (t w ) = 0g = f(t w) 2 R++ Rm : w = arg min fF (t w) : w 2 Rm g g which we want to trace numerically. Let 2 R++ . Dene the metric on R++ by Z t Z t 1 0 00 00 0 00 0 ; 1 00 0 (t t ) = 2 max fj ln(( )=( ))j : 2 t t ]g + (s)ds + (s)ds t t 0 00 for every t t 2 R++ : 0 00 0 00 Let = 2 ; 31=2 = 0:2679 . Dene ( + );1 if > ( ) = (1 1 if !( ) = 1 ; (1 ; 3 )1=3: Theorem 5.3. Assume that fF (t ) : t 2 R++ g is a globally self-concordant family of functions on Rm with the positive parameter functions : R++ ! R++ . Let t0 2 R++ , w0 2 Rm and 0 = (t0 w0). Let w 2 Rm be the Newton iterate at w0 with the step length ( 0) w = w0 ; ( 0)Fww (t0 w0);1 Fw(t0 w0): 10 (i) If 0 > then F (t0 w ) ; F (t0 w0) ;(t0) ( ; ln(1 + )) ;0:03(t0): (ii) If 0 then 8 0 > > 0 2 < = 2 if 0 = ( ) 0 (t w ) (1 ; 0)2 > > < 0 if 0 < : 2 + !( 0) 4 0 (t0): F (t0 w0) ; minfF (t0 w) : w 2 Rm g 12 (t0)!( 0 )2 11 ; !( 0) (iii) If 0 < < and (t00 t0) ;1( ; 0) then (t00 w 0) . Proof: See Section 1 of the paper 10] We have already seen that: (a)' For every t 2 R++ , the function f~p(t ) p: Rm ! R is a strictly convex C 1 function with a positive denite Hessian matrix f~ww (t w) at every wp 2 Rm . (b)' For every t 2 R++ , there is a unique global minimizer of f~ (t ) over Rm. p (f)' f~ ( ) is C 1 dierentiable on R++ Rm . Hence the following theorem is sucient to establish that the family ff~p(t ) : t 2 R++ g of functions on Rm is globally self-concordant. Theorem 5.4. p p n and (t) = n for every t 2 R : (d)' Dene (t) = t (t) = ++ t 2t Then the following relations hold for every (t w) 2 R++ Rm . 3 ~p !3=2 2 ~p d f ( t w + s h ) d f ( t w + s h ) ; 1 = 2 . (c)' 2(t) 2 ds3 ds s=0 s=0 2 ~p !1=2 2 ~p d f ( t w + s h ) d f ( t w + s h ) (g)' (t)(t)1=2 2 s=0 . dtds ds s=0 ! 3 ~p 2 ~p d f ( t w + s h ) d f ( t w + s h ) . 2(t) (h)' 2 s=0 dtds2 ds s=0 11 Proof: (c)' By Theorem 4.3, we have that 0 n 12 ! 3 ~p d f (t w + sh) 2 = @ 2 X 3 uj (t s) A 3 2 ds t j=1 0n 12 X 4 = t4 @ uj (t s)3A j =1 0n 13 X 4 2 4 @ uj (t s) A t j=1 2 ~p !3 d f ( t w + s h ) 4 : = t ds2 Thus (c)' follows. (g)' It follows from the denition of v(t s) that jv(t s)j pn. Hence we see by Theorem 4.3 that 2 ~p d f (t w + sh) = u(t s)T v(t s) dtds t p nku(t s)k t 2 ~p !1=2 d f ( t w + s h ) 1 = 2 = (t)(t) : ds2 Thus we have shown (g)'. (h)' It is easily veried that p p ; n 2vi (t w) ; 1 n for i = 1 2 . . . n: Hence we see by Theorem 4.3 that 3 ~p p d f (t w + sh) n u(t s)T u(t s) dtds2 t2 p 2 ~p = tn d f (tdsw2 + sh) 2 f~p (t w + sh) d = 2(t) : ds2 (7) Thus we have shown (h)'. Now we are ready to apply Theorem 5.4 to the family ff~p(t ) : t 2 R++ g. Newton's decrement turns out to be (t w) 12 ~p !1=2 2 ~p d f ( t w + s h ) d f ( t w + s h ) 1 = 2 = inf f : (t) 8h 2 Rm g 2 ds ds s=s s=s T = inf f : (a ; Ax~ (t w)) h 2 T ; 1 ~ ~ T 1 =2 T T ~ ~ ~ h AX I ; XB (B X B ) B X XA h 8h 2 Rm g = inf f : (Ax~ (t w) ; a)T h ~ I ; ~ B T (B ~ 2B T );1 B~ ~ AT h 1=2 8h 2 Rm g: t1=2 hT A Here X~ = X~ (t w) and ~ = X~ (t w)1=2Z~ (t w);1=2. Let 0 < and 0 < t00 < t0. Then (t00 t0) = 12 maxfjln(( 00)=( 0))j : 00 0 2 t00 t0] g Z t Z t ; 1 + (s)ds + (s)ds t t 1 = 2 maxfjln( 00= 0)j : 00 0 2 t00 t0] g Z t p Z t p +;1 sn ds + t 2sn ds t p p 1 n 0 00 0 00 = ln(t =t ) + (ln t ; ln t ) + n (ln t0 ; ln t00) 2 2 p p ! = 12 + n + 2n ln(t0=t00): Thus we obtain p p ! 00 0 (8) (t t ) = 21 + n + 2n ln(t0=t00): Theorem 5.5. Let = 1=4 < = 2 ; 31=2 = 0:2679 . . . . Let (t0 w0) 2 R++ Rm and 0 = (t0 w0). Let w 2 Rm be the Newton iterate at w0 with the step length ( 0) w = w0 ; ( 0)f~pww (t0 w0);1 f~pw (t0 w0): (i) If 0 > then f~p(t0 w ) ; f~p(t0 w0) ;t0 ( ; ln(1 + )) ;0:03t0: (ii) If 0 then 8 0 >= if 0 = 0 )2 > < ( 2 (t0 w ) 0 (1 ; 0)2 > > : < if 0 < 2 + !( 0) 4 0t0: f~p(t0 w0) ; minff~p(t0 w) : w 2 Rm g 12 t0!( 0)2 11 ; !( 0 ) 0 0 00 00 0 0 00 00 13 p (iii) If 0 =2 and (1 ; 1=(11 n))t0 t00 t0 then (t00 w 0 ) . Proof: The assertions (i) and (ii) follow directly from Theorems and 5.4, so that it p 5.3 0 0 suces to show (iii). Assume that =2 and (1 ; 1=(11 n))t t00 t0: Then we see by (8) that p p ! 00 0 (t t ) = 12 + n + 2n ln(t0=t00) p 1 p 5 n ln 1 ; 1=(11 n) p 1 5 n 1p = ;1 ( ; 0 ): 10 n 2 Thus the desired result follows from (iii) of Theorem 5.3. 6. Conceptual Algorithms. In addition to the assumptions (A), (B) and (C) stated in the Introduction, we assume: ~ (t w) of f~p(t w ) (D) For any (t w) 2 R++ Rm , we can compute the exact maximizer x over Q++ and the exact minimizer (~y(t w) z~ (t w)) of f~d(t w ) over D++ (w). We describe two algorithms based on Theorem 5.5. For each xed t 2 R++ , Algorithm p ~ 6.1 approximates the minimizer w^ (t) of f (t w) over Rm , while Algorithm 6.2 numerically traces the trajectory f(t w^ (t)) : t 2 R++ g from a given initial point (t0 w0) in the neighborhood N (=2) = f(t w) : (t w) =2g of the trajectory f(t w^ (t)) : t 2 R++ g, where = 1=4. Let (t w) 2 R++ Rm. Algorithm 6.1. Step 0: Step 1: Step 2: Step 3: Let q = 0 and wq = w. Compute (xq yq zq ) = (~x(t wpq ) y~ (t wq ) z~p (t wq )). Let q = (t Compute wq+1 = wq ; ( q )f~ww (t wq );1 f~w (t wq ): Replace q by q + 1, and go to Step 1. Suppose (t0 wq ). w0) 2 N (=2). where = 1=4. Algorithm 6.2. p Step 0: Let = 1=(11 n) and r = 0. Step 1: Compute (xr yr zr ) = (~x(tr wr ) y~ (tr wr ) z~ (ptr wr )). Let rp= (tr wr ). Step 2: Let tr+1 = (1 ; )tr. Compute wr+1 = wr ; f~ww (tr+1 wr );1f~w (tr+1 wr ): 14 Step 3: Replace r by r + 1, and go to Step 1. In view of Theorem 5.5, we see that: (i) The sequence f(t wq )g generated by Algorithm 6.1 moves into the neighborhood N (=2) = f(t w) : (t w) =2g of the trajectory f(t w^ (t)) : t 2 R++ g in a nite number of iterations, and eventually converges to w^ (t). Hence Algorithm 6.1 provides us with an initial point of Algorithm 6.2. (ii) The sequence f(tr wr )g generated by Algorithm 6.2 runs in the neighborhood N (=2) = f(t w) : (t w) =2g and satises that ! 1 tr+1 = 1 ; 11pn tr for every r = 0 1 2 . . . (9) hence limr!1 tr = 0. The sequence f(xr wr yr zr )g lies in a neighborhood of the central trajectory of the primal-dual pair of linear programs P and D, and if (x w y z) is a limiting point of the sequence then x and (w y z) are optimal solutions of P and D, respectively. We now assume that all elements of the data A, B , a, b and c are integers to analyze computational complexity of Algorithms 6.1 and 6.2 in detail. Let L denote the input size of P . Lemma 6.3. Let t > 0. (i) f~p(t 0)p n22L + tnL: (ii) wmin f~ (t w) ;n22L ; tnL: 2Rm Proof: By (A) and (B) assumed in the Introduction and by the denition of L, we know that xi 2L exp L for every i = 1 2 . . . n and every x 2 Q++ xi 2;L exp(;L) for every i = 1 2 . . . n and some x 2 P++ : It follows that f p(t 0 x) = cT x + t n X j =1 15 ln xj n(2L )(2L ) + tnL f p(t w = n22L + tnL for every x 2 Q++ n X T x ) = c x + t ln xj = j =1 L ;n(2 )(2L ) + tn(;L) ;n22L ; tnL for every w 2 Rm : Hence f~p(t 0) = maxff p (t 0 x) : x 2 Q++g n22L + tnL ~p(t w) = minm maxff p(t w x) : x 2 Q++g min f m w 2R w2R wmin f p(t w x ) 2Rm ;n22L ; tnL: Theorem 6.4. (i) Let t = 22L and w0 = 0. Then Algorithm 6.1 generates in O(nL) iterations a (t wq ) 2 N (=2). (ii) Let > 0. Then Algorithm 6.2 generates in O(pn ln t0=) iterations a (tr wr ) 2 N (=2) such that 0 < tr < . Proof: Let q = d4nL=0:03e + 1. Assume on the contrary that (t wq ) 62 N ( ) for q = 0 1 2 . . . q: By Theorems 6.3 and 5.5 we then see that ;n22L ; tnL f~p (t wq ) p f~ (t w0 ) ; 0:03t q n22L + tnL ; 4nLt n22L ; 2nL 22L ; tnL < ;n22L ; tnL: This is a contradiction. Hence (t wq ) 2 N ( ) for some q q: By Theorem 5.5, we see that (t wq+2) 2 N (=2). Thus we have shown (i). The assertion (ii) follows directly from the inequality (9). 16 7. Toward Practical Implementation. Among the assumptions imposed in our discussions so far, the most impractical one seems to be the assumption: (D) For any (t w) 2 R++ Rm , we can compute the exact maximizer x~ (t w) of f~p(t w ) over Q++ and the exact minimizer (~y(t w) z~ (t w)) of f~d(t w ) over D++ (w). For every (t w) 2 R++ Rm , the Newton direction (dx dy dz) toward a point on the central trajectory of the primal-dual pair of P (w) and D(w ) gives a search direction in the (x y z)-space, which was utilized in many primal-dual feasible- and infeasible-interior-point methods (5, 7, 8, 9, 11], etc.) (dx dy dz) is given by 9 Bdx = ;(Bx ; b) > = B T dy ; dz = ;(AT w + B T y ; z ; c) > (10) Zdx + Xdz = ;(Xz ; te): In our framework, we have paid little attention to such search directions but assumed that the point (~x(t w) y~ (t w) z~ (t w)) into which all the ow induced by the search directions runs is available. To develop practical computational methods, we need to weaken the assumption (D) at least we need to replace \exact" by \approximate" in (D). In Algorithms 6.1 and 6.2, the exact optimizers x = x~ (t w) and (y z) = (~y (t w) z~ (t w)) are necessary to compute the p ~ gradient f w (t w) = a ; Ax and the Hessian matrix f~pww (t w) = A I ; BT (B 2B T );1 B AT with which we perform one Newton iteration to generate a new point in the w-space w + ( (t w))dw where Hdw = Ax ; a H = A I ; BT G;1B AT G = B2BT = (diag x)1=2 (diag z);1=2 : (11) It should be noted, however, that even when the exact optimizers are unavailable, the search direction dw above is well-dened for every pair of x > 0 and z > 0. Thus, for every (t x y z) 2 R++ Rn++ Rk Rn++ , the correspondence w ! dw denes a search direction in the w-space. 17 Another critical question to be addressed is whether solving the system (11) to compute a search direction dw in the w-space is essentially easier than one iteration of the primaldual interior-point method applied to the original pair of P and D without using the decomposition techniques. To compute dw by solving (11), we need (i) a construction of the matrix H = A I ; B T G;1 B AT which requires an (implicit) inversion of the matrix G = B 2B T , (ii) a solution u of a system of equations Hdw = s for some s 2 Rm . On the other hand, given a current iterate (x w y z) 2 Rn++ Rm Rk Rn++ , the standard primal-dual interior-point method solves the system of equations in the search direction (dx dw dy dz): 9 Adx = ;(Ax ; a) Bdx = ;(Bx ; b) > = T T T T A dw + B dy ; dz = ;(A w + B y ; z ; c) > (12) Xdz + Zdx = ;(Xz ; te) where t 0. We can easily verify that the search direction (dx dw dy dz) can be represented as 9 ; b ) > Hdw = Ax ; a ; A2B T G;1(Bx > T ; 1 ;A I ; B G B (r + s) > > = T (13) Gdy = Bx ; b ; B A dw + r + s > > T T dz = A dw + B dy + r > > ; 1 dx = ;dz ; s where r = (AT w + BT y ; z ; c) and s = X ;1=2Z ;1=2(Xz ; te): This representation (13) of the solution (dx dw dy dz) of (12) is called a compact-inverse implementation in the literature 2]. Hence if we utilize the compact-inverse implementation (13), the majority of the computation of (dx dw dy dz) are also spent for (i) and (ii). Therefore we are required almost the same amount of computational work in solving (11) to compute a search direction dw as in one iteration of the compact-inverse implementation of the primal-dual interior-point method applied to the original pair of P and D without using the decomposition techniques. This observation is ironic to our nice theoretical results with the use of Newton's method presented so far, but sheds a new light on the compactinverse implementation. In particular, we observe that when x = x~ (t w) y = y~ (t w) and z = z~ (t w), (13) turns out to be 9 > Hdw = Ax ; aT 2 Gdy = ;BT A dw T >= dz = A dw + B dy > > ;1dx = ;dz 18 hence dw determined by the compact-inverse implementation (13) is exactly the same as the search direction dw in the w-space determined by (11). We conclude the paper by a general idea of computational methods consisting of the following three steps: Step xyz: Given (t w) 2 R++ Rm and (x y z) 2 Rn++ Rk Rn++ , compute a new iterate (x0 y0 z0) by x0 = x + pdx and (y0 z0) = (y z) + d(dy dz) where (dx dy dz) is given by (10) and p d 0 are step lengths. Step w: Given t > 0, (x y z) 2 Rn++ Rk Rn++ and w 2 Rm , choose a search direction dw in the w-space and generate a new iterate w0 by w0 = w + dw where > 0 is a step length. Step t: Decrease t. Algorithms 6.1 and 6.2 may be regarded as ideal cases of such methods in which we always perform innitely many iterations of Step xyz to compute the exact optimizers x~ (t w) of f p(t w ) and (~y (t w) z~ (t w)) of f d (t w ) before we perform Step w with the search direction dw determined by (11). Acknowledgment. We would like to thank an anonymous referee of the paper 4] which had been submitted to Operations Research Letters. Although the paper was not accepted for publication, the referee argued some interesting and signicant points. In particular, the referee suggested us to see whether a real valued function n X f (w ) = maxfwT (Bx ; b) + ln xj : eT x = 1 x > 0g j =1 on Rm is self-concordant. This question motivated the study of the subject of the current paper. References 1] J. R. Birge and L. Qi, \Computing block-angular Karmarkar projections with applications to stochastic programming," Management Science 34 (1988) 1472{1479. 2] I. C. Choi and F. Goldfarb, \Exploiting special structure in a primal-dual pathfollowing algorithm," Mathematical Programming 58 (1993) 33{52. 3] M. W. Hirsch and S. Smale, Dierential Equations, Dynamical Systems, and Linear Algebra, (Academic Press, New York, 1974). 4] M. Kojima, N. Megiddo and S. Mizuno, \A Lagrangian Relaxation Method for Approximating the Analytic Center of a Polytope," Technical Report, IBM Almaden Research Center, San Jose, CA 95120{6099, USA, 1992. 19 5] M. Kojima, S. Mizuno and A. Yoshise, \A primal-dual interior point algorithm for linear programming," In N. Megiddo, ed., Progress in Mathematical Programming, Interior-Point and Related Methods (Springer-Verlag, New York, 1989) 29{47. 6] L. S. Lasdon, Optimization Theory for Large Systems, (Macmillan, New York, 1970). 7] I. J. Lustig, \Feasibility issues in a primal-dual interior-point method for linear programming," Mathematical Programming 49 (1990/91) 145{162. 8] R. Marsten, R. Subramanian, M. Saltzman, I. J. Lustig and D. Shanno, \Interior point methods for linear programming: Just call Newton, Lagrange, and Fiacco and McCormick!," Interfaces 20 (1990) 105{116. 9] N. Megiddo, \Pathways to the optimal set in linear programming," In N. Megiddo, ed., Progress in Mathematical Programming, Interior-Point and Related Methods (SpringerVerlag, New York, 1989) 131{158. 10] Ju. E. Nesterov and A. S. Nemirovsky, \Self-concordant functions and polynomial-time methods in convex programming," Report, Central Economical and Mathematical Institute, USSR Acad. Sci. (Moscow, USSR, 1989). 11] K. Tanabe, \Centered Newton method for mathematical programming," In M. Iri and K. Yajima, eds., System Modeling and Optimization (Springer-Verlag, New York, 1988) 197{206. 12] M. J .Todd, \Exploiting special structure in Karmarkar's linear programming algorithm," Mathematical Programming 41 (1988) 97{113. 13] M. J .Todd, \Recent developments and new directions in linear programming," In M. Iri and K. Tanabe, eds., Recent Developments and Applications (Kluwer Academic Publishers, London, 1989) 109{157. Appendix. Calculation of Derivatives of f~p . Dierentiating the identities in (5) by w 2 Rm , we rst observe Bx~ w (t w) = O AT + BT y~ w (t w) ; z~ w (t w) = O ) (14) Z~ (t w)~xw (t w) + X~ (t w)~zw (t w) = O for every (t w) 2 R++ Rm . Dierentiating the identities in (5) by t 2 R++ , we also see B x~ t(t w) = O BT y~ t(t w) ; z~ t(t w) = O ) (15) Z~ (t w)~xt(t w) + X~ (t w)~zt(t w) = e for every (t w) 2 R++ Rm . By solving (14) in x~ w (t w), y~ w (t w), z~ w (t w) and (15) in x~ t(t w), y~ t(t w), z~ t(t w), we obtain the following lemma. Lemma A. (i) For every (t w) 2 R++ Rm , ~ T ~ T (B X~ 2BT );1B X~ XA x~ w (t w) = ; 1t X~ I ; XB 20 y~ w (t w) = ;(BX~ 2B T );1B X~ 2AT ~ T (B X~ 2BT );1B X~ XA ~ T: z~ w (t w) = X~ ;1 I ; XB (ii) For every (t w) 2 R++ Rm , ~ T (B X~ 2BT );1B X~ e x~ t(t w) = 1t X~ I ; XB ~ y~ t(t w) = (BX~ 2B T );1B Xe ~ : z~ t(t w) = BT (BX~ 2B T );1B Xe ~ = X~ (t w) and Z~ = Z~ (t w). Here X Proof: (i) It follows from the second and third identities of (14) that x~ w (t w) = ;X~ Z~ ;1z~ w (t w) = ;X~ Z~ ;1 AT + B T y~ w (t w) Hence, by the rst identity of (14), O = B x~ w (t w) = ;BX~ Z~ ;1 AT + B T y~ w (t w) : Therefore we obtain that y~ w (t w) = ;(BX~ Z~ ;1B T );1BX~ Z~ ;1AT = ;(BX~ 2B T );1B X~ 2AT (since X~ Z~ = tI ) z~ w (t w) = AT + BT y~ w (t w) = AT ;BT (BX~ 2B T );1B X~ 2AT ~ T (B X~ 2BT );1B X~ XA ~ T = X~ ;1 I ; XB x~ w (t w) = ;X~ Z~ ;1z~ w (t w) ~ T (B X~ 2BT );1BX~ XA ~ T: = ; 1t X~ I ; XB (ii) In view of the the second and third identities of (15), we see that x~ t(t w) = Z~ ;1e ; X~ Z~ ;1z~ t(t w) = Z~ ;1 e ; X~ Z~ ;1B T y~ t(t w): Hence, by the rst identity of (15), O = B x~ t(t w) = B Z~ ;1e ; B X~ Z~ ;1BT y~ t(t w): Therefore we obtain ~ y~ t(t w) = (BX~ Z~ ;1B T );1BZ~ ;1e = (BX~ 2BT );1B Xe ~ z~ t(t w) = BT (BX~ 2B T );1B Xe ~ T (B X~ 2BT );1B X~ e: x~ t(t w) = Z~ ;1e ; X~ Z~ ;1z~ t(t w) = 1t X~ I ; XB 21 Proof of Theorem 4.1: By the denition, f~pw (t w) = x~ w (t w)T fxp (t w x~ (t w)) ; (Ax~ (t w) ; a) = x~ w (t w)T (B T y~ (t w)) ; (Ax~ (t w) ; a) = (Bx~ w (t w))T y~ (t w) ; (Ax~ (t w) ; a) (since B x~ w (t w) = 0 by (14)) = a ; Ax~ (t w): Thus we have shown (i). By the denition and (i), we see that f~pww (t w) = ;Ax~ w (t w): p Hence we obtain by (i) of Lemma A and X~ = t~ that ~ T (B X~ 2B T );1 BX~ XA ~ T f~pww (t w) = 1t AX~ I ; XB = A~ I ; ~ B T (B ~ 2B T );1B ~ ~ AT : This completes the proof of Theorem 4.1. Proof of Theorem 4.3: For simplicity of notation, we use x(t s) X G P to denote x~ (t w + sh) to denote X (t s) = X~ (t w + sh) to denote BX 2B T to denote I ; XB T G;1 BX respectively. (i) By (i) of Theorem 4.1, df~p(t w + sh) = f~p (t w ds Thus we have shown (i). w + sh)T h = (a ; Ax(t s))T h: 22 (ii) By (ii) of Theorem 4.1, d2f~p(t w + sh) = hT f~p (t w + sh)h ww ds2 = 1t hT AXPXAT h = 1t u(t s)T u(t s): Thus we have shown (ii). (iii) We rst observe that dx(t s) = dx~ (t w + sh) ds ds = x~ w (t w + sh)h = ; 1 XPXAT h = ; 1 Xu(t s): t t On the other hand, we see by (ii) that d3f~p(t w + sh) = 2 u(t s)T du(t s) : ds3 t ds (t s) , let To evaluate duds q(t s) = XAT h and r(t s) = G;1BX 2AT h: Then Gr(t s) = BX 2AT h u(t s) = PXAT h = q(t s) ; XB T r(t s): It follows that dq(t s) = X AT h s ds 2BXX sB T r(t s) + BX 2B T dr(dst s) = 2BXX s AT h: (t s) : Hence Here X s = diag dxds (t s) = 2BXX AT h ; 2BXX B T r(t s) G drds s s = 2BXX s AT h ; 2BXX sB T G;1BX 2AT h = 2BX sPXAT h 23 (16) dr (t s) = 2G;1BX PXAT h s ds du(t s) = dq(t s) ; X B T r(t s) ; XB T dr (t s) s ds ds ds T T ; 1 2 T = X s A h ; X sB G BX A h ; 2XB T G;1 BX s PXAT h = I ; 2XBT G;1BX X ;1X sPXAT h: Therefore 3 f~p (t w + sh) d T du(t s) = 2 u ( t s ) t 3 ds ds T = 2h AXP I ; 2XBT G;1BX X ;1X s PXAT h ;1 T = 2hT AXPX 1 X s PXA h = ;2u(t s)T t diag u(t s) u(t s) n X = ; 2t uj (t s)3: j =1 Thus we have shown (iii). (iv) By (i), we see that d2f~p(t w + sh) = d((a ; Ax(t s))T h) dtds dt = ;hT Ax~ t(t w + sh) = ; 1t hT AXPe = ; 1t u(t s)T v(t s): (v) By (ii), we see that ( ) d3 f~p(t w + sh) = d u(t s)T u(t s) dtds2 dt t (t s) ; u(t s)T u(t s) : = 2t u(t s)T dudt t2 (t s) . By the denition Using q(t s) and r(t s), we represent u(t s) as in (16) to evaluate dudt of q(t s) and r(t s), we see that dq(t s) = X AT h t dt 2BXX tB T r(t s) + BX 2B T dr(dtt s) = 2BXX tAT h: 24 (t s) . Hence Here X t = diag dxdt G dr(dtt s) = 2BXX tAT h ; 2BXX tB T r(t s) = 2BXX tAT h ; 2BXX tB T G;1 BX 2AT h = 2BX tPXAT h dr(t s) = 2G;1BX PXAT h t dt du(t s) = dq (t s) ; X BT r(t s) ; XB T dr(t s) t dt dt dt = X tAT h ; X tB T G;1BX 2AT h ; 2XB T G;1BX tPXAT h = I ; 2XBT G;1BX X ;1X tPXAT h: Thus we obtain u(t s)T dudt(t s) = hT AXP I ; 2XB T G;1BX X ;1X tPXAT h = hT AXX tPXAT h: By the denition of u(t s), x(t s), X t and Lemma A, we know that u(t s) = PXAT h dx(t s) = x~ (t w + sh) = 1 Xv(t s) t dt t ! X ;1X t = X ;1 diag dx(dtt s) = 1t ( diag v(t s)) : Therefore d3f~p = 2 u(t s)T du(t s) ; u(t s)T u(t s) ds2dt t dt t2 T T = 2u(t s) (diag v(t s))tu2 (t s) ; u(t s) u(t s) T = u(t s) (2diag vt(2t s) ; I ) u(t s) : This completes the proof of Theorem 4.3. 25