Decomposition in Interior-Point Methods

Horizontal and Vertical Decomposition
in Interior Point Methods for Linear Programs
Masakazu Kojimay
Nimrod Megiddoz
Shinji Mizunox
Susumu Shindoh{
Abstract. Corresponding to the linear program:
Maximize cT x subject to Ax = a Bx = b x 0
we introduce two functions in the penalty parameter t > 0 and the Lagrange relaxation
parameter vector w,
~p(t w) = maxfcT x ; wT (Ax ; a) + t X ln xj : Bx = b
n
f
j =1
x 0g
>
(for horizontal decomposition),
~d (t w) = minfaT w + bT y ; t X ln zj : B T y ; z = c ; AT w
n
f
j =1
z 0g
>
(for vertical decomposition).
For each t > 0, f~p (t ) and f~d (t ) are strictly convex C 1 functions with a common
minimizer w^ (t), which converges to an optimal Lagrange multiplier vector w associated with the constraint Ax = a as t ! 0, and enjoy the strong self-concordance property given by Nesterov and Nemirovsky. Based on these facts, we present conceptual
algorithms with the use of Newton's method for tracing the trajectory fw^ (t) : t > 0g,
and analyze their computational complexity.
1. Introduction.
This paper presents a theoretical framework for incorporating horizontal and vertical decomposition techniques into interior-point methods (see for example the survey paper 13])
Research supported in part by ONR Contract N00014-94-C-0007
Department of Information Sciences, Tokyo Institute of Technology, Oh-Okayama, Meguro-ku, Tokyo
152, Japan.
z IBM Research Division, Almaden Research Center, 650 Harry Road, San Jose, CA 95120, and School
of Mathematical Sciences, Tel Aviv University, Tel Aviv, Israel
x The Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-ku, Tokyo 106, Japan.
{ Department of Mathematics and Physics, The National Defense Academy, Hashirimizu 1-10-20, Yokosuka, Kanagawa, 239, Japan.
y
1
for linear programs. Let a 2 Rm , b 2 Rk , c 2 Rn , A 2 Rmn and B 2 Rkn . We consider
the following equality form linear program P and its dual D:
P Maximize cT x
subject to Ax = a Bx = b x 0:
T
T
D Minimize a w + b y subject to AT w + B T y ; z = c z 0:
Let
R++ = ft 2 R : t > 0 g (the set of positive numbers)
Rn++ = fx 2 Rn : x > 0 g (the positive orthant)
P++ = fx 2 Rn++ : Ax = a Bx = b g
(the interior of the feasible region of P )
D++ = f(w y z) 2 Rm Rk Rn++ : AT w + B T y ; z = c g
(the interior of the feasible region of D)
Q++ = fx 2 Rn++ : Bx = b g
D++ (w) = f(y z) 2 Rk Rn++ : B T y ; z = c ; AT w g:
Then we obviously see that
\
P++ = fx 2 Rn : Ax = ag Q++
D++ =
f(w y z) 2 Rm+k+n : (y z ) 2 D++ (w )g:
w2Rm
Throughout the paper we impose the following assumptions on the constraint set of P :
(A) P++ is nonempty and bounded.
(B) Q++ is nonempty and bounded.
!
(C) The (m + k) n matrix A
B has full row rank.
For every (t
x w y z) 2 R++ Rn++ Rm Rk Rn++ , dene
n
X
f p(t w x) = cT x ; wT (Ax ; a) + t ln xj
f d(t w y z) = aT w + bT y ; t
n
X
j =1
j =1
ln zj :
Here t 2 R++ denotes the barrier parameter and w 2 Rm the Lagrange multiplier vector
associated with the constraint Ax = a.
We can regard the horizontal (or vertical) decomposition technique as a numerical tracing
of the trajectory which consists of the solution w^ (t) (t 2 R++ ) of the following parametric
min-max (or min-min) problem:
minimize w2Rm maximize ff p(t w x) : x 2 Q++g
(1)
d
minimize w2Rm minimize ff (t w y z) : (y z) 2 D++ (w)g:
(2)
2
Under the assumptions (A), (B) and (C), the inner maximization problem in (1) (or the
inner minimization problem in (2)) has a unique solution for every (t w) 2 R++ Rm, so
that we can consistently dene the optimal value functions and the optimizers of the inner
problems for every (t w) 2 R++ Rm , let
f~p(t w) = maxff p(t w x) : x 2 Q++ g
x~ (t w) = arg max ff p(t w x) : x 2 Q++ g
f~d(t w) = minff d(t w y z) : (y z) 2 D++ (w)g
(~y(t w) z~ (t w)) = arg min ff d(t w y z) : (y z) 2 D++ (w)g:
We are mainly concerned with theoretical aspects of the horizontal and vertical decomposition techniques. In particular we show the following features.
(a) For every t 2 R++ , the function f~p(t ) : Rm ! R is a strictly convex C 1 function
with the unique minimizer w^ (t) over Rm .
method
(b) For every (t w) 2 R++ Rm , we can use the primal-dual interior-point
(5, 7, 8,p 9, 11], etc.) to compute (~x(t w) y~ (t wp) z~ (t w)), f~p(t w), the gradient
vector f~w (t w) w.r.t. w and the Hessian matrix f~ww (t w) w.r.t. w.
(c) The set f(~x(t w^ (t)) w^ (t) y~ (t w^ (t)) z~ (t w^ (t))) : t 2 R++ g forms the central trajectory 9] converging to the analytic center of the optimal solution set of the primal-dual
pairp of linear programs P and D.
(d) ff~ (t ) : t 2 R++ g is a strongly self-concordant
familyp 10] (with the parameter
p
functions (t) = t (t) = (t) = 1 (t) = n=t (t) = n=(2t)).
(e) f~d(t w) ; f~p(t w) = nt(1 ; ln t) for every (t w) 2 R++ Rm . Note that the right
hand side is independent of w. Hence (a), (b), (c) and (d) above remain valid even if
we replace f~p by f~d.
In view of the features (a), (b), (c) and (e) above, we obtain approximate optimal
solutions of P and D if we numerically trace the trajectory fw^ (t) : t > 0g in the w-space
until the barrier parameter t 2 R++ gets suciently small. It should be emphasized that
w^ (t) is the common minimizer of strictly convex C 1 functions f~p(t ) and f~d (t ) over the
entire space Rm . Thus we can utilize various unconstrained minimization techniques such
as Newton's, quasi-Newton and conjugate gradient methods to approximate w^ (t).
The feature (d) is a major theoretical contribution of this paper, which makes it possible
to eectively utilize Newton's method for tracing the trajectory fw^ (t) : t > 0g. The notion
and the theory of self-concordance were given by Nesterov and Nemirovsky 10] for a wide
class of polynomial-time interior-point methods for convex programming.
After listing in Section 2 symbols and notation used throughout the paper, we show
in Section 3 that the inner optimization problems in (1) and (2) share a common necessary and sucient optimality condition, and derive the feature (e) from the condition.
3
Section 4 pdescribes several derivatives of the function f~p :p R++ Rm ! R including the
gradient f~w (t w) and the positive denite Hessian matrix f~ww (t w) w.r.t. w 2 Rm . Details
of pcalculation of derivatives are given in Appendix. Section 5 establishes that the family
ff~ (t ) : t 2 R++ g is strongly self-concordant. In Section 6 we present conceptual decomposition algorithms with the use of Newton's method, and investigate their polynomial-time
computational complexity.
Although we will state a horizontal decomposition for the general linear program P and
a vertical decomposition mainly for its dual D, it is interesting in practice to apply them
to a special case where the matrix B has a block diagonal structure:
1
0
B
O
O
1
BB O B O CC
2
CC :
(3)
B = BB@
A
O O
B`
Our horizontal and vertical decomposition techniques applied to such special cases are
roughly corresponding to the Dantzig-Wolfe and the Benders decomposition methods, respectively. See for example the book 6]. It should be noted that when the matrix B has
a block diagonal structure as in (3), we can decompose the inner maximization problem in
(1) (or the inner minimization problem in (2)) into ` smaller subproblems of the same type.
Many studies (1, 2, 12], etc.) have been done on how we exploit special structures such
as generalized upper bounds, block diagonal structures and staircase structures in interiorpoint methods. The conceptual algorithms given in Section 6 has a close relation with
a compact-inverse implementation (see Section 9 of 2]) of the primal-dual interior-point
method applied to P and D. This is shown in Section 7 where we discuss some issues
toward implementation of the horizontal and vertical decomposition techniques.
2. Notation.
R++
P++
D++
Q++
D++ (w)
f p(t w x)
=
=
=
=
=
=
ft 2 R : t > 0 g Rn++ = fx 2 Rn : x > 0 g
fx 2 Rn++ : Ax = a Bx = b g
f(w y z) 2 Rm Rk Rn++ : AT w + B T y ; z = c g
fx 2 Rn++ : Bx = b g
f(y z ) 2 Rk Rn++ : B T y ; z = c ; AT w g
cT x ; wT (Ax ; a) + t X ln xj
f d (t w y z) = aT w + bT y ; t
n
n
X
j =1
4
j =1
ln zj
f~p(t w)
x~ (t w)
f~d(t w)
(~y(t w) z~ (t w))
X~
~
w^ (t)
=
=
=
=
=
=
=
=
maxff p(t w x) : x 2 Q++g
arg max ff p(t w x) : x 2 Q++g
minff d (t w y z) : (y z) 2 D++ (w)g
arg min ff d (t w y z) : (y z) 2 D++ (w)g
X~ (t w) = diag x~ (t w) Z~ = Z~ (t w) = diag z~ (t w)
X~ 1=2Z~ ;1=2 p
arg min ff~ (t w) : w 2 Rm g
2 ; 31=2 = 0:2679 . . . = 1=4 = 0:25 :
3. Duality between f~p and f~d .
Given t > 0 and w 2 Rm , the term aT w involved in the objective functions f p (t w ) of the
inner problem in (1) and f d(t w ) of the inner problem in (2) is constant. Hence we may
eliminate it from the objective functions when we are concerned with the inner problems in
(1) and (2). Thus the pair of the inner problems is equivalent to the pair:
P (t w ) Maximize (c ; ATP
w
)T x + t Pnj=1 ln xj subject to x 2 Q++:
subject to (y z) 2 D++ (w):
D(t w ) Minimize bT y ; t nj=1 ln zj
This pair has exactly the same structure as the one often studied with the primal-dual
interior-point method (5, 7, 8, 9, 11], etc.). Therefore we can apply the primal-dual interiorpoint method to the pair of problems above. By using the well-known argument (see for
example the paper 9]) we obtain the theorem below that states a common necessary and
sucient optimality condition for the inner maximization problem in (1) (i.e., P (t w))
and the inner minimization problem in (2) (i.e., D(t w)).
Theorem 3.1. Let (t w) 2 R++ Rm . Then x 2 Rn is a maximizer of f p (t w ) over
Q++ and (y z) 2 Rk+n is a minimizer of f d(t w ) over D++ (w) if and only if
9
BxT ; b =T0 x > 0
>
=
A w + B y ; z ; c = 0 z > 0 >
Xz ; te = 0:
(4)
Here X denotes the diagonal matrix of the coordinates of x 2 Rn and e = (1 . . . 1)T 2 Rn .
Let (t w) 2 R++ Rm . Suppose that (x y z) satises (4). Then (w y z) is a feasible
solution of the original dual linear program D. Hence aT w + bT y gives an upper bound
of the maximal objective value of the original primal linear program P to be solved. On
the other hand, x is not a feasible solution of P unless the additional equality Ax ; a = 0
5
holds. It will be shown in Section 4 that the equality holdsp when and only when w is a
minimizer of the outer problem in (1), i.e., a minimizer of f~ (t ) over Rm .
Recall that x~ (t w) and (~y(t w) z~ (t w)) denote the maximizer of the inner problem in
(1) and the minimizer of the inner problem in (2), respectively. For every xed w 2 Rm , the
set f(~x(t w) y~ (t w) z~ (t w)) : t 2 R++ g forms the central trajectory of the primal-dual
pair of linear programs:
P (w ) Maximize (c ; AT w )T x subject to Bx = b x 0:
D (w ) Minimize bT y
subject to B T y ; z = c ; AT w z 0:
Theorem 3.1 implies that the relations
BTx~ (t w) ;T b = 0 x~ (t w) > 0
A w + B y~ (t w) ; z~ (t w) ; c = 0 z~ (t w) > 0
X~ (t w)~z(t w) ; te = 0
9
>
=
>
(5)
hold for every (t w) 2 R++ Rm. It follows that
f~d(t w) ; f~p(t w)
n
X
T
T
T
~
~
~
= b y(t w) ; c x(t w) + w Ax(t w) ; t ln x~j (t w)~zj (t w)
j =1
n
T
X
= AT w + BT y~ (t w) ; c x~ (t w) ; t ln t
= x~ (t
j =1
w z w) ; nt ln t = nt(1 ; ln t):
)T ~ (t
Therefore we obtain:
Theorem 3.2. f~d (t w) ; f~p(t w) = nt(1 ; ln t) for every (t w) 2 R++ Rm :
4. Derivatives.
In this section, we present various derivatives of f~p : R++ Rm ! R. Details of calculation
of derivatives are omitted here but are given in Appendix.
Theorem 4.1.
(i) f~pw (t w) = a ; Ax~ (t w) for every (t w) 2 R ++ Rm .
T
(ii) f~pww (t w) = A~ I; ~ BT (B~ 2BT );1B~ ~ A
~ T (B X~ 2B T );1BX~ XA
~ T
= 1t AX~ I ; XB
for every (t w) 2 R++ Rm .
6
~ 1=2Z~ ;1=2, X~ = X~ (t w) = diag x~ (t w) and Z~ = Z~ (t w) = diag z~ (t w).
Here ~ = X
As a corollary we obtain:
Theorem 4.2. Let t 2 R++ be xed arbitrarily.
~pww (t w) is positive denite for every w 2 Rm .
(i) The Hessian matrix
f
(ii) The function f~p(t ) : Rm ! R is a strictly convex C 1 function with the unique
minimizer over Rm .
Proof: (i) Obviously the Hessian matrix
f~pww (t w) = A~ I ; ~ B T (B ~ 2B T );1B ~ ~ AT
is positive semi-denite. Hence it suces to show that the matrix is nonsingular. Assume
on the contrary that
A~ I ; ~ B T (B~ 2B T );1B~ ~ AT u = 0
for some nonzero u 2 Rm . Since I ; ~ BT (B~ 2BT );1B~ is a projection matrix, we
have
~ T ~ 2 T ;1 ~ ~ T
I ; B (B B ) B A u = 0
which implies that
AT u ; BT (B~ 2BT );1B~ 2AT u = 0:
This contradicts the assumption (C).
(ii) The strict convexity of f~p(t ) follows directly from (i). We have observed that the
mappings x~ y~ z~ on R++ Rm are dened implicitly through the equalities in (4) (see
(5)). Since the Jacobian matrix of the left hand side of the equalities with respect to
(x y z) is nonsingular at every (t w x y z) 2 R++ Rm Rn++ Rk Rn++ and the
left hand side is C 1 dierentiable with respect to (t w x y z), we see by the implicit
function theorem (for example 3]) that the mappings x~ y~ z~ are C 1 dierentiable with
respect to (t w). It is well-known 9] that under the assumptions (A)and (C), there exists
a unique (w x y z) for which
AxT ; a =T0 Bx ; b = 0 x > 0 9>=
(6)
A w + B y ; z ; c = 0 z > 0 >
Xz ; te = 0
hold. We know by Theorems 3.1 andp 4.1 that (6) is a necessary and sucient condition
for w 2 Rm to be a minimizer
of f~ (t ) over Rm . This ensures the existence and the
p
unique minimizer of f~ (t ).
7
For every t 2 R++ , dene w^ (t) to be the minimizer of f~p(t p) over Rm whose existence
and uniqueness have been shown above i.e., w^ (t) = arg min ff~ (t w) : w 2 Rm g: In view
of Theorem 3.2, we see that w^ (t) = arg min ff~d(t w) : w 2 Rm g: Furthermore, for every
t 2 R++ , (x w y z) = (~x(t w^ (t)) w^ (t) y~ (t w^ (t)) z~ (t w^ (t))) satises (6). Thus the set
^ (t)) w^ (t) y~ (t w^ (t)) z~ (t w^ (t))) : t 2 R++ g forms the central trajectory converging
f(~x(t w
to the analytic center of the optimal solution set of the primal-dual pair of linear programs
P and D .
The theorem below will be utilized in Section 5 where we discuss the self-concordant
property of the family ff~p(t ) : t 2 R++ g of functions.
Theorem 4.3. Let w 2 Rm and h 2 Rm be xed arbitrarily. For every (t s) 2 R++ R,
let
~
X (t s) = X
(t w + sh) T
u(t s) = I ; X (t s)B (BX (t s)2B T );1BX (t s) X (t s)AT h
v(t s) = I ; X (t s)BT (BX (t s)2B T );1BX (t s) e
Then the following (i) { (v) hold for every s 2 R.
~p(t w + sh)
d
f
(i)
= (a ; Ax~ (t w + sh))T h
2 ~p
ds
n
X
(ii) d f (tdsw2 + sh) = 1t uj (t s)2.
j =1
n
3 f~p (t w + sh)
X
d
2
(iii)
=
; 2 uj (t s)3.
3
ds
t j=1
2 f~p (t w + sh)
d
u(t s)T v(t s) .
(iv)
=
;
dtds
t
p
3 ~ t w + sh) u(t s)T (2diag v (t s) ; I ) u(t s)
(v) d f (dtds
=
.
2
t2
5. Self-Concordance.
In this section, we apply the notion and the theory of self-concordance to the family ff~p(t ) :
t 2 R++ g of functions according to the paper 10] by Nesterov and Nemirovsky. To avoid
complicated notation and discussion, however, we employ a simplied version of the selfconcordance. Our denition of the \global self-concordance" and related properties shown
below are less general than those of the original strong self-concordance given in the paper
8
10], but can be adapted directly to the family ff~p(t ) : t 2 R++ g. The readers who are
familiar to the strong self-concordance can easily see that if a family fF (t ) : t 2 R++ g
of functions on Rm is globally self-concordant then F = fRm F (t ) Rmgt2R++ is strongly
self-concordant.
De
nition 5.1. A function F on Rm is globally self-concordant with the parameter value
> 0 if it satises:
(a) F is a strictly convex C 3 function on Rm with a positive denite Hessian matrix
Fww (w) at every w 2 Rm .
(b) There is a unique global minimizer of F over Rm .
(c) For every w 2 Rm and h 2 Rm ,
3
!3=2
d F (w + sh) 2;1=2 d2F (w + sh) s=0 s=0
ds3
ds2
De
nition 5.2. A family fF (t ) : t 2 R++ g of functions on Rm is globally self-concordant
with the positive parameter functions and if it satises:
(d) : R++ ! R++ are continuous functions.
(e) For every t 2 R++ , F (t ) is a globally self-concordant function on Rm with the
parameter value (t).
(f) F (t w), Fw (t w) and Fww (t w) are dierentiable in t 2 R++ and the resultant derivatives in t 2 R++ are continuous in (t w) 2 R++ Rm .
(g) For every (t w) 2 R++ Rm and h 2 Rm ,
!1=2
2
d F (t w + sh) (t)(t)1=2 d2 F (t w + sh) s=0 s=0 :
dtds
ds2
(h) For every (t w) 2 R++ Rm and h 2 Rm ,
!
3
d F (t w + sh) 2(t) d2F (t w + sh) :
s=0 s=0
dtds2
ds2
Let fF (t ) : t 2 R++ g be a globally self-concordant family of functions on Rm with the
positive parameter functions : R++ ! R++ . For every t 2 R++ , dene Newton's
decrement of the function F (t ) at w 2 Rm by
1=2
(t w) = inf f
: jFw (t w)T hj (t)1=2 hT Fww (t w)h
for every h 2 Rm g:
9
Or alternatively, (t w) is dened by
!
F
(
t
w
)
; inf f(t u) : u 2 Rm g
2
(t w) = 2
(t)
1
= (t) Fw (t w)T Fww (t w);1Fw (t w)
where (t ) is a quadratic approximation of the function F (t ) at w 2 Rm (t u) = F (t w) + Fw (t w)T (u ; w) + 12 (u ; w)T Fww (t w)(u ; w)
for every (t u) 2 R++ Rm :
Newton's decrement is a continuous function from R++ Rm into the set of nonnegative
numbers such that for each t 2 R++ , (t ) takes the minimal value 0 over Rm at w 2 Rm
if and only if w is the unique global minimizer of F (t ) over Rm . Thus, for every 2 R++ ,
the set
N () = f(t w) 2 R++ Rm : (t w) g
forms a closed neighborhood of the trajectory
f(t w) 2 R++ Rm : (t w ) = 0g
= f(t w) 2 R++ Rm : w = arg min fF (t w) : w 2 Rm g g
which we want to trace numerically. Let 2 R++ . Dene the metric on R++ by
Z t
Z t
1
0
00
00
0
00
0
;
1
00
0
(t t ) = 2 max fj ln(( )=( ))j : 2 t t ]g + (s)ds + (s)ds
t
t
0
00
for every t t 2 R++ :
0
00
0
00
Let = 2 ; 31=2 = 0:2679 . Dene
(
+ );1 if > (
) = (1
1
if !(
) = 1 ; (1 ; 3
)1=3:
Theorem 5.3. Assume that fF (t ) : t 2 R++ g is a globally self-concordant family of
functions on Rm with the positive parameter functions : R++ ! R++ . Let t0 2 R++ ,
w0 2 Rm and 0 = (t0 w0). Let w 2 Rm be the Newton iterate at w0 with the step length
(
0)
w = w0 ; (
0)Fww (t0 w0);1 Fw(t0 w0):
10
(i) If 0 > then
F (t0 w ) ; F (t0 w0) ;(t0) (
; ln(1 + )) ;0:03(t0):
(ii) If 0 then
8 0
>
>
0
2
< = 2 if 0 = (
)
0
(t w ) (1 ; 0)2 >
> < 0 if 0 < :
2
+ !(
0) 4
0 (t0):
F (t0 w0) ; minfF (t0 w) : w 2 Rm g 12 (t0)!(
0 )2 11 ;
!(
0)
(iii) If 0 < < and (t00 t0) ;1( ; 0) then (t00
w 0) .
Proof: See Section 1 of the paper 10]
We have already seen that:
(a)' For every t 2 R++ , the function f~p(t ) p: Rm ! R is a strictly convex C 1 function
with a positive denite Hessian matrix f~ww (t w) at every wp 2 Rm .
(b)' For
every t 2 R++ , there is a unique global minimizer of f~ (t ) over Rm.
p
(f)' f~ ( ) is C 1 dierentiable on R++ Rm .
Hence the following theorem is sucient to establish that the family ff~p(t ) : t 2 R++ g of
functions on Rm is globally self-concordant.
Theorem 5.4.
p
p
n and (t) = n for every t 2 R :
(d)' Dene (t) = t (t) =
++
t
2t
Then the following relations hold for every (t w) 2 R++ Rm .
3 ~p
!3=2
2 ~p
d
f
(
t
w
+
s
h
)
d
f
(
t
w
+
s
h
)
;
1
=
2
.
(c)' 2(t)
2
ds3
ds
s=0
s=0
2 ~p
!1=2
2 ~p
d
f
(
t
w
+
s
h
)
d
f
(
t
w
+
s
h
)
(g)' (t)(t)1=2
2
s=0 .
dtds
ds
s=0
!
3 ~p
2 ~p
d
f
(
t
w
+
s
h
)
d
f
(
t
w
+
s
h
)
.
2(t)
(h)' 2
s=0
dtds2
ds
s=0
11
Proof: (c)' By Theorem 4.3, we have that
0 n
12
!
3 ~p
d f (t w + sh) 2 = @ 2 X
3
uj (t s) A
3
2
ds
t j=1
0n
12
X
4
= t4 @ uj (t s)3A
j =1
0n
13
X
4
2
4 @ uj (t s) A
t j=1
2 ~p
!3
d
f
(
t
w
+
s
h
)
4
:
= t
ds2
Thus (c)' follows.
(g)' It follows from the denition of v(t s) that jv(t s)j pn. Hence we see by
Theorem 4.3 that
2 ~p
d f (t w + sh) = u(t s)T v(t s) dtds
t
p
nku(t s)k
t
2 ~p
!1=2
d
f
(
t
w
+
s
h
)
1
=
2
= (t)(t)
:
ds2
Thus we have shown (g)'.
(h)' It is easily veried that
p
p
; n 2vi (t w) ; 1 n for i = 1 2 . . . n:
Hence we see by Theorem 4.3 that
3 ~p
p
d f (t w + sh) n u(t s)T u(t s)
dtds2
t2
p 2 ~p
= tn d f (tdsw2 + sh)
2 f~p (t w + sh)
d
= 2(t)
:
ds2
(7)
Thus we have shown (h)'.
Now we are ready to apply Theorem 5.4 to the family ff~p(t ) : t 2 R++ g. Newton's
decrement turns out to be
(t w)
12
~p
!1=2
2 ~p
d
f
(
t
w
+
s
h
)
d
f
(
t
w
+
s
h
)
1
=
2
= inf f
: (t)
8h 2 Rm g
2
ds
ds
s=s
s=s
T
= inf f
: (a ; Ax~ (t w)) h
2 T ; 1 ~ ~ T 1 =2
T
T
~
~
~
h AX I ; XB (B X B ) B X XA h
8h 2 Rm g
= inf f
: (Ax~ (t w) ; a)T h
~ I ; ~ B T (B ~ 2B T );1 B~ ~ AT h 1=2 8h 2 Rm g:
t1=2 hT A
Here X~ = X~ (t w) and ~ = X~ (t w)1=2Z~ (t w);1=2. Let 0 < and 0 < t00 < t0. Then
(t00 t0) = 12 maxfjln(( 00)=( 0))j : 00 0 2 t00 t0] g
Z t
Z t
;
1
+ (s)ds + (s)ds
t
t
1
= 2 maxfjln( 00= 0)j : 00 0 2 t00 t0] g
Z t p Z t p +;1 sn ds + t 2sn ds
t
p
p
1
n
0
00
0
00
= ln(t =t ) + (ln t ; ln t ) + n (ln t0 ; ln t00)
2
2 p p !
= 12 + n + 2n ln(t0=t00):
Thus we obtain
p p !
00
0
(8)
(t t ) = 21 + n + 2n ln(t0=t00):
Theorem 5.5. Let = 1=4 < = 2 ; 31=2 = 0:2679 . . . . Let (t0 w0) 2 R++ Rm and
0 = (t0 w0). Let w 2 Rm be the Newton iterate at w0 with the step length (
0)
w = w0 ; (
0)f~pww (t0 w0);1 f~pw (t0 w0):
(i) If 0 > then
f~p(t0 w ) ; f~p(t0 w0) ;t0 (
; ln(1 + )) ;0:03t0:
(ii) If 0 then
8 0
>=
if 0 = 0 )2 >
<
(
2
(t0 w ) 0
(1 ; 0)2 >
>
: < if 0 < 2
+ !(
0) 4
0t0:
f~p(t0 w0) ; minff~p(t0 w) : w 2 Rm g 12 t0!(
0)2 11 ;
!(
0 )
0
0
00
00
0
0
00
00
13
p
(iii) If 0 =2 and (1 ; 1=(11 n))t0 t00 t0 then (t00
w 0 ) .
Proof: The assertions (i) and (ii) follow directly from Theorems
and 5.4, so that it
p 5.3
0
0
suces to show (iii). Assume that =2 and (1 ; 1=(11 n))t t00 t0: Then we
see by (8) that
p p !
00
0
(t t ) = 12 + n + 2n ln(t0=t00)
p
1 p
5 n ln
1 ; 1=(11 n)
p
1
5 n 1p = ;1 ( ; 0 ):
10 n
2
Thus the desired result follows from (iii) of Theorem 5.3.
6. Conceptual Algorithms.
In addition to the assumptions (A), (B) and (C) stated in the Introduction, we assume:
~ (t w) of f~p(t w )
(D) For any (t w) 2 R++ Rm , we can compute the exact maximizer
x
over Q++ and the exact minimizer (~y(t w) z~ (t w)) of f~d(t w ) over D++ (w).
We describe two algorithms based on Theorem
5.5. For each xed t 2 R++ , Algorithm
p
~
6.1 approximates the minimizer w^ (t) of f (t w) over Rm , while Algorithm 6.2 numerically
traces the trajectory f(t w^ (t)) : t 2 R++ g from a given initial point (t0 w0) in the neighborhood N (=2) = f(t w) : (t w) =2g of the trajectory f(t w^ (t)) : t 2 R++ g, where
= 1=4.
Let (t w) 2 R++ Rm.
Algorithm 6.1.
Step 0:
Step 1:
Step 2:
Step 3:
Let q = 0 and wq = w.
Compute (xq yq zq ) = (~x(t wpq ) y~ (t wq ) z~p (t wq )). Let q = (t
Compute wq+1 = wq ; (
q )f~ww (t wq );1 f~w (t wq ):
Replace q by q + 1, and go to Step 1.
Suppose (t0
wq ).
w0) 2 N (=2). where = 1=4.
Algorithm 6.2.
p
Step 0: Let = 1=(11 n) and r = 0.
Step 1: Compute (xr yr zr ) = (~x(tr wr ) y~ (tr wr ) z~ (ptr wr )). Let rp= (tr wr ).
Step 2: Let tr+1 = (1 ; )tr. Compute wr+1 = wr ; f~ww (tr+1 wr );1f~w (tr+1 wr ):
14
Step 3: Replace r by r + 1, and go to Step 1.
In view of Theorem 5.5, we see that:
(i) The sequence f(t wq )g generated by Algorithm 6.1 moves into the neighborhood
N (=2) = f(t w) : (t w) =2g
of the trajectory f(t w^ (t)) : t 2 R++ g in a nite number of iterations, and eventually
converges to w^ (t). Hence Algorithm 6.1 provides us with an initial point of Algorithm
6.2.
(ii) The sequence f(tr wr )g generated by Algorithm 6.2 runs in the neighborhood
N (=2) = f(t w) : (t w) =2g
and satises that
!
1
tr+1 = 1 ; 11pn tr for every r = 0 1 2 . . . (9)
hence limr!1 tr = 0. The sequence f(xr wr yr zr )g lies in a neighborhood of
the central trajectory of the primal-dual pair of linear programs P and D, and if
(x w y z) is a limiting point of the sequence then x and (w y z) are optimal
solutions of P and D, respectively.
We now assume that all elements of the data A, B , a, b and c are integers to analyze
computational complexity of Algorithms 6.1 and 6.2 in detail. Let L denote the input size
of P .
Lemma 6.3. Let t > 0.
(i) f~p(t 0)p n22L + tnL:
(ii) wmin
f~ (t w) ;n22L ; tnL:
2Rm
Proof: By (A) and (B) assumed in the Introduction and by the denition of L, we know
that
xi 2L exp L for every i = 1 2 . . . n and every x 2 Q++
xi 2;L exp(;L) for every i = 1 2 . . . n and some x 2 P++ :
It follows that
f p(t 0 x) = cT x + t
n
X
j =1
15
ln xj
n(2L )(2L ) + tnL
f p(t w
= n22L + tnL for every x 2 Q++
n
X
T
x ) = c x + t ln xj
=
j =1
L
;n(2 )(2L ) + tn(;L)
;n22L ; tnL for every
w 2 Rm :
Hence
f~p(t 0) = maxff p (t 0 x) : x 2 Q++g n22L + tnL
~p(t w) = minm maxff p(t w x) : x 2 Q++g
min
f
m
w 2R
w2R
wmin
f p(t w x )
2Rm
;n22L ; tnL:
Theorem 6.4.
(i) Let t = 22L and w0 = 0. Then Algorithm 6.1 generates in O(nL) iterations a (t wq ) 2
N (=2).
(ii) Let > 0. Then Algorithm 6.2 generates in O(pn ln t0=) iterations a (tr wr ) 2
N (=2) such that 0 < tr < .
Proof: Let q = d4nL=0:03e + 1. Assume on the contrary that
(t
wq ) 62 N (
) for q = 0 1 2 . . . q:
By Theorems 6.3 and 5.5 we then see that
;n22L ; tnL f~p (t wq )
p
f~ (t w0 ) ; 0:03t q n22L + tnL ; 4nLt
n22L ; 2nL 22L ; tnL < ;n22L ; tnL:
This is a contradiction. Hence (t wq ) 2 N (
) for some q q: By Theorem 5.5, we
see that (t wq+2) 2 N (=2). Thus we have shown (i). The assertion (ii) follows directly
from the inequality (9).
16
7. Toward Practical Implementation.
Among the assumptions imposed in our discussions so far, the most impractical one seems
to be the assumption:
(D) For any (t w) 2 R++ Rm , we can compute the exact maximizer x~ (t w) of f~p(t w )
over Q++ and the exact minimizer (~y(t w) z~ (t w)) of f~d(t
w ) over D++ (w).
For every (t w) 2 R++ Rm , the Newton direction (dx dy dz) toward a point on the
central trajectory of the primal-dual pair of P (w) and D(w ) gives a search direction in the
(x y z)-space, which was utilized in many primal-dual feasible- and infeasible-interior-point
methods (5, 7, 8, 9, 11], etc.) (dx dy dz) is given by
9
Bdx
= ;(Bx ; b)
>
=
B T dy ; dz = ;(AT w + B T y ; z ; c) >
(10)
Zdx + Xdz = ;(Xz ; te):
In our framework, we have paid little attention to such search directions but assumed that
the point (~x(t w) y~ (t w) z~ (t w)) into which all the ow induced by the search directions
runs is available.
To develop practical computational methods, we need to weaken the assumption (D) at
least we need to replace \exact" by \approximate" in (D). In Algorithms 6.1 and 6.2, the
exact optimizers
x = x~ (t w) and (y z) = (~y (t w) z~ (t w)) are necessary to compute the
p
~
gradient f w (t w) = a ; Ax and the Hessian matrix
f~pww (t w) = A I ; BT (B 2B T );1 B AT
with which we perform one Newton iteration to generate a new point in the w-space
w + (
(t w))dw
where
Hdw = Ax ; a
H = A I ; BT G;1B AT
G = B2BT
= (diag x)1=2 (diag z);1=2 :
(11)
It should be noted, however, that even when the exact optimizers are unavailable, the
search direction dw above is well-dened for every pair of x > 0 and z > 0. Thus, for
every (t x y z) 2 R++ Rn++ Rk Rn++ , the correspondence w ! dw denes a search
direction in the w-space.
17
Another critical question to be addressed is whether solving the system (11) to compute
a search direction dw in the w-space is essentially easier than one iteration of the primaldual interior-point method applied to the original pair of P and D without using the
decomposition techniques. To compute dw by solving (11), we need
(i) a construction of the matrix H = A I ; B T G;1 B AT which requires an
(implicit) inversion of the matrix G = B 2B T ,
(ii) a solution u of a system of equations Hdw = s for some s 2 Rm .
On the other hand, given a current iterate (x w y z) 2 Rn++ Rm Rk Rn++ , the
standard primal-dual interior-point method solves the system of equations in the search
direction (dx dw dy dz):
9
Adx
= ;(Ax ; a) Bdx = ;(Bx ; b)
>
=
T
T
T
T
A dw + B dy ; dz = ;(A w + B y ; z ; c) >
(12)
Xdz + Zdx = ;(Xz ; te)
where t 0. We can easily verify that the search direction (dx dw dy dz) can be represented as
9
;
b
)
>
Hdw = Ax ; a ; A2B T G;1(Bx
>
T
;
1
;A I ; B G B (r + s) >
>
=
T
(13)
Gdy = Bx ; b ; B A dw + r + s >
>
T
T
dz = A dw + B dy + r
>
>
;
1
dx = ;dz ; s
where
r = (AT w + BT y ; z ; c) and s = X ;1=2Z ;1=2(Xz ; te):
This representation (13) of the solution (dx dw dy dz) of (12) is called a compact-inverse
implementation in the literature 2]. Hence if we utilize the compact-inverse implementation
(13), the majority of the computation of (dx dw dy dz) are also spent for (i) and (ii).
Therefore we are required almost the same amount of computational work in solving (11)
to compute a search direction dw as in one iteration of the compact-inverse implementation
of the primal-dual interior-point method applied to the original pair of P and D without
using the decomposition techniques. This observation is ironic to our nice theoretical results
with the use of Newton's method presented so far, but sheds a new light on the compactinverse implementation. In particular, we observe that when x = x~ (t w) y = y~ (t w) and
z = z~ (t w), (13) turns out to be
9
>
Hdw = Ax ; aT
2
Gdy = ;BT A dw T >=
dz = A dw + B dy >
>
;1dx = ;dz
18
hence dw determined by the compact-inverse implementation (13) is exactly the same as
the search direction dw in the w-space determined by (11).
We conclude the paper by a general idea of computational methods consisting of the
following three steps:
Step xyz: Given (t w) 2 R++ Rm and (x y z) 2 Rn++ Rk Rn++ , compute a new iterate
(x0 y0 z0) by x0 = x + pdx and (y0 z0) = (y z) + d(dy dz) where (dx dy dz)
is given by (10) and p d 0 are step lengths.
Step w: Given t > 0, (x y z) 2 Rn++ Rk Rn++ and w 2 Rm , choose a search direction
dw in the w-space and generate a new iterate w0 by w0 = w + dw where > 0 is
a step length.
Step t: Decrease t.
Algorithms 6.1 and 6.2 may be regarded as ideal cases of such methods in which we always
perform innitely many iterations of Step xyz to compute the exact optimizers x~ (t w) of
f p(t w ) and (~y (t w) z~ (t w)) of f d (t w ) before we perform Step w with the search
direction dw determined by (11).
Acknowledgment. We would like to thank an anonymous referee of the paper 4] which
had been submitted to Operations Research Letters. Although the paper was not accepted
for publication, the referee argued some interesting and signicant points. In particular, the
referee suggested us to see whether a real valued function
n
X
f (w ) = maxfwT (Bx ; b) + ln xj : eT x = 1 x > 0g
j =1
on Rm is self-concordant. This question motivated the study of the subject of the current
paper.
References
1] J. R. Birge and L. Qi, \Computing block-angular Karmarkar projections with applications to stochastic programming," Management Science 34 (1988) 1472{1479.
2] I. C. Choi and F. Goldfarb, \Exploiting special structure in a primal-dual pathfollowing algorithm," Mathematical Programming 58 (1993) 33{52.
3] M. W. Hirsch and S. Smale, Dierential Equations, Dynamical Systems, and Linear
Algebra, (Academic Press, New York, 1974).
4] M. Kojima, N. Megiddo and S. Mizuno, \A Lagrangian Relaxation Method for Approximating the Analytic Center of a Polytope," Technical Report, IBM Almaden
Research Center, San Jose, CA 95120{6099, USA, 1992.
19
5] M. Kojima, S. Mizuno and A. Yoshise, \A primal-dual interior point algorithm for
linear programming," In N. Megiddo, ed., Progress in Mathematical Programming,
Interior-Point and Related Methods (Springer-Verlag, New York, 1989) 29{47.
6] L. S. Lasdon, Optimization Theory for Large Systems, (Macmillan, New York, 1970).
7] I. J. Lustig, \Feasibility issues in a primal-dual interior-point method for linear programming," Mathematical Programming 49 (1990/91) 145{162.
8] R. Marsten, R. Subramanian, M. Saltzman, I. J. Lustig and D. Shanno, \Interior
point methods for linear programming: Just call Newton, Lagrange, and Fiacco and
McCormick!," Interfaces 20 (1990) 105{116.
9] N. Megiddo, \Pathways to the optimal set in linear programming," In N. Megiddo, ed.,
Progress in Mathematical Programming, Interior-Point and Related Methods (SpringerVerlag, New York, 1989) 131{158.
10] Ju. E. Nesterov and A. S. Nemirovsky, \Self-concordant functions and polynomial-time
methods in convex programming," Report, Central Economical and Mathematical
Institute, USSR Acad. Sci. (Moscow, USSR, 1989).
11] K. Tanabe, \Centered Newton method for mathematical programming," In M. Iri and
K. Yajima, eds., System Modeling and Optimization (Springer-Verlag, New York, 1988)
197{206.
12] M. J .Todd, \Exploiting special structure in Karmarkar's linear programming algorithm," Mathematical Programming 41 (1988) 97{113.
13] M. J .Todd, \Recent developments and new directions in linear programming," In
M. Iri and K. Tanabe, eds., Recent Developments and Applications (Kluwer Academic
Publishers, London, 1989) 109{157.
Appendix. Calculation of Derivatives of f~p .
Dierentiating the identities in (5) by w 2 Rm , we rst observe
Bx~ w (t w) = O AT + BT y~ w (t w) ; z~ w (t w) = O )
(14)
Z~ (t w)~xw (t w) + X~ (t w)~zw (t w) = O
for every (t w) 2 R++ Rm . Dierentiating the identities in (5) by t 2 R++ , we also see
B x~ t(t w) = O BT y~ t(t w) ; z~ t(t w) = O )
(15)
Z~ (t w)~xt(t w) + X~ (t w)~zt(t w) = e
for every (t w) 2 R++ Rm . By solving (14) in x~ w (t w), y~ w (t w), z~ w (t w) and (15) in
x~ t(t w), y~ t(t w), z~ t(t w), we obtain the following lemma.
Lemma A.
(i) For every (t w) 2 R++ Rm ,
~ T
~ T (B X~ 2BT );1B X~ XA
x~ w (t w) = ; 1t X~ I ; XB
20
y~ w (t w) = ;(BX~ 2B T );1B X~ 2AT
~ T (B X~ 2BT );1B X~ XA
~ T:
z~ w (t w) = X~ ;1 I ; XB
(ii) For every (t w) 2 R++ Rm ,
~ T (B X~ 2BT );1B X~ e
x~ t(t w) = 1t X~ I ; XB
~
y~ t(t w) = (BX~ 2B T );1B Xe
~ :
z~ t(t w) = BT (BX~ 2B T );1B Xe
~ = X~ (t w) and Z~ = Z~ (t w).
Here X
Proof: (i) It follows from the second and third identities of (14) that
x~ w (t w) = ;X~ Z~ ;1z~ w (t w) = ;X~ Z~ ;1 AT + B T y~ w (t w)
Hence, by the rst identity of (14),
O = B x~ w (t w) = ;BX~ Z~ ;1 AT + B T y~ w (t w) :
Therefore we obtain that
y~ w (t w) = ;(BX~ Z~ ;1B T );1BX~ Z~ ;1AT
= ;(BX~ 2B T );1B X~ 2AT (since X~ Z~ = tI )
z~ w (t w) = AT + BT y~ w (t w)
= AT ;BT (BX~ 2B T );1B X~ 2AT ~ T (B X~ 2BT );1B X~ XA
~ T
= X~ ;1 I ; XB
x~ w (t w) = ;X~ Z~ ;1z~ w (t w)
~ T (B X~ 2BT );1BX~ XA
~ T:
= ; 1t X~ I ; XB
(ii) In view of the the second and third identities of (15), we see that
x~ t(t w) = Z~ ;1e ; X~ Z~ ;1z~ t(t w) = Z~ ;1 e ; X~ Z~ ;1B T y~ t(t w):
Hence, by the rst identity of (15),
O = B x~ t(t w) = B Z~ ;1e ; B X~ Z~ ;1BT y~ t(t w):
Therefore we obtain
~
y~ t(t w) = (BX~ Z~ ;1B T );1BZ~ ;1e = (BX~ 2BT );1B Xe
~
z~ t(t w) = BT (BX~ 2B T );1B Xe
~ T (B X~ 2BT );1B X~ e:
x~ t(t w) = Z~ ;1e ; X~ Z~ ;1z~ t(t w) = 1t X~ I ; XB
21
Proof of Theorem 4.1:
By the denition,
f~pw (t w) = x~ w (t w)T fxp (t w x~ (t w)) ; (Ax~ (t w) ; a)
= x~ w (t w)T (B T y~ (t w)) ; (Ax~ (t w) ; a)
= (Bx~ w (t w))T y~ (t w) ; (Ax~ (t w) ; a)
(since B x~ w (t w) = 0 by (14))
= a ; Ax~ (t w):
Thus we have shown (i). By the denition and (i), we see that
f~pww (t w) = ;Ax~ w (t w):
p
Hence we obtain by (i) of Lemma A and X~ = t~ that
~ T (B X~ 2B T );1 BX~ XA
~ T
f~pww (t w) = 1t AX~ I ; XB
= A~ I ; ~ B T (B ~ 2B T );1B ~ ~ AT :
This completes the proof of Theorem 4.1.
Proof of Theorem 4.3:
For simplicity of notation, we use
x(t s)
X
G
P
to denote x~ (t w + sh)
to denote X (t s) = X~ (t w + sh)
to denote BX 2B T
to denote I ; XB T G;1 BX
respectively.
(i) By (i) of Theorem 4.1,
df~p(t w + sh) = f~p (t
w
ds
Thus we have shown (i).
w + sh)T h = (a ; Ax(t s))T h:
22
(ii) By (ii) of Theorem 4.1,
d2f~p(t w + sh) = hT f~p (t w + sh)h
ww
ds2
= 1t hT AXPXAT h = 1t u(t s)T u(t s):
Thus we have shown (ii).
(iii) We rst observe that
dx(t s) = dx~ (t w + sh)
ds
ds
= x~ w (t w + sh)h
= ; 1 XPXAT h = ; 1 Xu(t s):
t
t
On the other hand, we see by (ii) that
d3f~p(t w + sh) = 2 u(t s)T du(t s) :
ds3
t
ds
(t s) , let
To evaluate duds
q(t s) = XAT h and r(t s) = G;1BX 2AT h:
Then
Gr(t s) = BX 2AT h
u(t s) = PXAT h = q(t s) ; XB T r(t s):
It follows that
dq(t s) = X AT h
s
ds
2BXX sB T r(t s) + BX 2B T dr(dst s) = 2BXX s AT h:
(t s) : Hence
Here X s = diag dxds
(t s) = 2BXX AT h ; 2BXX B T r(t s)
G drds
s
s
= 2BXX s AT h ; 2BXX sB T G;1BX 2AT h
= 2BX sPXAT h
23
(16)
dr (t s) = 2G;1BX PXAT h
s
ds
du(t s) = dq(t s) ; X B T r(t s) ; XB T dr (t s)
s
ds
ds
ds
T
T
;
1
2
T
= X s A h ; X sB G BX A h ; 2XB T G;1 BX s PXAT h
= I ; 2XBT G;1BX X ;1X sPXAT h:
Therefore
3 f~p (t w + sh)
d
T du(t s)
=
2
u
(
t
s
)
t
3
ds
ds
T
= 2h AXP I ; 2XBT G;1BX X ;1X s PXAT h
;1
T
= 2hT AXPX
1 X s PXA h
= ;2u(t s)T t diag u(t s) u(t s)
n
X
= ; 2t uj (t s)3:
j =1
Thus we have shown (iii).
(iv) By (i), we see that
d2f~p(t w + sh) = d((a ; Ax(t s))T h)
dtds
dt
= ;hT Ax~ t(t w + sh)
= ; 1t hT AXPe
= ; 1t u(t s)T v(t s):
(v) By (ii), we see that
(
)
d3 f~p(t w + sh) = d u(t s)T u(t s)
dtds2
dt
t
(t s) ; u(t s)T u(t s) :
= 2t u(t s)T dudt
t2
(t s) . By the denition
Using q(t s) and r(t s), we represent u(t s) as in (16) to evaluate dudt
of q(t s) and r(t s), we see that
dq(t s) = X AT h
t
dt
2BXX tB T r(t s) + BX 2B T dr(dtt s) = 2BXX tAT h:
24
(t s) . Hence
Here X t = diag dxdt
G dr(dtt s) = 2BXX tAT h ; 2BXX tB T r(t s)
= 2BXX tAT h ; 2BXX tB T G;1 BX 2AT h
= 2BX tPXAT h
dr(t s) = 2G;1BX PXAT h
t
dt
du(t s) = dq (t s) ; X BT r(t s) ; XB T dr(t s)
t
dt
dt
dt
= X tAT h ; X tB T G;1BX 2AT h ; 2XB T G;1BX tPXAT h
= I ; 2XBT G;1BX X ;1X tPXAT h:
Thus we obtain
u(t s)T dudt(t s) = hT AXP I ; 2XB T G;1BX X ;1X tPXAT h
= hT AXX tPXAT h:
By the denition of u(t s), x(t s), X t and Lemma A, we know that
u(t s) = PXAT h
dx(t s) = x~ (t w + sh) = 1 Xv(t s)
t
dt
t !
X ;1X t = X ;1 diag dx(dtt s) = 1t ( diag v(t s)) :
Therefore
d3f~p = 2 u(t s)T du(t s) ; u(t s)T u(t s)
ds2dt
t
dt
t2
T
T
= 2u(t s) (diag v(t s))tu2 (t s) ; u(t s) u(t s)
T
= u(t s) (2diag vt(2t s) ; I ) u(t s) :
This completes the proof of Theorem 4.3.
25