Analysis of Cycle Stealing with Switching Cost Takayuki Osogami Mor Harchol-Balter Alan Scheller-Wolf October 2002 CMU-CS-02-192 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Abstract We consider the scenario of two independent processors, each serving its own workload, where one of the processors (known as the “donor”) can help the other processor (known as the “beneficiary”) with its jobs, during times when the donor processor is idle. That is the beneficiary processor “steals idle cycles” from the donor processor. We assume that both donor jobs and beneficiary jobs may have generally-distributed service requirements. We assume that there is a switching cost required for the donor processor to start working on the beneficiary jobs, as well as a switching cost required for the donor processor to return to working on its own jobs. We also allow for threshold constraints, whereby the donor processor only initiates helping the beneficiary if both the donor is idle and the number of jobs at the beneficiary exceeds a certain threshold. We analyze the mean response time for the donor and beneficiary processors. Our analysis is approximate, but can be made as close to exact as desired. Analysis is validated via simulation. Results of the analysis illuminate several interesting principles with respect to the general benefits of cycle stealing and the design of cycle stealing policies. Carnegie Mellon University, CSD. Email: [email protected] Carnegie Mellon University, Computer Science Department. Email: [email protected]. This work was supported by NSF Career Grant CCR-0133077, by NSF ITR Grant 99-167 ANI-0081396 and by Spinnaker Networks via Pittsburgh Digital Greenhouse Grant 01-1. Carnegie Mellon University, GSIA. Email: [email protected] Keywords: cycle stealing, switching cost, queueing theory, task assignment, server farm, distributed system, unfairness, starvation, load sharing, supercomputing, Matrix-Analytic. 1 Introduction Motivation Since the invention of networks of workstations, systems designers have touted the benefits of allowing a user to take advantage of machines other than her own, at times when those machines are idle. This notion is often referred to as cycle stealing. Cycle stealing allows a user, Betty, with multiple jobs to offload one of her jobs to Dan’s machine if Dan’s machine is idle, giving Betty two machines to process her jobs. When Dan’s workload resumes, Betty must return to using only her own machine. We refer to Betty as the beneficiary, to her machine as the beneficiary machine/server, and to her jobs as beneficiary jobs. Likewise, we refer to Dan as the donor, to his machine as the donor machine/server, and to his jobs as donor jobs. Although cycle stealing provides obvious benefits to the beneficiary, these benefits come at some cost to the donor. For example, the beneficiary’s job may have to be checkpointed and the donor’s working set memory reloaded before the donor can resume, delaying the resumption of processing of donor jobs. In our model we refer to these additional costs associated with cycle stealing as switching costs. A primary goal of this paper is to understand what is the benefit of cycle stealing for the beneficiary and what is the penalty to the donor, as a function of switching costs. A second goal is to derive parameter settings for cycle stealing. In particular, given non-zero switching costs, cycle stealing may only pay if the beneficiary’s queue is “sufficiently” long. We seek to understand the optimal threshold on the beneficiary queue. More broadly, we seek general insights into which problem parameters have the most significant impact on the effectiveness of cycle stealing. The analytical model We assume two independent M/GI/1/FCFS queues, the beneficiary queue and the donor queue. Jobs arrive at rate (respectively, ) at the beneficiary (respectively, donor) queue and have service re- quirement represented by the random variable from distribution drawn from distribution (respectively, drawn ). The load made up by beneficiary (respectively, donor) jobs is denoted by and . If the donor server is idle, and if the number of jobs at the beneficiary queue is at least , the donor transitions into the switching state, for a random amount of time, . After time, the donor server is available to work on the bene ficiary queue and the beneficiary queue becomes an M/ /2 queue. If a donor job arrives during , (respectively, ) where or during the time the donor is helping the beneficiary, the donor transitions into a switching back state, for a random amount of time, !#" . After the completion of the switch back, the donor server resumes working on its own jobs. We assume the first four moments of the service times are finite, and queues 1 are stable. No work is accomplished by the donor server while switching. A few details are in order. First, in the above model, the donor processor continues to cooperate with the beneficiary even if there is no beneficiary work left for it to do — the donor processor only switches back when a donor job arrives. (We also analyzed the case where the donor processor switches back when it is not needed, see Appendix A. We found that this has almost no effect on performance, even under high switching costs). Second, we assume that if the donor processor is working on a beneficiary job and a donor job arrives, that beneficiary job is returned to the beneficiary queue and will be resumed by the beneficiary processor as soon as that processor is available. The work done on the job is not lost, i.e., the job is checkpointed. Finally, our model can be generalized to the case where there is a threshold, , on the number of donor jobs as well — i.e., the donor processor only returns to the donor queue when the number of jobs at the donor queue is at least . Throughout this paper, aside from the stability section (Section 4), we assume for simplicity, and focus on the effect of . In Appendix B, we extend the analysis to general . Our performance metric throughout is mean response time for each class of jobs. Difficulty of analysis and Previous work Consider the simplest instance of our problem — where the service requirements of all jobs are each drawn from exponential distributions and the switching costs and threshold are zero. Even for this simplest instance the continuous-time Markov chain, while easy to describe, is mathematically intractable. This is due to the fact that the stochastic process having state (number beneficiary jobs, number donor jobs ) grows infinitely in two dimensions and contains no structure that can be exploited to obtain an exact solution. While solution by truncation of the Markov chain is possible, the errors that are introduced by ignoring portions of the state space (infinite in two dimensions) can be quite significant, especially at higher traffic intensities. Thus truncation is neither sufficiently accurate nor robust for our purposes. To our knowledge, there has been no previous analytical work on the problem of cycle stealing with setup costs. Below we describe prior work on the “coupled processor model”. In this model two processors each serve their own class of job, and if either is idle it may help the other, increasing the rate of this other processor. This help incurs no switching cost and has an effect if even a single job is present. These models inherently assume a preemptive resume discipline: when a processor returns to It is easy to generalize our analysis to the case where the beneficiary requires a “switching time” before it can resume a job that the donor started. We have chosen to ignore this generalization for purpose of clarity. It is also trivial to extend our results to the case where all work on the job in progress is lost if a donor job arrives, provided that we assume that the job is restarted with a new service time – which we feel is unrealistic. It is also possible to extend our results to the case where the donor must complete the beneficiary job in progress before it switches back. This is the topic of current research, see Section 7. For and , truncation leads to error, even with states, and takes minutes to compute. As nears , the error increases indefinitely. Under jobs sizes more variable than exponential, the error increases. 2 its own queue, none of its work is lost. All works we mention below consider Poisson arrivals. Early work on the coupled processor model was by Fayolle and Iasnogorodski [5] and Konheim, Meilijson and Melkman [6]. Both papers assume exponential service times, deriving expressions for the stationary distribution of the number of jobs of each type. Fayolle and Iasnogorodski use complex algebra, eventually solving either a Dirichlet problem or a homogeneous Riemann-Hilbert problem for a circle, depending on the accelerated rates of the servers. Konheim et. al. assume that the accelerated rate is twice the original rate, which yields simpler expressions (still in the form of complex integrals). No indication is given in either work as to how the evaluation of these expressions. In addition, because the processors work in concert, rather than on different jobs, these systems will gain no multi-server benefit when serving highly variable jobs; short jobs may get stuck waiting behind long jobs in a single queue. The above work was extended by Cohen and Boxma [4] to the case of general service times. They consider stationary workload, which they formulate as a Wiener-Hopf boundary problem. This leads to complex expressions involving either integrals or infinite sums; if the queues are symmetric simpler expressions for mean total workload are found. They again have the two processors working in concert, without a switching cost, providing complex analytical expressions, rather than numerical values. In more recent work, Borst, Boxma and van Uitert [3] apply a transform method to the expressions in [4], yielding asymptotic relations between the workloads and the service requirement distributions. This leads to the insight that if a processor has a load less than one, it is “insulated” from the heavy-tail of the other, as long periods without cooperation will not lead to large backlogs. This is not the case if the load is greater than one, as the queue now must rely on help to be stable. Borst, Boxma and Jelenkovic [2] consider a very similar question under generalized processor sharing. Using probabilistic bounds, they show that different service rates can either insulate the performance of different classes from the others or not, again depending on whether the minimal rate is larger than the load. Both of these papers are concerned with the asymptotic behavior of workload, whereas our work isolates mean performance. Our work is thus complementary to these results. Our approach This paper presents the first analysis of cycle stealing under general service requirements with switching costs and a switching threshold. Recall that the difficulty in analyzing cycle stealing is that the corresponding stochastic process defines a Markov chain which grows infinitely in two dimensions (2D), making it intractable. The key idea in our approach is to find a way to transform this 2D chain into some 1D infinite Markov chain which can be analyzed. The questions in applying such a transformation are (i) what should the 1D Markov chain track, and (ii) how can all the relevant information from the 2D chain be captured in the 1D chain. Our 1D chain tracks the number of beneficiary jobs. For the donor 3 jobs, our state space contains only binary knowledge: zero jobs or jobs. Nevertheless we are able to capture detailed information on the number of donor jobs by using special transitions in our Markov chain, where these transitions represent the lengths of an assortment of busy periods. The difficulty lies in specifying the right busy periods, some of which transcend the definition of the analytical model. Once the 1D Markov chain is specified, the hard work is finished, since this chain can be solved efficiently using known numerical (Matrix analytic) techniques. While a closed-form solution may be preferable, our chain is compact enough, and Matrix analytic methods powerful enough, that only a couple seconds are required to generate any of the results plots in this paper. Furthermore, our method very easily generalizes to more complex problem formulations e.g., multiple donors (Section 7). Our analysis is approximate, but can be made as close to exact as desired. The only approximation lies in representing the length of the busy periods by a Coxian distribution fit to a finite number of the busy period moments. In this paper, we use a 2-stage Coxian to capture the first 3 moments of the busy periods, and verify that this is sufficient via simulation. However, our method naturally extends to matching more moments. Our analysis assumes a Poisson arrival process for both classes of jobs. The service requirements of the each class are assumed to be drawn i.i.d. from general distributions (which we approximate by a Coxians). The arrival process can easily be generalized to a MAP – Markovian Arrival Process [1]. Summary of results Our analysis yields many interesting results concerning cycle stealing, detailed in Section 6. While cycle stealing obviously benefits the beneficiaries and hurts the donors, we find that when , cycle stealing is profitable overall even under significant switching costs as it may ensure stability of the beneficiary queue. For , we define load-regions under which cycle stealing pays. We find that in general the switching cost is only prohibitive when it is large compared with switching cost, cycle stealing always pays. Two counterintuitive results are that when . Under zero , the performance of the beneficiaries is sur- prisingly insensitive to the switching cost, and also insensitive to the variability of the donor job size distribution. Even when the variability of the donor job sizes is very high, and donor help is very bursty, the beneficiaries don’t seem to suffer. We also study the effect of , and present several rules of thumb for choosing the optimal setting of as a function of the server loads, the switching cost, and the mean job sizes. 4 Outline In Section 2 we present our analysis for the case of zero switching cost, generalizing to non-zero cost in Section 3. In Section 4 we provide stability conditions for the donor and beneficiary hosts. In Section 5 we validate our analysis by considering limiting analytical cases and via simulation. In Section 6 we present results for mean response times of donor and beneficiary jobs under various loads, job size distributions, switching costs, and thresholds. Section 7 describes extensions to the model. 2 Analysis without switching cost , i.e. the donor server switches to help beneficiary jobs when there are zero In this section we analyze the simpler case of zero switching cost. Figure 1 shows our Markov chain for the case where donor jobs and there are at least three beneficiary jobs. Figure 1(a) shows a simplified form of our chain where job sizes and busy periods are assumed to be exponentially-distributed. Figure 1(b) and (c) show the case of generally-distributed (Coxian) job sizes and busy periods. The actual chain that we evaluate in the paper is a superposition of the chains in Figure 1(b) and (c), shown in Appendix C. The first two components of each state denote the number of beneficiary jobs and the number of donor jobs respectively. The states of the Markov chain have been grouped into three rows, labeled (i) cooperating row: indicates that the donor processor is cooperating with the beneficiary; (ii) independent row, with donor job: indicates that the donor and beneficiary processors are each working indepen- dently on their own queues and there is at least 1 donor job present; (iii) independent row, with donor jobs: indicates that the donor and beneficiary processors are each working independently on their own queues and there are zero donor jobs present. Observe that while the states track the precise number of beneficiary jobs, they keep only a binary record (zero or ) of the donor jobs. The key idea is that to determine beneficiary performance we can avoid tracking the number of donor jobs because we only need to know when the donor queue is idle. Thus we use transitions (labeled ) to represent the length of a busy period of donor jobs. The logic behind the Markov chain in Figure 1(a) is as follows: If we are in the row where the processors are working independently and the number of donor jobs is zero, the left-right transitions allow us to track the number of beneficiary jobs. If a donor job arrives we transition to the row where processors are working independently and the number of donor jobs is at least one. The time spent in this row is the length of a donor busy period. If at the end of the busy period the number of beneficiary jobs is below , then we return to the row where processors are working independently and the number of donor jobs is zero. If however at the end of the busy period the number of beneficiary jobs is at least , then we move to the cooperating row, where we stay until there is a donor arrival. 5 λ Cooperating Servers µ 0B,0D,C λ B B λ B λD B B λD λ λ 4B,0D,C B λD BD λ BD λ BD B 3B,1D B λD B 2µ B µ 4B,1D µ B B B B 2B,0D 1B,0D µ B λD B BD B 0B,0D 3B,0D,C 2B,1D µ BD 2µ λ 1B,1D µ λ B λD λ 0B,1D Working Independently, # Donors = 0 2B,0D,C λD λD Working Independently, # Donors = 1+ 2µ 1B,0D,C λ B µ B B (a) λ µ 0B,0D,C λ B D 0B,1D β2 µ β1 λ µ β3 1B,1D B D λ β2 λ β3 λ β2 B λ D 2µ β2 λ µ β1 β3 µ B λ B µ λ β1 4B,0D,C B β3 B 2B,1D λ D β3 β2 B B β1 B 3B,1D 4B,1D µ B B B D B 2B,0D 1B,0D µ B B D λ B B λ B 3B,0D,C λ λ µ λ B 2µ B µ β1 D 0B,0D B D λ B 2B,0D,C λ B B λ B 2µ 1B,0D,C λ λ λ λ B µ B B (b) b2 0B,0D,C 2b2 1B,0D,C b1 b3 b3 b1 2b3 b2 b3 b2 1B,1D b1 b3 2b3 b3 b1 2b3 b1 4B,0D,C 2b1 b2 b2 2B,1D b1 b3 2b2 3B,0D,C 2b1 b2 b2 0B,1D 2b2 2B,0D,C 2b1 b3 b3 b2 3B,1D b3 b1 4B,1D b1 b3 b3 λB b2 0B,0D b3 1B,0D b2 b1 b3 b3 λD 2B,0D b2 b1 b1 b3 XB ΒD (c) Figure 1: Markov chain for cycle stealing without switching cost. (a) Where is exponential and is drawn using a single (exponential) transition. (b) Where is exponential, and is represented by a 2-stage Coxian. (c) Where is represented by a 2-stage Coxian and is drawn using a single (exponential) transition. 6 Figure 1(b) and (c) show the same Markov chain as 1(a), where now and are represented by 2-stage Coxian distributions. Observe that when the donor and beneficiary are cooperating, the state space now must maintain a state for two partially-completed beneficiary jobs. If a donor job arrives while we’re in the cooperating row, the partially-completed job that the donor was working on will be moved to the head of the beneficiary queue (we currently assume zero cost for the transfer, but this is easy to generalize to non-zero cost). In order to specify the Markov chain, we need to compute the first three moments of find a 2-stage Coxian which matches these. The Laplace transform of The moments of and then is: are obtained from the transform by taking derivatives. In [9] we derive necessary and sufficient condition for matching the first three moments of a distribution using a 2-stage Coxian. 2.1 Computing response times The mean response time for donor jobs is trivial to compute – it is simply the response time of the M/ /1 queue, since the beneficiary jobs are preemptive and switching cost is zero. The mean number of beneficiary jobs is easy to compute from the chain in Figure 1 using the MatrixAnalytic method, described in [8] and [7]. This is a simple, compact and efficient method for solving QBD (quasi-birth-death) Markov chains which are infinite in only one dimension, where the chain repeats itself after some point, as does Figure 1. We then apply Little’s Law to get their mean response time. Every plot in this paper which uses Matrix-Analytic analysis (to solve multiple instances with different parameter values) was produced within a couple of seconds using the Matlab 6 environment. 3 Analysis with switching cost In this section, we analyze cycle stealing with switching costs in both directions. We assume that (i) the donor only makes the switch if the donor queue is idle and the number of jobs at the beneficiary queue is at least , and (ii) the donor stays at the beneficiary queue until there is a donor arrival. Let denote the time required for the donor to switch to working on the beneficiary queue, and #" the time to switch back to the donor queue. Figure 2 shows the Markov chain for cycle stealing with . For simplicity, we have drawn and as being exponentiallyswitching cost where distributed — these are easy to replace with Coxian distributions. The first two rows of the Markov chain in Figure 2 represent the case where the donor and beneficiary servers are working independently, where row 2 indicates zero donor jobs and row 1 indicates at least 7 λ Working Independently #Donors = 1+ 0B,1D λD B B D+ba 0B,1D,ba λD Switching to help λ 0B,0D,sw λD B D+ba Ksw 0B,0D,C Region 2 λ 1B,1D,ba λD λ 1B,0D,sw B λD Ksw λ B λ 2B,1D,ba λ B µ B 3B,1D,ba λD λ 2B,0D,sw B λD 2B,0D,C B λ 3B,0D,sw λ Figure 2: Markov chain for cycle stealing with switching cost, Ksw B 2µ 3B,0D,C B λD B D+ba B µ B λD Ksw 4B,1D,ba B B D+ba B µ B µ B λD B B 2µ 1B,0D,C B λ B µ B µ B D+ba µ B B µ BD BD B λ B µ BD B 2B,0D µ B λ 4B,1D µ B B 1B,0D µ Region 1 Switching Back λD B 3B,1D µ B BD λ 0B,0D λ B 2B,1D µ B BD λ Working Independently #Donors = 0 λ B 1B,1D µ λD Cooperating Servers λ B 4B,0D,sw B λD λ Ksw B 2µ 4B,0D,C B and exponential, . one donor job. A transition from row 2 to row 1 starts a donor job busy period, the length of which is beneficiary jobs in the system, the Markov chain simply transitions back to row 2. However if there are beneficiary jobs, the represented by . When this busy period completes, if there are Markov chain transitions to row 4 – the row for switching to help. Observe that the Markov chain can also go from row 2, the zero donor jobs row, directly to row 4, the switching to help row, as soon as there are beneficiary jobs. After time, if no donor job arrived, the Markov chain transitions to row 5, the cooperating row. As soon as a donor job arrives, either during or during the cooperating phase, the Markov chain immediately transitions into row 3 – the switching back row. At the point of entering row 3, there is a single donor job, which just arrived. The time to return to the case of zero donor jobs is thus a busy period started by the sum of and #" . We denote this busy period by #" . In our Markov chain, we have drawn the busy period transitions with a single bold transition, although in evaluating the chain we will replace this bold transition with a 2-stage Coxian as usual. Computing the first three moments of a busy period of donor jobs, started by a job of size #" , the switch back time, is straightforward by taking derivatives of the Laplace transform: !#" For some busy periods such as , there does not exist a 2-stage Coxian which matches the first 3 moments, see [9]. In such cases we divide into two busy periods: , the busy period started by a donor job, and , the busy period started by switching back. We then may use two 2-stage Coxians to represent each busy period, since . !#" #" $%" & (' )#" 8 Calculation of response times The mean response time for beneficiary jobs is easy to compute, since the chain keeps track of the number of beneficiary jobs and can be analyzed via Matrix Analytic methods. The donor jobs see an M/GI/1 queue where, at times, the first job in a busy period must wait for a time to switch back, #" . Thus the response time for donor jobs is the response time under an M/GI/1 queue with setup time : with probability #" with probability We consider only regions 1 and 2, as Region 1 Region 1 or 2 Region 2 Region 1 or 2 is defined by what the first job to start a busy period sees. The expected waiting time for an M/GI/1 queue with only donor jobs and setup time We thus have: ! #" Response time for donor " is known [10]: " ! 4 Stability In this section we derive stability conditions on and thresholds and . for cycle stealing with switching costs and Theorem 1 The stability condition (necessary and sufficient) for donor jobs is Proof: We first prove sufficiency. Assume In this case . Let . denote the length of a busy period at the period at the donor queue is started by a ., theA busy mean length of a busy period is #" % $ donor queue. We first consider the case switching cost #" and donor jobs. As . In this case clearly has a proper distribution and thus the queue is positive regenerative, hence stable. Next we consider the case is smaller than in the case because there will be donor busy periods in which the donor hasn’t left the donor queue, implying (i) there is no switching back cost, and (ii) the busy period is started by only one donor job. 9 Necessity follows immediately from the fact that the donor queue is unstable for all Before we derive the stability condition on . , we prove a lemma allowing us to assume , then it is stable at , %$ Lemma 1.1 If the beneficiary queue is stable at . . denote the number of beneficiary jobs at time given . Consider a new in which no service is given by either server to a beneficiary job if there are process jobs stochastically dominates . Along any sample present at the beneficiary queue. Note that . path, will be equal to + . Therefore, if is proper, so too is Proof: Let We now prove the stability condition on for exponential ! . Theorem 2 For exponential , the stability condition for the beneficiaries with donor threshold is: #" . Let , Proof: By Lemma 1.1, we are allowed to assume "!"&#$ (*% ' )+ denote the time average fraction of time that the donor helps the beneficiary. Then the strong law of large numbers can be used to show that a necessary and sufficient condition for stability of the beneficiary jobs is If , , . Thus we assume and derive , , using renewal reward theory. Let’s consider a where - , . By renewal-reward theory: renewal to occur every time the donor becomes idle. Recall Fraction of time donor helps beneficiary denotes the help (reward) during the renewal cycle, and - denotes the length of the renewal cycle. Observe that there may be any number of donor arrivals ranging from we switch back only after the arrival. Let denote the sum of the service times of donor jobs. Then, as . #" to during and / For non-exponential 02143 , conditions can be derived in terms of expectations and probabilities. 10 , #" Ksw where . Donor arrival Donor goes idle BS+ba Donor helping Donor arrival Donor arrival Donor arrival Donor arrival Donor goes idle Nth arrivals D - is the interarrival time for donor jobs. To compute we condition on the number of donor arrivals during . If there are arrivals, then the expected time spent helping is the time until there are more donor arrivals, . . Let denote the probability that there are donor arrivals during . As is exponentiallydistributed, Therefore, - "! #$ ! # $ %( ' )+ Thus, the stability condition for the beneficiary jobs is !#" %$ < "%$ ON "%$ BED '&)( +* -, / ..80 130 192%42:45 57;=<?C >A@ .0 G 5 .80 19H IJ5 76 %" $ >F@ 6 RQ8S '&)( LK=M ' &)( =P ? < &)( >F@ Note that the right hand side is an increasing function of ; in terms of stability, the larger , the . , UT $ , " ; as better the performance. In particular, when Figure 3 shows the stability condition for beneficiaries as a function of . In case (1), we set #" and . The region below each line satisfies the stability condition. As increases as high as 100, the effect of switching overhead becomes negligible, and the stability . In case (2), we set and #" . " condition approaches The switching cost is large, and it is observed that there is little benefit at moderate or high in terms of stability. However, there is still large benefit at low . In case (3), we set and #" . The stability region is much larger; when , the stability region is 11 2 2 N th=1 D N th=10 D NDth=100 1.8 1.8 B ρ ρ ρ B 1.6 1.4 1.4 1.4 1.2 1.2 1.2 1 0 0.2 0.4 ρD 0.6 0.8 1 0 1 (1) N th=1 D N th=10 D NDth=100 1.8 1.6 B 1.6 2 N th=1 D N th=10 D NDth=100 (2) 0.2 0.4 ρD 0.6 0.8 1 0 1 0.2 (3) 0.4 ρD 0.6 0.8 1 Figure 3: Stability condition on beneficiaries for cycle stealing with switching costs and thresholds. almost the same as that of in case (1). This is intuitive: when and expected amount of donor work when the donor switches back is the same as that when . , the and 5 Validation of analysis Since our analysis involves approximation of busy periods by 2-stage Coxians, it is of paramount importance to validate the analysis. Analytical validation against limiting cases is presented in Section 5.1, and simulation validation is reported in Section 5.2. 5.1 Validation against known limiting cases We evaluate the performance of cycle stealing under 4 limiting situations: T and " , where " is the stability condition for T , , T T , derived in Section 4. For these limiting cases, the response times of the beneficiaries and donors are easy to evaluate analytically since they converging in performance to either an M/GI/1, an M/GI/1 with setup cost, or an M/M/2. Figure 4 verifies that our analysis of cycle stealing has the correct limiting behavior in all these cases. 5.2 Validation against simulation The accuracy of our analysis was also validated against simulation: a subset of out validation experiments is shown in Figure 5. Job sizes are assumed to be exponential. We show three cases: In case (a), . In case (b), and . In case (c), and . In all cases, the results of analysis are in very close agreement with simulation. The only mild discrepancy is the performance of the beneficiaries in case (b), under high load. Under case (b), donor jobs are large, making their busy periods more variable, especially at high loads. As our analysis is very dependent on these busy periods, matching only the first three moments may introduce error in 12 Performance of Beneficiaries 6 1.8 1.7 1.6 1.5 1.4 1.3 1.2 0 0.05 ρ 1.2 1.15 1.1 1.05 1 0.95 0 0.1 response time for beneficiaries 1.9 11 M/M/1 CS 0.05 ρ d 2 M/M/1 CS response time for beneficiaries 1.25 M/M/2 CS response time for beneficiaries response time for beneficiaries 2 10 9 8 7 6 0.9 0.1 0.95 ρ B x 10 CS 1.5 1 0.5 0 1 1 1.1 D 1.2 ρ 1.3 1.4 B Performance of Donors 4 2.3 2.2 2.1 2 1.9 0 0.1 T D 7.76 7.74 7.72 7.7 7.68 0 0.05 ρ 7.78 0.05 ρ T (2) B 8.85 M/G/1 CS 3 2 1 (3) 0.95 ρD 8.75 8.7 8.65 8.6 8.55 8.5 1 1 T M/G/1 w/setup CS 8.8 4 0 0.9 0.1 x 10 response time for donors 2.4 5 M/G/1 CS 7.8 response time for donors response time for donors 2.5 (1) 7.82 M/G/1 CS response time for donors 2.6 1.1 T (4) 1.2 ρ B 1.3 1.4 " Figure 4: Validation of our cycle stealing analysis against four limiting cases. For lack of space we , #" , is exponentiallyshow graphs only for the specific parameters: distributed with mean 1 and is a Coxian with mean 1 and . this case. We hypothesize that the accuracy of our analysis would improve if we matched more moments of the busy periods using Coxians with more stages. 6 Results of Analysis This section discusses our results, organized as take-home messages. We begin with a summary: 6.1 Take-home messages (Section 6.2) Cycle stealing is always a win when When (and , but doesn’t pay when . ), cycle stealing can provide an infinite benefit to beneficiaries over dedicated servers, with comparatively little pain to the donors. Recall that the stability region for the donors under cycle stealing is much greater than under dedicated servers. Factors such as increased switching costs, increased When , there is so little gain to the beneficiaries that cycle stealing with non-zero switch-. However since the gain is still infinite these factors are less important in the domain , and increased result in lower gains for the beneficiary. ing overhead doesn’t pay. We therefore focus the rest of the results section on the remaining case: 13 . Performance of beneficiaries 8 Analysis Simulation 7 6 5 4 3 2 0.2 0.4 0.6 ρD 0.8 7 6 5 4 3 2 0.2 1 80 Analysis Simulation response time for beneficiaries 8 response time for beneficiaries response time for beneficiaries 9 0.4 0.6 ρD 0.8 Analysis Simulation 70 60 50 40 30 20 0.2 1 0.4 0.6 ρD 0.8 1 Performance of Donors 120 10 8 6 4 2 0.2 12 Analysis Simulation 100 response time for donors Analysis Simulation response time for donors response time for donors 12 80 60 40 20 0.4 (a) 0.6 ρD 0.8 8 6 4 0.4 (b) 0.6 ρD 0.8 1 0 0.2 0.4 0.6 ρD 0.8 1 (c) Figure 5: Validation of analysis against simulation. #" . We fix , and vary . and are exponentially-distributed. . Analysis Simulation 2 0 0.2 1 10 (Section 6.3) For , cycle stealing has regions of high gain/low pain and also regions where the reverse is true. These regions depend on job sizes, switching costs, thresholds and loads. We organize performance into these gain/pain regions. We find that when the switching costs are low, cycle stealing leads to high gain and low pain. However high switching costs can reverse this effect. We find that the performance of the donors is sensitive to the switching cost, while surprisingly, the performance of the beneficiaries is far less sensitive to the switching cost. We also see that higher implies higher gains. (Section 6.4) For , variability of donor job sizes has very little effect. We find that the variability of the donor jobs has little effect on beneficiary performance. This is very surprising; we expected the beneficiary to gain far less from the bursty help of a donor with irregular (highly variable) job sizes. For the above intuition is in fact correct. This echoes findings in [3] and [2] for a different model. (Section 6.5) Setting the best threshold is non-trivial. Finally we tackle the practical problem of how to set . Higher implies less gain for the beneficiaries and less pain for the donors. However, the performance of the beneficiaries is relatively insensitive to increasing , while the donors benefit quite a bit from raising . The absolute effect of depends greatly on the particular loads 14 and . The effect of on Performance of beneficiary jobs 30 20 10 0.5 1 ρ (a) 20 10 0 0 B 30 0.5 ρ 20 10 0 0 1 2.2 2.05 2 1.95 1.9 1 ρB (a) 20 10 0 0 0.5 ρB 2.8 2.6 2.4 2.2 1 (d) E[K]=1 CS with Nth=1 B CS with Nth=10 6.5 5.4 3 1.8 0 Dedicated 30 1 Performance of donor jobs 2 1.85 40 (c) E[K]=0 Response time for donors Response time for donors 2.1 CS with Nth =1 B CS with Nth =10 B B 3.2 2.15 0.5 0.5 ρ (b) E[K]=1 Response time for donors 30 B 1.8 0 40 50 B Response time for donors 0 0 40 Response time for beneficiaries 40 50 Response time for beneficiaries 50 Response time for beneficiaries Response time for beneficiaries 50 5.2 5 4.8 Dedicated 6 5.5 5 4.6 0.5 ρ 1 0 0.5 ρ B 1 0 (b) E[K]=1 0.5 ρ 1 B B (c) E[K]=0 (d) E[K]=1 Figure 6: The response time for beneficiaries and donors as a function of . Graphs show the case of (i) , (ii) cycle stealing with , and (iii) dedicated servers. In all figures cycle stealing with are exponential with mean 1. Switching cost is exponential with mean 0 or 1, as labeled. and mean performance (of beneficiaries and donors together) is irregular and delicate. In this section we derive five rules of thumb for setting as a function of the server loads, the switching costs, and the donor job size. Due to limited space we typically only show a couple options for each parameter. Many more graphs are available in Appendix D. 6.2 Benefits of cycle stealing for wide range of Figure 6 shows the response time for beneficiary jobs (top row) and donor jobs (bottom row) as a function of , where is held fixed at (columns 1 and 2) and at (columns 3 and 4). Each graph shows , (ii) cycle stealing with , and (iii) dedicated response time under (i) cycle stealing with servers. The first and third columns have no switching cost, and the second and fourth columns have exponential switching cost with mean 1. When , cycle stealing can provide an infinite benefit to beneficiary jobs over dedicated servers. We find that factors like increased switching costs, increased load at donor, and increased all result 15 in lower gains for the beneficiary. However since the gain is infinite these factors are less important when . We also see that the response time of donor jobs is bounded by the response time for an M/GI/1 queue with setup cost , where is the switch back cost. Increasing results in less penalty to the reaches its maximum. When assuming non-zero , the beneficiaries benefit so little that cycle stealing doesn’t pay, switching cost. We therefore focus the rest of the results section on the case: . donor, converging to the same point for all values when 6.3 Benefit of cycle stealing for Gain of Beneficiaries and Loss of Donors — 0.8 mid gain low loss 1 0.8 0.8 ρ high gain low loss 0.4 1 low gain low loss GOOD BAD 0.8 mid gain mid loss 0.6 ρD 0.6 D 0.6 1 0.6 ρD low gain low loss ρD 1 0.4 0.4 0.2 0.2 GOOD 0.4 mid gain high loss 0.2 0 0.6 ρB 0.8 0 1 (a) 0.6 ρB 0.8 0 1 high gain high loss 0.6 ρB 0.8 0.2 0 1 0.6 ρB 0.8 1 (b) E[K]=1 Gain of Beneficiaries and Loss of Donors — 1 0.8 0.8 0.6 1 low gain low loss GOOD 0.6 mid gain mid loss 0.4 0.4 0.4 0.2 0.2 0.2 BAD 0.8 0.6 ρD D 0.8 0.6 high gain low loss mid gain low loss ρ 1 ρD low gain low loss ρD 1 GOOD 0.4 high gain mid loss 0.2 high gain high loss 0 0.6 ρB 0.8 1 (a) 0 0.6 ρB 0.8 0 1 0.6 ρB 0.8 1 0 0.6 ρB 0.8 1 (b) E[K]=1 Figure 7: Columns 1 and 3 show the gain of beneficiary jobs and loss of donor jobs relative to dedicated servers. High gain/loss refers to a difference. Low gain/loss refers to a difference. Columns 2 and 4 show the mean performance (over all jobs) relative to dedicated servers. In all figures: and are exponentially-distributed with mean 1. Figures 7 and 8 evaluate cycle stealing in the region , as a function of . In Figure 7 and both have mean 1, while Figure 8 assumes has mean 1 and has mean 10. Both figures assume switching cost is 0 or 1, and is 1 or 3. In our discussion below, we first concentrate on the effect of switching cost, and then look at the effect of . A few definitions are necessary: The term “gain” refers to the improvement in mean response time experienced by beneficiary jobs, as compared with dedicated servers. This gain is expressed as a per16 Gain of Beneficiaries and Loss of Donors — 1 low gain low loss 1 0.8 mid gain low loss 0.8 0.4 high gain low loss 0.2 0 0.6 ρB 0.8 GOOD 0.4 0.4 0.2 0.2 0 1 (a) 0.8 0.6 ρD ρ D 0.6 0.6 ρB 0.8 0 1 BAD 0.6 ρD 0.8 0.6 1 low gain low loss ρD 1 mid gain low loss high gain low loss 0.6 GOOD 0.4 ρB 0.2 0.8 0 1 0.6 ρB 0.8 1 ρB 0.8 1 (b) E[K]= 1 Gain of Beneficiaries and Loss of Donors — 1 low gain low loss 1 0.8 mid gain low loss 0.8 0.6 ρD ρ D 0.6 0.4 high gain low loss 0.2 0 0.6 ρB 0.8 1 (a) GOOD 0.4 0.4 0.2 0.2 0 0.6 ρB 0.8 0 1 BAD 0.8 0.6 ρD 0.8 0.6 1 low gain low loss ρD 1 mid gain low loss high gain low loss 0.6 GOOD 0.4 ρB 0.2 0.8 0 1 0.6 (b) E[K]=1 Figure 8: Columns 1 and 3 show the gain of beneficiary jobs and loss of donor jobs relative to dedicated servers. High gain/loss refers to a difference. Low gain/loss refers to a difference. Columns 2 and 4 show the mean performance (over all jobs) relative to dedicated servers. In all figures: and are exponentially-distributed, where has mean 1 and has mean 10. centage of the mean response time under dedicated servers. The beneficiary jobs may experience low gain ( ), mid gain (between and ), or high gain ( ). The term “loss” refers to the relative pain (increase) in response time for donor jobs, as compared with dedicated servers. Loss is also expressed as a percentage of the response time under dedicated servers. Odd-numbered columns show various regions of gain and loss. Even-numbered columns use different terms: “good” and “bad.” The “good” and “bad” terms refer to mean response time overall (over all jobs including beneficiaries and donors). “Good” indicates that mean response time has dropped as a consequence of cycle stealing. “Bad” indicates that mean response time has increased as a consequence of cycle stealing. We start by looking at Figure 7. The first and second columns show the case of zero switching cost. We see that all regions are low loss regions (in fact zero loss), and higher yields higher gain for the beneficiaries. The third and fourth columns show switching cost of 1. The non-zero switching cost creates slightly reduced gain for the beneficiaries (Recall from Section 6.2 that switching cost has a much more pronounced effect on the beneficiaries when , as the beneficiary is dependent on the donor). More importantly switching cost creates loss for the donor jobs. When 17 is very low, the loss appears high, but this is primarily due to the fact that “loss” is defined as a percentage relative to the response time under dedicated servers, and the response time under dedicated servers is obviously low for small . The mean gain over all jobs is no longer always positive as shown in the fourth column of Figure 7. Although not shown, we have also investigated higher switching costs, and these lead to the same trend of slightly less gain for beneficiaries and more loss for donors. Figure 8 differs from Figure 7 only in which now has mean 10. The effect of this change is dramatic: now a switching cost of 1 has almost no effect on either beneficiaries or donors. This makes sense since the setup cost experienced by the donor job is now relatively small compared to its size. We can conclude that cycle stealing is most effective when the switching cost is small relative to the size of the donor jobs. Consider now columns 2 and 4 of both figures, which depict the effect on overall mean response time. When the switching cost is zero, cycle stealing always overwhelms the dedicated policy. When switching cost is non-zero, cycle stealing is a good idea provided either is low compared to is high, or the switching cost . These trends continue in our experiments with higher switching costs. Finally, we discuss the effect of in both figures. The performance of the beneficiaries is relatively insensitive to increasing from to in both figures; however the donors benefit quite a bit from raising . We further investigate selection of in Section 6.5. 6.4 Effect of donor job size variability In this section, we consider the effect of variability in the donor job sizes. It seems intuitive that when donor job sizes are made more variable, two things should happen: 1. The donor pain percentage should go down. This is because the donor response times will be higher overall and so the relative pain will appear to be less, and 2. The beneficiary gain should go down. This is because high variability in the donor job sizes, implies high variability in the length of the donor busy periods, which implies that the donor’s visits to the beneficiary queue will become more irregular. Sporadic help should be inferior to regular help, for the beneficiary. Figure 9 below shows that the first hypothesis above is in fact true, while the second hypothesis is surprisingly false, at least for ( . Comparing row 1 ( ) with row 2 has high variability: ) we see that there is no discernible difference in donor performance. To study this effect more closely, we next increase the variability in the donor job sizes to . has low variability: Figure 10 shows the performance of just the beneficiary jobs under the case of zero switching cost, when the donor job size variability is 1, 8, or 50, and is fixed at 0.5. 18 is allowed to vary from 0 to the Gain of Beneficiaries and Loss of Donors — 1 0.8 1 low gain low loss GOOD 0.6 mid gain mid loss 0.4 0.4 0.4 0.2 0.2 0.2 BAD 0.8 0.6 ρD D ρ 0.8 0.6 high gain low loss mid gain low loss 1 0.8 0.6 ρD low gain low loss ρD 1 with GOOD 0.4 high gain mid loss 0.2 high gain high loss 0 0.6 ρB 0.8 0 1 (a) 0.6 ρB 0.8 0 1 0.6 ρB 0.8 Gain of Beneficiaries and Loss of Donors — 1 1 0 1 (b) 0.6 ρB 0.8 1 with 1 1 low gain low loss mid gain low loss 0.8 ρ 0.4 high gain low loss 0.2 0 0.6 ρB 0.8 GOOD 0.4 0.4 0.2 0.2 0 1 (a) 0.6 ρB 0.8 0 1 BAD 0.8 mid gain low loss 0.6 ρD 0.6 ρD 0.6 D 0.6 0.8 ρD 0.8 GOOD 0.4 high gain mid loss mid gain mid loss 0.2 high gain high loss 0.6 ρB 0.8 1 0 (b) 0.6 ρB 0.8 1 Figure 9: Beneficiary gain and donor pain shown for donor job sizes with low variability (top) and high variability (bottom). Under higher donor variability, percentage of donor pain is lessened, but . percentage of beneficiary gain stays the same. All graphs assume: and have mean 1, stability condition. As is observed in Figure 9, the effect of variability of is very small when . When on the performance of , the effect of variability in donor sizes is non-negligible, but is still not as great as one might expect. 6.5 Optimal setting for beneficiary threshold, This section investigates the optimal setting for . We start with some obvious rules: (i) increasing leads to lower gain for the beneficiaries and lower pain for the donors; (ii) if the switching cost is zero, the optimal is 1, since there is never any pain for the donors. Figure 11 shows optimal values of for minimizing mean response time over all jobs (donors and beneficiaries) as a function of and under various switching costs and donor job sizes. The numbers on the contour curves show the best at each load. For clarity we only show lines up to . The following additional rules of thumb are implied by the figure: (iii) the optimal is an increasing function of and a decreasing function of ; (iv) increasing the switching cost increases the optimal ; (v) increasing the donor job size leads to lower optimal . 19 Effect of variability of donor jobs on performance of beneficiary jobs response time for beneficiaries 100 Exp 2 C =8 2 C =50 80 60 40 20 0 0 0.5 1 ρB 1.5 Figure 10: The response time for beneficiary jobs under different donor job size variability. is fixed at 0.5. varies from 0 to the stability condition of 1.5. The mean donor job size and beneficiary job size is 1, and switching cost is zero. Selecting the optimal 8 0.9 6 0.7 , 3 8 0.9 6 4 2 0.7 2 2 0.6 0.5 4 6 0.8 0.7 D 4 0.5 4 3 0.8 ρ D 0.6 ρ D 0.6 ρ 14 12 0 1 8 0.7 4 4 10 0.8 65 0.9 14 12 8 6 8 0.8 6 0.9 , 0.6 D 14 1120 0.5 ρ 2 0.5 6 0.4 0.4 0.3 0.3 2 0.2 4 4 0.2 0.1 0.1 0.6 0.4 0.3 0.3 0.2 0.2 2 2 0.5 0.4 0.7 ρ B E[K]=1 0.8 0.9 0.5 2 0.1 0.6 0.7 ρ 0.8 0.9 0.1 0.5 B 0.6 0.7 ρ 0.8 B E[K]=2 E[K]=1 0.9 0.5 0.6 0.7 ρ 0.8 0.9 B E[K]=2 Figure 11: Graphs showing the optimal value of with respect to mean response time over all jobs. 7 Extensions and Current Work This paper solves the problem of cycle stealing with switching costs and thresholds, presenting many insights into the characteristics and performance of cycle stealing. Our analytical approach easily generalizes to more complex cycle stealing models. For example, in this paper we have assumed that the donor immediately switches back when a donor job arrives. As we’ve stated earlier, we can also solve the more general case where the donor only switches back when donor jobs are waiting. Furthermore, we don’t even need to require that the donor switches back immediately at this point; we can allow the donor to first complete the beneficiary job in progress. Completing the beneficiary job obviates the need for checkpointing the job, however it sometimes reduces system performance, particularly when beneficiary jobs have higher variability than donor jobs. We can also handle the problem of one beneficiary and two 20 or more donors, where if donors are helping, the beneficiary sees an . queue. This extension also allows the different donors to have different switching thresholds and switching costs. References [1] S. Asmussen. Phase-type distributions and related point processes: Fitting and recent advances. In S. R. Chakravarthy and A. S. Alfa, editors, Matrix-Analytic Methods in Stochastic Models, pages 137–149. Marcel Dekker, 1997. [2] S. Borst, O. Boxma, and P. Jelenkovic. Reduced-load equivalence and induced burstiness in gps queues with long-tailed traffic flows. Under submission. [3] S. Borst, O. Boxma, and M. van Uitert. The asymptotic workload behavior of two coupled queues. Under submission. [4] J. Cohen and O. Boxma. Boundary Value Problems in Queueing System Analysis. North-Holland Publ. Cy., 1983. [5] G. Fayolle and R. Iasnogorodski. Two couple processors: the reduction to a riemann-hilbert problem. Zeitschrift fur Wahrscheinlichkeitstheorie und vervandte Gebiete, 47:325–351, 1979. [6] A. Konheim, I. Meilijson, and A. Melkman. Processor-sharing of two parallel lines. Journal of Applied Probability, 18:952–956, 1981. [7] G. Latouche and V. Ramaswami. Introduction to Matrix Analytic Methods in Stochastic Modeling. ASA-SIAM, Philadelphia, 1999. [8] M. F. Neuts. Matrix-Geometric Solutions in Stochastic Models. Johns Hopkins University Press, 1981. [9] T. Osogami and M. Harchol-Balter. Matching three moments of distributions and busy periods by -stage coxians. Technical Report CMU-CS-02-178, School of Computer Science, Carnegie Mellon University, September 2002. [10] H. Takagi. Queueing Analysis – A Foundation of Performance Evaluation, volume 1: Vacation and Priority Systems, Part 1. North Holland, 1991. 21 A Analysis of a variant: Donor switches back when unneeded λ 0B,1D B BD λ B B λ 0B,1D,ba λD λ B B B λD B D+ba λ B µ BD BD 2B,0D,ba µ B B D+ba B B 1B,0D,ba µ BD Kba λ 0B,0D,ba B 2B,0D µ Kba λ 4B,1D µ B 1B,0D µ Kba λD B λD B 3B,1D µ BD B 0B,0D Region 1 λ B 2B,1D µ λD λ Region 2 λ B 1B,1D µ λD λ B 1B,1D,ba λ B µ B B D+ba λB 2B,1D,ba µ B µ λ B λD Ksw 2B,0D,C Region 2 λ 3B,0D,sw B λD λ B 2µ B 3B,0D,C 4B,1D,ba B B D+ba B µ B µ B λD 2B,0D,sw 2µ 3B,1D,ba λD B λ B Ksw B D+ba λD B µ 4B,0D,sw B λD λ Ksw B 2µ B 4B,0D,C Figure 12: Markov chain of a variant. The donor server switches back when there is at most one beneficiary job regardless of the number of donor jobs. In this section, we consider a variant of our algorithm: the donor server switches back when there is beneficiary job, even if there are no donor jobs at the time. One might imagine that the performance of donor jobs would improve under this policy, especially when the arrival rate of donor jobs is high compared to that of beneficiary jobs, and the cost of switching back is relatively high. Figure 12 shows the Markov chain for the analysis of the variant in the case of . It is assumed that if the beneficiary server becomes idle while the donor server is still working on a beneficiary job, then the job in progress at the donor server is passed over to the beneficiary server without any cost. It is also assumed that ! , #" , and are exponentially distributed, all of which can be generalized by using Coxian distributions. The response time of the donor job is analyzed in the same way as before, except that Region 2 in the chain has been expanded. Having analyzed the above Markov chain, we find that counter to our intuition this variant does not significantly affect the response time for the donor or the beneficiary, under all conditions studied. 22 . B Incorporating a Donor threshold Throughout the paper we have assumed that the donor server switches back immediately when a new donor job arrives at an empty donor queue. However, this is not necessarily the best policy with respect to overall mean performance; given high switching costs, it might be better for the donor server to only switch back if there are at least donor jobs waiting. Furthermore, in Section 4 we found that the stability region for the beneficiary is an increasing function of , thereby giving further justification for raising . In this section, we show how to analyze a generalization of our algorithm, which assumes general . " and Figure 13 shows a Markov chain that corresponds to cycle stealing where λ Region A λ B 0B,1D 1B,1D µ λD λ B 2B,1D µ B λD BD λ λ B B λD BD λ B B 3B,1D µ . 4B,1D µ B BD B BD BD B Region B 0B,0D 1B,0D µ Region C1 B 2D+ba λ 0B,2D,ba B B 2D+ba µ 1B,2D,ba 0B,1D,sw Region C2 λ 0B,0D,sw 1B,1D,sw B Ksw λ 1B,0D,sw B µ Region D λD 2B,1D,sw Ksw λ 2B,0D,sw Ksw 1B,0D,C 2µ B µ 1B,1D,C B λD λ 3B,1D,sw B λ 4B,1D,sw B 3B,0D,sw λD Ksw 2B,0D,C 2µ B λD Ksw Figure 13: Markov chain for cycle stealing with and . 23 B Ksw 3B,0D,C 2µ B λD 3B,1D,C B " and Ksw Ksw 4B,0D,C B λD λ B 2µ B λ 4B,0D,sw B λD λ 2B,1D,C B µ B B B 2D+ba B µ λ B µ Ksw B 2µ λD B 2D+ba λD λ 4B,2D,ba B λD λ B B B µ B B µ B λD λ 3B,2D,ba λD λ B µ B λD µ B λ B λD B Ksw 0B,0D,C Region C3 2B,2D,ba B µ λ B µ λ B λD λD B λD λ B µ λ B 2D+ba µ B B λD λ 0B,1D,C λ B λD λD 2B,0D µ B 2µ 4B,1D,C B and exponentially-distributed B.1 Response time for donor jobs We first analyze the response time for donor jobs when " . Then, the results are extended to general . The states of the Markov chain are divided into four regions , further subdivided into three regions , and , , , and , where Region is , as shown in Figure 13. The time in queue for a donor arrival is analyzed by conditioning on the region it arrives into. That is, the expected time in queue is where + ! # " $ %&"('*) ) is the expected time in queue given that the arrival sees Region . When a donor job arrives in Region A, it sees a busy M/GI/1 queue, where the job sizes the arrival rate is , and . Recall that the expected time in queue for an M/GI/1 is " which is also written as -/. 1 0 243 5- . 6 0 2=3 87:9<; >7:9<; -5. 60 is the expected time in queue in an M/GI/1 given that the arrival finds the system 243 -5. 60 busy. Observing that , and the probability that the system is busy is 87:9<; , we have where + @? -5. 1 0 " When a donor job arrives in Region B, the waiting time is zero, since the donor server is staying at an empty donor queue. Thus, A CB To analyze the performance of a donor job, given that it arrives in Region C, first note that if we aggregate Regions with rate of size and into , and the duration of staying at Region and a setup time ED , the duration of staying at Region ED is exponentially distributed #" . Furthermore, Region Therefore, the fraction of time that the system spends in Region is the length of a busy period started by a job alternately; Regions A, B, and D are visited on the way from Region Region and Region to Region ED are visited ED (see Figure 14). given that the system is either in or in Region ED is the same as the fraction of time that the M/GI/1 queue with setup time 24 A B C1 C3 C2 D Figure 14: Transitions among regions of Markov chain in figure 13 is busy. The expected time in queue given that an arrival sees Region ( ED , respectively) equals the expected time in queue for an M/GI/1 queue with setup time, given that the arrival finds the system busy (idle, respectively). Therefore, the expected time in queue given that an arrival sees Region C is just the expected time in queue in a corresponding M/GI/1 queue with setup time : ! " " " When a donor job arrives in Region D, it waits for one more donor job to arrive, after which the donor server switches back. Thus, the expected time in queue of an arrival into Region D is A #" In summary, the time in queue for donor jobs is A -5. 6 0 ! + E ! ! The analysis is easily extended to general . The Markov chain describing the states of such a system can still be divided into four regions A, B, C, and D, but Region D is further divided into regions , ..., (Region C is still subdivided into three regions, as before). In Region A, the donor server is working on donor jobs. In Region B, the donor server is idle at the donor queue. In Region the donor server is in the process of switching back to the donor queue. In Region is in the process of switching to the beneficiary queue and there are the donor server is working on beneficiary jobs and there are 25 , , the donor server donor jobs. In Region , donor jobs. In Region , the donor server is either switching to the beneficiary queue or working on beneficiary jobs and there are donor jobs, for -* . A similar argument as above leads to the time in queue for donor jobs: -5. 6 0 ! + where the setup time is now #" ! . 26 E ! ! A C Superposition of the chains Figure 15 shows a superposition of the chains in Figure 1(b) and (c). b2 0B,0D,C b1 b3 2b2 3B,0D,C b3 2b1 2b3 b1 2b2 2B,0D,C b3 2b1 2b3 b1 1B,0D,C 2b3 b2 b2 β2 b3 0B,1D β2 β1 b2 b2 1B,1D b1 b3 b3 b3 β1 β2 b1 β1 β1 β1 b2 β3 0B,0D β2 b3 β3 b3 b3 β3 b2 b1 b3 β1 β3 β2 β3 β3 β3 β1 β1 β1 b2 b3 b3 b2 b1 b1 b3 β2 β1 β2 β1 β1 b2 3B,1D 1B,0D b3 β2 β3 β2 4B,1D b1 b3 b3 λB b1 β 2 β3 b2 b3 b1 β3 b2 b1 b3 β2 b2 b1 b3 2B,1D β3 β2 2b2 4B,0D,C b3 2b1 b1 b2 λD β3 b2 b1 b3 β2 2B,0D β1 Figure 15: Markov chain for cycle stealing without switching cost, where sented by 2-stage Coxians. 27 and β3 XB ΒD are both repre- D More graphs In this section we include a few more result graphs omitted from the body of the paper for lack of space. Figure 16 shows the gain/loss performance regions from Section 6.3 for the case where the beneficiaries now have mean size 10 and the donors have mean size 1. The performance of the beneficiaries is virtually identical to the case where and have mean 1. The donor performance is improved. The reason for the improvement in donor performance is that the donor visits the beneficiary less frequently, since the number of jobs queued at the beneficiary isn’t as high, for large jobs. Figure 17 shows additional optimal computations, for the case of large beneficiary jobs and the case of highly variable donor jobs. The case of large beneficiary jobs, results in lower optimal . Increasing the variability of donor jobs doesn’t seem to have any effect. Gain of Beneficiaries and Loss of Donors — 1 low gain low loss mid gain low loss 0.8 0.6 ρD ρ D 0.6 high gain low loss 0.2 0.6 ρB 0.8 GOOD 0.4 0.4 0.2 0.2 0 1 (a) 0.6 ρB 0.8 0 1 BAD 0.8 mid gain mid loss 0.6 ρD 0.8 0.4 0 1 low gain low loss 0.8 0.6 1 ρD 1 high gain mid loss high gain high loss 0.6 ρB 0.8 GOOD 0.4 0.2 0 1 0.6 ρB 0.8 1 ρB 0.8 1 (b) E[K]= 1 Gain of Beneficiaries and Loss of Donors — 1 low gain low loss mid gain low loss 0.8 0.6 ρD ρ D 0.6 high gain low loss 0.2 0.6 ρB 0.8 1 (a) GOOD 0.4 0.2 0.2 0.6 ρB 0.8 mid gain low loss 0 1 BAD 0.6 mid gain mid loss 0.4 0 0.8 ρD 0.8 0.4 0 1 low gain low loss 0.8 0.6 1 ρD 1 GOOD 0.4 high gain mid loss 0.6 ρB 0.8 0.2 1 0 0.6 (b) E[K]=1 Figure 16: Columns 1 and 3 show the gain of beneficiary jobs and loss of donor jobs relative to dedicated difference. Low gain/loss refers to a difference. servers. High gain/loss refers to a Columns 2 and 4 show the mean performance (over all jobs) relative to dedicated servers. In all figures, and are exponentially-distributed, where has mean 10 and has mean 1. 28 Selecting the optimal 142 110 8 0.9 4 142 1 10 8 6 0.7 0.5 0.5 8 6 14120 1 8 0.8 0.7 4 0.6 4 1412 10 8 0.5 6 0.4 0.4 0.3 0.3 0.3 0.2 0.4 4 0.3 2 0.2 0.2 4 2 2 0.1 0.1 2 0.5 0.6 0.7 ρ B E[K]=1 0.8 0.9 0.5 0.6 0.9 4 0.4 0.2 : mean 1 and 0.6 D ρ 0.7 4 , 8 6 0.8 ρ D 0.6 0.5 1142 10 0.9 6 0.8 0.6 D 4 6 0.7 ρ 6 6 , D 14 1012 8 0.9 12 10 0.8 8 ρ 0.7 ρ 0.8 0.1 2 0.9 0.5 B 0.6 0.7 ρ 0.8 B E[K]=2 E[K]=1 2 0.1 0.9 0.5 0.6 0.7 ρ 0.8 0.9 B E[K]=2 Figure 17: Graphs showing the optimal value of with respect to mean response time over all jobs. 29