Replication in Computer Systems

Homework assignment

The goal of this homework assignment is i) to write a code in R that simulates the dynamics of redundant jobs in a multiserver platform and ii) to use the code to answer some practical questions. We consider a system of \(K\) parallel servers (or queues) adopting the first-come first-served discipline. When a new job arrives, it is replicated to \(d\) queues selected uniformly at random without replacement and independently of all else. We assume that the service times of all replicas of all jobs are i.i.d. random variables having the distribution of the random variable \(S\) (specified later). We assume that the jobs’ interarrival times are independent and have an exponential distribution with rate \(\lambda\), i.e., jobs join the system following a Poisson process with rate \(\lambda\).

Depending on the assumptions on \(S\), we are interested in understanding which one of the following two scenarios is the most convenient:

Scenario 1: the remaining copies of each job are canceled as soon as any copy starts service at some server;

Scenario 2: the remaining copies of each job are canceled as soon as any copy completes service at some server.

Step 1: Simulation of redundant jobs

Modify the code that simulates the dynamics of a G/G/1 queue to simulate \(K\) G/G/1 queues as described above.

System parameters

N: number of arriving jobs to simulate (excluding replicas);
d: number of replicas per job;
K: number of queues;
lambda: job arrival rate;
mu: in the case where \(S\) follows an exponential distribution, \(\mu=1/\mathbb{E}[S]\) is the job service rate.
xm,alpha: in the case where \(S\) follows a Pareto distribution, xm resp. alpha is the scale resp. shape parameter. Since common values of alpha found in the empirical studies of computer systems range in the interval \([1,2]\), we assume alpha=1.5.

Using the inversion method, note that

xm=1;
alpha=1.5;
xm/(runif(5)^(1/alpha));

## [1] 1.047215 1.969547 3.129194 1.748818 1.406158

gives a sample associated to 5 i.i.d. Pareto distributed random variables.

State variables

t: simulation time;
Arrival: size-\(N\) vector of jobs’ arrival times;
Service: size-\(N\times d\) matrix of jobs’ service times;
Queue: size-\(N\times d\) matrix indicating where the \(i\)-th copy of the \(n\)-th job is dispatched;
Remaining: size-\(N\times d\) matrix indicating the remaining time of the \(i\)-th copy of the \(n\)-th job;
Completion: size-\(N\) vector indicating the job’s completion time;
CurrJob: size-\(K\) vector indicating the current job in processing at each server, i.e., CurrJob[i] takes value in \(\{1,\ldots,N\}\);
CurrReplica: size-\(K\) vector indicating the current replica in processing at each server, i.e., CurrReplica[i] takes value in \(\{1,\ldots,d\}\);
NextJob: this variable is incremented by one each time a new job arrives (not necessary but simplifies the code).

Do not hesitate to introduce other variables if needed.

Code structure (for both scenarios 1 and 2)

while (TRUE) {
    dtA = ... # time of the next arrival
    dtC = ... # time of the next completion
    if(is.na(dtA) & is.na(dtC)) {break;}
    dt = min(dtA,dtC)

    if((NextArrival <=N) & (Arrival[NextArrival] == t)) 
    {
      # update Remaining and, possibly, and other state variables
    }
    
    for(k in 1:K)
    {
      if(!is.na(CurrJob[k]))
      {
        # update Remaining and, possibly, CurrJob to NA and other state variables
      }
    }
    
    for(k in 1:K)
    {
      if(is.na(CurrJob[k]))
      {
        # assign a job and a copy to server k
        # update state variables
      }
    }

}

Note for the evaluation: The more efficient the code, the better. Efficiency refers to the ability of simulating an increased number of jobs given a time constraint.

Step 2: Numerical investigation

We call response time the average time that jobs spend in the network. Answer the following items distinguishing between the cases where \(S\) follows an exponential or Pareto distribution.

Question 1

Assume that d=1 and that \(S\) has an exponential distribution. Give a formula for the response time.

Question 2

Assume that N=1e4 or higher, d=4, K=10 and \(\mathbb{E}[S]=1\). Then,

Plot the response time as a function of lambda, where lambda varies between 0 and \(0.95 K /\mathbb{E}[S]\), and verify that it corresponds to the formula given in previous question when d=1 and \(S\) has an exponential distribution.
Is N large enough to make the previous plot representative of the mean time that jobs spend in the network?

Question 3

Assume that N=1e4 or higher, K=10, lambda=0.7*K and \(\mathbb{E}[S]=1\). Use your code to investigate under which values of the replication parameter d:

the system is stable, i.e., the response time does not diverge to infinity as \(N\to\infty\).
Scenario 1 is better than Scenario 2 (and viceversa).

Explain your numerical investigations providing plots and intuititions that support your conclusions.

Replication in Computer Systems - DM

Jonatha Anselmi

February 4, 2020

Introduction