Web Application - Jackson Queueing Network Évaluation de Performances 2024, TD #2, INFO4, Polytech Grenoble

TD: “Performance Modelling, Evaluation and Optimization of a Web Application”, Évaluation de Performance 2024, Polytech Grenoble.

System Description

A web application is hosted on a computer system composed of a front-end web-server (F) and \(M\) identical parallel back-end application servers, say \(B_1,\ldots,B_M\). The incoming requests, or jobs, join and leave the system only through F, and they can access the application servers only after visiting F. After completion at F, a job may visit one application server and in this case it is chosen uniformly at random. All servers operate under the First-Come First-Served scheduling discipline.

After a measurement study, it has been observed that:

the probability that a job leaves the system after service at F is \(p_{F,0}=0.2\)
15 jobs/second join the system in average
the mean service time at F is 10 milliseconds
the mean service time at any backend server Bi is 50 milliseconds

The objective is to identify a model and to evaluate and optimize the performance of the system dscribed above. In particular, we are interested in the minimal number of back-end servers \(M\) that keep the mean delay below \(D_{\max}=500\) milliseconds.

Jackson Queueing Network Modelling

We model the dynamics of jobs in the system by means of a Jackson queueing network with \(M+1\) queues. Towards this purpose, we assume that

jobs arrive through a Poisson process with rate \(\lambda=0.015\) jobs/ms
the sequence of service times at F are exponentially distributed independent randon variables with rate \(\mu_F=0.1\) job/ms
the sequence of service times at \(B_i\) are exponentially distributed independent randon variables with rate \(\mu=0.02\) job/ms, for all \(i=1,\ldots,M\)

Consistently with the system description, for the routing probabilities we assume that \(p_{F,B_i}=0.8/M\) and \(p_{B_i,F}=1\), for all \(i=1,\ldots,M\). For the external arrival probabilities, \(p_{0,F}=1\), and for the exit probabilities, \(p_{F,0}=0.2\).

This modelling identifies a Jackson queueing network.

Steady-state probabilities and Stability

Letting \(n=(n_F,n_{B_1},\ldots,n_{B_M})\) denote the number of jobs in all servers, we are interested in the steady-state probability that servers contains \(n\) jobs.

By applying Jackson’s theorem we have \[ \pi(n) = \left(\frac{\lambda_{F}}{\mu_F}\right)^{n_{M+1}} \left(1-\frac{\lambda_{F}}{\mu_F}\right) \,\prod_{i=1}^{M} \left(\frac{\lambda_{B_i}}{\mu}\right)^{n_i} \left(1-\frac{\lambda_{B_i}}{\mu}\right) \] where \(\lambda_{F}\) and \(\lambda_{B_i}\) are given by the unique solution of the traffic equations: \(\lambda_{F} = \lambda + \sum_{i=1}^M \lambda_{B_i}\) and \(\lambda_{B_i} = \lambda_F \,\,p_{F,B_i}\). This gives \[\lambda_F = \frac{\lambda}{p_{F,0}} = 0.075\] and \[\lambda_{B_i} = \frac{1-p_{F,0}}{p_{F,0}} \, \frac{\lambda}{M} = \frac{0.06}{M}.\]

From this we conclude that stability is obtained if and only if \(0.06<M\mu\), that is if \(M>3\).

Performance Evaluation and Optimization

We are interested in the mean response time (or delay), say \(R\), that is the overall time that a job spends in the system. Since for the average number of visits per job on servers \(F\) resp. \(B_i\) is \(v_F=\frac{\lambda_F}{\lambda}\) resp. \(v_{B_i}=\frac{\lambda_{B_i}}{\lambda}\), we obtain \[ R = v_F R_F + \sum_{i=1}^M v_{B_i} R_{B_i} \] where \(R_{F}\) and \(R_{B_i}\) are response time of servers \(F\) and \(B_i\), respectively. By applying Little’s law, \(R_{F} = Q_{F}\lambda_{F}\) and \(R_{B_i} = Q_{B_i}\lambda_{B_i}\) where \(Q_{F}\) and \(Q_{B_i}\) are the mean number of jobs in servers \(F\) and \(B_i\). Using \(\pi(n)\), we have \[ Q_F = \frac{\lambda_F/\mu_F}{1-\lambda_F/\mu_F},\qquad Q_{B_i} = \frac{\lambda_{B_i}/\mu}{1-\lambda_{B_i}/\mu} \] and after some algebra \[\begin{align} R & = \frac{R_F}{p_{F,0}} + \sum_{i=1}^M \frac{1-p_{F,0}}{p_{F,0}} \, \frac{1}{M} R_{B_i}\\ % & = \frac{1}{p_{F,0}}\frac{1}{\mu_F-0.075} + \frac{1-p_{F,0}}{p_{F,0}} \, \frac{1}{\mu-\frac{0.06}{M}}\\ % & = 200 + \frac{4}{0.02-\frac{0.06}{M}}. \end{align}\] Note that we may have found \(R\) by using Little’s law differently, that is, using that \(R\) satisfies \(\lambda R= Q_{F} + \sum_{i} Q_{B_i}\).

Therefore, to ensure that \(R\le D_{\max}\), we need to find the smallest integer \(M\) such that \(200 + \frac{4}{0.02-\frac{0.06}{M}}\le D_{\max}\). This gives \[\begin{align} M \ge \frac{3}{1 - \frac{1}{\frac{D}{200}-1 } } \end{align}\] and for \(D=500\) we obtain \(M=9\).

Finally, what if \(D_{\max}=300\)ms?

Simulation of the underlying Markov chain

In order to see the transient dynamics of jobs inside the network, we want to simulate the Markov chain underlying the Jackson queueing network above.

Simulation is especially useful when the underlying Markov chain is not “tractable”. This is not the case here but it would be if, for instance, the distribution of service times would be general.

knitr::opts_chunk$set(echo = TRUE)

set.seed(5);  # set seed for reproducibility
N=1e6;        # number of events to simulate;

M=9;          # number of back-end servers

# time units in milliseconds
lambda=0.015; # arrival rate
mu_F=0.1;     # front-end server (F) service rate 
mu=0.02;      # back-end servers (Bi) service rate

pF0=0.2;      # Probability a job leaves the network after visiting F

n = rep(M+1, x=0); # state: n(1) is the number of jobs in F and for i>1, n(i+1) is the number of jobs in Bi

outrate=lambda+mu_F+M*mu;

Q = rep(M+1, x=0); # average queue lengths

ResponseTime=rep(N, x=0);

for (i in 1:N) {

  event=runif(1)*outrate;
  
  if (event<lambda){
    # an arrival occurred, necessarily to F
    n[1]=n[1]+1;
  }

  if (lambda<event & event <lambda+mu_F){
    # a potential departure from F occurred
    if (n[1]>0){
        n[1]=n[1]-1;
      
        if (runif(1)>pF0){
          # the job joins a random backend server
          j=sample(1:M,1);
          n[j+1]=n[j+1]+1;
        } # otherwise it leaves the network and nothing has to be done
    }
  }

  tmp=lambda+mu_F;
  for (j in 1:M){
    if (tmp<event & event<tmp+mu) {
      # a potential departure from backend server Bj occurred
      if (n[j+1]>0){
        n[j+1]=n[j+1]-1;
        n[1]=n[1]+1;
      }
    }
    tmp=tmp+mu;
  }

  # update average queue lenghts
  Q=Q+n;
  ResponseTime[i] = sum(Q)/(i*lambda); # Little's law

}

# average queue lengths
Q=Q/N;
ResponseTime_theory=0;
if (mu-lambda*(1-pF0)/(pF0*M)>0) {
  ResponseTime_theory = (1/pF0)/(mu_F-lambda/pF0) + ((1-pF0)/pF0)/(mu-lambda*(1-pF0)/(pF0*M));
} else {
  print("The system is not stable: Response Time -> +oo")  
}

# Print the Response Times (simulation and theory)
events=1:N;
plot(events, ResponseTime, type = "l", lty = 1)
abline(h = ResponseTime_theory, col="red")

print(tail(ResponseTime,1))

## [1] 498.5621

print(ResponseTime_theory)

## [1] 500

Closing the queueing network

Now, we consider the same setting as above but we “close” the network. That is, upon leaving the (open) network above, a job enters an auxiliary multiserver queue \(-/M/K/K\) queue, and after receiving processing, it rejoins the frontend server. Therefore, no job arrivals/departures occur in/from the network. To parameterize the network, it remains to specify the service rate of the multiserver queue (let’s denote it by \(\lambda\) for each server) and the overall number of jobs that circulate in the network. We assume that the latter is \(N\) so that jobs never actually wait in the additional queue. In fact, this queue is meant to model the think time of a job (interpreted as a customer), which is not affected by the interaction with the other jobs.

Objective

Let us assume that the think time (i.e., \(1/\lambda\)) is 5 seconds (the time to read the answer of the webapp!) and let’s keep \(M=9\), which in the open network ensured a maximum mean delay of \(500\)ms when the external arrival rate was 15 job/sec. Then, we are interested in the following question:

How many jobs induce a throughput of 15 jobs/sec? We define the throughput as the rate at which jobs leave the multiserver queue.

We may answer this question by numerically solving the global balance equations associated to the underlying Markov chain, or by running efficient solution algorithms such Mean Value Analysis (MVA) or the Convolution Algorithm. However, the goal here is to use simulation.

Simulation of the closed network

Let’s adapt the code for the open network above. But this time let’s use a slightly different simulation concept. Instead of using a uniformization constant as above, we adapt outrate as a function of the current state.

knitr::opts_chunk$set(echo = TRUE)

set.seed(4);  # Set seed for reproducibility

# K=50;         # number of jobs circulating in the network

compute_throughput <- function(K) {
    
    N=1e5;        # number of events to simulate;
    M=9;          # number of back-end servers
    
    # time units in milliseconds
    lambda=1/(5*1000);   # service rate of each server of the auxiliary new queue (1/seconds)
                  # or, equivalently, the inverse of the mean think time
    mu_F=0.1;     # front-end server (F) service rate 
    mu=0.02;      # back-end servers (Bi) service rate
    pF0=0.2;      # Probability that a job leaves the network after visiting F
    n = rep(M+2, x=0); # state: n(1) is the number of jobs in F and for i>1, n(i+1) is the number of jobs in Bi
    n[M+2]=K;     # set an initial condition
    
    Q = n; # average queue lengths
    
    ResponseTime=rep(N, x=0);

    sim_time=0;    
    for (i in 1:N) {

      outrate=lambda*n[M+2];
      if (n[1]>0) {outrate=outrate+mu_F;}
      for (j in 1:M) { if (n[1+j]>0) {outrate=outrate+mu;}}

      event=runif(1)*outrate;
      
      # a potential departure from the multiserver queue occurred 
      if (event<lambda*n[M+2]) {
        n[1]=n[1]+1;
        n[M+2]=n[M+2]-1;
      }
    
      tmp=lambda*n[M+2];
      if (n[1]>0) {
          tmp=tmp+mu_F;
          if (lambda*n[M+2]<event & event <lambda*n[M+2]+mu_F) {
                n[1]=n[1]-1;
                if (runif(1)>pF0) {
                  # the job joins a random backend server
                  j=sample(1:M,1);
                  n[j+1]=n[j+1]+1;
                } else {
                    n[M+2]=n[M+2]+1;
                } # otherwise it leaves the web app and think
          }
      }
    
      n_app_servers=n[-1];
      n_app_servers=n_app_servers[-length(n_app_servers)];
      busy_app_servers=(1:M)[n_app_servers>0];
      for (j in busy_app_servers){
        if (tmp<event & event<tmp+mu) {
            # a departure from backend server Bj occurred
            n[j+1]=n[j+1]-1;
            n[1]=n[1]+1;
        }
        tmp=tmp+mu;
      }
      
      # update time average queue lenghts
      time_in_state_n=rexp(1,outrate);
      Q=Q+n*time_in_state_n;
      sim_time=sim_time+time_in_state_n;

    }

    # average queue lengths
    Q=Q/sim_time;
    txt=paste("Avg queue lengths (K=",K,"): ", sep="");
    cat(paste(c(txt, round(Q,digits=2)),collapse=" "));
    cat('\n');
    cat(paste("Throughput (K=",K,"): ", round(Q[M+2]*lambda*1000, digits=3),"jobs/sec", sep=""),'\n');
    # return the throughput
    return( Q[M+2]*lambda );
}

K_values=(1:20)*10;
throughput=c();
for (K in K_values) {
    throughput=c(throughput,compute_throughput(K));
}

## Avg queue lengths (K=10):  0.87 0.03 0.03 0.02 0.03 0.03 0.02 0.03 0.03 0.03 8.9
## Throughput (K=10): 1.78jobs/sec 
## Avg queue lengths (K=20):  0.88 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 18.58
## Throughput (K=20): 3.715jobs/sec 
## Avg queue lengths (K=30):  0.92 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11 28.1
## Throughput (K=30): 5.62jobs/sec 
## Avg queue lengths (K=40):  1.01 0.16 0.17 0.18 0.17 0.17 0.17 0.17 0.17 0.18 37.44
## Throughput (K=40): 7.489jobs/sec 
## Avg queue lengths (K=50):  1.19 0.23 0.24 0.24 0.24 0.24 0.23 0.24 0.24 0.24 46.67
## Throughput (K=50): 9.334jobs/sec 
## Avg queue lengths (K=60):  1.49 0.33 0.32 0.32 0.33 0.3 0.32 0.31 0.32 0.33 55.63
## Throughput (K=60): 11.127jobs/sec 
## Avg queue lengths (K=70):  1.91 0.38 0.38 0.38 0.38 0.38 0.37 0.37 0.37 0.39 64.67
## Throughput (K=70): 12.933jobs/sec 
## Avg queue lengths (K=80):  2.76 0.47 0.44 0.46 0.49 0.49 0.45 0.47 0.49 0.49 72.99
## Throughput (K=80): 14.598jobs/sec 
## Avg queue lengths (K=90):  3.79 0.55 0.57 0.58 0.54 0.58 0.59 0.58 0.56 0.6 81.08
## Throughput (K=90): 16.216jobs/sec 
## Avg queue lengths (K=100):  5.79 0.63 0.61 0.63 0.6 0.68 0.59 0.64 0.6 0.63 88.6
## Throughput (K=100): 17.719jobs/sec 
## Avg queue lengths (K=110):  10.57 0.75 0.76 0.76 0.74 0.77 0.8 0.72 0.74 0.72 92.67
## Throughput (K=110): 18.533jobs/sec 
## Avg queue lengths (K=120):  16.08 0.78 0.76 0.77 0.7 0.78 0.76 0.8 0.73 0.84 96.99
## Throughput (K=120): 19.399jobs/sec 
## Avg queue lengths (K=130):  24.31 0.79 0.76 0.77 0.75 0.82 0.8 0.79 0.8 0.79 98.62
## Throughput (K=130): 19.724jobs/sec 
## Avg queue lengths (K=140):  32.99 0.79 0.79 0.76 0.82 0.77 0.77 0.81 0.77 0.74 100
## Throughput (K=140): 20.001jobs/sec 
## Avg queue lengths (K=150):  39.16 0.78 0.76 0.75 0.8 0.87 0.85 0.75 0.77 0.82 103.69
## Throughput (K=150): 20.738jobs/sec 
## Avg queue lengths (K=160):  53.15 0.77 0.86 0.79 0.73 0.76 0.83 0.77 0.85 0.8 99.69
## Throughput (K=160): 19.939jobs/sec 
## Avg queue lengths (K=170):  62.08 0.82 0.79 0.82 0.82 0.73 0.79 0.76 0.76 0.79 100.84
## Throughput (K=170): 20.169jobs/sec 
## Avg queue lengths (K=180):  69.5 0.84 0.79 0.75 0.74 0.82 0.91 0.79 0.79 0.83 103.23
## Throughput (K=180): 20.646jobs/sec 
## Avg queue lengths (K=190):  81.79 0.82 0.78 0.79 0.8 0.79 0.84 0.81 0.82 0.79 100.98
## Throughput (K=190): 20.196jobs/sec 
## Avg queue lengths (K=200):  91.56 0.82 0.79 0.79 0.84 0.8 0.81 0.84 0.86 0.78 101.12
## Throughput (K=200): 20.223jobs/sec

plot(K_values, throughput*1000,
        type = "b", pch = 19, 
        xlab="Number of jobs (K)", 
        ylab="Throughput (jobs/sec)", 
        lwd=3,      # Line width
        lty = 1)
abline(h = 15, col="red")

Web Application - Jackson Queueing Network
Évaluation de Performances 2024, TD #2, INFO4, Polytech Grenoble

Jonatha Anselmi

March 26, 2024

System Description

Jackson Queueing Network Modelling

Steady-state probabilities and Stability

Performance Evaluation and Optimization

Simulation of the underlying Markov chain

Closing the queueing network

Objective

Simulation of the closed network

Web Application - Jackson Queueing NetworkÉvaluation de Performances 2024, TD #2, INFO4, Polytech Grenoble

Jonatha Anselmi

March 26, 2024

System Description

Jackson Queueing Network Modelling

Steady-state probabilities and Stability

Performance Evaluation and Optimization

Simulation of the underlying Markov chain

Closing the queueing network

Objective

Simulation of the closed network

Web Application - Jackson Queueing Network
Évaluation de Performances 2024, TD #2, INFO4, Polytech Grenoble