TD: “Performance Modelling, Evaluation and Optimization of a Web Application”, Évaluation de Performance 2024, Polytech Grenoble.
A web application is hosted on a computer system composed of a front-end web-server (F) and \(M\) identical parallel back-end application servers, say \(B_1,\ldots,B_M\). The incoming requests, or jobs, join and leave the system only through F, and they can access the application servers only after visiting F. After completion at F, a job may visit one application server and in this case it is chosen uniformly at random. All servers operate under the First-Come First-Served scheduling discipline.
After a measurement study, it has been observed that:
The objective is to identify a model and to evaluate and optimize the performance of the system dscribed above. In particular, we are interested in the minimal number of back-end servers \(M\) that keep the mean delay below \(D_{\max}=500\) milliseconds.
Consistently with the system description, for the routing probabilities we assume that \(p_{F,B_i}=0.8/M\) and \(p_{B_i,F}=1\), for all \(i=1,\ldots,M\). For the external arrival probabilities, \(p_{0,F}=1\), and for the exit probabilities, \(p_{F,0}=0.2\).
This modelling identifies a Jackson queueing network.
Letting \(n=(n_F,n_{B_1},\ldots,n_{B_M})\) denote the number of jobs in all servers, we are interested in the steady-state probability that servers contains \(n\) jobs.
By applying Jackson’s theorem we have \[ \pi(n) = \left(\frac{\lambda_{F}}{\mu_F}\right)^{n_{M+1}} \left(1-\frac{\lambda_{F}}{\mu_F}\right) \,\prod_{i=1}^{M} \left(\frac{\lambda_{B_i}}{\mu}\right)^{n_i} \left(1-\frac{\lambda_{B_i}}{\mu}\right) \] where \(\lambda_{F}\) and \(\lambda_{B_i}\) are given by the unique solution of the traffic equations: \(\lambda_{F} = \lambda + \sum_{i=1}^M \lambda_{B_i}\) and \(\lambda_{B_i} = \lambda_F \,\,p_{F,B_i}\). This gives \[\lambda_F = \frac{\lambda}{p_{F,0}} = 0.075\] and \[\lambda_{B_i} = \frac{1-p_{F,0}}{p_{F,0}} \, \frac{\lambda}{M} = \frac{0.06}{M}.\]
From this we conclude that stability is obtained if and only if \(0.06<M\mu\), that is if \(M>3\).
We are interested in the mean response time (or delay), say \(R\), that is the overall time that a job spends in the system. Since for the average number of visits per job on servers \(F\) resp. \(B_i\) is \(v_F=\frac{\lambda_F}{\lambda}\) resp. \(v_{B_i}=\frac{\lambda_{B_i}}{\lambda}\), we obtain \[ R = v_F R_F + \sum_{i=1}^M v_{B_i} R_{B_i} \] where \(R_{F}\) and \(R_{B_i}\) are response time of servers \(F\) and \(B_i\), respectively. By applying Little’s law, \(R_{F} = Q_{F}\lambda_{F}\) and \(R_{B_i} = Q_{B_i}\lambda_{B_i}\) where \(Q_{F}\) and \(Q_{B_i}\) are the mean number of jobs in servers \(F\) and \(B_i\). Using \(\pi(n)\), we have \[ Q_F = \frac{\lambda_F/\mu_F}{1-\lambda_F/\mu_F},\qquad Q_{B_i} = \frac{\lambda_{B_i}/\mu}{1-\lambda_{B_i}/\mu} \] and after some algebra \[\begin{align} R & = \frac{R_F}{p_{F,0}} + \sum_{i=1}^M \frac{1-p_{F,0}}{p_{F,0}} \, \frac{1}{M} R_{B_i}\\ % & = \frac{1}{p_{F,0}}\frac{1}{\mu_F-0.075} + \frac{1-p_{F,0}}{p_{F,0}} \, \frac{1}{\mu-\frac{0.06}{M}}\\ % & = 200 + \frac{4}{0.02-\frac{0.06}{M}}. \end{align}\] Note that we may have found \(R\) by using Little’s law differently, that is, using that \(R\) satisfies \(\lambda R= Q_{F} + \sum_{i} Q_{B_i}\).
Therefore, to ensure that \(R\le D_{\max}\), we need to find the smallest integer \(M\) such that \(200 + \frac{4}{0.02-\frac{0.06}{M}}\le D_{\max}\). This gives \[\begin{align} M \ge \frac{3}{1 - \frac{1}{\frac{D}{200}-1 } } \end{align}\] and for \(D=500\) we obtain \(M=9\).
Finally, what if \(D_{\max}=300\)ms?
In order to see the transient dynamics of jobs inside the network, we want to simulate the Markov chain underlying the Jackson queueing network above.
Simulation is especially useful when the underlying Markov chain is not “tractable”. This is not the case here but it would be if, for instance, the distribution of service times would be general.
knitr::opts_chunk$set(echo = TRUE)
set.seed(5); # set seed for reproducibility
N=1e6; # number of events to simulate;
M=9; # number of back-end servers
# time units in milliseconds
lambda=0.015; # arrival rate
mu_F=0.1; # front-end server (F) service rate
mu=0.02; # back-end servers (Bi) service rate
pF0=0.2; # Probability a job leaves the network after visiting F
n = rep(M+1, x=0); # state: n(1) is the number of jobs in F and for i>1, n(i+1) is the number of jobs in Bi
outrate=lambda+mu_F+M*mu;
Q = rep(M+1, x=0); # average queue lengths
ResponseTime=rep(N, x=0);
for (i in 1:N) {
event=runif(1)*outrate;
if (event<lambda){
# an arrival occurred, necessarily to F
n[1]=n[1]+1;
}
if (lambda<event & event <lambda+mu_F){
# a potential departure from F occurred
if (n[1]>0){
n[1]=n[1]-1;
if (runif(1)>pF0){
# the job joins a random backend server
j=sample(1:M,1);
n[j+1]=n[j+1]+1;
} # otherwise it leaves the network and nothing has to be done
}
}
tmp=lambda+mu_F;
for (j in 1:M){
if (tmp<event & event<tmp+mu) {
# a potential departure from backend server Bj occurred
if (n[j+1]>0){
n[j+1]=n[j+1]-1;
n[1]=n[1]+1;
}
}
tmp=tmp+mu;
}
# update average queue lenghts
Q=Q+n;
ResponseTime[i] = sum(Q)/(i*lambda); # Little's law
}
# average queue lengths
Q=Q/N;
ResponseTime_theory=0;
if (mu-lambda*(1-pF0)/(pF0*M)>0) {
ResponseTime_theory = (1/pF0)/(mu_F-lambda/pF0) + ((1-pF0)/pF0)/(mu-lambda*(1-pF0)/(pF0*M));
} else {
print("The system is not stable: Response Time -> +oo")
}
# Print the Response Times (simulation and theory)
events=1:N;
plot(events, ResponseTime, type = "l", lty = 1)
abline(h = ResponseTime_theory, col="red")
print(tail(ResponseTime,1))
## [1] 498.5621
print(ResponseTime_theory)
## [1] 500
Now, we consider the same setting as above but we “close” the network. That is, upon leaving the (open) network above, a job enters an auxiliary multiserver queue \(-/M/K/K\) queue, and after receiving processing, it rejoins the frontend server. Therefore, no job arrivals/departures occur in/from the network. To parameterize the network, it remains to specify the service rate of the multiserver queue (let’s denote it by \(\lambda\) for each server) and the overall number of jobs that circulate in the network. We assume that the latter is \(N\) so that jobs never actually wait in the additional queue. In fact, this queue is meant to model the think time of a job (interpreted as a customer), which is not affected by the interaction with the other jobs.
Let us assume that the think time (i.e., \(1/\lambda\)) is 5 seconds (the time to read the answer of the webapp!) and let’s keep \(M=9\), which in the open network ensured a maximum mean delay of \(500\)ms when the external arrival rate was 15 job/sec. Then, we are interested in the following question:
How many jobs induce a throughput of 15 jobs/sec? We define the throughput as the rate at which jobs leave the multiserver queue.
We may answer this question by numerically solving the global balance equations associated to the underlying Markov chain, or by running efficient solution algorithms such Mean Value Analysis (MVA) or the Convolution Algorithm. However, the goal here is to use simulation.
Let’s adapt the code for the open network above. But this time let’s
use a slightly different simulation concept. Instead of using a
uniformization constant as above, we adapt outrate as a
function of the current state.
knitr::opts_chunk$set(echo = TRUE)
set.seed(4); # Set seed for reproducibility
# K=50; # number of jobs circulating in the network
compute_throughput <- function(K) {
N=1e5; # number of events to simulate;
M=9; # number of back-end servers
# time units in milliseconds
lambda=1/(5*1000); # service rate of each server of the auxiliary new queue (1/seconds)
# or, equivalently, the inverse of the mean think time
mu_F=0.1; # front-end server (F) service rate
mu=0.02; # back-end servers (Bi) service rate
pF0=0.2; # Probability that a job leaves the network after visiting F
n = rep(M+2, x=0); # state: n(1) is the number of jobs in F and for i>1, n(i+1) is the number of jobs in Bi
n[M+2]=K; # set an initial condition
Q = n; # average queue lengths
ResponseTime=rep(N, x=0);
sim_time=0;
for (i in 1:N) {
outrate=lambda*n[M+2];
if (n[1]>0) {outrate=outrate+mu_F;}
for (j in 1:M) { if (n[1+j]>0) {outrate=outrate+mu;}}
event=runif(1)*outrate;
# a potential departure from the multiserver queue occurred
if (event<lambda*n[M+2]) {
n[1]=n[1]+1;
n[M+2]=n[M+2]-1;
}
tmp=lambda*n[M+2];
if (n[1]>0) {
tmp=tmp+mu_F;
if (lambda*n[M+2]<event & event <lambda*n[M+2]+mu_F) {
n[1]=n[1]-1;
if (runif(1)>pF0) {
# the job joins a random backend server
j=sample(1:M,1);
n[j+1]=n[j+1]+1;
} else {
n[M+2]=n[M+2]+1;
} # otherwise it leaves the web app and think
}
}
n_app_servers=n[-1];
n_app_servers=n_app_servers[-length(n_app_servers)];
busy_app_servers=(1:M)[n_app_servers>0];
for (j in busy_app_servers){
if (tmp<event & event<tmp+mu) {
# a departure from backend server Bj occurred
n[j+1]=n[j+1]-1;
n[1]=n[1]+1;
}
tmp=tmp+mu;
}
# update time average queue lenghts
time_in_state_n=rexp(1,outrate);
Q=Q+n*time_in_state_n;
sim_time=sim_time+time_in_state_n;
}
# average queue lengths
Q=Q/sim_time;
txt=paste("Avg queue lengths (K=",K,"): ", sep="");
cat(paste(c(txt, round(Q,digits=2)),collapse=" "));
cat('\n');
cat(paste("Throughput (K=",K,"): ", round(Q[M+2]*lambda*1000, digits=3),"jobs/sec", sep=""),'\n');
# return the throughput
return( Q[M+2]*lambda );
}
K_values=(1:20)*10;
throughput=c();
for (K in K_values) {
throughput=c(throughput,compute_throughput(K));
}
## Avg queue lengths (K=10): 0.87 0.03 0.03 0.02 0.03 0.03 0.02 0.03 0.03 0.03 8.9
## Throughput (K=10): 1.78jobs/sec
## Avg queue lengths (K=20): 0.88 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 18.58
## Throughput (K=20): 3.715jobs/sec
## Avg queue lengths (K=30): 0.92 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11 28.1
## Throughput (K=30): 5.62jobs/sec
## Avg queue lengths (K=40): 1.01 0.16 0.17 0.18 0.17 0.17 0.17 0.17 0.17 0.18 37.44
## Throughput (K=40): 7.489jobs/sec
## Avg queue lengths (K=50): 1.19 0.23 0.24 0.24 0.24 0.24 0.23 0.24 0.24 0.24 46.67
## Throughput (K=50): 9.334jobs/sec
## Avg queue lengths (K=60): 1.49 0.33 0.32 0.32 0.33 0.3 0.32 0.31 0.32 0.33 55.63
## Throughput (K=60): 11.127jobs/sec
## Avg queue lengths (K=70): 1.91 0.38 0.38 0.38 0.38 0.38 0.37 0.37 0.37 0.39 64.67
## Throughput (K=70): 12.933jobs/sec
## Avg queue lengths (K=80): 2.76 0.47 0.44 0.46 0.49 0.49 0.45 0.47 0.49 0.49 72.99
## Throughput (K=80): 14.598jobs/sec
## Avg queue lengths (K=90): 3.79 0.55 0.57 0.58 0.54 0.58 0.59 0.58 0.56 0.6 81.08
## Throughput (K=90): 16.216jobs/sec
## Avg queue lengths (K=100): 5.79 0.63 0.61 0.63 0.6 0.68 0.59 0.64 0.6 0.63 88.6
## Throughput (K=100): 17.719jobs/sec
## Avg queue lengths (K=110): 10.57 0.75 0.76 0.76 0.74 0.77 0.8 0.72 0.74 0.72 92.67
## Throughput (K=110): 18.533jobs/sec
## Avg queue lengths (K=120): 16.08 0.78 0.76 0.77 0.7 0.78 0.76 0.8 0.73 0.84 96.99
## Throughput (K=120): 19.399jobs/sec
## Avg queue lengths (K=130): 24.31 0.79 0.76 0.77 0.75 0.82 0.8 0.79 0.8 0.79 98.62
## Throughput (K=130): 19.724jobs/sec
## Avg queue lengths (K=140): 32.99 0.79 0.79 0.76 0.82 0.77 0.77 0.81 0.77 0.74 100
## Throughput (K=140): 20.001jobs/sec
## Avg queue lengths (K=150): 39.16 0.78 0.76 0.75 0.8 0.87 0.85 0.75 0.77 0.82 103.69
## Throughput (K=150): 20.738jobs/sec
## Avg queue lengths (K=160): 53.15 0.77 0.86 0.79 0.73 0.76 0.83 0.77 0.85 0.8 99.69
## Throughput (K=160): 19.939jobs/sec
## Avg queue lengths (K=170): 62.08 0.82 0.79 0.82 0.82 0.73 0.79 0.76 0.76 0.79 100.84
## Throughput (K=170): 20.169jobs/sec
## Avg queue lengths (K=180): 69.5 0.84 0.79 0.75 0.74 0.82 0.91 0.79 0.79 0.83 103.23
## Throughput (K=180): 20.646jobs/sec
## Avg queue lengths (K=190): 81.79 0.82 0.78 0.79 0.8 0.79 0.84 0.81 0.82 0.79 100.98
## Throughput (K=190): 20.196jobs/sec
## Avg queue lengths (K=200): 91.56 0.82 0.79 0.79 0.84 0.8 0.81 0.84 0.86 0.78 101.12
## Throughput (K=200): 20.223jobs/sec
plot(K_values, throughput*1000,
type = "b", pch = 19,
xlab="Number of jobs (K)",
ylab="Throughput (jobs/sec)",
lwd=3, # Line width
lty = 1)
abline(h = 15, col="red")