STAT 417 Project 1

Two Mechanics

Two mechanics are changing oil filters for the arriving customers. The service time has an Exponential distribution with mean 12 minutes for the first mechanic, and mean 3 minutes for the second mechanic. When you arrive to have your oil filter changed, your probability of being served by the faster mechanic is 0.8.

Use simulation to generate 10000 service times and estimate the mean service time for yourself.
Summarize your data with R code hist(x, probability = TRUE) and mean(x), where x is an array representing your simulated data. Add to the histogram two vertical lines indicating the simulated mean and the theoretical mean. A histogram can be made based on frequencies. Here is a video about histograms: https://www.youtube.com/watch?v=uHRqkGXX55I. A density histogram can also be made. To get a density histogram, the height of each bar is divided by the total number of values, and then by the length of the bar. Such a histogram has a vertical scale comparable with the probability density function of the distribution from which data are randomly drawn.
Add the curve of the probability density function (which is a mixture of two exponential pdfs) of the service time to the density histogram you got in question 2. For a mixture of two exponential distributions, refer to formula 15 of the paper http://www.columbia.edu/~ww2040/OnApproximationsIII84.pdf.

Generating Values

To generate the service times for each mechanic, we create a function generateTimes() that can help us. It arguments include the mean service times of each mechanic and the number of iterations to generate. (The code for this function is available in the Code Section of this document.)

Now we generate our values using the function:

# variables
n <- 10000
lambda1 <- 1/3
lambda2 <- 1/12
p1 <- 0.8
p2 <- 1 - p1

# empty vector to store values
data <- generateTimes(lambda1, lambda2, p1, p2, n)

Here are the first 50 values:

data$times[1:50]

##  [1]  9.3099007  3.9548983  1.1756092  0.1266302  3.9223161  1.2185691
##  [7]  5.9870434  0.6725234  0.9868688  1.6361711  5.6808767  0.7579711
## [13]  1.9509329  1.0005531  8.6062261 39.6117727  5.6569027  6.0276861
## [19]  3.4497401 16.6665626  2.7891469  7.0319120 18.2042430  9.3339309
## [25]  5.4183372 13.1739603 16.4868703 12.0234483  3.0840899  6.4987790
## [31]  0.2573348  9.1799765  0.5118473  8.3393904  3.2078271  1.7563506
## [37]  5.4739334  0.8774403  0.4496911 22.4949649  8.0814485 15.6242183
## [43]  8.0577878  1.7217692  5.2004053  7.5836018  0.7085888  2.9943535
## [49]  3.7589695  2.2594585

Visualising Distribution

We use the ggplot2 package for all our visualizations.

This section is further divided into visualizations based on frequencies and density of our data.

Frequencies

# by frequencies
ggplot(data, aes(x=times)) +
  geom_histogram(binwidth=3) +
  labs(title="Frequency Histogram of Service Times",
       x="Times",
       y="Frequencey")

Density

We add a density curve and the theoretical and actual mean lines to our previously generated frequency plot. (Notice the difference on the y-axes compared to the previous plot.)

The theoretical mean is calculated using the following formula, since our distribution is a combination of two separate distributions: \[\mu = 0.8(\frac{1}{\lambda_1})+0.2(\frac{1}{\lambda_2})\]

# theoretical mean
tmean <- p1*(1/lambda1) + p2*(1/lambda2)

# actual mean
amean <- mean(data$times)

# by density
ggplot(data, aes(x=times)) +
  geom_histogram(aes(y= ..density..), binwidth=3, fill="grey") +
  geom_density(col="blue") +
  labs(title="Density Histogram of Service Times",
       x="Times",
       y="Density",
       color="Legend") +
  geom_vline(aes(xintercept = amean), col="red") +
  geom_vline(aes(xintercept = tmean), col="orange")

Summary & Discussion

We use the runif() function to generate a random uniform number which we then use to simulate our service times for each mechanic. We record those values and generate our data set.

We then visualize our values to see the underlying distribution, which comes out to be similar to an exponential distribution, as we expected.

We visualize using both frequencies and density, and calculate the theoretical and actual mean to test if our theory holds in practice. On our visualization, we can only see the orange line for the theoretical mean as it overshadows the actual mean, implying that both are very close to each other. We can also check using R:

amean # actual mean

## [1] 4.826489

tmean # theoretical mean

## [1] 4.8

We can conclude that our simulation was successful in modelling the service times for any customer.

Code

This section provides all the code used for this problem.

library(ggplot2)

# function for generating values
generateTimes <- function(mean1, mean2, p1, p2, n) {
  
  times <- c()
  mechanic <- c()
  
  # for loop used to generate n iterations
  for(i in 1:n) {
    
    # uniform random number generator
    uniformNum <- runif(1)
    
    # depending on uniformNum, we generate random exponential value based on mean
    times[i] <- ifelse(uniformNum <= p1, rexp(1, mean1), rexp(1, mean2))
    mechanic[i] <- ifelse(uniformNum <= p1, 1, 2)
  }
  
  data <- data.frame(mechanic, times)
  return(data)
}

# variables
n <- 10000
lambda1 <- 1/3
lambda2 <- 1/12
p1 <- 0.8
p2 <- 1 - p1

# creating data set
data <- generateTimes(lambda1, lambda2, p1, p2, n)

data$times[1:50]

# plot by frequencies
ggplot(data, aes(x=times)) +
  geom_histogram(binwidth=3) +
  labs(title="Frequency Histogram of Service Times",
       x="Times",
       y="Frequencey")

# theoretical mean
tmean <- p1*(1/lambda1) + p2*(1/lambda2)

# actual mean
amean <- mean(data$times)

# plot by frequency and density
ggplot(data, aes(x=times)) +
  geom_histogram(aes(y= ..density..), binwidth=3, fill="grey") +
  geom_density(col="blue") +
  labs(title="Density Histogram of Service Times",
       x="Times",
       y="Density",
       color="Legend") +
  geom_vline(aes(xintercept = amean), col="red") +
  geom_vline(aes(xintercept = tmean), col="orange")