Finding Probabilities in Exponential Probability Distributions

In this markdown, I use a CDF in the Memory-less Exponential Probability Distribution for a Hair Salon use case example. Hypothetically, the Hair Salon would like to get an idea of how often they can expect customers around the clock. We’ve done calculations and found that the average customer arrival times are 30 minutes apart from one another.

Our task is to give insight on the likelihood of customer arrivals occurring before or after that 30-minute arrival time. This could help the Hair Salon with staffing coordinations such as scheduling employee hours, or even finding the best time windows to leave out for a quick lunch break! Also, knowing when to clean up the Hair Salon in between customers is useful for operations management as well.

Task

Find the probability that a customer will enter into the Hair Salon within various time intervals throughout a day.

Identify assumptions of the exponential probability distribution

Exponential Probability Distribution Assumptions

Data must:

1.) Be Independent

2.) Be Identically Distributed as Exponential

Check both assumptions

For this self-guided exercise, I’ll assume that amounts of time in between customer arrivals into a hair salon are known to have an exponential distribution. The data will be generated from an Exponential Distribution. Finally, I’ll assume customers come in every 30 minutes on average.

Question 1

What are the chances that a customer will arrive in less than 10 minutes from the previous customer’s arrival, i.e. 20 minutes earlier than average?

# *Call help document for pexp() function.
# ?pexp()

The ‘rate’ argument in the pexp() function represents the rate, or amount of time it takes on average for an event to transpire. The rate is actually equal to Lambda. The formula for Lambda is 1 divided by the mean or expectation. Lambda, in turn, tells what the relationship or proportion of mean is when placed under 1 in a fraction. The first argument, ‘q’, simply stands for the quantile in which you’re trying to predict probabilities for.

Use pexp() function to find probability

Less_Than_10M <- pexp(q = 10, rate = 1/30 )

Less_Than_10M
## [1] 0.2834687

Exponential distribution CDF shows a 28.3% likelihood of a customer arrival occurring in less than 10 minutes! I’ll create a cool model to visualize this relationship!

# ?curve()
# ?Shade()
# install.packages("DescTools")
# library("DescTools")

# Plot shaded curve
curve(pexp(x, rate = .03333),
      from = 0, to= 60, col='cyan3', lwd = 4,
      xlab = "Minutes to Customer Arrival",
      ylab = "Exponential Probability %")
Shade(pexp(x, rate = .03333),
      xlim=c(0, 60), breaks = c(0, 10),
      col=c("lightgray"), density=c(50))
points(10, 0.2834687, pch = 19)
text(23, .23, "10 Minutes = pexp(28.3%)", cex = .8)
title(main = "Less_Than_10M")

# plot.new()

Question 2

What is the probability of a customer arriving in less than an hour after the previous customer arrival?

Less_Than_1HR <- pexp(q = 60, rate = 1/30 )

Less_Than_1HR
## [1] 0.8646647
# Plot shaded curve
curve(pexp(x, rate = .03333),
      from = 0, to= 100, col='cyan3', lwd = 4,
      xlab = "Minutes to Customer Arrival",
      ylab = "Exponential Probability %")
Shade(pexp(x, rate = .03333),
      xlim=c(0, 100), breaks = c(0, 60),
      col=c("lightgray"), density=c(50))
points(60, 0.8646647, pch = 19)
text(72, .75, "60 Minutes = pexp(86.4%)", cex = .8)
title(main = "Less_Than_1HR")

# plot.new()

The CDF shows us that there is an 86.4% likelihood, or probability, of a customer coming into the salon within the next hour.

# This type of probability distribution
# could be helpful for estimating more than
# just customer arrivals. It answers questions such as
# how many in-bound phone calls may be received in an
# hour, or in a user experience analysis on
# a website, how long visitors are likely to
# stay on a particular page. Figuring out
# how much time employees are likely
# to spend with a customer can be extremely
# helpful for HR departments working on people
# analytics. When staffing or doing inventory
# preparation for an event, the interval estimates on
# potential turnouts, guest influx, or chances of
# supplies running out are crucial metrics this
# distribution can find as well. As you can see,
# the list goes on for the Exponential Probability
# Distribution's real-world applications, which
# is why it draws my interest.

# Taking this exercise a bit further,
# I'll try to figure out some more probabilities
# within this same business case example.

Question 3

What are the chances that a customer will show up after 5 minutes have passed?

More_Than_5M <- 1 - pexp(q = 5, rate = 1/30 )

More_Than_5M
## [1] 0.8464817
# I subtract the function from 1 to get the
# remaining proportion of 100% which would represent
# event probabilities greater than, instead of less 
# than. However, I simply could've changed the lower 
# tail argument to false and got the same
# output as well.
pexp(q = 5, rate = 1/30, lower.tail = F )
## [1] 0.8464817
# Plot shaded curve
curve(pexp(x, rate = .03333),
      from = 0, to= 80, col='cyan3', lwd = 4,
      xlab = "Minutes to Customer Arrival",
      ylab = "Exponential Probability %")
Shade(pexp(x, rate = .03333),
      xlim=c(0,80), breaks = c(5, 80),
      col=c("lightgray"), density=c(50))
points(5, 0.1535183, pch = 19)
text(15, .25, "5 Minutes = pexp(84.6%)", cex = .8)
title(main = "More_Than_5M")

Wow, that’s over 84%! The majority of the probability distribution begins after 5 minutes! Let’s see what probability quantile percentage lies within 5 minutes to 50 minutes of customer arrival.

Question 4

What probability quantile percentage lies within 5 minutes to 50 minutes of a customer arrival?

From_5M_To_50M <- pexp(50, rate = 1/30) - pexp(5, rate = 1/30)

From_5M_To_50M
## [1] 0.6576061
# Plot shaded curve
curve(pexp(x, rate = .03333),
      from = 0, to= 100, col='cyan3', lwd = 4,
      xlab = "Minutes to Customer Arrival",
      ylab = "Exponential Probability %")
Shade(pexp(x, rate = .03333),
      xlim=c(0, 100), breaks = c(5,50),
      col=c("lightgray"), density=c(50))
points(5, 0.1535183, pch = 19)
points(50, 0.8111244, pch = 19)
text(26, .5, "5 to 50 Minutes = pexp(65.7%)", cex = .8)
title(main = "From_5M_To_50M")

This is very intriguing to me because I thought we’d see a higher probability considering less than 1 hour showed an 86% chance, and 5 minutes or later gave an 84% chance. However, as you can see, pexp() gives us a 65.7% chance of getting a customer in a window between 5 minutes and 50 minutes. This may be showing us something interesting about the nature of the Exponential Probability Distribution!

I’ll do a bit more digging!

Question 5

What is the probability quantile between 30 and 90 minutes of arrival?

From_30M_To_90M <- pexp(90, rate = 1/30) - pexp(30, rate = 1/30)

From_30M_To_90M
## [1] 0.3180924
# Plot shaded curve
curve(pexp(x, rate = .03333),
      from = 0, to= 120, col='cyan3', lwd = 4,
      xlab = "Minutes to Customer Arrival",
      ylab = "Exponential Probability %")
Shade(pexp(x, rate = .03333),
      xlim=c(0, 120), breaks = c(30, 90),
      col=c("lightgray"), density=c(50))
points(30, 0.6321206, pch = 19)
points(90, 0.9502129, pch = 19)
text(45, .8, "30 to 90 Minutes = pexp(31.8%)", cex = .8)
title(main = "From_30M_To_90M")

Aha! So… Any amount of time after 30 minutes, you only have a 31% chance of seeing customer arrivals within the next hour afterward. That’s a fairly significant drop off in Probability, especially when accounting for the wideness of the time gap being considered here. It seems as though once we get further out from the average of 30 minutes, we see significant drops in likelihood. I’d like to investigate the probability relationship of the 30-minute average a bit further.

Question 6

What is the probability of customer arrival anytime after the 30-minute average?

pexp(30, rate = 1/30, lower.tail = F)
## [1] 0.3678794
pexp(40, rate = 1/30, lower.tail = F)
## [1] 0.2635971
pexp(50, rate = 1/30, lower.tail = F)
## [1] 0.1888756

There you have it! There’s less than 37% chance that a customer arrives anytime after that 30-minute average. That says a great deal about the Exponential Probability Distribution. Interestingly enough, you can see that the further we get beyond the 30-minute average, probability drops faster and faster. That means Hair Salon managers really need to design business operations fairly closely based around that mean of 30 minutes to stay harmonious with customer influx.

Conclusion

After 30 minutes, you only have a 31% chance of seeing a customer within the next hour afterward. Yet, within the first hour, you have an 86% chance of seeing a customer arrival. This shows the nature of the exponential probability distribution and how gradually, as we deviate from the average, probability drops more and more dramatically, i.e., exponentially.

This, of course, could hypothetically be explained by phenomena such as the sun going down, influencing most people to head home toward the end of the day. Additionally, if it’s near closing, that could lead to fewer arrivals as there isn’t enough time left to complete entire hair appointments, and this could be one of many other potential contributors to the exponential distribution in this case.

This has been my short example of using pexp() for finding Probabilities in the Exponential Distribution!

Thanks for viewing!