Practice Problems on Distribution Functions and Probability Calculations

1. Suppose the random variable \(X\) has density \(1.5t^{(0.5)}\) on \((0, 1)\), \(0\) elsewhere.

Find \(F_X(0.5)\)

Mathematical Solution:

Our density function is given by

\[ \begin{aligned} f_X(t)&= P(X=t)\\ &=\begin{cases} 1.5t^{0.5} & 0\leq t\leq 1 \\ 0 & otherwise \end{cases} \end{aligned} \]

Therefore, the cumulative function is given by

\[ \begin{aligned} F_X(t) &=P(X\leq t)\\ &=\int_{0}^{t} f_X(t) \space dt\\ &=\int_{0}^{t} 1.5t^{0.5} \space dt\\ &=t^{1.5}\Big|_0^t\\ &=t^{1.5}. \end{aligned} \]

Therefore, \(F_X(0.5)=(0.5)^{1.5}\approx 0.3536\).

Computational Solution:

Fx <- function(t) t^1.5

Fx(0.5)

## [1] 0.3535534

# ggplot

t<-0:1
df<-data.frame(t)

ggplot(df,aes(t))+ stat_function(fun=Fx,size=1.1,color="mediumpurple1")+
  geom_vline(xintercept=0.5,linetype="dashed")+
  geom_hline(yintercept=Fx(0.5),linetype="dashed")+labs(title = "CDF",x="t",y="F_X(t)")+
  theme(plot.title = element_text(hjust=0.5))

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

# Base R
# curve(Fx,0,1)

3. Consider the network buffer example, Section \(6.7.4.2\). Find a number \(u\) such that there is a \(90%\) chance that the transmission is made before \(u\) milliseconds pass.

Mathematical Solution:

We will not explore the mathematical solution in this case, but we’ll at least start the problem.

Let U be our random variable and prob = P(U\(\leq\) u) be the probability that less than or equal to \(u\) milliseconds have passed before a transmission is made. That is, we want to find \(u\) such that \(u=F^{-1}_U(u)\).

Computational Solution:

Method 1:

The easiest way is to use qgamma, since is built-in function in R that returns the quantile function for the gamma distribution.

qgamma(0.9,5,0.01)

## [1] 799.359

Method 2:

What if we were given a distribution for which is no built-in function in R? In that case we can use a data frame to store the milliseconds passed (1 to 1000) and corresponding probabilities (prob).

out <- data.frame(U = seq(1,1000,by=1), prob = NA)

for(ms in 1:nrow(out)){
out[ms,"prob"] <- pgamma(ms,5,0.01)
}

ggplot(out,aes(x=U,y=prob))+geom_line(size=1.1,color="steelblue1")+
  geom_hline(yintercept=0.90,linetype="dashed")+
  geom_vline(xintercept=800,linetype="dashed")+
  labs(title = "CDF",x="milliseconds",y="P(U<=ms)")+
  theme(plot.title = element_text(hjust=0.5))

# Print the value of ms and call it u when prob(ms)=0.9
for(i in 1:nrow(out)) {
  if (out[i,"prob"]<0.9){
    next
  }
  else{
    u <- out[i,"U"]
    break
  }
}

u

## [1] 800

8. The \(\textit{mode}\) of a density \(f_X(t)\) is the value of \(t\) for which the density is maximal, i.e., the location of the peak. (Some densities are multimodal, complicating the definition, which we will not pursue here.) Find the mode of a gamma distribution with parameters \(r\) and \(\lambda\).

Mathematical Solution:

A random variable \(X\) that is gamma-distributed with parameters shape = \(r\) and rate = \(\lambda\), denoted as \(X \sim \Gamma(r,\lambda)\), has a probability density function (pdf) given by

\[ f_X(t; r, \lambda)= \begin{cases} \frac{t^{r-1}e^{-\lambda t}\lambda^r}{\Gamma(r)} & t,r,\lambda>0 \\ 0 & otherwise \end{cases} \]

where \(\Gamma(r)=(r-1)!\) is known as the \(\textit{gamma function}\).

We wish to find the mode, which is the value of \(t\) for which the density is maximal. To do this, we most find the location of peak. From calculus, we know that finding the time \(t\) at which a function has a maximum or a minimum is equivalent to finding the time \(t\) for which the derivative is zero. The derivative of \(f_X(t; r, \lambda)\) is given by

[

\[\begin{aligned} \frac{d}{dt}(f_X(t; r, \lambda)) &= \frac{d}{dt} \left( \frac{t^{r-1}e^{-\lambda t}\lambda^r}{\Gamma(r)}\right) \\ &= \frac{\lambda ^ r}{\Gamma(r)}[t^{r-1}*(-\lambda e^{-\lambda t}) + e^{-\lambda t}*((r-1)t^{r-2})]\\ &=\frac{\lambda ^ r}{\Gamma(r)} e^{-\lambda t} [(r-1)t^{r-2}-\lambda t^{r-1}]\\ &=\frac{\lambda ^ r}{\Gamma(r)} t^{r-2} e^{-\lambda t} [r-1-\lambda t]. \end{aligned}\]

]

Therefore, we want to find \(t\) values for which \(\frac{\lambda ^ r}{\Gamma(r)} t^{r-2} e^{-\lambda t} [r-1-\lambda t]=0\). Since \(\frac{\lambda ^ r}{\Gamma(r)}\) is a constant and \(\lambda ^ r\) is positive we can multiply both side of this equation by \(\frac{\Gamma(r)}{\lambda ^ r}\) which yields that \(t^{r-2} e^{-\lambda t} [r-1-\lambda t] = 0\), which is zero when \(t=0\) or \(t=\frac{r-1}{\lambda}\).

Now, there are two cases to consider:

If \(t=0\), we know this cannot be the mode as the pdf is zero here by definition.

If \(t=\frac{r-1}{\lambda}\), there are two subcases to consider:

If \(0<r<1\), then \(t<0\), which cannot be the mode as the pdf is zero here by definition.
If \(r\geq 1\), this give us the model.

Therefore, the mode is given by \(t=\frac{r-1}{\lambda}\) whenever \(r\geq 1\).

10. Consider the network buffer example, Section \(6.7.4.2\). Suppose interarrival times, instead of having an exponential distribution with mean \(100\), are uniformly distributed between \(60\) and \(140\) milliseconds. Write simulation code to find the new value of \((6.51)\).

df <- replicate(1000,sum(runif(5,60,140)))

Now, we can apply the ecdf R function in order to calculate the Empirical Cumulative Distribution Function (ECDF) values of our example data:

ecdf(df)

## Empirical CDF 
## Call: ecdf(df)
##  x[1:1000] = 356.63, 364.19, 374.22,  ..., 644.66, 652.18

# Base R:
plot(ecdf(df))

Therefore, the new value of 6.51 is the following:

1 - ecdf(df)(552)

## [1] 0.14

12. Say \(f_X(t) = 4t^3\) for \(0 < t < 1\), \(0\) elsewhere. Write a function with call form

\(r43( n )\)

to generate \(n\) random numbers from this distribution. Check it by finding \(EX\) analytically and comparing to

\(\text{print} ( \text{mean} ( r43 (10000)))\)

Mathematical Solution:

\[ \begin{aligned} E(X) &=\int_{0}^{1} x f_X(x) \space dx\\ &=\int_{0}^{1} t * 4t^3 \space dx\\ &=\frac{4}{5}t^{5}\Big|_0^1\\ &=\frac{4}{5}(1)^{5}-0\\ &=0.8. \end{aligned} \]

Computational Solution:

In order to find this function, we must find the quantile function first. Thus, we will begin by stating the functions for the PDF and CDF first. Notice that the PDF was given and the CDF is equivalent to

\[ \begin{aligned} F_X(t) &=\int_{0}^{t} f_X(t) \space dt\\ &=P(X\leq t)\\ &=\int_{0}^{t} 4t^3 \space du\\ &=t^{4}\Big|_0^t\\ &=t^{4}. \end{aligned} \]

df <- function(t) 4*t^3

pf <- function(t){
  y <- t^4
  y[t<=0] <- 0
  y[t>=1] <- 1
  y
}

# Base R:
#curve(df,0,1)
#curve(pf,0,1)


# ggplot:

t<-0:1
out3<-data.frame(t)

g1<-ggplot(out3,aes(t))+ stat_function(fun=df,size=1.1,color="palevioletred1")+
  labs(title = "PDF",x="t",y="f_X(t)")+
  theme(plot.title = element_text(hjust=0.5))

g2<-ggplot(out3,aes(t))+ stat_function(fun=pf,size=1.1,color="orange")+
  labs(title = "CDF",x="t",y="F_X(t)")+
  theme(plot.title = element_text(hjust=0.5))

plot_grid(g1,g2)

To find the quantile function, we find the inverse of the CDF function:

qf <- function(p){
  if(any(p<0) | any(p>1)) stop("P must be between 0 and 1")
  p^(1/4)
} 

r43 <- function(n){
  qf(runif(n))
}

hist(r43(1000),freq = FALSE)
curve(df(x),add=TRUE)

print(mean(r43(1000)))

## [1] 0.8039571

Therefore, we have that the value of E(X) and print(mean(r43(1000))) are approximately the same.

14. Use R to graph the cdf of the random variable \(D\) in Section \(6.4.2.\)

\(X \sim Uniform(-1,2)\)

\[ \begin{aligned} D &= \begin{cases} 1 & 1< x\leq 2 \\ x & 0\leq x\leq 1\\ 0 & -1\leq x < 0 \end{cases} \end{aligned} \]

Our CDF is defined by \(F_D(d) = P(D\leq d)\).

# ggplot
t = seq(-1, 2, 0.001)
df<-data.frame(t)

fD1 <- function(t){
    fD1 <- NULL
    fD1[t>1 & t<=2] <- 1
    fD1
}

fD2 <- function(t){
    fD2 <- NULL
    fD2[t>=0 & t<=1] <- t[t>=0 & t<=1]
    fD2
}

fD3 <- function(t){
    fD3 <- NULL
    fD3[t>=-1 & t<0] <- 0
    fD3
}

ggplot(df,aes(t))+
  stat_function(fun=fD1,size=1.1,color="darksalmon")+
  stat_function(fun=fD2,size=1.1,color="darkturquoise")+
  stat_function(fun=fD3,size=1.1,color="darkseagreen4")+
  
  geom_point(aes(x=2,y=1), shape=21, fill="darksalmon", color="darksalmon", size=3)+
  geom_point(aes(x=1,y=1), shape=21, fill="darkturquoise", color="darkturquoise",  size=3)+
  geom_point(aes(x=0,y=0), shape=21, fill="darkturquoise", color="darkturquoise",  size=3)+
  geom_point(aes(x=-1,y=0), shape=21, fill="darkseagreen4", color="darkseagreen4",  size=3)+
  labs(title = "Random Variable D",x="X",y="D")+
  theme(plot.title = element_text(hjust=0.5))

## Warning: Removed 67 rows containing missing values (`geom_function()`).

## Warning: Removed 68 rows containing missing values (`geom_function()`).

## Warning: Removed 67 rows containing missing values (`geom_function()`).

There are three cases to consider:

Case 1: If \(d=0\), then \(P(D\leq 0) = P(X\leq 0) = \frac{1}{3}\).

punif(0,-1,2)

## [1] 0.3333333

Case 2: If \(0 < d\leq 1\), then \(P(D\leq d) = P(X\leq d) = \frac{d-(-1)}{2-(-1)} = \frac{1}{3}d + \frac{1}{3}.\)

#punif(t,-1,2)

Case 3: If d=1, then \(P(D\leq 1) = 1\).

punif(1,-1,2)

## [1] 0.6666667

\[ \begin{aligned} F_D(d)&= \begin{cases} \frac{1}{3}d + \frac{1}{3} & 0\leq d < 1 \\ 1 & d \geq 1\\ 0 & d<0 \end{cases} \end{aligned} \]

# Base R
plot.new()
plot.window(xlim = c(-1,2), ylim = c(0,1))
box()
axis(1)
axis(2)
lines(c(-1,0),c(0,0), lwd=3)
lines(c(1,2),c(1,1),lwd=3)
lines(c(0,1),c(1/3,2/3),lwd=3)
points(c(0,0,1,1),c(0,1/3,2/3,1), pch = c(1,16,1,16))

16. At the end of Section \(6.5.1\), it is stated that “a density can have values larger than \(1\) at some points, even though it must integrate to \(1\).” Give a specific example of this in one of the parametric families in this chapter, showing the family, the parameter value(s) and the point t at which the density is to be evaluated.

Let’s consider an example of a density for which all values are greater than 1. In general, the probability density of the uniform distribution is given by

\[ \begin{aligned} f_X(t)&=\begin{cases} \frac{1}{max-min} & min\leq t\leq max \\ 0 & otherwise \end{cases} \end{aligned} \]

Then, the uniform distribution on the interval \(\left[0,\frac{1}{2}\right]\) has a probability density function of

\[ \begin{aligned} f_X(t)&=\begin{cases} 2 & 0\leq t\leq\frac{1}{2} \\ 0 & otherwise \end{cases} \end{aligned} \]

Parametric family: Uniform Distribution

Parameter values: \(min, max\)

Point t at which the density is to be evaluated: \(t\in\left[0,\frac{1}{2}\right]\)