Tuesday, February 07, by the start of class time (2:00pm).
Use of R Markdown is optional for this assignment. Problems 1–4 were given as in-class exercises on 01/27.
Generate n = 50 random values from an exponential distribution with rate parameter = 10. Generate a plot of the exponential PDF with the data points overlaid (see PDF for desired output).
If you want your data to be identical to mine, add
set.seed(42) before you generate the random values. This
will initialize R’s random number generator so that it produces the same
‘random’ values on different computers.
library(ggplot2)
set.seed(42)
x <- rexp(50, rate = 10)
#x <- rexp(50, rate = 10) #generates 0 - 50 numbers at a rate of 10
#cat(x)
x <- data.frame(x)
print(ggplot(x, aes(x = x, y = 0.09, xend = x, yend = 0)) #takes in data frame, specifies parameters for x and y values (necessary for geom functions)
+ stat_function(fun = dexp, # plot function of exponential distribution
geom = "area",
color = "blue",
fill = "blue",
alpha = 0.5) +
xlim(c(0,5)) +
geom_point() +
geom_segment()
)
## Warning: The following aesthetics were dropped during statistical transformation: xend
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
#fig <- ggplot(data.frame(x, aes(x))) + stat_function(fun = dnorm, geom = "polygon")
#fig <- ggplot(x) + stat_function() + geom_point() + geom_segment()
#print(fig)
#the graph is incorrect, and Im not sure how to use geom_point() and geom_segment() without breaking the code and/or adding a Y value
The data for problem 1 was generated from an exponential distribution with \(\lambda = 10\). Suppose we only had the data, but didn’t know what distribution they came from or the true parameters used to generate the data.
Part a: What is the maximum likelihood estimate (MLE) for the rate parameter of an exponential distribution, given the data from problem 1?
# Figure out the value of lambda, that makes the value as large as possible
mle <- 1/mean(x) # wikipedia formula
## Warning in mean.default(x): argument is not numeric or logical: returning NA
Part b: Assuming instead the data is generated from a normal distribution, what are the maximum likelihood estimates for \(\mu\) and \(\sigma\)?
# this is preventing my code from rendering to html
like_func <- function(x) {
-sum(dnorm(x, mean(x), sd(x), log = TRUE)) #mu = mean of x values, sigma is the standard dev
}
print(like_func(x))
Write a function that takes two arguments: lambda, and a vector of observations, and returns the log-likelihood of the observations according to an exponential distribution with the given value of lambda.
my_data <- x
expo <- function(lambda, my_data) {
sum(dexp(my_data, rate = lambda, log = TRUE))
}
# take the SUM of the value not the product
# we want the log liklihood
#lambda is some value, dexp is the vector of values
#Results <- integrate(integrand, 0, Inf)
Plot the log-likelihood function for the data you generated in
problem (1) That is, generate a plot with \(\lambda\) on the x-axis, and
exponential_log_likelihood(\(\lambda\)) on the y-axis.
Make sure the x and y limits of your plot show the peak of the log-likelihood function.
Hint #1: ggplot( ) + stat_function( )
Hint #2: Your function
exponential_log_likelihood() function needs to be
vectorized—that is, given a vector of lambda values, it should
return a vector of log-likelihood values.
exponential_log_likelihood <- functions{
v <- vector(size(0)) #if I were to do this properly, I would fill this empty vector with lambda values, then use this function to calculate and return the log-likelihood values of the previous values.
return(v)
}
The p.d.f. for an exponential distribution is defined as:
\[p(x) = \lambda e^{-\lambda x}\]
for \(x \in \left[0, \infty \right)\).
Using numerical integration, what is the probability that \(x \in \left(2, 5\right)\) when \(\lambda = 1\)?
Repeat the previous problem, except this time use the cumulative distribution function (c.d.f.) to determine your answer instead of numerical integration.
Using numerical integration, compute the variance of an exponential distribution with rate parameter \(\lambda = 2\). (You can check your answer against Wikipedia.)
#Variance == 1/lambda^2
#x <- rexp(50)
#Variance <- functions{
# (x - ( 1 / lambda))^2
#}
#Variance(x)
The skewness of a probability distribution is defined as:
\[ \gamma = \mathrm{E}\left[\left(\frac{x - \mu}{\sigma} \right)^3\right]\]
where \(\mu\) is the mean and \(\sigma\) is the standard deviation of the distribution. Using this definition, compute the skewness of an exponential distribution with rate parameter \(\lambda = 2\). (As before, you can check your answer against Wikipedia.)