knitr::opts_chunk$set(echo = TRUE)
library("tidyverse") #to load ggplot() and read_csv() functions
data_obs <- rexp(1000, rate = 1/3.2)
author field abovesave as this file as a separate *.rmd file, and replace “template” with your full nameSource: https://cmdlinetips.com/2019/03/introduction-to-maximum-likelihood-estimation-in-r/
# generate data from Poisson distribution with the parameter lambda = 5
N <- 100 #sample size
data_pois <- rpois(n = N, lambda = 5)
# convert to data frame
df_Poisson <- data.frame(data_pois)
# plot data
df_Poisson %>%
ggplot(aes(x=data_pois))+
geom_histogram(bins = N) +
labs(title = "Poisson Distribution",
subtitle = "lambda = 5",
caption = "Math 32",
x = "data",
y = "count") +
theme_bw()
# likelihood of a single data point, guessing lambda = 1
dpois(data_pois[1], lambda = 1)
## [1] 0.06131324
# likelihood of a single data point, over a sequence of guesses for lambda
lambda_values <- seq(1, 20)
likelihood <- dpois(data_pois[1], lambda = lambda_values)
# arranged into a data frame
lh_single <- data.frame(lambda_values = lambda_values, likelihood = likelihood )
lh_single %>%
ggplot(aes(x = lambda_values,y = likelihood))+
geom_point(size = 4, color = "blue")+
labs(title = "Likelihood of a single data point over multiple lambda values",
subtitle = "(we will look for a maximum)",
caption = "Math 32",
x = "lambda",
y = "likelihood") +
theme_bw()
# log-likelihood of a single data point, guessing lambda = 1
dpois(data_pois[1], lambda = 1, log = TRUE)
## [1] -2.791759
# log-likelihood of a single data point, over a sequence of guesses for lambda
lambda_values <- seq(1, 20)
log_likelihood <- dpois(data_pois[1], lambda = lambda_values, log = TRUE)
# arranged into a data frame
lh_single <- data.frame(lambda_values = lambda_values, log_likelihood = log_likelihood)
lh_single %>%
ggplot(aes(x = lambda_values,y = log_likelihood))+
geom_point(size = 4, color = "blue")+
labs(title = "Log-likelihood of a single data point over multiple lambda values",
subtitle = "(we will look for a maximum)",
caption = "Math 32",
x = "lambda",
y = "log-likelihood") +
theme_bw()
LL_poisson <- function(lambda, y){
# log-likelihood by summing
LL <- sum(dpois(y, lambda, log=TRUE))
return(LL)
}
lambda_values <- seq(1, 20, by = 0.01)
# compute log-likelihood for all data, for all lambda values
LL <- sapply(lambda_values, function(x){ LL_poisson(x, data_pois) })
# save the lambdas and log-likelihoods in a data frame
df <- data.frame(LL, lambda_values)
df %>%
ggplot(aes(x = lambda_values,y = LL)) +
geom_point(color = "blue") +
geom_vline(xintercept = lambda_values[which.max(LL)], color = "red", size = 2) +
labs(title = "Maximum Likelihood Estimation",
subtitle = "using the log-likelihood (original data had lambda = 5)",
caption = "Math 32",
x = "lambda",
y = "log-likelihood") +
theme_bw()
# the MLE value itself
lambda_values[which.max(LL)]
## [1] 5.01
Here we generate \(n\) independent uniformly distributed random numbers (\(x_1, x_2, \dots, x_n\)) on an interval from \([0,a]\) where you specify the value of \(a\).
a <-32
N <- 100
data_unif <- runif(N, 0, a)
likelihood_unif <- function(a, data){
temp <- dunif(data, 0, a)
return(prod(temp, na.rm = TRUE))
}
Describe the graph and calculations in your own words.
# compute likelihood for all data, for many guesses at parameter a
a_values <- seq(1, 50, by = 0.01)
LH <- sapply(a_values, function(x){ likelihood_unif(x, data_unif) })
df <- data.frame(a_values, LH)
# graph the likelihood curve
df %>%
ggplot(aes(x = a_values, y = LH)) +
geom_vline(xintercept = a_values[which.max(LH)], color = "red", size = 2) +
geom_point(color = "blue") +
labs(title = "Maximum Likelihood Estimation",
subtitle = "using the likelihood (original data had a = 32)",
caption = "Math 32",
x = "a",
y = "likelihood") +
theme_bw()
The graph chooses data points between values 1 and 50 with 0.01 step-size. The variable LH calculates and stores likelihood, and the variable df is the data frame used to graph the likelihood using LH and a-samples. The graph likelihood is a constant 0 except for values approaching 32 from the right, which increases to a random value then drops back to 0 on the left side.
Use a line of R code to empirically determine the maximum likelihood choice for \(a\) from your likelihood.
a_values[which.max(LH)]
## [1] 31.85
max(LH)
## [1] 4.887157e-151
The data file obsExpData.csv contains a data set of 1000 independent points sampled from an exponential distribution with unknown parameter \(\lambda > 0\).
NOTE: If this next code block does not work for you, set eval = FALSE and re-run all of the R code blocks. However, you will need to be able to debug this error for next week’s homework (“put everything in your Math 32 folder”).
data_obs <- read_csv("obsExpData.csv")
Remember that the probability density function of a normal distribution is given as follows: \[\Huge f(x) = \lambda e^{-\lambda x}\]
Describe the following function.
LL_exp <- function(mu = 1, sample_data){
lambda <- 1/mu
n <- sum(!is.na(sample_data)) #sample size
this_sum <- sum(sample_data, na.rm = TRUE)
loglikelihood <- n*log(lambda) - this_sum * lambda
loglikelihood #return
}
This function finds the log-likelihood of an exponential function with a mean of 1 and some sample data. Lambda is assigned to 1 over the mean, and n is calculated as the sample size. this_sum is calculated as the sum of the sample data. The log-likelihood is found by multiplying the log(lambda) and sample size, subtracting the product of this_sum with lambda, then is returned.
Describe the graph and calculations in your own words.
# values to guess toward MLE
mu_values <- seq(1, 5, by = 0.01)
# compute log-likelihood for all data, for all lambda values
LL_values <- sapply(mu_values, function(x){ LL_exp(x, data_obs) })
# save the lambdas and log-likelihoods in a data frame
df <- data.frame(mu_values, LL_values)
df %>%
ggplot(aes(x = mu_values, y = LL_values)) +
geom_vline(xintercept = mu_values[which.max(LL_values)],
color = "red", size = 2) +
geom_point(color = "blue") +
labs(title = "Maximum Likelihood Estimation",
subtitle = "using the log-likelihood",
caption = "Math 32",
x = "mu",
y = "log-likelihood") +
theme_bw()
The graph of likelihood increases logarithmically until reaching a max value a_values[which.max(LL_values)] before decreasing. The x-axis mu is given by the data set from values 1 to 5 with 0.01 step size. The y-axis is calculated as the log-likelihood stored in LL_values, with df as the data frame that graphs the curve.
Use a line of R code to empirically determine the maximum likelihood choice for \(a\) from your likelihood.
a_values[which.max(LL_values)]
## [1] 3.16
max(LL_values)
## [1] -2150.615
Reflection: write at least two sentences below about this assignment. This assignment significantly increased my understanding of maximum likelihood estimation. Calculating log-likelihoods using R-Studio visually definitely helps to further learn how the function and idea works.
Knit button (at top of script editor) and knit to HTMLCatCourses page