Instructions
Examples
- Maximum Likelihood Estimate with the Poisson Distribution
Exercises
- 1. MLE over an Uniform Distribution
  - 1A
  - 1B
- 2: MLE over an Exponential Distribution
  - 2A
  - 2B
  - 2C
Wrap-Up
- Reflection
- Submission

knitr::opts_chunk$set(echo = TRUE)
library("tidyverse") #to load ggplot() and read_csv() functions
data_obs <- rexp(1000, rate = 1/3.2)

Instructions

fill in your full name in the author field above
save as this file as a separate *.rmd file, and replace “template” with your full name
read through the examples below
finish the exercises

Examples

Maximum Likelihood Estimate with the Poisson Distribution

Synthetic Data

Source: https://cmdlinetips.com/2019/03/introduction-to-maximum-likelihood-estimation-in-r/

# generate data from Poisson distribution with the parameter lambda = 5
N <- 100 #sample size
data_pois <- rpois(n = N, lambda = 5)

# convert to data frame
df_Poisson <- data.frame(data_pois)

# plot data
df_Poisson %>% 
  ggplot(aes(x=data_pois))+ 
  geom_histogram(bins = N) +
  labs(title = "Poisson Distribution",
       subtitle = "lambda = 5", 
       caption = "Math 32",
       x = "data",
       y = "count") +
  theme_bw()

Likelihood

# likelihood of a single data point, guessing lambda = 1
dpois(data_pois[1], lambda = 1)

## [1] 0.06131324

# likelihood of a single data point, over a sequence of guesses for lambda
lambda_values <- seq(1, 20)
likelihood <- dpois(data_pois[1], lambda = lambda_values)

# arranged into a data frame
lh_single <- data.frame(lambda_values = lambda_values, likelihood = likelihood )

lh_single %>% 
  ggplot(aes(x = lambda_values,y = likelihood))+
  geom_point(size = 4, color = "blue")+
  labs(title = "Likelihood of a single data point over multiple lambda values",
       subtitle = "(we will look for a maximum)",
       caption = "Math 32",
       x = "lambda",
       y = "likelihood") +
  theme_bw()

Log-Likelihood

# log-likelihood of a single data point, guessing lambda = 1
dpois(data_pois[1], lambda = 1, log = TRUE)

## [1] -2.791759

# log-likelihood of a single data point, over a sequence of guesses for lambda
lambda_values <- seq(1, 20)
log_likelihood <- dpois(data_pois[1], lambda = lambda_values, log = TRUE)

# arranged into a data frame
lh_single <- data.frame(lambda_values = lambda_values, log_likelihood = log_likelihood)

lh_single %>% 
  ggplot(aes(x = lambda_values,y = log_likelihood))+
  geom_point(size = 4, color = "blue")+
  labs(title = "Log-likelihood of a single data point over multiple lambda values",
       subtitle = "(we will look for a maximum)",
       caption = "Math 32",
       x = "lambda",
       y = "log-likelihood") +
  theme_bw()

MLE over all values

LL_poisson <- function(lambda, y){
  # log-likelihood by summing 
  LL <- sum(dpois(y, lambda, log=TRUE))
  return(LL)
}
lambda_values <- seq(1, 20, by = 0.01)

# compute log-likelihood for all data, for all lambda values
LL <- sapply(lambda_values, function(x){ LL_poisson(x, data_pois) })

# save the lambdas and log-likelihoods in a data frame
df <- data.frame(LL, lambda_values)

df %>% 
  ggplot(aes(x = lambda_values,y = LL)) +
  geom_point(color = "blue") +
  geom_vline(xintercept = lambda_values[which.max(LL)], color = "red", size = 2) +
  labs(title = "Maximum Likelihood Estimation",
       subtitle = "using the log-likelihood (original data had lambda = 5)",
       caption = "Math 32",
       x = "lambda",
       y = "log-likelihood") +
  theme_bw()

# the MLE value itself
lambda_values[which.max(LL)]

## [1] 5.01

Exercises

1. MLE over an Uniform Distribution

Here we generate \(n\) independent uniformly distributed random numbers (\(x_1, x_2, \dots, x_n\)) on an interval from \([0,a]\) where you specify the value of \(a\).

a <-32
N <- 100
data_unif <- runif(N, 0, a)
likelihood_unif <- function(a, data){
  temp <- dunif(data, 0, a)
  return(prod(temp, na.rm = TRUE))
}

1A

Describe the graph and calculations in your own words.

# compute likelihood for all data, for many guesses at parameter a
a_values <- seq(1, 50, by = 0.01)
LH <- sapply(a_values, function(x){ likelihood_unif(x, data_unif) })
df <- data.frame(a_values, LH)

# graph the likelihood curve
df %>% 
  ggplot(aes(x = a_values, y = LH)) +
  geom_vline(xintercept = a_values[which.max(LH)], color = "red", size = 2) +
  geom_point(color = "blue") +
  labs(title = "Maximum Likelihood Estimation",
       subtitle = "using the likelihood (original data had a = 32)",
       caption = "Math 32",
       x = "a",
       y = "likelihood") +
  theme_bw()

The graph chooses data points between values 1 and 50 with 0.01 step-size. The variable LH calculates and stores likelihood, and the variable df is the data frame used to graph the likelihood using LH and a-samples. The graph likelihood is a constant 0 except for values approaching 32 from the right, which increases to a random value then drops back to 0 on the left side.

1B

Use a line of R code to empirically determine the maximum likelihood choice for \(a\) from your likelihood.

a_values[which.max(LH)]

## [1] 31.85

max(LH)

## [1] 4.887157e-151

2: MLE over an Exponential Distribution

The data file obsExpData.csv contains a data set of 1000 independent points sampled from an exponential distribution with unknown parameter \(\lambda > 0\).

NOTE: If this next code block does not work for you, set eval = FALSE and re-run all of the R code blocks. However, you will need to be able to debug this error for next week’s homework (“put everything in your Math 32 folder”).

data_obs <- read_csv("obsExpData.csv")

Remember that the probability density function of a normal distribution is given as follows: \[\Huge f(x) = \lambda e^{-\lambda x}\]

2A

Describe the following function.

LL_exp <- function(mu = 1, sample_data){
  lambda <- 1/mu
  n <- sum(!is.na(sample_data)) #sample size
  this_sum <- sum(sample_data, na.rm = TRUE)
  loglikelihood <- n*log(lambda) - this_sum * lambda
  loglikelihood #return
}

This function finds the log-likelihood of an exponential function with a mean of 1 and some sample data. Lambda is assigned to 1 over the mean, and n is calculated as the sample size. this_sum is calculated as the sum of the sample data. The log-likelihood is found by multiplying the log(lambda) and sample size, subtracting the product of this_sum with lambda, then is returned.

2B

Describe the graph and calculations in your own words.

# values to guess toward MLE
mu_values <- seq(1, 5, by = 0.01)

# compute log-likelihood for all data, for all lambda values
LL_values <- sapply(mu_values, function(x){ LL_exp(x, data_obs) })

# save the lambdas and log-likelihoods in a data frame
df <- data.frame(mu_values, LL_values)

df %>% 
  ggplot(aes(x = mu_values, y = LL_values)) +
  geom_vline(xintercept = mu_values[which.max(LL_values)],
             color = "red", size = 2) +
  geom_point(color = "blue") +
  labs(title = "Maximum Likelihood Estimation",
       subtitle = "using the log-likelihood",
       caption = "Math 32",
       x = "mu",
       y = "log-likelihood") +
  theme_bw()

The graph of likelihood increases logarithmically until reaching a max value a_values[which.max(LL_values)] before decreasing. The x-axis mu is given by the data set from values 1 to 5 with 0.01 step size. The y-axis is calculated as the log-likelihood stored in LL_values, with df as the data frame that graphs the curve.

2C

Use a line of R code to empirically determine the maximum likelihood choice for \(a\) from your likelihood.

a_values[which.max(LL_values)]

## [1] 3.16

max(LL_values)

## [1] -2150.615

Wrap-Up

Reflection

Reflection: write at least two sentences below about this assignment. This assignment significantly increased my understanding of maximum likelihood estimation. Calculating log-likelihoods using R-Studio visually definitely helps to further learn how the function and idea works.

Submission

Use the Knit button (at top of script editor) and knit to HTML
upload today’s HTML back to our CatCourses page

MLE Computer Homework

Jason Petersen

03 December, 2021