X Wang - Assignment 3 - ALTERNATIVE DEMO ONLY

PSCI 7108: Advanced Data Analysis III

1: Intuitive Maximum Likelihood

Describe the following intuitively using words:

a: Why the maximum of the likelihood is a useful point estimate.

The maximum of the likelihood is a useful point estimate because that is the value of likehood that is most likely given the dataset. Put differently, since likelihood is a relative measure of uncertainty, the maximum of the likehood says it is the highest likely parameter estimate given the assumed distribution for the given data.

b: Why we maximizes likelihood the way we do (set derivative to 0, solve for \( \theta \), check sign of second derivative).
The derivative of a function is the rate of change of that function, therefore at maximums and minimums, the second derivative is 0 because the slope at these exact points are 0. We solve for \( \theta \) at that point to find what parameter values produced that maximum or minimum. Since we are interested in finding the maximum, the sign of the second derivative should be negative, indicating that the function is pointing downward, therefore the point of interest is a maximum.

c: What curvature near the maximum tells us.
Curvature tells us the amount of certainty in the maximum likelihood estimation (MLE). A more curved function near the maximum tells us that there is less certainty about the MLE, which is why standard errors will be bigger. Conversely, when the function is more peaked, there is more certinty around the MLE and standard errors will be smaller.

2: Finding the MLE

Assume we have some data \( x_{1}, ... , x_{n} \) which are idd from an exponentional distribution. The probability density function is: \( \lambda exp(-\lambda x) \). Write out the following in \( \LaTeX \):

a: The likelihood function

The (sample) likelihood function is the joint density:
\( \mathcal{L}(\lambda|{\boldsymbol{x}}) = f(x_{1},..., x_{n}|\lambda) = \prod_{i=1}^{n}f(x_{i}|\lambda) \)
\( =\prod_{i=1}^{n}\lambda\exp(-\lambda x_{i}) \)
\( =\lambda^{n}\exp(-\lambda\sum_{i=1}^{n}x_{i}) \)
\( =\lambda^{n}\exp(-\lambda n\bar{x}) \), where \( \bar{x} \) is the sample mean

b: The log-likelihood
\( \ln\mathcal{L}(\lambda|\boldsymbol{x})=\ln(\lambda^{n}\exp(-\lambda n\bar{x})) \)
\( =\ln(\lambda^{n})+(-\lambda n\bar{x}) \)
\( =n\ln\lambda-\lambda n\bar{x}=n(\ln\lambda-\lambda\bar{x}) \)

c: The score function
\( S(\lambda|\boldsymbol{x})=\frac{\partial}{\partial\lambda}\ln\mathcal{L}(\lambda|\boldsymbol{x})=n\frac{\partial}{\partial\lambda}(\ln\lambda-\lambda\bar{x}) \)
\( =n({\large{\frac{1}{\lambda}}}-\bar{x}) \)

d: The expression (formula) for the MLE
\( n({\large{\frac{1}{\lambda}}}-\bar{x})=0 \)
\( {\large{\frac{n}{\lambda}}}=n\bar{x} \)
\( \hat{\lambda}_{ML}={\large{\frac{1}{\bar{x}}}}={\large{\frac{n}{\sum_{i=1}^{n}x_{i}}}} \)

e: The expression for the second derivative and the Fisher information. Is the point found a maximum? How do we know?
The expression for the Hessian (second derivative) is:
\( {\large{\frac{\partial^{2}}{\partial\lambda^{2}}}}={\large{-\frac{n}{\lambda^{2}}}} \)

The expression for the Fisher information is:
\( I(\lambda|\boldsymbol{x})={\large{-\frac{\partial^{2}}{\partial\lambda^{2}}}}={\large{\frac{n}{\lambda^{2}}}} \)

The point found is a maximum because both \( n \) and \( \lambda^{2} \) are positive, therefore the second derivative is negative. This means that \( \hat{\lambda}_{ML} \) found in question 2d is a maximum.

3: An Application

With the same exponential setup as in question 2, assume we have the following data \( x \): 5, 0, 1, 1, 0, 3, 2, 3, 4, 1. Based on these data and the derivations above, report the following:

a: The MLE


From question 2d, the MLE is: \( \hat{\lambda}_{ML}={\large{\frac{1}{\bar{x}}}} \).

# MLE formula:
x <- c(5, 0, 1, 1, 0, 3, 2, 3, 4, 1)
MLE <- 1/mean(x)
MLE
## [1] 0.5

\( \hat{\lambda}_{ML} \) = 0.5.

b: The standard error

The standard error is: \( se(\hat{\lambda}) = \sqrt{var(\hat{\lambda})}=\sqrt{I^{-1}(\hat{\lambda})} \). From question 2e, \( I(\hat{\lambda})={\large{\frac{n}{\hat{\lambda}^{2}}}} \).

I.MLE <- length(x)/MLE^2
SE <- sqrt(solve(I.MLE))
SE
##        [,1]
## [1,] 0.1581

\( se(\hat{\lambda}) \) = 0.1581.

c: The z-statistic, and its associated p-value, for the null hypothes is that \( \lambda \) = 0.75. Would you reject or fail to reject the null hypothesis?

The z-statistic is: \( z = {\large\frac{\hat{\lambda}-\lambda_0}{se(\hat{\lambda})}} \).

# z-statistic
lambda.0 <- 0.75
z <- (MLE - lambda.0)/SE
z
##        [,1]
## [1,] -1.581
# p-value
p <- pnorm(z)
p
##         [,1]
## [1,] 0.05692

\( z \) = -1.581.
p-value = 0.0569.
At the 5% significance level, the p-value is not statistically significant, therefore I would fail to reject the null hypothesis.

d: A 95% confidence interval.

Wald confidence interval = \( \hat{\lambda}\pm1.96se(\hat{\lambda)} \).

CI.part <- 1.96 * SE
CI.high <- MLE + CI.part
CI.low <- MLE - CI.part
CI.high
##        [,1]
## [1,] 0.8099
CI.low
##        [,1]
## [1,] 0.1901

Wald confidence interval = 0.5 \( \pm \) 0.3099 = 0.1901 to 0.8099.

e: Create a function for the likelihood. Does the number found seem to correspond to the maximum of the likelihood produced by the function? Also, show that the maximum is the same even if the log of the likelihood is used.

The likelihood function from question 2a: \( \mathcal{L}(\lambda)=\lambda^{n}\exp (-\lambda\sum_{i=1}^{n}x_{i}) \).
The log-likelihood function from question 2b: \( \ln\mathcal{L}(\lambda)=n(\ln\lambda-\lambda\bar{x}) \).

# Install and load libraries
install.packages("ggplot2")
## Error: trying to use CRAN without setting a mirror
library("ggplot2")

# Likelihood function
LikelihoodExp <- function(lambda, data) {
    return(lambda^(length(data)) * exp(-lambda * sum(data)))
}
lambda <- seq(0, 2, by = 0.01)  # Many possible values of lambda
likelihood.x <- LikelihoodExp(lambda, x)  # Value of the likelihood given lambda
lambda[which.max(likelihood.x)]  # MLE for likelihood function, checks out
## [1] 0.5

# Plot the two functions
likelihood.plot <- qplot(lambda, likelihood.x, geom = "point", main = "Likelihood Function", 
    xlab = expression(lambda), ylab = "Value of the Likelihood")
likelihood.plot + geom_vline(xintercept = lambda[which.max(likelihood.x)], colour = "purple")  # Likelihood function

plot of chunk unnamed-chunk-5

log.plot <- qplot(lambda, log(likelihood.x), geom = "point", main = "Log-Likelihood Function", 
    xlab = expression(lambda), ylab = "Value of the Likelihood")
log.plot + geom_vline(xintercept = lambda[which.max(likelihood.x)], colour = "purple")  # Log-likelihood function

plot of chunk unnamed-chunk-5

The MLE found in question 3a, 0.5, corresponds to the maximum of the likelihood produced by the function. It also corresponds to the maximum of the log of the likelihood.

f: Add 400 more data points to x but make sure the MLE stays the same as before. Then reproduce the graphs from Question 3e. What changed and why?

# Add 400 more data points with mean of 2 to x
y <- rep(2, 400)  # Vector of length 400 populated by 2's
x2 <- c(x, y)

# Replot graphs from 3e
likelihood.x2 <- LikelihoodExp(lambda, x2)
likelihood.plot2 <- qplot(lambda, likelihood.x2, geom = "point", main = "Likelihood Function with 400 More Data Points", 
    xlab = expression(lambda), ylab = "Value of the Likelihood")
likelihood.plot2 + geom_vline(xintercept = lambda[which.max(likelihood.x2)], 
    colour = "purple")

plot of chunk unnamed-chunk-6

log.plot2 <- qplot(lambda, log(likelihood.x2), geom = "point", main = "Log-Likelihood Function with 400 More Data Points", 
    xlab = expression(lambda), ylab = "Value of the Likelihood")
log.plot2 + geom_vline(xintercept = lambda[which.max(likelihood.x2)], colour = "purple")

plot of chunk unnamed-chunk-6


# Alternative Approach through Log Function
LogExp <- function(lambda, data) {
    return(length(data) * (log(lambda) - lambda * mean(data)))
}
log.x2alt <- LogExp(lambda, x2)
log.plot2alt <- qplot(lambda, log.x2alt, geom = "point", main = "Log-Likelihood Function w/ 400 More Data Points", 
    xlab = expression(lambda), ylab = "Value of the Likelihood")
log.plot2alt + geom_vline(xintercept = 0.5, colour = "purple")

plot of chunk unnamed-chunk-6

log.plot2alt2 <- qplot(lambda, log(exp(log.x2alt)), geom = "point", main = "Log-Likelihood Function w/ 400 More Data Points", 
    xlab = expression(lambda), ylab = "Value of the Likelihood")
log.plot2alt2 + geom_vline(xintercept = 0.5, colour = "purple")

plot of chunk unnamed-chunk-6

For both the likelihood and log-likelihood, the MLE remained the same because the mean remained the same. However, with the larger dataset, both functions show a tighter peak. This makes sense since \( n \), the number of data points, is larger, therefore there is more certainty in the MLE. In both cases, the scale of the value of the likelihood changes.