Week 2 Homework

Question 2.1

1)

To find the likelihood function ℒ(𝜇; 𝑦_1, ⋯ , 𝑦_𝑛), we need to multiply the probability functions of each observation in the sample. Given the exponential probability function, 𝒫(𝑦; 𝜇) = (1/𝜇)*exp(-𝑦/𝜇), we can write the likelihood function as:

ℒ(𝜇; 𝑦_1, ⋯ , 𝑦_𝑛) = 𝒫(𝑦_1; 𝜇) * 𝒫(𝑦_2; 𝜇) * … * 𝒫(𝑦_𝑛; 𝜇)

Now, substitute the exponential probability function into the likelihood function:

ℒ(𝜇; 𝑦_1, ⋯ , 𝑦_𝑛) = (1/𝜇) * exp(-𝑦_1/𝜇) * (1/𝜇) * exp(-𝑦_2/𝜇) * … * (1/𝜇) * exp(-𝑦_𝑛/𝜇)

Simplify the expression:

ℒ(𝜇; 𝑦_1, ⋯ , 𝑦_𝑛) = (1/𝜇^n) * exp(-(𝑦_1 + 𝑦_2 + … + 𝑦_𝑛)/𝜇)

Let 𝑆 = 𝑦_1 + 𝑦_2 + … + 𝑦_𝑛 be the sum of all the observations in the sample, and let 𝑦̅ = 𝑆/𝑛 be the sample mean. Then, the likelihood function becomes:

ℒ(𝜇; 𝑦_1, ⋯ , 𝑦_𝑛) = (1/𝜇^n) * exp(-𝑛*𝑦̅/𝜇)

Now, let’s find the log-likelihood function ℓ(𝜇; 𝑦_1, ⋯ , 𝑦_𝑛) by taking the natural logarithm of the likelihood function:

ℓ(𝜇; 𝑦_1, ⋯ , 𝑦_𝑛) = ln(ℒ(𝜇; 𝑦_1, ⋯ , 𝑦_𝑛)) = ln((1/𝜇^n) * exp(-𝑛*𝑦̅/𝜇))

Use the properties of logarithms to simplify the expression:

ℓ(𝜇; 𝑦_1, ⋯ , 𝑦_𝑛) = ln(1/𝜇^n) + ln(exp(-𝑛𝑦̅/𝜇)) = -𝑛 ln(𝜇) - (𝑛*𝑦̅/𝜇)

So, the log-likelihood function is:

ℓ(𝜇; 𝑦_1, ⋯ , 𝑦_𝑛) = -𝑛 * ln(𝜇) - (𝑛*𝑦̅/𝜇)

2)

The score function 𝑈(𝜇) is the first derivative of the log-likelihood function ℓ(𝜇; 𝑦_1, ⋯ , 𝑦_𝑛) with respect to the unknown parameter 𝜇. Given the log-likelihood function we derived earlier:

ℓ(𝜇; 𝑦_1, ⋯ , 𝑦_𝑛) = -𝑛 * ln(𝜇) - (𝑛*𝑦̅/𝜇)

Let’s differentiate this with respect to 𝜇:

𝑈(𝜇) = dℓ(𝜇)/d𝜇

First, differentiate the term -𝑛 * ln(𝜇):

d(-𝑛 * ln(𝜇))/d𝜇 = -𝑛 * (1/𝜇)

Next, differentiate the term -(𝑛*𝑦̅/𝜇):

d(-(𝑛𝑦̅/𝜇))/d𝜇 = 𝑛𝑦̅/𝜇^2

Now, add these two terms together to find the score function:

𝑈(𝜇) = -𝑛 * (1/𝜇) + 𝑛*𝑦̅/𝜇^2

So, the score function is:

𝑈(𝜇) = -𝑛 * (1/𝜇) + 𝑛*𝑦̅/𝜇^2

3)

To find the maximum likelihood estimate (MLE) of 𝜇, denoted as 𝜇̂, we need to set the score function 𝑈(𝜇) to zero and solve for 𝜇. From the score function we derived earlier:

𝑈(𝜇) = -𝑛 * (1/𝜇) + 𝑛*𝑦̅/𝜇^2

Set 𝑈(𝜇) to zero:

0 = -𝑛 * (1/𝜇) + 𝑛*𝑦̅/𝜇^2

Now, we’ll solve for 𝜇:

𝑛 * (1/𝜇) = 𝑛*𝑦̅/𝜇^2

Cancel out the 𝑛 term:

1/𝜇 = 𝑦̅/𝜇^2

Now, cross-multiply to eliminate the fractions:

𝜇^2 = 𝑦̅*𝜇

Since 𝜇 > 0 (as given in the problem context), we can safely divide both sides by 𝜇:

𝜇 = 𝑦̅

So, the maximum likelihood estimate (MLE) for 𝜇, denoted as 𝜇̂, is equal to the sample mean:

𝜇̂ = 𝑦̅

4)

The observed information for 𝜇, denoted as 𝐼(𝜇), is the negative of the second derivative of the log-likelihood function ℓ(𝜇; 𝑦_1, ⋯ , 𝑦_𝑛) with respect to the unknown parameter 𝜇. Given the log-likelihood function we derived earlier:

ℓ(𝜇; 𝑦_1, ⋯ , 𝑦_𝑛) = -𝑛 * ln(𝜇) - (𝑛*𝑦̅/𝜇)

Let’s differentiate it twice with respect to 𝜇:

First, we already found the first derivative (the score function 𝑈(𝜇)):

𝑈(𝜇) = -𝑛 * (1/𝜇) + 𝑛*𝑦̅/𝜇^2

Now, differentiate 𝑈(𝜇) with respect to 𝜇:

d𝑈(𝜇)/d𝜇 = 𝑛/𝜇^2 - 2𝑛*𝑦̅/𝜇^3

The observed information 𝐼(𝜇) is the negative of the second derivative of the log-likelihood function:

𝐼(𝜇) = -d²ℓ(𝜇)/d𝜇² = -d𝑈(𝜇)/d𝜇

Now, plug in the second derivative:

𝐼(𝜇) = - (𝑛/𝜇^2 - 2𝑛*𝑦̅/𝜇^3)

Simplify the expression:

𝐼(𝜇) = -𝑛/𝜇^2 + 2𝑛*𝑦̅/𝜇^3

So, the observed information for 𝜇 is:

𝐼(𝜇) = -𝑛/𝜇^2 + 2𝑛*𝑦̅/𝜇^3

5)

The expected information for 𝜇, also known as the Fisher information, is the expected value of the observed information 𝐼(𝜇) with respect to the distribution of the data. In this case, we’re working with the exponential distribution.

Since the observed information 𝐼(𝜇) is:

𝐼(𝜇) = -𝑛/𝜇^2 + 2𝑛*𝑦̅/𝜇^3

Let’s compute the expected information for 𝜇, denoted as 𝐼_E(𝜇):

𝐼_E(𝜇) = E[𝐼(𝜇)]

Now, recall that for the exponential distribution, the mean and variance are both equal to 𝜇:

E[𝑦] = 𝜇 Var(𝑦) = 𝜇^2

So, the expected value of 𝑦̅ (sample mean) is 𝜇:

E[𝑦̅] = 𝜇

Now, substitute the expected value of 𝑦̅ into the observed information 𝐼(𝜇):

𝐼_E(𝜇) = -𝑛/𝜇^2 + 2𝑛*E[𝑦̅]/𝜇^3

Plug in E[𝑦̅] = 𝜇:

𝐼_E(𝜇) = -𝑛/𝜇^2 + 2𝑛*𝜇/𝜇^3

Simplify the expression:

𝐼_E(𝜇) = -𝑛/𝜇^2 + 2𝑛/𝜇^2

Now, combine the terms:

𝐼_E(𝜇) = 𝑛/𝜇^2

So, the expected information for 𝜇 is:

𝐼_E(𝜇) = 𝑛/𝜇^2

6)

To find the estimated standard error for 𝜇̂, denoted as 𝑠𝑒(𝜇̂), we need to find the square root of the inverse of the expected information for 𝜇. We have previously found the expected information for 𝜇:

𝐼_E(𝜇) = 𝑛/𝜇^2

Now, let’s find the inverse of 𝐼_E(𝜇):

𝐼_E(𝜇)^(-1) = 𝜇^2/𝑛

The estimated variance of 𝜇̂ is given by the inverse of the expected information for 𝜇:

Var(𝜇̂) = 𝜇^2/𝑛

Now, we’ll take the square root of the variance to find the estimated standard error:

𝑠𝑒(𝜇̂) = √(Var(𝜇̂)) = √(𝜇^2/𝑛)

Recall that we have found the maximum likelihood estimate (MLE) for 𝜇:

𝜇̂ = 𝑦̅

Substitute 𝜇̂ for 𝜇 in the standard error formula:

𝑠𝑒(𝜇̂) = √(𝜇̂^2/𝑛)

So, the estimated standard error for 𝜇̂ is:

𝑠𝑒(𝜇̂) = 𝜇̂/√𝑛

7)

The Wald test statistic is used to test a hypothesis about a parameter in a statistical model. In this case, we are testing the null hypothesis 𝐻_0: 𝜇 = 1.

The Wald test statistic is given by the following formula:

𝑊 = (𝜃̂ - 𝜃_0) / 𝑠𝑒(𝜃̂)

where 𝜃̂ is the maximum likelihood estimate (MLE) of the parameter (in this case, 𝜇̂), 𝜃_0 is the hypothesized value of the parameter under the null hypothesis (in this case, 1), and 𝑠𝑒(𝜃̂) is the estimated standard error of the MLE.

We have already found the MLE for 𝜇:

𝜇̂ = 𝑦̅

And the estimated standard error for 𝜇̂:

𝑠𝑒(𝜇̂) = 𝜇̂/√𝑛

Now, we’ll substitute these values into the Wald test statistic formula:

𝑊 = (𝜇̂ - 1) / (𝜇̂/√𝑛)

Now, multiply both numerator and denominator by √𝑛 to eliminate the fraction in the denominator:

𝑊 = (√𝑛 * (𝜇̂ - 1)) / 𝜇̂

Now, simplify the expression:

𝑊 = (√𝑛 * 𝜇̂ - √𝑛) / 𝜇̂

So, the Wald test statistic for testing 𝐻_0: 𝜇 = 1 is:

𝑊 = (𝜇̂ - 1) / (√𝜇̂^2/𝑛)

8)

The likelihood ratio test statistic is used to test a hypothesis about a parameter in a statistical model. In this case, we are testing the null hypothesis 𝐻_0: 𝜇 = 1.

The likelihood ratio test statistic is given by the following formula:

𝐿 = 2 * (ℓ(𝜃̂) - ℓ(𝜃_0))

where 𝜃̂ is the maximum likelihood estimate (MLE) of the parameter (in this case, 𝜇̂), 𝜃_0 is the hypothesized value of the parameter under the null hypothesis (in this case, 1), and ℓ(𝜃) is the log-likelihood function for the parameter 𝜃.

We have already found the MLE for 𝜇:

𝜇̂ = 𝑦̅

And the log-likelihood function:

ℓ(𝜇; 𝑦_1, ⋯ , 𝑦_𝑛) = -𝑛 * ln(𝜇) - (𝑛 * 𝑦̅/𝜇)

Now, let’s evaluate the log-likelihood function at 𝜇̂:

ℓ(𝜇̂) = -𝑛 * ln(𝜇̂) - (𝑛 * 𝑦̅/𝜇̂)

Since 𝜇̂ = 𝑦̅:

ℓ(𝜇̂) = -𝑛 * ln(𝑦̅) - 𝑛

Next, evaluate the log-likelihood function at 𝜇_0 = 1:

ℓ(1) = -𝑛 * ln(1) - (𝑛 * 𝑦̅/1)

Since ln(1) = 0:

ℓ(1) = -𝑛 * 𝑦̅

Now, let’s substitute these values into the likelihood ratio test statistic formula:

𝐿 = 2 * (ℓ(𝜇̂) - ℓ(1)) = 2 * (-𝑛 * ln(𝑦̅) - 𝑛 - (-𝑛 * 𝑦̅))

Simplify the expression:

𝐿 = 2 * (𝑛 * 𝑦̅ - 𝑛 * ln(𝑦̅) - 𝑛)

Factor out 𝑛:

𝐿 = 2𝑛(𝑦̅ - ln(𝑦̅) - 1)

Since 𝜇̂ = 𝑦̅:

𝐿 = 2𝑛(𝜇̂ - log(𝜇̂) - 1)

So, the likelihood ratio test statistic for testing 𝐻_0: 𝜇 = 1 is:

𝐿 = 2𝑛(𝜇̂ - log(𝜇̂) - 1)

9)

# Load required libraries
library(ggplot2)

# Define the sample size
n <- 10

# Generate a sequence of mu_hat values
mu_hat <- seq(0.5, 2.0, by = 0.1)

# Calculate W^2 and LRT
W_sq <- ((mu_hat - 1) / (sqrt(mu_hat^2 / n)))^2
LRT <- 2 * n * (mu_hat - log(mu_hat) - 1)

# Combine the data into a data frame
data <- data.frame(mu_hat = mu_hat, W_sq = W_sq, LRT = LRT)

# Create a ggplot object
p <- ggplot(data) +
  geom_line(aes(x = mu_hat, y = W_sq, color = "W^2")) +
  geom_line(aes(x = mu_hat, y = LRT, color = "LRT")) +
  labs(x = expression(hat(mu)), y = "Test Statistic Value", title = "W^2 and LRT vs. Mu_hat") +
  scale_color_manual(values = c("W^2" = "red", "LRT" = "blue"), name = "Test Statistic") +
  theme_minimal()

# Print the plot
print(p)

The resulting plot shows the relationship between 𝑊^2 and LRT as 𝜇̂ varies. Both test statistics increase as the difference between the null hypothesis value (𝜇 = 1) and 𝜇̂ increases, indicating a stronger evidence against the null hypothesis.

10)

# Define the sample size
n <- 100

# Generate a sequence of mu_hat values
mu_hat <- seq(0.5, 2.0, by = 0.1)

# Calculate W^2 and LRT
W_sq <- ((mu_hat - 1) / (sqrt(mu_hat^2 / n)))^2
LRT <- 2 * n * (mu_hat - log(mu_hat) - 1)

# Combine the data into a data frame
data <- data.frame(mu_hat = mu_hat, W_sq = W_sq, LRT = LRT)

# Create a ggplot object
p <- ggplot(data) +
  geom_line(aes(x = mu_hat, y = W_sq, color = "W^2")) +
  geom_line(aes(x = mu_hat, y = LRT, color = "LRT")) +
  labs(x = expression(hat(mu)), y = "Test Statistic Value", title = "W^2 and LRT vs. Mu_hat") +
  scale_color_manual(values = c("W^2" = "red", "LRT" = "blue"), name = "Test Statistic") +
  theme_minimal()

# Print the plot
print(p)

The resulting plot shows the relationship between 𝑊^2 and LRT as 𝜇̂ varies for 𝑛 = 100. As with the previous plot, both test statistics increase as the difference between the null hypothesis value (𝜇 = 1) and 𝜇̂ increases, indicating stronger evidence against the null hypothesis. However, for 𝑛 = 100, the test statistics increase more sharply compared to 𝑛 = 10, as a larger sample size provides more evidence against the null hypothesis when the true value of 𝜇 is different from the hypothesized value.

Question 2.2

1)

To calculate the variance-covariance matrix of 𝛽̂0, 𝛽̂1, and 𝛽̂2, we need to find the inverse of the information matrix 𝐼(𝜷̂).

# Define the information matrix
information_matrix <- matrix(c(4823, 12334, 20871,
                               12334, 31798, 53423,
                               20871, 53423, 90348), nrow = 3, ncol = 3, byrow = TRUE)

# Calculate the variance-covariance matrix
var_cov_matrix <- solve(information_matrix)

# Print the variance-covariance matrix
print(var_cov_matrix)

##             [,1]         [,2]         [,3]
## [1,]  0.70524128  0.023890843 -0.177042229
## [2,]  0.02389084  0.005597557 -0.008828796
## [3,] -0.17704223 -0.008828796  0.046129512

The formula used to calculate the variance-covariance matrix is:

Var-Cov(𝜷̂) = 𝐼(𝜷̂)^(-1)

where 𝐼(𝜷̂) is the information matrix and 𝐼(𝜷̂)^(-1) is its inverse.

2)

To find the estimated standard error of 𝛽̂1, we need to take the square root of the corresponding diagonal element in the variance-covariance matrix.

# Calculate the standard error of beta_hat_1
se_beta_hat_1 <- sqrt(var_cov_matrix[2, 2])

# Print the standard error
print(se_beta_hat_1)

## [1] 0.07481683

3)

To find the 95% confidence interval of 𝛽̂1, we will use the estimated standard error we calculated previously, as well as the t-distribution critical value for a 95% confidence level.

The formula for the confidence interval is:

CI(𝛽̂1) = 𝛽̂1 ± t_α/2 * SE(𝛽̂1)

The degrees of freedom for the t-distribution in a GLM are usually calculated as n - p, where n is the number of observations and p is the number of parameters. However, we don’t have the number of observations in this case, so we will use the normal distribution critical value (z-score) as an approximation, which is 1.96 for a 95% confidence interval.

# Define the MLE of beta_1
beta_hat_1 <- 2.0

# Define the critical value (z-score) for a 95% confidence interval
z_critical <- 1.96

# Calculate the confidence interval for beta_hat_1
lower_bound_1 <- beta_hat_1 - z_critical * se_beta_hat_1
upper_bound_1 <- beta_hat_1 + z_critical * se_beta_hat_1

# Print the confidence interval
cat("95% confidence interval for beta_hat_1: (", lower_bound_1, ", ", upper_bound_1, ")\n")

## 95% confidence interval for beta_hat_1: ( 1.853359 ,  2.146641 )

4)

To find the estimated standard error of 𝛽̂1 − 𝛽̂2, we need to consider the covariance between 𝛽̂1 and 𝛽̂2 as well. The formula for the standard error of 𝛽̂1 − 𝛽̂2 is:

SE(𝛽̂1 − 𝛽̂2) = sqrt(Var(𝛽̂1) + Var(𝛽̂2) - 2 * Cov(𝛽̂1, 𝛽̂2))

where Var(𝛽̂1) and Var(𝛽̂2) are the variances of 𝛽̂1 and 𝛽̂2, and Cov(𝛽̂1, 𝛽̂2) is the covariance between 𝛽̂1 and 𝛽̂2. These values can be obtained from the variance-covariance matrix calculated earlier.

# Extract variances and covariance from the variance-covariance matrix
var_beta_hat_1 <- var_cov_matrix[2, 2]
var_beta_hat_2 <- var_cov_matrix[3, 3]
cov_beta_hat_1_2 <- var_cov_matrix[2, 3]

# Calculate the standard error of beta_hat_1 - beta_hat_2
se_beta_hat_1_2 <- sqrt(var_beta_hat_1 + var_beta_hat_2 - 2 * cov_beta_hat_1_2)

# Print the standard error
print(se_beta_hat_1_2)

## [1] 0.2634097

5)

To find the 95% confidence interval of 𝛽̂1 − 𝛽̂2, we will use the estimated standard error we calculated previously, as well as the normal distribution critical value (z-score) for a 95% confidence level, which is 1.96.

The formula for the confidence interval is:

CI(𝛽̂1 - 𝛽̂2) = (𝛽̂1 - 𝛽̂2) ± z_α/2 * SE(𝛽̂1 - 𝛽̂2)

where (𝛽̂1 - 𝛽̂2) is the difference between the MLEs of 𝛽1 and 𝛽2, z_α/2 is the critical value from the normal distribution for a 95% confidence level (α = 0.05), and SE(𝛽̂1 - 𝛽̂2) is the estimated standard error of 𝛽̂1 - 𝛽̂2.

# Define the MLEs of beta_1 and beta_2
beta_hat_1 <- 2.0
beta_hat_2 <- 1.0

# Define the critical value (z-score) for a 95% confidence interval
z_critical <- 1.96

# Calculate the difference between beta_hat_1 and beta_hat_2
beta_hat_diff <- beta_hat_1 - beta_hat_2

# Calculate the confidence interval for beta_hat_diff
lower_bound_2 <- beta_hat_diff - z_critical * se_beta_hat_1_2
upper_bound_2 <- beta_hat_diff + z_critical * se_beta_hat_1_2

# Print the confidence interval
cat("95% confidence interval for beta_hat_1 - beta_hat_2: (", lower_bound_2, ", ", upper_bound_2, ")\n")

## 95% confidence interval for beta_hat_1 - beta_hat_2: ( 0.483717 ,  1.516283 )

6)

Let’s summarize the 95% confidence intervals obtained in the previous two questions:

For 𝛽̂1, the 95% CI was calculated to be (1.853359, 2.146641).
For 𝛽̂1 - 𝛽̂2, the 95% CI was calculated to be (0.483717, 1.516283).

Now, we’ll comment on the two null hypotheses:

𝐻0: 𝛽̂1 = 1.5 To test this hypothesis at a significance level of 0.05, we need to check if the hypothesized value (1.5) is within the 95% confidence interval for 𝛽̂1. The value 1.5 is not within the interval (1.853359, 2.146641), we reject the null hypothesis.
𝐻0: 𝛽̂1 - 𝛽̂2 = 0 To test this hypothesis at a significance level of 0.05, we need to check if the hypothesized value (0) is within the 95% confidence interval for 𝛽̂1 - 𝛽̂2. The value 0 is not within the interval (0.483717, 1.516283), we reject the null hypothesis.

Question 2.3

1)

We are given that 𝒫(𝑦; 𝑝) = exp(𝑦 ∗ 𝑓1(𝑝) - 𝑓2(𝑝)). Let’s rewrite the geometric distribution’s probability mass function in terms of exponential functions:

𝒫(𝑦; 𝑝) = 𝑝(1 − 𝑝)^(𝑦-1)

Taking the natural logarithm of 𝑝 and (1 - 𝑝)^(𝑦-1):

log(𝑝) = log(exp(log(𝑝)))

(1 - 𝑝)^(𝑦-1) = exp((𝑦-1) * log(1-𝑝))

Now, let’s rewrite the geometric distribution in terms of exponentials:

𝒫(𝑦; 𝑝) = exp(log(𝑝) + (𝑦-1) * log(1-𝑝))

Now, we can see that 𝑓1(𝑝) = log(1-𝑝) and 𝑓2(𝑝) = -log(𝑝). So we have:

𝜃 = 𝑓1(𝑝) = log(1-𝑝) 𝜅(𝜃) = 𝑓2(𝑝) = -log(𝑝)

To rewrite 𝜅(𝜃) as a function of 𝜃, we need to express 𝑝 in terms of 𝜃:

𝜃 = log(1-𝑝) ⟹ exp(𝜃) = 1-𝑝 ⟹ 𝑝 = 1 - exp(𝜃)

Now, substitute 𝑝 in 𝜅(𝜃):

𝜅(𝜃) = -log(1 - exp(𝜃))

Since 𝜙 = 1, we can find 𝑎(𝑦,𝜙):

𝑎(𝑦,𝜙) = 𝑦 - 1

So, we have:

𝜃 = log(1-𝑝) 𝜙 = 1 𝜅(𝜃) = -log(1 - exp(𝜃)) 𝑎(𝑦,𝜙) = 𝑦 - 1

2)

We know that for the geometric distribution, 𝜇 = 𝐸[𝑦] = 1/𝑝. We also know that 𝜃 = log(1-𝑝). Our goal is to find the canonical link function, 𝑔(𝜇), such that 𝜃 = 𝑔(𝜇).

First, let’s express 𝑝 in terms of 𝜇:

1/𝑝 = 𝜇 ⟹ 𝑝 = 1/𝜇

Now, substitute this expression for 𝑝 in the equation for 𝜃:

𝜃 = log(1 - (1/𝜇))

Thus, the canonical link function for the geometric distribution is:

𝑔(𝜇) = log(1 - (1/𝜇))

3)

We know that for the geometric distribution, 𝑣𝑎𝑟[𝑦] = (1−𝑝)/𝑝^2. We also know that 𝜇 = 1/𝑝 and 𝜙 = 1. Our goal is to find the variance function, 𝑉(𝜇), such that 𝑣𝑎𝑟[𝑦] = 𝜙 ∗ 𝑉(𝜇).

First, let’s rewrite 𝑣𝑎𝑟[𝑦] in terms of 𝜇:

𝑣𝑎𝑟[𝑦] = (1−(1/𝜇))/(1/𝜇)^2

Now, since 𝜙 = 1, the variance function 𝑉(𝜇) is equal to 𝑣𝑎𝑟[𝑦]:

𝑉(𝜇) = (1−(1/𝜇))/(1/𝜇)^2

Thus, the variance function for the geometric distribution is:

𝑉(𝜇) = (1−(1/𝜇))/(1/𝜇)^2

4)

𝜅(𝜃) = -log(1 - exp(𝜃)) 𝑑𝜅(𝜃)/𝑑𝜃 = exp(𝜃)/(1 - exp(𝜃))

exp(𝜃) = 1-𝑝

𝑑𝜅(𝜃)/𝑑𝜃 = (1-𝑝)/𝑝

𝜇 = (1-𝑝)/𝑝

𝑝 = 1/(1+𝜇)

𝜇 = 1/𝑝

5)

𝑑𝜅(𝜃)/𝑑𝜃 = (1-𝑝)/𝑝

𝑑²𝜅(𝜃)/𝑑𝜃² = 𝑑((1-(1-exp(𝜃)))/((1-exp(𝜃))))/𝑑𝜃 = 𝑑(exp(𝜃)/(1-exp(𝜃)))/𝑑𝜃

Let u = exp(𝜃) and v = (1 - exp(𝜃)):

𝑑²𝜅(𝜃)/𝑑𝜃² = (v(𝑑u/𝑑𝜃) - u(𝑑v/𝑑𝜃))/v²

Since 𝑑u/𝑑𝜃 = exp(𝜃) and 𝑑v/𝑑𝜃 = -exp(𝜃):

𝑑²𝜅(𝜃)/𝑑𝜃² = ((1-exp(𝜃))(exp(𝜃)) - exp(𝜃)(-exp(𝜃)))/(1-exp(𝜃))^2 = (exp(2𝜃))/(1-exp(𝜃))^2

Now, let’s find 𝑣𝑎𝑟[𝑦] = 𝜙((𝑑²𝜅(𝜃)/𝑑𝜃²)):

Since 𝜙 = 1:

𝑣𝑎𝑟[𝑦] = (exp(2𝜃))/(1-exp(𝜃))^2

We know that 𝜃 = log(1-𝑝), so exp(𝜃) = 1-𝑝:

𝑣𝑎𝑟[𝑦] = (1-𝑝)^2/(𝑝2) = (1-𝑝)/𝑝^2

6)

To find the MLE of 𝑝 for a single observation 𝑦1, we need to maximize the likelihood function.

L(𝑝; 𝑦1) = 𝑝(1-𝑝)^(𝑦1-1)

ℓ(𝑝) = log(𝑝) + (𝑦1 - 1)log(1-𝑝)

Differentiating ℓ(𝑝) with respect to 𝑝:

dℓ(𝑝)/d𝑝 = 1/𝑝 - (𝑦1 - 1)/(1-𝑝)

Setting the derivative equal to zero:

1/𝑝 - (𝑦1 - 1)/(1-𝑝) = 0

Solving for 𝑝:

𝑝 = 1/𝑦1

The MLE of 𝑝 is 1/𝑦1.

Question 2.4

1)

# Load the data 
data(nambeware)
nambeware_data <- nambeware

# Create a scatter plot
plot(nambeware_data$Diam, nambeware_data$Price, xlab="Diam", ylab="Price", main="Price vs. Diam")

# Add a regression line
model <- lm(Price ~ Diam, data=nambeware_data)
abline(model, col="red")

Trend: The price tends to increase as the diameter increases, there’s a positive correlation between the two variables.
Dispersion: With lower diameters the price appears to be closely correlated with diameter, but as diameter increases we see an increase in the variance of dispersion of the points along our line.

2)

# Divide the data into age groups
groups <- cut(nambeware_data$Diam, breaks=c(-Inf, 8.25, 10.25, 12.25, 14.25, Inf), right = FALSE)

# Calculate group means, variances, and sample sizes
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

group_summary <- nambeware_data %>% 
  mutate(group = groups) %>% 
  group_by(group) %>% 
  summarise(
    mean_blocks = mean(Diam),
    var_blocks = var(Diam),
    sample_size = n()
  )

# Log-transform the group means and variances
group_summary$log_mean <- log(group_summary$mean_blocks)
group_summary$log_var <- log(group_summary$var_blocks)

# Plot log(group_variance) vs log(group_mean)
plot(group_summary$log_mean, group_summary$log_var, 
     xlab = "Log(Group Mean)", ylab = "Log(Group Variance)")

# Fit a linear model
lm_model <- lm(group_summary$log_var ~ group_summary$log_mean)

# Add the regression line to the plot
abline(lm_model, col="red")

# Convert the group_summary data.frame to a table
group_summary_table <- group_summary %>% 
  select(-log_mean, -log_var) %>% 
  rename(
    `Age Group` = group,
    `Sample Mean` = mean_blocks,
    `Sample Variance` = var_blocks,
    `Sample Size` = sample_size
  )

# Transpose the group_summary_table
group_summary_table_transposed <- as.data.frame(t(group_summary_table[,-1]))

# Set the column names as the Age Group values
colnames(group_summary_table_transposed) <- group_summary_table$`Age Group`

# Add row names for Sample Mean, Sample Variance, and Sample Size
rownames(group_summary_table_transposed) <- c("Sample Mean", "Sample Variance", "Sample Size")

# Print the transposed table
print(group_summary_table_transposed)

##                 [-Inf,8.25) [8.25,10.2) [10.2,12.2) [12.2,14.2) [14.2, Inf)
## Sample Mean        6.420000   9.2818182  11.1666667  13.1583333    17.15556
## Sample Variance    1.137429   0.3056364   0.2842424   0.5499242    11.09028
## Sample Size       15.000000  11.0000000  12.0000000  12.0000000     9.00000

3)

The table and plot of log(Group Variance) vs log(Group Mean) show a positive relationship between group mean and variance. A suitable Exponential Dispersion Model (EDM) for this is the Gamma distribution, with a variance function:

V(μ) = μ^2

This suggests that the variance increases quadratically with the mean, fitting the observed relationship in the data.

Week 2 Homework

Thomas DeWaters

2023-03-16

Question 2.1

1)

2)

3)

4)

5)

6)

7)

8)

9)

10)

Question 2.2

1)

2)

3)

4)

5)

6)

Question 2.3

1)

2)

3)

4)

5)

6)

Question 2.4

1)

2)

3)