Problem #2
loglikelihood <- function(lam, total, n){
total * log(lam) + (-n * lam)
}
lamvals <- seq(.1,40, by = .1)
logvals <- loglikelihood(lamvals,100,15)
plot(lamvals,logvals,type = "l", xlab = expression(lambda), ylab = "log likelihood", main = "Log Likelihood graph for Problem #2")
Problem #3
swiss <- datasets::swiss
attach(swiss)
allmodel <- lm(Infant.Mortality ~ Fertility + Agriculture + Examination + Education + Catholic)
summary(allmodel)
##
## Call:
## lm(formula = Infant.Mortality ~ Fertility + Agriculture + Examination +
## Education + Catholic)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.2512 -1.2860 0.1821 1.6914 6.0937
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.667e+00 5.435e+00 1.595 0.11850
## Fertility 1.510e-01 5.351e-02 2.822 0.00734 **
## Agriculture -1.175e-02 2.812e-02 -0.418 0.67827
## Examination 3.695e-02 9.607e-02 0.385 0.70250
## Education 6.099e-02 8.484e-02 0.719 0.47631
## Catholic 6.711e-05 1.454e-02 0.005 0.99634
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.683 on 41 degrees of freedom
## Multiple R-squared: 0.2439, Adjusted R-squared: 0.1517
## F-statistic: 2.645 on 5 and 41 DF, p-value: 0.03665
no.agrmodel <- lm(Infant.Mortality ~ Fertility + Examination + Education + Catholic)
summary(no.agrmodel)
##
## Call:
## lm(formula = Infant.Mortality ~ Fertility + Examination + Education +
## Catholic)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.0910 -1.4349 0.0376 1.6891 6.4731
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.108483 3.915065 1.816 0.07656 .
## Fertility 0.160357 0.048082 3.335 0.00179 **
## Examination 0.048467 0.091119 0.532 0.59760
## Education 0.078839 0.072578 1.086 0.28356
## Catholic -0.001908 0.013611 -0.140 0.88920
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.656 on 42 degrees of freedom
## Multiple R-squared: 0.2407, Adjusted R-squared: 0.1684
## F-statistic: 3.328 on 4 and 42 DF, p-value: 0.01866
First a Likelihood Ratio Test
logall <- logLik(allmodel)
logagr <- logLik(no.agrmodel)
logall
## 'log Lik.' -109.8608 (df=7)
logagr
## 'log Lik.' -109.9607 (df=6)
LRT <- 2*((-109.8608) - (-109.9607))
pval <- pchisq(LRT, df = 1, lower.tail = FALSE)
pval
## [1] 0.6548823
This pvalue is pretty large which means that I should reject the null hypothesis and we should use the the nested model instead of the more complex one. Essentially the larger model adds no benefits when compared to the smaller model that doesn’t contain the variable of ‘Agriculture’.
Next, a Wald Test
library(lmtest)
## Warning: package 'lmtest' was built under R version 3.6.3
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.6.3
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
waldtest(no.agrmodel, allmodel)
## Wald test
##
## Model 1: Infant.Mortality ~ Fertility + Examination + Education + Catholic
## Model 2: Infant.Mortality ~ Fertility + Agriculture + Examination + Education +
## Catholic
## Res.Df Df F Pr(>F)
## 1 42
## 2 41 1 0.1746 0.6783
This large p value indicates there no reason to drop the Agriculture variable.
ANOVA/F-Test
p.anova <- c(anova(allmodel, no.agrmodel)[6])
p.anova
## $`Pr(>F)`
## [1] NA 0.6782676
This large p value tells me the removal of the agriculture variable does not make a better model.
AIC
AIC(allmodel)
## [1] 233.7217
AIC(no.agrmodel)
## [1] 231.9214
Typically, a difference of 10 between two AICs indicates a clear better model fit, but we do not have this here.
BIC
BIC(allmodel)
## [1] 246.6727
BIC(no.agrmodel)
## [1] 243.0222
Same 10 point difference applies here and again we see no major difference so, dropping agriculture is not justified.