Question 2
#This function builds the log likelihood function for a poisson, which is found by multiplying independent poissons together, taking the log, and then taking the derivative
loglike <- function(lambda, sumx = 100, n = 15) + {sumx*log(lambda) + -n*lambda}
#This will test different values of lamba into the function to plot. I went up to 30 to make it a prettier plot
lambdavals <- seq(.1, 30, by = .1)
#This applies the loglikelihood function to these different values of lambda to find the corresponding loglikelihood
llvals <- loglike(lambdavals)
#This plots the different lambda values with their corresponding loglikelihood values and adds nice labels
plot(lambdavals, llvals, type = "l", xlab = expression(lambda), ylab = "log likelihood")
#This puts a point at the value of lambda which maximizes loglikelihood, which is 6.66666...etc. I found this through setting the loglikelihood function from step one equal to zero.
points(x = 20/3, y = 89.71199, col = "red")
Question 3
data(swiss)
#This is to find the names of each of the variables
summary(swiss)
## Fertility Agriculture Examination Education
## Min. :35.00 Min. : 1.20 Min. : 3.00 Min. : 1.00
## 1st Qu.:64.70 1st Qu.:35.90 1st Qu.:12.00 1st Qu.: 6.00
## Median :70.40 Median :54.10 Median :16.00 Median : 8.00
## Mean :70.14 Mean :50.66 Mean :16.49 Mean :10.98
## 3rd Qu.:78.45 3rd Qu.:67.65 3rd Qu.:22.00 3rd Qu.:12.00
## Max. :92.50 Max. :89.70 Max. :37.00 Max. :53.00
## Catholic Infant.Mortality
## Min. : 2.150 Min. :10.80
## 1st Qu.: 5.195 1st Qu.:18.15
## Median : 15.140 Median :20.00
## Mean : 41.144 Mean :19.94
## 3rd Qu.: 93.125 3rd Qu.:21.70
## Max. :100.000 Max. :26.60
#The model with all of the variables
mod0 <- lm(Infant.Mortality~Fertility + Examination + Education + Catholic + Agriculture, data = swiss)
#The model with all of the variables except agriculture
mod1 <- lm(Infant.Mortality~Fertility + Examination + Education + Catholic, data = swiss)
Question 3 part a
#This likelihood ratio test finds that the p-value is .655, which means that the more complex model is not a significant improvement over the simpler model. This points to dropping the agriculture variable.
teststat <- 2*(logLik(mod0) - logLik(mod1))
pchisq(teststat[1], lower.tail = FALSE, df = 1)
## [1] 0.6549756
Question 3 part b
#By analyzing these summaries, we can find that the p-value of Agriculture from the more complex model is .67827, so it is not adding much to the model. Additionally, the p-value for the more complex model is .03665 and is .01866 for the simpler model. This is an improvement, which points to dropping the agriculture variable.
summary(mod0)
##
## Call:
## lm(formula = Infant.Mortality ~ Fertility + Examination + Education +
## Catholic + Agriculture, data = swiss)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.2512 -1.2860 0.1821 1.6914 6.0937
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.667e+00 5.435e+00 1.595 0.11850
## Fertility 1.510e-01 5.351e-02 2.822 0.00734 **
## Examination 3.695e-02 9.607e-02 0.385 0.70250
## Education 6.099e-02 8.484e-02 0.719 0.47631
## Catholic 6.711e-05 1.454e-02 0.005 0.99634
## Agriculture -1.175e-02 2.812e-02 -0.418 0.67827
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.683 on 41 degrees of freedom
## Multiple R-squared: 0.2439, Adjusted R-squared: 0.1517
## F-statistic: 2.645 on 5 and 41 DF, p-value: 0.03665
summary(mod1)
##
## Call:
## lm(formula = Infant.Mortality ~ Fertility + Examination + Education +
## Catholic, data = swiss)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.0910 -1.4349 0.0376 1.6891 6.4731
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.108483 3.915065 1.816 0.07656 .
## Fertility 0.160357 0.048082 3.335 0.00179 **
## Examination 0.048467 0.091119 0.532 0.59760
## Education 0.078839 0.072578 1.086 0.28356
## Catholic -0.001908 0.013611 -0.140 0.88920
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.656 on 42 degrees of freedom
## Multiple R-squared: 0.2407, Adjusted R-squared: 0.1684
## F-statistic: 3.328 on 4 and 42 DF, p-value: 0.01866
Question 3 part c
#This F-test finds teh p-value to be .6783, which means that the more complex model is NOT a significant improvement over the simpler model, which points to dropping the extra variable
anova(mod1, mod0)
## Analysis of Variance Table
##
## Model 1: Infant.Mortality ~ Fertility + Examination + Education + Catholic
## Model 2: Infant.Mortality ~ Fertility + Examination + Education + Catholic +
## Agriculture
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 42 296.32
## 2 41 295.07 1 1.2563 0.1746 0.6783
Question 3 part d
#This test compares the AIC of the two different models. The AIC of model 1, which drops Agriculture, is an improvement over the more complex model by about 2 points, which points to dropping this variable
AIC(mod0)
## [1] 233.7217
AIC(mod1)
## [1] 231.9214
Question 3 part e
#This test compares the BIC of the two different models. The BIC of model 1, which drops Agriculture, is an improvement over the more complex model (mod0) by about 3.5 points, which points to dropping this variable
BIC(mod0)
## [1] 246.6727
BIC(mod1)
## [1] 243.0222
Overall, we should DROP the agriculture vairable because all of our tests find that it doesnโt bring significant value to our model. It is better to use the simpler model.