Likelihood Homework

Question 2

#This function builds the log likelihood function for a poisson, which is found by multiplying independent poissons together, taking the log, and then taking the derivative
loglike <- function(lambda, sumx = 100, n = 15) + {sumx*log(lambda) + -n*lambda}
#This will test different values of lamba into the function to plot. I went up to 30 to make it a prettier plot
lambdavals <- seq(.1, 30, by = .1)
#This applies the loglikelihood function to these different values of lambda to find the corresponding loglikelihood
llvals <- loglike(lambdavals)
#This plots the different lambda values with their corresponding loglikelihood values and adds nice labels
plot(lambdavals, llvals, type = "l", xlab = expression(lambda), ylab = "log likelihood")
#This puts a point at the value of lambda which maximizes loglikelihood, which is 6.66666...etc. I found this through setting the loglikelihood function from step one equal to zero. 
points(x = 20/3, y = 89.71199, col = "red")

Question 3

data(swiss)
#This is to find the names of each of the variables
summary(swiss)

##    Fertility      Agriculture     Examination      Education    
##  Min.   :35.00   Min.   : 1.20   Min.   : 3.00   Min.   : 1.00  
##  1st Qu.:64.70   1st Qu.:35.90   1st Qu.:12.00   1st Qu.: 6.00  
##  Median :70.40   Median :54.10   Median :16.00   Median : 8.00  
##  Mean   :70.14   Mean   :50.66   Mean   :16.49   Mean   :10.98  
##  3rd Qu.:78.45   3rd Qu.:67.65   3rd Qu.:22.00   3rd Qu.:12.00  
##  Max.   :92.50   Max.   :89.70   Max.   :37.00   Max.   :53.00  
##     Catholic       Infant.Mortality
##  Min.   :  2.150   Min.   :10.80   
##  1st Qu.:  5.195   1st Qu.:18.15   
##  Median : 15.140   Median :20.00   
##  Mean   : 41.144   Mean   :19.94   
##  3rd Qu.: 93.125   3rd Qu.:21.70   
##  Max.   :100.000   Max.   :26.60

#The model with all of the variables
mod0 <- lm(Infant.Mortality~Fertility + Examination + Education + Catholic + Agriculture, data = swiss)
#The model with all of the variables except agriculture
mod1 <- lm(Infant.Mortality~Fertility + Examination + Education + Catholic, data = swiss)

Question 3 part a

#This likelihood ratio test finds that the p-value is .655, which means that the more complex model is not a significant improvement over the simpler model. This points to dropping the agriculture variable.
teststat <- 2*(logLik(mod0) - logLik(mod1))
pchisq(teststat[1], lower.tail = FALSE, df = 1)

## [1] 0.6549756

Question 3 part b

#By analyzing these summaries, we can find that the p-value of Agriculture from the more complex model is .67827, so it is not adding much to the model. Additionally, the p-value for the more complex model is .03665 and is .01866 for the simpler model. This is an improvement, which points to dropping the agriculture variable.
summary(mod0)

## 
## Call:
## lm(formula = Infant.Mortality ~ Fertility + Examination + Education + 
##     Catholic + Agriculture, data = swiss)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.2512 -1.2860  0.1821  1.6914  6.0937 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  8.667e+00  5.435e+00   1.595  0.11850   
## Fertility    1.510e-01  5.351e-02   2.822  0.00734 **
## Examination  3.695e-02  9.607e-02   0.385  0.70250   
## Education    6.099e-02  8.484e-02   0.719  0.47631   
## Catholic     6.711e-05  1.454e-02   0.005  0.99634   
## Agriculture -1.175e-02  2.812e-02  -0.418  0.67827   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.683 on 41 degrees of freedom
## Multiple R-squared:  0.2439, Adjusted R-squared:  0.1517 
## F-statistic: 2.645 on 5 and 41 DF,  p-value: 0.03665

summary(mod1)

## 
## Call:
## lm(formula = Infant.Mortality ~ Fertility + Examination + Education + 
##     Catholic, data = swiss)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.0910 -1.4349  0.0376  1.6891  6.4731 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  7.108483   3.915065   1.816  0.07656 . 
## Fertility    0.160357   0.048082   3.335  0.00179 **
## Examination  0.048467   0.091119   0.532  0.59760   
## Education    0.078839   0.072578   1.086  0.28356   
## Catholic    -0.001908   0.013611  -0.140  0.88920   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.656 on 42 degrees of freedom
## Multiple R-squared:  0.2407, Adjusted R-squared:  0.1684 
## F-statistic: 3.328 on 4 and 42 DF,  p-value: 0.01866

Question 3 part c

#This F-test finds teh p-value to be .6783, which means that the more complex model is NOT a significant improvement over the simpler model, which points to dropping the extra variable
anova(mod1, mod0)

## Analysis of Variance Table
## 
## Model 1: Infant.Mortality ~ Fertility + Examination + Education + Catholic
## Model 2: Infant.Mortality ~ Fertility + Examination + Education + Catholic + 
##     Agriculture
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     42 296.32                           
## 2     41 295.07  1    1.2563 0.1746 0.6783

Question 3 part d

#This test compares the AIC of the two different models. The AIC of model 1, which drops Agriculture, is an improvement over the more complex model by about 2 points, which points to dropping this variable
AIC(mod0)

## [1] 233.7217

AIC(mod1)

## [1] 231.9214

Question 3 part e

#This test compares the BIC of the two different models. The BIC of model 1, which drops Agriculture, is an improvement over the more complex model (mod0) by about 3.5 points, which points to dropping this variable
BIC(mod0)

## [1] 246.6727

BIC(mod1)

## [1] 243.0222

Overall, we should DROP the agriculture vairable because all of our tests find that it doesn’t bring significant value to our model. It is better to use the simpler model.

Likelihood Homework

StatsWithSarah

2/16/2021