521 Assignment 2

Question 2

A)

## Question 2 
 library(faraway)
library(ResourceSelection)
# str(bliss)
# str(gala) 

model1 = glm(Species~Area + Elevation + Nearest + Scruz + Adjacent, family = "poisson",data = gala)
summary(model1)

## 
## Call:
## glm(formula = Species ~ Area + Elevation + Nearest + Scruz + 
##     Adjacent, family = "poisson", data = gala)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -8.2752  -4.4966  -0.9443   1.9168  10.1849  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  3.155e+00  5.175e-02  60.963  < 2e-16 ***
## Area        -5.799e-04  2.627e-05 -22.074  < 2e-16 ***
## Elevation    3.541e-03  8.741e-05  40.507  < 2e-16 ***
## Nearest      8.826e-03  1.821e-03   4.846 1.26e-06 ***
## Scruz       -5.709e-03  6.256e-04  -9.126  < 2e-16 ***
## Adjacent    -6.630e-04  2.933e-05 -22.608  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 3510.73  on 29  degrees of freedom
## Residual deviance:  716.85  on 24  degrees of freedom
## AIC: 889.68
## 
## Number of Fisher Scoring iterations: 5

B)

Adjusted dependent variable formula:

\(n=log(\mu)\)

\(n'=1/\mu\)

\(V(\mu)=\mu\)

\(w=\mu\)

\[ z_{o} = log(\mu) + (y-\mu)/\mu \]

C)

Area, elevation and adjacent are all extremely close with nearest and scruz still being very close. Only intercept appears to be of the mark a bit but still relatively close overall.

### b
##n = sum(gala$Species)
y = gala$Species
mu = y
eta = log(mu) 
z = eta + (y-mu)/(mu)
w = mu
lmod = lm(z~ Area + Elevation + Nearest + Scruz + Adjacent, weights = w,gala)
coef(lmod)

##   (Intercept)          Area     Elevation       Nearest         Scruz 
##  3.5191545412 -0.0005298484  0.0031643557  0.0025188990 -0.0037899780 
##      Adjacent 
## -0.0006623523

coef(model1)

##   (Intercept)          Area     Elevation       Nearest         Scruz 
##  3.1548078779 -0.0005799429  0.0035405940  0.0088255719 -0.0057094223 
##      Adjacent 
## -0.0006630311

# deviance(model1)
# deviance(lmod)

D)

This is getting a lot closer to the deviance of glm model at 828.0096 where the glm is at 716.8458. Still a great improvement from the null deviance of 3510.73.

eta = lmod$fitted.values
mu = exp(eta)
z = eta + (y-mu)/(mu)
w = mu
lmod = lm(z~ Area + Elevation + Nearest + Scruz + Adjacent, weights = w,gala)

d = 2*sum(y*log(y/mu)-(y-mu))
d

## [1] 828.0096

deviance(model1)

## [1] 716.8458

coef(lmod)

##   (Intercept)          Area     Elevation       Nearest         Scruz 
##  3.2102594447 -0.0005651969  0.0034606226  0.0077171134 -0.0052400871 
##      Adjacent 
## -0.0006604828

coef(model1)

##   (Intercept)          Area     Elevation       Nearest         Scruz 
##  3.1548078779 -0.0005799429  0.0035405940  0.0088255719 -0.0057094223 
##      Adjacent 
## -0.0006630311

E)

The coefficents and deviance are nearly almost exactly the same now with the deviance showing the greatest difference being 3 higher than the glm model.

eta = lmod$fitted.values
mu = exp(eta)
z = eta + (y-mu)/(mu)
w = mu
lmod = lm(z~ Area + Elevation + Nearest + Scruz + Adjacent, weights = w,gala)

d = 2*sum(y*log(y/mu)-(y-mu))
d

## [1] 719.4158

deviance(model1)

## [1] 716.8458

coef(lmod)

##   (Intercept)          Area     Elevation       Nearest         Scruz 
##  3.1562582546 -0.0005793855  0.0035379237  0.0087861184 -0.0056868875 
##      Adjacent 
## -0.0006630167

coef(model1)

##   (Intercept)          Area     Elevation       Nearest         Scruz 
##  3.1548078779 -0.0005799429  0.0035405940  0.0088255719 -0.0057094223 
##      Adjacent 
## -0.0006630311

F)

There is no difference after 5 iterations the deviance and coefficents are pretty much exactly the same.

for(i in 1:2){
eta = lmod$fitted.values
 mu = exp(eta)
 z = eta + (y-mu)/(mu)
 w = mu
 lmod = lm(z~ Area + Elevation + Nearest + Scruz + Adjacent, weights = w,gala)
 
 d = 2*sum(y*log(y/mu)-(y-mu))
 d
 cat(i,coef(lmod),d,"\n")
}

## 1 3.154809 -0.0005799422 0.003540591 0.008825509 -0.00570938 -0.0006630313 716.8488 
## 2 3.154808 -0.0005799429 0.003540594 0.008825572 -0.005709422 -0.0006630311 716.8458

deviance(model1)

## [1] 716.8458

coef(model1)

##   (Intercept)          Area     Elevation       Nearest         Scruz 
##  3.1548078779 -0.0005799429  0.0035405940  0.0088255719 -0.0057094223 
##      Adjacent 
## -0.0006630311

G)

Standard errors when computed are all smaller than the ones produced from the imod model but comparing to the glm() model it is the same.

summary(lmod)

## 
## Call:
## lm(formula = z ~ Area + Elevation + Nearest + Scruz + Adjacent, 
##     data = gala, weights = w)
## 
## Weighted Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.4483 -3.6198 -0.9279  2.0367 13.0984 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.1548079  0.2915894  10.819 1.03e-10 ***
## Area        -0.0005799  0.0001480  -3.918 0.000649 ***
## Elevation    0.0035406  0.0004925   7.189 1.98e-07 ***
## Nearest      0.0088256  0.0102621   0.860 0.398291    
## Scruz       -0.0057094  0.0035251  -1.620 0.118379    
## Adjacent    -0.0006630  0.0001652  -4.012 0.000511 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.635 on 24 degrees of freedom
## Multiple R-squared:  0.7611, Adjusted R-squared:  0.7114 
## F-statistic:  15.3 on 5 and 24 DF,  p-value: 8.607e-07

xm = model.matrix(lmod)
wm = diag(w)
sqrt(diag(solve(t(xm) %*% wm %*% xm)))

##  (Intercept)         Area    Elevation      Nearest        Scruz 
## 5.174951e-02 2.627298e-05 8.740703e-05 1.821260e-03 6.256200e-04 
##     Adjacent 
## 2.932754e-05

Question 3

A)

Shown in the contour plot we can see blotch appears to increase as site and variety increase as a general trend. This is shown in more detail in the individual line plots of blotch against variety and site respectively. Generally speaking we can see as variety and site increase the blotch increases as well and looks to be a constantly increasing relationship as it goes up with a few expceptions. Some of the lower numbered sites and varieties change is a lot less significant as the values increase.

Boxplots further show that generally the medians and variance increase as site and variety increase as well. Site 1,2,6 and variety 1,2,3,4,6 alll have very low medians and variance whereas site 7,8,9 and variety 7,8,9,10 all have greater medians and variance.

B)

Null deviance: 40.8029 on 89 degrees of freedom

Residual deviance: 6.1264 on 72 degrees of freedom

Appears to be a good fit to the model as we have reduced our deviance reports the goodness of fit of our model of generalized linear model. Our model has lost a great portion of it’s deviance for 17 degrees of freedom.

Using the Hosmer/Lemeshow goodness of fit test we can see out model fits well (p-value = 0.9996) showing no significant difference between the model and the observed data.

Also using the chisq() to check goodness of fit (compares size of residual deviance to its degrees of freedom), we can see a very high p-value (1) indicating no evidence of a lack of fit. This does not mean a better model doesn’t exist or that this is the best model.

model3 = glm(formula = blotch~site+variety, family = "binomial",data = leafblotch)

## Warning: non-integer #successes in a binomial glm!

summary(model3)

## 
## Call:
## glm(formula = blotch ~ site + variety, family = "binomial", data = leafblotch)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -0.64431  -0.13546  -0.02061   0.09628   0.81005  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  -8.0546     4.7723  -1.688   0.0915 .
## site2         1.6391     4.8440   0.338   0.7351  
## site3         3.3265     4.5282   0.735   0.4626  
## site4         3.5822     4.5122   0.794   0.4273  
## site5         3.5838     4.5121   0.794   0.4270  
## site6         3.8932     4.4980   0.866   0.3867  
## site7         4.7299     4.4798   1.056   0.2910  
## site8         5.5226     4.4792   1.233   0.2176  
## site9         6.7945     4.4996   1.510   0.1310  
## variety2      0.1501     2.4288   0.062   0.9507  
## variety3      0.6895     2.2566   0.306   0.7600  
## variety4      1.0481     2.1796   0.481   0.6306  
## variety5      1.6147     2.0998   0.769   0.4419  
## variety6      2.3711     2.0440   1.160   0.2460  
## variety7      2.5712     2.0354   1.263   0.2065  
## variety8      3.3419     2.0189   1.655   0.0979 .
## variety9      3.4999     2.0182   1.734   0.0829 .
## variety10     4.2529     2.0279   2.097   0.0360 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 40.8029  on 89  degrees of freedom
## Residual deviance:  6.1264  on 72  degrees of freedom
## AIC: 70.44
## 
## Number of Fisher Scoring iterations: 8

hoslem.test(leafblotch$blotch, fitted(model3))

## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  leafblotch$blotch, fitted(model3)
## X-squared = 0.66998, df = 8, p-value = 0.9996

1-pchisq(deviance(model3), df.residual(model3))

## [1] 1

C)

Dispersion parameter = 0.08878094

Dispersion parameter for quasibinomial family taken to be 0.08878094. This is dispersion parameter is derived from the fitted model by taking the sum of the squared Pearson residuals and dividing it by the residual degrees of freedom.

model4 = glm(formula = blotch~site+variety, family = "quasibinomial",data = leafblotch) 
# Manual vs Automatic
y = leafblotch$blotch
mu = fitted(model4)
muh = mu*(1-mu)

sum(((y - mu)^2)/muh) / (90-18)

## [1] 0.08878094

sum( residuals(model4, "pearson")^2 ) / model4$df.residual

## [1] 0.08878094

summary(model4)$dispersion

## [1] 0.08878094

D)

We see the variance of the residuals is increasing with error (lower values have lower deviance and higher values have higher deviance) indicating that our variance function is probably no reasonable.

plot(predict(model4),residuals(model4,type="deviance"),
xlab="Linear Predictor", ylab="Deviance Residuals")

##plot(residuals(model4)~log(fitted(model4)), ylab = "Deviance Residuals")

E)

Is an imrovement from the last plot as less of a pattern (and only more deviance still for much higher values) but there is still a funnel pattern in the residuals indicating our variance functions is still likely unreasonable.

\(n=log(\mu/1-\mu)\)

\(n'=1/\mu(1-\mu)\)

\(V(\mu)=\mu(1-\mu)\)

\(w=\mu(1-\mu)\)

y = leafblotch$blotch
mu = fitted(model4)
w = 1/(mu*(1-mu))

imod = glm(blotch~site+variety, family = quasi(link = logit,variance = "mu(1-mu)"),data = leafblotch, weights = w)

plot(predict(imod),residuals(imod,type="deviance"),
xlab="Linear Predictor", ylab="Deviance Residuals")

F)

Both equally important at predicting with very low p-values of 2.2e-16 , I wouldn’t drop either in this case.

drop1(model4,test="F")

## Single term deletions
## 
## Model:
## blotch ~ site + variety
##         Df Deviance F value    Pr(>F)    
## <none>       6.1264                      
## site     8  28.6200  33.044 < 2.2e-16 ***
## variety  9  22.2273  21.025 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

G)

Through the graphs in a) the combination of site and variety that show the greatest indication of an interation would be the higher values such as site 7,8,9 and variety 7,8,9,10.

An interaction is when the effect of one variable on an outcome depends on the state of a second variable. It can be seen in the line plots when one is high the other generally is higher aswell.In the case you have a high numbered site in combination with a high numbered variety there is a greater effect on blotch.

521 Assignment 2

Mitchell Rawlings

Question 2

A)

B)

C)

D)

E)

F)

G)

Question 3

A)

B)

C)

D)

E)

F)

G)