CDA – Homework 3: due Tuesday February 19, 2019 @ 3.30pm

1. 3.2

[All Students] p.84, exercise 3.2

a

For each decade, the average percentage of times the starting pitcher pitched a complete game decreases by .0662.

b

x=12
predict = 0.693 - 0.0662*x

The linear model gives us the prediction of -0.1014 for 2024. The logistic prediction of .034 is much more plausible (we should not have a negative prediction).

2. 3.5 part b.

[All Students] Using R and/or SAS, p.86, exercise 3.5 skip part (a) and do part (b). Attach your R and/or SAS output. “Fit the logistic regression model” means to clearly write down the predicted logistic regression model in terms of 𝑃(𝑌 = 1). Also, add the following part (c): showing all work/calculations clearly obtain and interpret 𝑃(𝑌 = 1) for satellites of weight 𝟐. 𝟎𝟑𝟓𝟔 kg.

crab = read.table("http://users.stat.ufl.edu/~aa/cat/data/Crabs.dat", header = T)
 fit <- glm(y ~ weight, family = binomial(link = logit), data = crab)
summary(fit)

## 
## Call:
## glm(formula = y ~ weight, family = binomial(link = logit), data = crab)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1108  -1.0749   0.5426   0.9122   1.6285  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -3.6947     0.8802  -4.198 2.70e-05 ***
## weight        1.8151     0.3767   4.819 1.45e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 225.76  on 172  degrees of freedom
## Residual deviance: 195.74  on 171  degrees of freedom
## AIC: 199.74
## 
## Number of Fisher Scoring iterations: 4

p = predict.glm(fit, data.frame(weight=5.2))

p1 = exp(p)/(1+exp(p))

The probability of a crab having at least one satellite is 0.9968084 given a crab with a weight of 5.2kg.

(c):

Showing all work/calculations clearly obtain and interpret 𝑃(𝑌 = 1) for satellites of weight of

p = predict.glm(fit, data.frame(weight = 2.0356))

exp(p)/(1+exp(p))

##         1 
## 0.5000455

Using our logistic model, we predict a 50% of having at least one satellite for a crab with weight 2.0356 kg.

3. 3.6

[All Students] p.86, exercise 3.6. For parts (c) and (d), be sure to also report 𝐻0 and 𝐻𝑎, and report your detailed and complete interpretations.

a

The odds a person is a democrat = 3.1870 - 0.5901*x (where x is the political idealogy scale). Since the slope is negative, the more conservative a person reports, the less likely that they will be a Democrat.

b

l = -0.5901 - 1.96*0.1564
u = -0.5901 + 1.96*0.1564

L=exp(l)/(1 + exp(l))
U=exp(u)/(1 + exp(u))

c(L,U)

## [1] 0.2897406 0.4295822

L = exp(-0.91587)/(1+exp(-0.91587))
U = exp(-0.29832)/(1+exp(-0.29832))
c(L,U)

## [1] 0.2858002 0.4259682

We are 95% confident that the true effect of political ideology (slope) is between 0.2858002 and 0.4259682.

c

\[H_0: \beta_1 = 0\] \[H_0: \beta_1 \neq 0\]

With a z of -3.772 and a p-value of 0.000162 < \(\alpha = 05\), we reject the null hypothesis and conclude that the effect of political ideology does not equal 0.

d

y = c(5, 18, 19,25,7,7,2); n = c(6,21,20,36,17,18,3)
x = c(1,2,3,4,5,6,7)
fit = glm(y/n ~ x , family = binomial (link=logit), weights = n)

library(car)

## Loading required package: carData

anov = Anova(fit); anov

## Analysis of Deviance Table (Type II tests)
## 
## Response: y/n
##   LR Chisq Df Pr(>Chisq)    
## x   17.009  1   3.72e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

With a \(\chi^2\) of 17.0089141 and p-value of 3.720473810^{-5} < \(\alpha = 05\), we reject the null hypothesis and conclude that the effect of political ideology does not equal 0.

e

There are 4 rounds of the Fisher scoring iterations. This means that the Fisher Scoring algorithm started with an initial guess for the parameter \(\beta_1\), and then revises the weights for the least squares regression by minimizing the variance, while also Maximizing \(\beta_1\).

4. 3.11

[All Students] Using R and/or SAS, p.88, exercise 3.11. Attach your R and/or SAS output. Be ̂̂ thorough and detailed in your interpretation of 𝛽 and 𝑒𝛽.

df = data.frame(imperfections = c(8,7,6,6,3,4,7,2,3,4,9,9,8,14,8,13,11,5,7,6), treatments = c(rep(0,10),rep(1,10)))

fit4 = glm(imperfections ~ treatments, family = poisson(link = log), data = df)
    sum = summary(fit4)
    
sl = fit4$coefficients[2] 
int = fit4$coefficients[1]

\(\widehat{Defects}\) = exp(1.6094379 + 0.5877867 * Treatment) or more simply:

\(\hat{y} =\) = exp(1.6094379 + 0.5877867 * x)

5. 3.12

[All Students] Using R and/or SAS, p.88, exercise 3.12. Attach your R and/or SAS output.

(a)

For the testing in part (a), perform both the Wald and likelihood-ratio tests, remembering to give both test statistics and p-value as well as your clear and detailed conclusion.

From the Wald test we get a z of 3.332 and a p-value of 0.000861 < \(\alpha = 0.01\), so we reject the null and conclude \(\beta \neq 0\).

an = Anova(fit4)

With a \(\chi^2\) of 11.5893734 and p-value of 6.632975710^{-4} < \(\alpha = 05\), so we reject the null and conclude \(\beta \neq 0\) for the Likelihood ratio test.

(b)

For part (b), obtain the 95% confidence intervals for 𝜇𝐵⁄𝜇𝐴 using both the Wald and (profile) likelihood methods, showing all work (or R code).

C = .5878 + c(-1,1)* 1.96* .1764 
CI = exp(C)
low = CI[1]
up = CI[2]

We are 95% confident that the interval 1.2738655 to 2.5435074 captures the true change in defects per treatment.

Here is the profile likelihood method.

exp(confint(fit4))[2,]

## Waiting for profiling to be done...

##    2.5 %   97.5 % 
## 1.280063 2.560228

(c)

Also, add part (c): using the profile likelihood CI from part (b), redo the likelihood- ratio test from part (a) by observing/highlighting whether or not a key value is in the profile likelihood CI. What is that key value?

Since 1 is not in the confidence interval we would reject the null hypothesis in the same fashion that we did in part (a).

6.

[STAT-410 students only] For positive x and parameters 𝛼, 𝛽 > 0, the probability density function (pdf) for the gamma distribution can be written 𝒇(𝒚) = 𝟏 𝒚𝜶−𝟏𝒆−𝒚/𝜷. Note that in this 𝚪(𝜶)𝜷𝜶expression, Γ(𝛼) is the gamma function, it is not a parameter; the parameters are 𝛼 and 𝛽.

(a)

Taking 𝛼 as known/fixed and 𝛽 as unknown here, decide whether this version of the gamma distribution is a member of the exponential family and justify your claims - i.e., by identifying each of the respective pieces on slide 5 of the posted Chapter 3 Course Notes: 𝑎(𝜃), 𝑏(𝑦), 𝑔(𝑦), 𝑄(𝜃). If it is a member of the exponential family, clearly identify the natural parameter.

\(a(\beta) = {1 \over {\beta^\alpha}}\) \(b(y) = {y^{\alpha - 1}\over{\Gamma(\alpha)}}\) \(g(y) = -y\) and \(Q(\beta) = {1 \over\beta}\)

\(\psi = {1 \over\beta}\)

(b)

Taking 𝛽 as known/fixed and 𝛼 as unknown here, decide whether this version of the gamma distribution is a member of the exponential family and justify your claims - i.e. by identifying each of the respective pieces on slide 5 of the posted Chapter 3 Course Notes: 𝑎(𝜃), 𝑏(𝑦), 𝑔(𝑦), 𝑄(𝜃). If it is a member of the exponential family, clearly identify the natural parameter.

\(a(\alpha) = {1 \over {\Gamma(\alpha)}}\) \(b(y) = {1 \over {ye^{y/b}}}\) \(g(y) = log(y/\beta)\) and \(Q(\beta) = \alpha\)

\(\psi = \alpha\)

410 HW3

Alex Blohm

2/12/2019