CDA β Homework 3: due Tuesday February 19, 2019 @ 3.30pm
[All Students] p.84, exercise 3.2
For each decade, the average percentage of times the starting pitcher pitched a complete game decreases by .0662.
x=12
predict = 0.693 - 0.0662*x
The linear model gives us the prediction of -0.1014 for 2024. The logistic prediction of .034 is much more plausible (we should not have a negative prediction).
[All Students] Using R and/or SAS, p.86, exercise 3.5 skip part (a) and do part (b). Attach your R and/or SAS output. βFit the logistic regression modelβ means to clearly write down the predicted logistic regression model in terms of π(π = 1). Also, add the following part (c): showing all work/calculations clearly obtain and interpret π(π = 1) for satellites of weight π. ππππ kg.
crab = read.table("http://users.stat.ufl.edu/~aa/cat/data/Crabs.dat", header = T)
fit <- glm(y ~ weight, family = binomial(link = logit), data = crab)
summary(fit)
##
## Call:
## glm(formula = y ~ weight, family = binomial(link = logit), data = crab)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.1108 -1.0749 0.5426 0.9122 1.6285
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.6947 0.8802 -4.198 2.70e-05 ***
## weight 1.8151 0.3767 4.819 1.45e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 225.76 on 172 degrees of freedom
## Residual deviance: 195.74 on 171 degrees of freedom
## AIC: 199.74
##
## Number of Fisher Scoring iterations: 4
p = predict.glm(fit, data.frame(weight=5.2))
p1 = exp(p)/(1+exp(p))
The probability of a crab having at least one satellite is 0.9968084 given a crab with a weight of 5.2kg.
Showing all work/calculations clearly obtain and interpret π(π = 1) for satellites of weight of
p = predict.glm(fit, data.frame(weight = 2.0356))
exp(p)/(1+exp(p))
## 1
## 0.5000455
Using our logistic model, we predict a 50% of having at least one satellite for a crab with weight 2.0356 kg.
[All Students] p.86, exercise 3.6. For parts (c) and (d), be sure to also report π»0 and π»π, and report your detailed and complete interpretations.
The odds a person is a democrat = 3.1870 - 0.5901*x (where x is the political idealogy scale). Since the slope is negative, the more conservative a person reports, the less likely that they will be a Democrat.
l = -0.5901 - 1.96*0.1564
u = -0.5901 + 1.96*0.1564
L=exp(l)/(1 + exp(l))
U=exp(u)/(1 + exp(u))
c(L,U)
## [1] 0.2897406 0.4295822
L = exp(-0.91587)/(1+exp(-0.91587))
U = exp(-0.29832)/(1+exp(-0.29832))
c(L,U)
## [1] 0.2858002 0.4259682
We are 95% confident that the true effect of political ideology (slope) is between 0.2858002 and 0.4259682.
\[H_0: \beta_1 = 0\] \[H_0: \beta_1 \neq 0\]
With a z of -3.772 and a p-value of 0.000162 < \(\alpha = 05\), we reject the null hypothesis and conclude that the effect of political ideology does not equal 0.
y = c(5, 18, 19,25,7,7,2); n = c(6,21,20,36,17,18,3)
x = c(1,2,3,4,5,6,7)
fit = glm(y/n ~ x , family = binomial (link=logit), weights = n)
library(car)
## Loading required package: carData
anov = Anova(fit); anov
## Analysis of Deviance Table (Type II tests)
##
## Response: y/n
## LR Chisq Df Pr(>Chisq)
## x 17.009 1 3.72e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
With a \(\chi^2\) of 17.0089141 and p-value of 3.720473810^{-5} < \(\alpha = 05\), we reject the null hypothesis and conclude that the effect of political ideology does not equal 0.
There are 4 rounds of the Fisher scoring iterations. This means that the Fisher Scoring algorithm started with an initial guess for the parameter \(\beta_1\), and then revises the weights for the least squares regression by minimizing the variance, while also Maximizing \(\beta_1\).
[All Students] Using R and/or SAS, p.88, exercise 3.11. Attach your R and/or SAS output. Be ΜΜ thorough and detailed in your interpretation of π½ and ππ½.
df = data.frame(imperfections = c(8,7,6,6,3,4,7,2,3,4,9,9,8,14,8,13,11,5,7,6), treatments = c(rep(0,10),rep(1,10)))
fit4 = glm(imperfections ~ treatments, family = poisson(link = log), data = df)
sum = summary(fit4)
sl = fit4$coefficients[2]
int = fit4$coefficients[1]
\(\widehat{Defects}\) = exp(1.6094379 + 0.5877867 * Treatment) or more simply:
\(\hat{y} =\) = exp(1.6094379 + 0.5877867 * x)
[All Students] Using R and/or SAS, p.88, exercise 3.12. Attach your R and/or SAS output.
For the testing in part (a), perform both the Wald and likelihood-ratio tests, remembering to give both test statistics and p-value as well as your clear and detailed conclusion.
From the Wald test we get a z of 3.332 and a p-value of 0.000861 < \(\alpha = 0.01\), so we reject the null and conclude \(\beta \neq 0\).
an = Anova(fit4)
With a \(\chi^2\) of 11.5893734 and p-value of 6.632975710^{-4} < \(\alpha = 05\), so we reject the null and conclude \(\beta \neq 0\) for the Likelihood ratio test.
For part (b), obtain the 95% confidence intervals for ππ΅βππ΄ using both the Wald and (profile) likelihood methods, showing all work (or R code).
C = .5878 + c(-1,1)* 1.96* .1764
CI = exp(C)
low = CI[1]
up = CI[2]
We are 95% confident that the interval 1.2738655 to 2.5435074 captures the true change in defects per treatment.
Here is the profile likelihood method.
exp(confint(fit4))[2,]
## Waiting for profiling to be done...
## 2.5 % 97.5 %
## 1.280063 2.560228
Also, add part (c): using the profile likelihood CI from part (b), redo the likelihood- ratio test from part (a) by observing/highlighting whether or not a key value is in the profile likelihood CI. What is that key value?
Since 1 is not in the confidence interval we would reject the null hypothesis in the same fashion that we did in part (a).
[STAT-410 students only] For positive x and parameters πΌ, π½ > 0, the probability density function (pdf) for the gamma distribution can be written π(π) = π ππΆβππβπ/π·. Note that in this πͺ(πΆ)π·πΆexpression, Ξ(πΌ) is the gamma function, it is not a parameter; the parameters are πΌ and π½.
Taking πΌ as known/fixed and π½ as unknown here, decide whether this version of the gamma distribution is a member of the exponential family and justify your claims - i.e., by identifying each of the respective pieces on slide 5 of the posted Chapter 3 Course Notes: π(π), π(π¦), π(π¦), π(π). If it is a member of the exponential family, clearly identify the natural parameter.
\(a(\beta) = {1 \over {\beta^\alpha}}\) \(b(y) = {y^{\alpha - 1}\over{\Gamma(\alpha)}}\) \(g(y) = -y\) and \(Q(\beta) = {1 \over\beta}\)
\(\psi = {1 \over\beta}\)
Taking π½ as known/fixed and πΌ as unknown here, decide whether this version of the gamma distribution is a member of the exponential family and justify your claims - i.e.Β by identifying each of the respective pieces on slide 5 of the posted Chapter 3 Course Notes: π(π), π(π¦), π(π¦), π(π). If it is a member of the exponential family, clearly identify the natural parameter.
\(a(\alpha) = {1 \over {\Gamma(\alpha)}}\) \(b(y) = {1 \over {ye^{y/b}}}\) \(g(y) = log(y/\beta)\) and \(Q(\beta) = \alpha\)
\(\psi = \alpha\)