12.16

a )

The rule for examining standard deviations in ANOVA states that is the largest standard deviation is less than twice the smallest standard deviation, we can use methods based on the assumption of equal standard deviations, and our results will still be approximately correct. Int his case, the ratio of the largest to smallest standrard deviation is \(4.8/2.6 = 1.846 < 2\); therefore, we can perform ANOVA as it meets the assumptions.

b)

\(V_1 = S_1^2 = 2.7^2 = 7.29\)

\(V_2 = 2.62^2 = 6.76\)

\(V_3 = 4.8^2 = 23.04\)

c)

Pooled Variance

\(s_p^2 = \frac{(n_1-1)s_1^2 + (n_2-1)s_2^2 + (n_3-1)s_3^2}{(n_1-1)+ (n_2-1)+ (n_3-1)} = \frac{27\times7.29 + 32\times6.76 + 101\times23.04}{(27+ 32+ 101)} = \frac{2740.19}{160} = 17.1262\)

d)

Pooled Standard Deviation \(s_p = \sqrt{s_p^2} = \sqrt{17.1262} =\)

sqrt(17.1262)
## [1] 4.138381

e)

The pooled standard deviation is closer to the standard deviation for the third group, which has the largest sample size, because pooling weights individual standard deviations by sample size. Since the third group had the largest, it had the highest weight, so skewed the pooled value significant toward that standard deviation.

12.23

a)

placebo_xbar = 11.80 
lowA_xbar = 15.25
highA_xbar = 18.55
lowB_xbar = 16.15 
highB_xbar = 17.10

barplot(c(placebo_xbar, lowA_xbar, highA_xbar, lowB_xbar, highB_xbar), names.arg = c("placebo", "Low A", "High A", "Low B", "High B"), 
        col = c("#eb8060", "#b9e38d", "#b9e38d", "#a1e9f0", "#a1e9f0"))

There does appear to be a difference in the activity between the groups. The placebo as the lowest activity level and High A has the highest activity level. Low A and Low Bshow about the same activity level where High A and High B are show an increase compared to their low doses.

b)

The rule for examining standard deviations in ANOVA states that is the largest standard deviation is less than twice the smallest standard deviation, we can use methods based on the assumption of equal standard deviations, and our results will still be approximately correct. In this case, the ratio of the largest to smallest standard deviation is \(\frac{\sqrt{17.20}}{\sqrt{7.75}} = 1.489 < 2\); therefore, we can perform ANOVA as it meets the assumptions.

$s_p^2 = = $

\(s_p^2 = \frac{4*17.20 + 4*13.10 + 4*10.25 + 4*7.75 + 4*12.50}{4*5} = 243.2 / 20 = 12.16\)

\(s_p = \sqrt{s_p^2} = \sqrt{12.16} = 3.487\)

c)

DF_numerator = I-1 = 5 - 1 = 4

DF_denominator = N-I = 25 - 5 = 20

d)

P-value is between 0.05 and 0.10; there may not be significant evidence when \(\alpha = 0.05\) that the means between the groups are not all the same, therefore, it’s probbaly safe to not reject the nul lhypothesis in this case that the means are equal.

12.35

a)

$_1 = _2 - _1 - _2 $

b)

\(\psi_2 = -\mu_3 + \frac{1}{3}\mu_1 + \frac{1}{3}\mu_2 + \frac{1}{2}\mu_4\)

12.36

a)

b)

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# eyes = EX12_36EYES

eyes = read.csv("~/Library/Mobile Documents/com~apple~CloudDocs/STAT500/Homework/HW5/EX12-36EYES.csv")
  
blue = filter(eyes, Group == "Blue")
brown = filter(eyes, Group == "Brown")
gaze_down = filter(eyes, Group == "Down")
green = filter(eyes, Group == "Green")

dim(blue)
## [1] 67  3
n_blue = 67
n_brown = 37
n_gaze_down = 41
n_green = 77


var_blue = var(blue$Score)
var_brown = var(brown$Score)
var_gaze_down = var(gaze_down$Score)
var_green = var(green$Score)


Sp2_eyes = (n_blue*var_blue + n_brown*var_brown + n_gaze_down*var_gaze_down + n_green*var_green) /( (n_blue - 1)+(n_brown-1) + (n_gaze_down - 1)+(n_green - 1))

Sp_eyes = sqrt(Sp2_eyes)
Sp_eyes
## [1] 1.692214
xbar_blue = mean(blue$Score)
xbar_brown = mean(brown$Score)
xbar_gaze_down = mean(gaze_down$Score)
xbar_green = mean(green$Score)

No Software

\(c_1 = \Sigma a_i \bar{x}_i = (1)(3.72) - (1/2)(3.19) - (1/2)(3.86) = 0.195\)

\(c_2 = \Sigma a_i \bar{x}_i = (-1)(3.11) + (1/3)(3.19) + (1/3)(3.72) + (1/3)(3.86) = 0.450\)

c1_eyes = 0.195
c2_eyes = 0.450

c)

\(SE_{c1} = S_p \sqrt{\Sigma \frac{a_i^2}{n_i}}= 1.69 \sqrt{(.5^2/67) + (1^2/37) + (-.5^2/77)} = 0.2753\)

\(SE_{c2} = S_p \sqrt{\Sigma \frac{a_i^2}{n_i}}= 1.69 \sqrt{(-1^2/41) + (.33^2/67) + (.33^2/37) + (.33^2/77)} = 0.294\)

SEc1_eyes = 0.2753
SEc2_eyes = 0.294

d)

t1_eyes = c1_eyes / SEc1_eyes
t1_eyes
## [1] 0.7083182
t2_eyes = c2_eyes / SEc2_eyes
t2_eyes
## [1] 1.530612
t_ratio_eyes = t1_eyes/t2_eyes

0.20 < P1 < 0.25

0.05 < P2 < 0.10

Neither contrast shows a significant difference between the groups but contrast 2 comes close to significance.

e)

CI_1 = \(c_1 \pm t*SE_{c1} = 0.195 \pm 0.708*0.2753 = (1e-4, 0.3899)\)

CI_2 = \(c_2 \pm t*SE_{c2} = 0.450 \pm 1.530*0.294 = (2e-4, 0.8998)\)

Software

library(lsmeans)
## Loading required package: emmeans
## The 'lsmeans' package is now basically a front end for 'emmeans'.
## Users are encouraged to switch the rest of the way.
## See help('transition') for more information, including how to
## convert old 'lsmeans' objects and scripts to work with 'emmeans'.
model_eyes = lm(eyes$Score ~ eyes$Group,
           data = eyes)

leastsquare_eyes = lsmeans(model_eyes, "Group")

Contrasts_eyes = list(C1          = c(-.5,  1, 0, -.5),
                 C2          = c(1/3, 1/3,  -1, 1/3))

contrast(leastsquare_eyes, Contrasts_eyes, adjust="sidak")
##  contrast estimate    SE  df t.ratio p.value
##  C1          0.197 0.309 218   0.638  0.7733
##  C2          0.485 0.293 218   1.657  0.1880
## 
## P value adjustment: sidak method for 2 tests
c1_eyes_s_lo = 0.197 - 0.638*0.309
c1_eyes_s_hi = 0.197 + 0.638*0.309

CI_1 = -0.000142, 0.394142

contains 0.

c1_eyes_s_lo = 0.485 - 1.657*0.293
c2_eyes_s_hi = 0.485 + 1.657*0.293

CI_2 = -0.000501, 0.970501

Contains 0.

With software, both confidence intervals contains 0, confirming that there is not enough evidence to confrm a true difference between the groups in either contrast.

12.72

dan = read.csv("EX12-72DANDRUFF.csv")
pyr1 = filter(dan, Treatment == "PyrI")
pyr2 = filter(dan, Treatment == "PyrII")
keto = filter(dan, Treatment == "Keto")
plac = filter(dan, Treatment == "Placebo")
xbar_pyr1 = mean(pyr1$Flaking)
xbar_pyr2 = mean(pyr2$Flaking)
xbar_keto = mean(keto$Flaking)
xbar_plac = mean(plac$Flaking)

print(c(xbar_pyr1, xbar_pyr2, xbar_keto, xbar_plac))
## [1] 17.39286 17.20183 16.02830 29.39286
sd_pyr1 = sd(pyr1$Flaking)
sd_pyr2 = sd(pyr2$Flaking)
sd_keto = sd(keto$Flaking)
sd_plac = sd(plac$Flaking)

print(c(sd_pyr1, sd_pyr2, sd_keto, sd_plac))
## [1] 1.1418110 1.3524999 0.9305149 1.5948827
std <- function(x) sd(x)/sqrt(length(x))

se_pyr1 = std(pyr1$Flaking)
se_pyr2 = std(pyr2$Flaking)
se_keto = std(keto$Flaking)
se_plac = std(plac$Flaking)

print(c(se_pyr1, se_pyr2, se_keto, se_plac))
## [1] 0.1078910 0.1295460 0.0903796 0.3014045
dim(plac)
## [1] 28  3
means_dan = c(xbar_pyr1, xbar_pyr2, xbar_keto, xbar_plac)
sd_dan = c(sd_pyr1, sd_pyr2, sd_keto, sd_plac)
se_dan = c(se_pyr1, se_pyr2, se_keto, se_plac)
n_dan = c(112, 109, 106, 28)

dan_table = data.frame(means_dan, sd_dan, se_dan, n_dan)

dan_table
##   means_dan    sd_dan    se_dan n_dan
## 1  17.39286 1.1418110 0.1078910   112
## 2  17.20183 1.3524999 0.1295460   109
## 3  16.02830 0.9305149 0.0903796   106
## 4  29.39286 1.5948827 0.3014045    28

graph of means

barplot(dan_table$means_dan, names.arg = c("PyrI", "PyrII", "Keto", "Placebo"), 
        col = rainbow(4))

b)

ANOVA will have 3 degress of freedom in the numerator and 351 in the denominator, and the visualization fo the data suggest that the treatment groups will almost certainly demonstrate a dignificant difference from the placebo.

\(H_0 = \mu_1 = \mu_2 = \mu_3 = \mu_4\)

\(H_0\) = All \(\mu_i\)s are not equal

head(dan)
##   OBS Treatment Flaking
## 1   1      PyrI      17
## 2   2      PyrI      16
## 3   3      PyrI      18
## 4   4      PyrI      17
## 5   5      PyrI      18
## 6   6      PyrI      16
res.aov_dan <- aov(Flaking ~ Treatment, data = dan)

summary(res.aov_dan)
##              Df Sum Sq Mean Sq F value Pr(>F)    
## Treatment     3   4151  1383.8   967.8 <2e-16 ***
## Residuals   351    502     1.4                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

F = 967.82 DF numberator = 3 DF denominator = 351 p-value = <<< 0.01 = 2e-16

We can reject the null hypothesis and conclude that the mean of at least one treatment group is different from the rest.

12.75

a)

PyrI: μ1 PyrII: μ2 Keto: μ3 Placebo: μ4

\(\psi_1 = 1/3\mu_1 + 1/3\mu_2 + 1/3 \mu_3 - \mu_4\)

\(\psi_2 = 1/2\mu_1 + 1/2\mu_2 - \mu_3\)

\(\psi_3 = \mu_1 - \mu_2\)

##b)

library(lsmeans)

# View(dan)


model_dan = lm(Flaking ~ Treatment,
           data = dan)

leastsquare_dan = lsmeans(model_dan, "Treatment")

Contrasts_dan = list(C1          = c(1/3, 1/3, 1/3, -1),
                 C2          = c(.5, .5, -1, 0), 
                 C3 = c(1, -1, 0, 0))

contrast(leastsquare_dan, Contrasts_dan, adjust="sidak")
##  contrast estimate    SE  df t.ratio p.value
##  C1           3.74 0.147 351  25.358  <.0001
##  C2           5.32 0.170 351  31.278  <.0001
##  C3         -13.36 0.254 351 -52.601  <.0001
## 
## P value adjustment: sidak method for 3 tests

$H_0: all means are equal $

\(H_a: all means are not equal\)

If done correctly, ,the test performed shows that all contrasts are significant in thast at least one mean is not equal within the treatments. I am doubtful this test was performed correctly as there does not appear to be significant different between the non-placebo groups, and so I don’t believe contrast 3 is representative of what is observed.

12.79

friends = read.csv("EX12-79FRIENDS.csv")

friends_fit = lm(Score ~ Friends,
           data = friends)

summary(friends_fit)
## 
## Call:
## lm(formula = Score ~ Friends, data = friends)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.3602 -0.7756  0.2193  0.7988  2.6398 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.4320879  0.2060299  21.512   <2e-16 ***
## Friends     -0.0001023  0.0003694  -0.277    0.782    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.15 on 132 degrees of freedom
## Multiple R-squared:  0.000581,   Adjusted R-squared:  -0.00699 
## F-statistic: 0.07674 on 1 and 132 DF,  p-value: 0.7822
resid(friends_fit)
##           1           2           3           4           5           6 
## -0.62164958 -0.82164958 -1.22164958 -2.02164958  0.37835042 -1.42164958 
##           7           8           9          10          11          12 
## -0.22164958 -0.82164958 -1.22164958 -1.42164958  0.37835042 -1.02164958 
##          13          14          15          16          17          18 
##  0.37835042 -1.42164958  0.17835042 -1.62164958  1.57835042 -1.62164958 
##          19          20          21          22          23          24 
##  0.77835042 -1.22164958 -0.22164958 -2.22164958  0.57835042  0.37835042 
##          25          26          27          28          29          30 
##  0.59881777  0.79881777  1.19881777 -1.80118223 -0.60118223  0.39881777 
##          31          32          33          34          35          36 
##  1.19881777  0.39881777  1.99881777  0.39881777 -0.00118223  1.59881777 
##          37          38          39          40          41          42 
## -0.60118223  0.39881777  0.19881777  1.59881777  0.59881777 -1.40118223 
##          43          44          45          46          47          48 
## -0.00118223  0.99881777  0.99881777  0.19881777  1.19881777  1.39881777 
##          49          50          51          52          53          54 
## -0.20118223  0.39881777  0.59881777  0.79881777 -0.20118223  0.59881777 
##          55          56          57          58          59          60 
##  1.39881777  1.19881777 -0.60118223  0.21928512 -0.38071488  0.41928512 
##          61          62          63          64          65          66 
## -1.38071488 -2.38071488  1.41928512  1.21928512  0.01928512  0.01928512 
##          67          68          69          70          71          72 
##  1.21928512  0.21928512  1.21928512 -1.38071488  1.21928512 -0.78071488 
##          73          74          75          76          77          78 
##  2.41928512 -1.18071488  0.41928512  0.21928512  1.01928512  0.41928512 
##          79          80          81          82          83          84 
##  0.41928512  1.01928512 -0.78071488  0.41928512 -0.58071488 -1.16024753 
##          85          86          87          88          89          90 
## -0.76024753  1.43975247 -3.16024753 -0.56024753  1.03975247 -0.76024753 
##          91          92          93          94          95          96 
## -0.96024753  0.63975247  0.83975247 -0.76024753 -1.76024753  2.63975247 
##          97          98          99         100         101         102 
##  0.03975247  0.43975247  0.83975247  1.03975247 -0.76024753 -3.36024753 
##         103         104         105         106         107         108 
##  0.63975247  0.63975247  1.63975247 -0.16024753  1.43975247 -1.16024753 
##         109         110         111         112         113         114 
##  1.03975247  2.03975247  0.03975247 -1.36024753  1.63975247 -0.13978018 
##         115         116         117         118         119         120 
##  0.26021982 -1.33978018 -1.73978018  0.86021982  0.86021982 -2.73978018 
##         121         122         123         124         125         126 
##  0.66021982  0.06021982  0.66021982 -0.73978018 -0.13978018  0.66021982 
##         127         128         129         130         131         132 
## -0.93978018 -0.73978018  0.66021982 -1.13978018 -1.93978018  0.46021982 
##         133         134 
## -0.73978018 -0.13978018
plot(friends$Friends, resid(friends_fit)); abline(0, 0)

The p-value for this simple linear model fitted on this data is 0.782, which is not significant suggests there is not enough evidence that there is a linear relationship between these features. Both correlation coefficients were extremely low as well.

This suggests that a linear model may not be the best to explain the relationship between these discrete features, ANOVA may be better.

14.48

gpa = read.csv("EX14-048GPAHI.csv", stringsAsFactors = T)

# gpa_fit = glm(as.factor(GPA) ~ SATM + SATCR, data = gpa, family = binomial(link = "logit") )
gpa_fit = glm(GPA ~ SATM + SATCR, data = gpa)
summary(gpa_fit)
## 
## Call:
## glm(formula = GPA ~ SATM + SATCR, data = gpa)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.4158  -0.5924   0.2029   0.5599   1.1655  
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 0.4665253  0.5494504   0.849  0.39722   
## SATM        0.0030290  0.0010407   2.911  0.00417 **
## SATCR       0.0008482  0.0008888   0.954  0.34149   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.6005995)
## 
##     Null deviance: 99.675  on 149  degrees of freedom
## Residual deviance: 88.288  on 147  degrees of freedom
## AIC: 354.18
## 
## Number of Fisher Scoring iterations: 2

a)

SATM shows a significant logistic relationship to GPA, but SATCR does not.

b)

# install.packages("broom")
broom::tidy(gpa_fit, conf.int = TRUE) %>% select(term, estimate, conf.low, conf.high)
## # A tibble: 3 × 4
##   term        estimate  conf.low conf.high
##   <chr>          <dbl>     <dbl>     <dbl>
## 1 (Intercept) 0.467    -0.610      1.54   
## 2 SATM        0.00303   0.000989   0.00507
## 3 SATCR       0.000848 -0.000894   0.00259

SATM estimate: 0.003 with CI (0.0009893085, 0.005068713 ) SATCR estimate: 0.00084 with CI (-0.0008938451, 0.002590314 ), which contains 0.

c )

SATM shows a significant logistic relationship to GPA, but SATCR does not. The confidence interval for the coefficient estimate does not contian 0, therfore, the significant logistic relationship is supported.

14.50

a)

gpa
##      GPA HSM HSS HSE SATM SATCR SATW sex HIGPA
## 1   3.84  10  10  10  630   570  590   2     1
## 2   3.97  10  10  10  750   700  630   1     1
## 3   3.49   8  10   9  570   510  490   2     1
## 4   1.95   6   4   8  640   600  610   1     0
## 5   2.59   8  10   9  510   490  490   2     0
## 6   3.00   7  10  10  660   680  630   1     1
## 7   1.78   9   9   9  630   490  510   1     0
## 8   2.41   6   6   7  670   620  610   1     0
## 9   2.83   9   8  10  550   570  540   2     0
## 10  0.60   7   7   9  640   720  630   1     0
## 11  3.98  10  10  10  630   560  540   2     1
## 12  1.52   8   8   7  650   550  460   1     0
## 13  2.82  10  10   9  610   540  490   2     0
## 14  3.09  10  10  10  670   330  540   1     1
## 15  3.79   9  10  10  550   580  560   2     1
## 16  3.20   9  10   9  580   490  420   2     1
## 17  3.62  10  10  10  720   620  570   1     1
## 18  1.74  10   9   6  780   540  620   1     0
## 19  2.02   7   5   6  660   520  600   1     0
## 20  2.29  10  10  10  690   750  680   1     0
## 21  3.13  10  10   9  660   620  590   2     1
## 22  2.96   8   9   6  660   670  750   1     0
## 23  4.00  10  10  10  640   620  620   2     1
## 24  2.89   8   7   8  620   610  570   1     0
## 25  3.95  10  10  10  770   780  760   1     1
## 26  1.71  10  10   8  800   750  650   1     0
## 27  3.23   8   8   8  650   590  480   1     1
## 28  2.07   8   8   8  570   560  560   1     0
## 29  1.29   2   6   8  480   490  650   2     0
## 30  4.00  10  10  10  630   630  620   2     1
## 31  2.94   9  10  10  490   510  490   1     0
## 32  3.89  10  10  10  680   730  740   1     1
## 33  3.34   7   7   6  700   420  460   1     1
## 34  3.52   9  10   8  740   740  620   1     1
## 35  3.75  10  10  10  650   690  670   1     1
## 36  3.55   6   7   5  630   420  400   1     1
## 37  3.46   8   9   9  590   600  560   2     1
## 38  2.02   7   9   8  690   690  640   1     0
## 39  1.56   8   5   6  650   600  510   1     0
## 40  3.97  10  10  10  730   660  660   1     1
## 41  1.36   8  10  10  590   560  510   1     0
## 42  4.00  10  10  10  630   580  490   1     1
## 43  3.31  10  10   9  600   570  510   1     1
## 44  2.28   8   9   8  590   400  480   1     0
## 45  3.65   8   9   9  680   650  650   1     1
## 46  2.34  10  10   9  660   530  610   1     0
## 47  2.06   8   9   9  650   550  570   2     0
## 48  3.25   9   7   8  640   540  540   1     1
## 49  3.45  10  10  10  600   570  530   1     1
## 50  2.31   9   7   7  580   530  510   1     0
## 51  4.00  10  10   9  630   570  630   1     1
## 52  2.50  10   9   7  620   550  430   1     0
## 53  3.08   8   8   8  620   560  490   2     1
## 54  3.38  10   9  10  670   690  550   1     1
## 55  2.69   8   6   6  550   510  460   1     0
## 56  3.64   9   9   9  600   590  590   2     1
## 57  3.26  10  10  10  510   450  470   2     1
## 58  1.49   8   9   9  500   540  530   2     0
## 59  2.93   8   4   4  550   480  470   1     0
## 60  2.92   7   9   8  740   600  640   1     0
## 61  3.99  10  10  10  750   610  640   1     1
## 62  3.27   8  10   9  480   450  560   2     1
## 63  3.05   9   9  10  700   560  550   1     1
## 64  3.36   8  10   9  520   490  470   2     1
## 65  0.03   5   7   8  460   450  500   1     0
## 66  2.57   7   9   8  520   550  570   2     0
## 67  3.33   7   8   7  610   450  480   2     1
## 68  3.06   8   9  10  630   560  650   2     1
## 69  2.39   6   8   9  620   530  480   1     0
## 70  2.21   8  10  10  500   510  590   2     0
## 71  2.99  10  10  10  580   480  530   2     0
## 72  4.00  10  10  10  760   650  630   1     1
## 73  1.20   8   7   7  520   480  560   1     0
## 74  3.28   9  10   9  610   540  460   1     1
## 75  3.87  10  10  10  690   580  570   1     1
## 76  2.52   8   8   8  510   480  460   2     0
## 77  3.32   9   9   9  580   490  480   1     1
## 78  1.02   9   9   7  560   560  560   2     0
## 79  2.91   6   9  10  580   700  640   2     0
## 80  2.14  10  10  10  700   650  640   1     0
## 81  2.50  10  10  10  520   480  440   1     0
## 82  3.36  10   9  10  640   580  630   1     1
## 83  3.51   7   9   8  650   640  640   1     1
## 84  2.36   6   5   8  540   520  520   2     0
## 85  1.87   6   7   8  700   580  560   1     0
## 86  3.45  10  10  10  770   760  730   1     1
## 87  2.96   8   7   9  500   540  610   2     0
## 88  3.24   6   7   8  660   640  610   2     1
## 89  3.32   9   9  10  730   640  670   1     1
## 90  3.71  10  10  10  710   760  660   2     1
## 91  3.18   9  10  10  620   620  550   2     1
## 92  3.59  10   9  10  690   580  560   1     1
## 93  2.93   8   9   9  490   530  550   2     0
## 94  3.93   9  10  10  690   740  670   2     1
## 95  1.41   8   8   9  690   410  460   1     0
## 96  1.90   6   7   7  540   720  650   1     0
## 97  3.45  10  10   9  640   670  600   2     1
## 98  3.06   9  10   9  590   450  460   1     1
## 99  1.85   8   8   8  570   520  520   1     0
## 100 3.13   9  10  10  550   530  520   1     1
## 101 1.81   7   7   7  550   510  490   1     0
## 102 2.38   9   6   8  640   580  640   2     0
## 103 2.45   9  10   9  720   670  700   1     0
## 104 3.19   6   7   8  540   510  490   2     1
## 105 2.23   7   7   8  690   620  570   1     0
## 106 1.83   5   7  10  560   550  560   2     0
## 107 3.38   7   8   9  630   640  530   2     1
## 108 3.43   9   8   9  670   600  590   1     1
## 109 2.74   9   7   7  780   610  680   1     0
## 110 4.00  10  10  10  710   600  630   2     1
## 111 2.93   8   8  10  610   480  440   1     0
## 112 1.68   7   8   8  650   530  450   1     0
## 113 3.71   9  10   9  620   500  520   2     1
## 114 1.72   7   8   9  530   610  540   2     0
## 115 1.63  10   9   9  540   600  560   2     0
## 116 0.85  10   9   9  560   520  470   1     0
## 117 2.94   8  10  10  630   700  580   1     0
## 118 3.37  10   9   9  560   460  480   1     1
## 119 3.15   8   8   7  690   670  670   1     1
## 120 2.96  10  10  10  550   560  490   2     0
## 121 3.48   9   9   9  710   660  610   2     1
## 122 2.05  10  10  10  670   550  620   2     0
## 123 1.66  10   9  10  580   480  470   2     0
## 124 3.12  10   9   9  530   480  480   2     1
## 125 2.78   9   9   8  520   490  520   2     0
## 126 3.33  10   9  10  650   580  480   2     1
## 127 2.57   6   7   7  560   550  540   1     0
## 128 3.10   8   9   9  710   600  730   1     1
## 129 2.30   8   9  10  630   540  500   1     0
## 130 2.74  10  10   8  570   450  590   1     0
## 131 2.19  10  10  10  700   530  560   1     0
## 132 3.36  10  10  10  690   580  570   2     1
## 133 3.03  10  10  10  630   540  580   2     1
## 134 3.49   8   9   8  600   660  620   2     1
## 135 3.88  10  10  10  740   640  690   1     1
## 136 2.71   9   8   9  510   430  500   2     0
## 137 2.55  10  10  10  720   800  670   1     0
## 138 2.82   8   8   9  610   550  590   2     0
## 139 3.65  10   9  10  620   530  520   1     1
## 140 3.75  10  10  10  770   730  770   1     1
## 141 2.59   6   8   9  590   650  630   1     0
## 142 1.99   9   9   9  540   500  480   2     0
## 143 2.57   6   7   6  580   520  560   1     0
## 144 3.99   9   9  10  680   650  590   1     1
## 145 2.31  10   9  10  590   660  480   1     0
## 146 2.75   9  10   9  580   590  520   1     0
## 147 1.72   8   7   7  520   400  430   2     0
## 148 3.73   9  10  10  630   630  620   2     1
## 149 3.62  10  10   8  640   560  540   1     1
## 150 3.23  10  10  10  640   510  350   1     1
gpa_fit2 = glm(GPA ~ sex, data = gpa)
summary(gpa_fit2)
## 
## Call:
## glm(formula = GPA ~ sex, data = gpa)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.7535  -0.5010   0.1470   0.5717   1.2165  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.6345     0.2017  13.060   <2e-16 ***
## sex           0.1490     0.1366   1.091    0.277    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.6681081)
## 
##     Null deviance: 99.675  on 149  degrees of freedom
## Residual deviance: 98.880  on 148  degrees of freedom
## AIC: 369.17
## 
## Number of Fisher Scoring iterations: 2
broom::tidy(gpa_fit2, conf.int = TRUE) %>% select(term, estimate, conf.low, conf.high)
## # A tibble: 2 × 4
##   term        estimate conf.low conf.high
##   <chr>          <dbl>    <dbl>     <dbl>
## 1 (Intercept)    2.63     2.24      3.03 
## 2 sex            0.149   -0.119     0.417

The p-value (> 0.05) suggests that there is no significant logistic relationship between sex and GPA. The confidence interval of the estimate contains 0.

b)

gpa_fit3 = glm(GPA ~ sex + SATM + SATCR, data = gpa)
summary(gpa_fit3)
## 
## Call:
## glm(formula = GPA ~ sex + SATM + SATCR, data = gpa)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.2196  -0.4862   0.1300   0.5944   1.3042  
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.8293247  0.6723888  -1.233  0.21941    
## sex          0.4388964  0.1386813   3.165  0.00189 ** 
## SATM         0.0044183  0.0011014   4.011  9.6e-05 ***
## SATCR        0.0005309  0.0008686   0.611  0.54197    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.565892)
## 
##     Null deviance: 99.675  on 149  degrees of freedom
## Residual deviance: 82.620  on 146  degrees of freedom
## AIC: 346.22
## 
## Number of Fisher Scoring iterations: 2
broom::tidy(gpa_fit3, conf.int = TRUE) %>% select(term, estimate, conf.low, conf.high)
## # A tibble: 4 × 4
##   term         estimate conf.low conf.high
##   <chr>           <dbl>    <dbl>     <dbl>
## 1 (Intercept) -0.829    -2.15      0.489  
## 2 sex          0.439     0.167     0.711  
## 3 SATM         0.00442   0.00226   0.00658
## 4 SATCR        0.000531 -0.00117   0.00223

The p-values suggest that there is significant evidence that sex and SATM scores have a significant logistic relationship with GPA.

c)

Gender seems to now have a role in GPA unless SATM scores are also taken into consideration. When sex is the only variable int he model, we faile to reject the null hypothesis that coefficient estimate is 0. When Adding SATM scores, there is sufficient evidence that allows us to reject the null and suggests that the variable may play a role in predicting GPA. The confience interval of the estimates for the significant variables are the only ones that do not contain 0.