The rule for examining standard deviations in ANOVA states that is the largest standard deviation is less than twice the smallest standard deviation, we can use methods based on the assumption of equal standard deviations, and our results will still be approximately correct. Int his case, the ratio of the largest to smallest standrard deviation is \(4.8/2.6 = 1.846 < 2\); therefore, we can perform ANOVA as it meets the assumptions.
\(V_1 = S_1^2 = 2.7^2 = 7.29\)
\(V_2 = 2.62^2 = 6.76\)
\(V_3 = 4.8^2 = 23.04\)
Pooled Variance
\(s_p^2 = \frac{(n_1-1)s_1^2 + (n_2-1)s_2^2 + (n_3-1)s_3^2}{(n_1-1)+ (n_2-1)+ (n_3-1)} = \frac{27\times7.29 + 32\times6.76 + 101\times23.04}{(27+ 32+ 101)} = \frac{2740.19}{160} = 17.1262\)
Pooled Standard Deviation \(s_p = \sqrt{s_p^2} = \sqrt{17.1262} =\)
sqrt(17.1262)
## [1] 4.138381
The pooled standard deviation is closer to the standard deviation for the third group, which has the largest sample size, because pooling weights individual standard deviations by sample size. Since the third group had the largest, it had the highest weight, so skewed the pooled value significant toward that standard deviation.
placebo_xbar = 11.80
lowA_xbar = 15.25
highA_xbar = 18.55
lowB_xbar = 16.15
highB_xbar = 17.10
barplot(c(placebo_xbar, lowA_xbar, highA_xbar, lowB_xbar, highB_xbar), names.arg = c("placebo", "Low A", "High A", "Low B", "High B"),
col = c("#eb8060", "#b9e38d", "#b9e38d", "#a1e9f0", "#a1e9f0"))
There does appear to be a difference in the activity between the groups. The placebo as the lowest activity level and High A has the highest activity level. Low A and Low Bshow about the same activity level where High A and High B are show an increase compared to their low doses.
The rule for examining standard deviations in ANOVA states that is the largest standard deviation is less than twice the smallest standard deviation, we can use methods based on the assumption of equal standard deviations, and our results will still be approximately correct. In this case, the ratio of the largest to smallest standard deviation is \(\frac{\sqrt{17.20}}{\sqrt{7.75}} = 1.489 < 2\); therefore, we can perform ANOVA as it meets the assumptions.
$s_p^2 = = $
\(s_p^2 = \frac{4*17.20 + 4*13.10 + 4*10.25 + 4*7.75 + 4*12.50}{4*5} = 243.2 / 20 = 12.16\)
\(s_p = \sqrt{s_p^2} = \sqrt{12.16} = 3.487\)
DF_numerator = I-1 = 5 - 1 = 4
DF_denominator = N-I = 25 - 5 = 20
P-value is between 0.05 and 0.10; there may not be significant evidence when \(\alpha = 0.05\) that the means between the groups are not all the same, therefore, it’s probbaly safe to not reject the nul lhypothesis in this case that the means are equal.
$_1 = _2 - _1 - _2 $
\(\psi_2 = -\mu_3 + \frac{1}{3}\mu_1 + \frac{1}{3}\mu_2 + \frac{1}{2}\mu_4\)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# eyes = EX12_36EYES
eyes = read.csv("~/Library/Mobile Documents/com~apple~CloudDocs/STAT500/Homework/HW5/EX12-36EYES.csv")
blue = filter(eyes, Group == "Blue")
brown = filter(eyes, Group == "Brown")
gaze_down = filter(eyes, Group == "Down")
green = filter(eyes, Group == "Green")
dim(blue)
## [1] 67 3
n_blue = 67
n_brown = 37
n_gaze_down = 41
n_green = 77
var_blue = var(blue$Score)
var_brown = var(brown$Score)
var_gaze_down = var(gaze_down$Score)
var_green = var(green$Score)
Sp2_eyes = (n_blue*var_blue + n_brown*var_brown + n_gaze_down*var_gaze_down + n_green*var_green) /( (n_blue - 1)+(n_brown-1) + (n_gaze_down - 1)+(n_green - 1))
Sp_eyes = sqrt(Sp2_eyes)
Sp_eyes
## [1] 1.692214
xbar_blue = mean(blue$Score)
xbar_brown = mean(brown$Score)
xbar_gaze_down = mean(gaze_down$Score)
xbar_green = mean(green$Score)
\(c_1 = \Sigma a_i \bar{x}_i = (1)(3.72) - (1/2)(3.19) - (1/2)(3.86) = 0.195\)
\(c_2 = \Sigma a_i \bar{x}_i = (-1)(3.11) + (1/3)(3.19) + (1/3)(3.72) + (1/3)(3.86) = 0.450\)
c1_eyes = 0.195
c2_eyes = 0.450
\(SE_{c1} = S_p \sqrt{\Sigma \frac{a_i^2}{n_i}}= 1.69 \sqrt{(.5^2/67) + (1^2/37) + (-.5^2/77)} = 0.2753\)
\(SE_{c2} = S_p \sqrt{\Sigma \frac{a_i^2}{n_i}}= 1.69 \sqrt{(-1^2/41) + (.33^2/67) + (.33^2/37) + (.33^2/77)} = 0.294\)
SEc1_eyes = 0.2753
SEc2_eyes = 0.294
t1_eyes = c1_eyes / SEc1_eyes
t1_eyes
## [1] 0.7083182
t2_eyes = c2_eyes / SEc2_eyes
t2_eyes
## [1] 1.530612
t_ratio_eyes = t1_eyes/t2_eyes
0.20 < P1 < 0.25
0.05 < P2 < 0.10
Neither contrast shows a significant difference between the groups but contrast 2 comes close to significance.
CI_1 = \(c_1 \pm t*SE_{c1} = 0.195 \pm 0.708*0.2753 = (1e-4, 0.3899)\)
CI_2 = \(c_2 \pm t*SE_{c2} = 0.450 \pm 1.530*0.294 = (2e-4, 0.8998)\)
library(lsmeans)
## Loading required package: emmeans
## The 'lsmeans' package is now basically a front end for 'emmeans'.
## Users are encouraged to switch the rest of the way.
## See help('transition') for more information, including how to
## convert old 'lsmeans' objects and scripts to work with 'emmeans'.
model_eyes = lm(eyes$Score ~ eyes$Group,
data = eyes)
leastsquare_eyes = lsmeans(model_eyes, "Group")
Contrasts_eyes = list(C1 = c(-.5, 1, 0, -.5),
C2 = c(1/3, 1/3, -1, 1/3))
contrast(leastsquare_eyes, Contrasts_eyes, adjust="sidak")
## contrast estimate SE df t.ratio p.value
## C1 0.197 0.309 218 0.638 0.7733
## C2 0.485 0.293 218 1.657 0.1880
##
## P value adjustment: sidak method for 2 tests
c1_eyes_s_lo = 0.197 - 0.638*0.309
c1_eyes_s_hi = 0.197 + 0.638*0.309
CI_1 = -0.000142, 0.394142
contains 0.
c1_eyes_s_lo = 0.485 - 1.657*0.293
c2_eyes_s_hi = 0.485 + 1.657*0.293
CI_2 = -0.000501, 0.970501
Contains 0.
With software, both confidence intervals contains 0, confirming that there is not enough evidence to confrm a true difference between the groups in either contrast.
dan = read.csv("EX12-72DANDRUFF.csv")
pyr1 = filter(dan, Treatment == "PyrI")
pyr2 = filter(dan, Treatment == "PyrII")
keto = filter(dan, Treatment == "Keto")
plac = filter(dan, Treatment == "Placebo")
xbar_pyr1 = mean(pyr1$Flaking)
xbar_pyr2 = mean(pyr2$Flaking)
xbar_keto = mean(keto$Flaking)
xbar_plac = mean(plac$Flaking)
print(c(xbar_pyr1, xbar_pyr2, xbar_keto, xbar_plac))
## [1] 17.39286 17.20183 16.02830 29.39286
sd_pyr1 = sd(pyr1$Flaking)
sd_pyr2 = sd(pyr2$Flaking)
sd_keto = sd(keto$Flaking)
sd_plac = sd(plac$Flaking)
print(c(sd_pyr1, sd_pyr2, sd_keto, sd_plac))
## [1] 1.1418110 1.3524999 0.9305149 1.5948827
std <- function(x) sd(x)/sqrt(length(x))
se_pyr1 = std(pyr1$Flaking)
se_pyr2 = std(pyr2$Flaking)
se_keto = std(keto$Flaking)
se_plac = std(plac$Flaking)
print(c(se_pyr1, se_pyr2, se_keto, se_plac))
## [1] 0.1078910 0.1295460 0.0903796 0.3014045
dim(plac)
## [1] 28 3
means_dan = c(xbar_pyr1, xbar_pyr2, xbar_keto, xbar_plac)
sd_dan = c(sd_pyr1, sd_pyr2, sd_keto, sd_plac)
se_dan = c(se_pyr1, se_pyr2, se_keto, se_plac)
n_dan = c(112, 109, 106, 28)
dan_table = data.frame(means_dan, sd_dan, se_dan, n_dan)
dan_table
## means_dan sd_dan se_dan n_dan
## 1 17.39286 1.1418110 0.1078910 112
## 2 17.20183 1.3524999 0.1295460 109
## 3 16.02830 0.9305149 0.0903796 106
## 4 29.39286 1.5948827 0.3014045 28
graph of means
barplot(dan_table$means_dan, names.arg = c("PyrI", "PyrII", "Keto", "Placebo"),
col = rainbow(4))
ANOVA will have 3 degress of freedom in the numerator and 351 in the denominator, and the visualization fo the data suggest that the treatment groups will almost certainly demonstrate a dignificant difference from the placebo.
\(H_0 = \mu_1 = \mu_2 = \mu_3 = \mu_4\)
\(H_0\) = All \(\mu_i\)s are not equal
head(dan)
## OBS Treatment Flaking
## 1 1 PyrI 17
## 2 2 PyrI 16
## 3 3 PyrI 18
## 4 4 PyrI 17
## 5 5 PyrI 18
## 6 6 PyrI 16
res.aov_dan <- aov(Flaking ~ Treatment, data = dan)
summary(res.aov_dan)
## Df Sum Sq Mean Sq F value Pr(>F)
## Treatment 3 4151 1383.8 967.8 <2e-16 ***
## Residuals 351 502 1.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
F = 967.82 DF numberator = 3 DF denominator = 351 p-value = <<< 0.01 = 2e-16
We can reject the null hypothesis and conclude that the mean of at least one treatment group is different from the rest.
PyrI: μ1 PyrII: μ2 Keto: μ3 Placebo: μ4
\(\psi_1 = 1/3\mu_1 + 1/3\mu_2 + 1/3 \mu_3 - \mu_4\)
\(\psi_2 = 1/2\mu_1 + 1/2\mu_2 - \mu_3\)
\(\psi_3 = \mu_1 - \mu_2\)
##b)
library(lsmeans)
# View(dan)
model_dan = lm(Flaking ~ Treatment,
data = dan)
leastsquare_dan = lsmeans(model_dan, "Treatment")
Contrasts_dan = list(C1 = c(1/3, 1/3, 1/3, -1),
C2 = c(.5, .5, -1, 0),
C3 = c(1, -1, 0, 0))
contrast(leastsquare_dan, Contrasts_dan, adjust="sidak")
## contrast estimate SE df t.ratio p.value
## C1 3.74 0.147 351 25.358 <.0001
## C2 5.32 0.170 351 31.278 <.0001
## C3 -13.36 0.254 351 -52.601 <.0001
##
## P value adjustment: sidak method for 3 tests
$H_0: all means are equal $
\(H_a: all means are not equal\)
If done correctly, ,the test performed shows that all contrasts are significant in thast at least one mean is not equal within the treatments. I am doubtful this test was performed correctly as there does not appear to be significant different between the non-placebo groups, and so I don’t believe contrast 3 is representative of what is observed.
friends = read.csv("EX12-79FRIENDS.csv")
friends_fit = lm(Score ~ Friends,
data = friends)
summary(friends_fit)
##
## Call:
## lm(formula = Score ~ Friends, data = friends)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.3602 -0.7756 0.2193 0.7988 2.6398
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.4320879 0.2060299 21.512 <2e-16 ***
## Friends -0.0001023 0.0003694 -0.277 0.782
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.15 on 132 degrees of freedom
## Multiple R-squared: 0.000581, Adjusted R-squared: -0.00699
## F-statistic: 0.07674 on 1 and 132 DF, p-value: 0.7822
resid(friends_fit)
## 1 2 3 4 5 6
## -0.62164958 -0.82164958 -1.22164958 -2.02164958 0.37835042 -1.42164958
## 7 8 9 10 11 12
## -0.22164958 -0.82164958 -1.22164958 -1.42164958 0.37835042 -1.02164958
## 13 14 15 16 17 18
## 0.37835042 -1.42164958 0.17835042 -1.62164958 1.57835042 -1.62164958
## 19 20 21 22 23 24
## 0.77835042 -1.22164958 -0.22164958 -2.22164958 0.57835042 0.37835042
## 25 26 27 28 29 30
## 0.59881777 0.79881777 1.19881777 -1.80118223 -0.60118223 0.39881777
## 31 32 33 34 35 36
## 1.19881777 0.39881777 1.99881777 0.39881777 -0.00118223 1.59881777
## 37 38 39 40 41 42
## -0.60118223 0.39881777 0.19881777 1.59881777 0.59881777 -1.40118223
## 43 44 45 46 47 48
## -0.00118223 0.99881777 0.99881777 0.19881777 1.19881777 1.39881777
## 49 50 51 52 53 54
## -0.20118223 0.39881777 0.59881777 0.79881777 -0.20118223 0.59881777
## 55 56 57 58 59 60
## 1.39881777 1.19881777 -0.60118223 0.21928512 -0.38071488 0.41928512
## 61 62 63 64 65 66
## -1.38071488 -2.38071488 1.41928512 1.21928512 0.01928512 0.01928512
## 67 68 69 70 71 72
## 1.21928512 0.21928512 1.21928512 -1.38071488 1.21928512 -0.78071488
## 73 74 75 76 77 78
## 2.41928512 -1.18071488 0.41928512 0.21928512 1.01928512 0.41928512
## 79 80 81 82 83 84
## 0.41928512 1.01928512 -0.78071488 0.41928512 -0.58071488 -1.16024753
## 85 86 87 88 89 90
## -0.76024753 1.43975247 -3.16024753 -0.56024753 1.03975247 -0.76024753
## 91 92 93 94 95 96
## -0.96024753 0.63975247 0.83975247 -0.76024753 -1.76024753 2.63975247
## 97 98 99 100 101 102
## 0.03975247 0.43975247 0.83975247 1.03975247 -0.76024753 -3.36024753
## 103 104 105 106 107 108
## 0.63975247 0.63975247 1.63975247 -0.16024753 1.43975247 -1.16024753
## 109 110 111 112 113 114
## 1.03975247 2.03975247 0.03975247 -1.36024753 1.63975247 -0.13978018
## 115 116 117 118 119 120
## 0.26021982 -1.33978018 -1.73978018 0.86021982 0.86021982 -2.73978018
## 121 122 123 124 125 126
## 0.66021982 0.06021982 0.66021982 -0.73978018 -0.13978018 0.66021982
## 127 128 129 130 131 132
## -0.93978018 -0.73978018 0.66021982 -1.13978018 -1.93978018 0.46021982
## 133 134
## -0.73978018 -0.13978018
plot(friends$Friends, resid(friends_fit)); abline(0, 0)
The p-value for this simple linear model fitted on this data is 0.782, which is not significant suggests there is not enough evidence that there is a linear relationship between these features. Both correlation coefficients were extremely low as well.
This suggests that a linear model may not be the best to explain the relationship between these discrete features, ANOVA may be better.
gpa = read.csv("EX14-048GPAHI.csv", stringsAsFactors = T)
# gpa_fit = glm(as.factor(GPA) ~ SATM + SATCR, data = gpa, family = binomial(link = "logit") )
gpa_fit = glm(GPA ~ SATM + SATCR, data = gpa)
summary(gpa_fit)
##
## Call:
## glm(formula = GPA ~ SATM + SATCR, data = gpa)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.4158 -0.5924 0.2029 0.5599 1.1655
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.4665253 0.5494504 0.849 0.39722
## SATM 0.0030290 0.0010407 2.911 0.00417 **
## SATCR 0.0008482 0.0008888 0.954 0.34149
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.6005995)
##
## Null deviance: 99.675 on 149 degrees of freedom
## Residual deviance: 88.288 on 147 degrees of freedom
## AIC: 354.18
##
## Number of Fisher Scoring iterations: 2
SATM shows a significant logistic relationship to GPA, but SATCR does not.
# install.packages("broom")
broom::tidy(gpa_fit, conf.int = TRUE) %>% select(term, estimate, conf.low, conf.high)
## # A tibble: 3 × 4
## term estimate conf.low conf.high
## <chr> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.467 -0.610 1.54
## 2 SATM 0.00303 0.000989 0.00507
## 3 SATCR 0.000848 -0.000894 0.00259
SATM estimate: 0.003 with CI (0.0009893085, 0.005068713 ) SATCR estimate: 0.00084 with CI (-0.0008938451, 0.002590314 ), which contains 0.
SATM shows a significant logistic relationship to GPA, but SATCR does not. The confidence interval for the coefficient estimate does not contian 0, therfore, the significant logistic relationship is supported.
gpa
## GPA HSM HSS HSE SATM SATCR SATW sex HIGPA
## 1 3.84 10 10 10 630 570 590 2 1
## 2 3.97 10 10 10 750 700 630 1 1
## 3 3.49 8 10 9 570 510 490 2 1
## 4 1.95 6 4 8 640 600 610 1 0
## 5 2.59 8 10 9 510 490 490 2 0
## 6 3.00 7 10 10 660 680 630 1 1
## 7 1.78 9 9 9 630 490 510 1 0
## 8 2.41 6 6 7 670 620 610 1 0
## 9 2.83 9 8 10 550 570 540 2 0
## 10 0.60 7 7 9 640 720 630 1 0
## 11 3.98 10 10 10 630 560 540 2 1
## 12 1.52 8 8 7 650 550 460 1 0
## 13 2.82 10 10 9 610 540 490 2 0
## 14 3.09 10 10 10 670 330 540 1 1
## 15 3.79 9 10 10 550 580 560 2 1
## 16 3.20 9 10 9 580 490 420 2 1
## 17 3.62 10 10 10 720 620 570 1 1
## 18 1.74 10 9 6 780 540 620 1 0
## 19 2.02 7 5 6 660 520 600 1 0
## 20 2.29 10 10 10 690 750 680 1 0
## 21 3.13 10 10 9 660 620 590 2 1
## 22 2.96 8 9 6 660 670 750 1 0
## 23 4.00 10 10 10 640 620 620 2 1
## 24 2.89 8 7 8 620 610 570 1 0
## 25 3.95 10 10 10 770 780 760 1 1
## 26 1.71 10 10 8 800 750 650 1 0
## 27 3.23 8 8 8 650 590 480 1 1
## 28 2.07 8 8 8 570 560 560 1 0
## 29 1.29 2 6 8 480 490 650 2 0
## 30 4.00 10 10 10 630 630 620 2 1
## 31 2.94 9 10 10 490 510 490 1 0
## 32 3.89 10 10 10 680 730 740 1 1
## 33 3.34 7 7 6 700 420 460 1 1
## 34 3.52 9 10 8 740 740 620 1 1
## 35 3.75 10 10 10 650 690 670 1 1
## 36 3.55 6 7 5 630 420 400 1 1
## 37 3.46 8 9 9 590 600 560 2 1
## 38 2.02 7 9 8 690 690 640 1 0
## 39 1.56 8 5 6 650 600 510 1 0
## 40 3.97 10 10 10 730 660 660 1 1
## 41 1.36 8 10 10 590 560 510 1 0
## 42 4.00 10 10 10 630 580 490 1 1
## 43 3.31 10 10 9 600 570 510 1 1
## 44 2.28 8 9 8 590 400 480 1 0
## 45 3.65 8 9 9 680 650 650 1 1
## 46 2.34 10 10 9 660 530 610 1 0
## 47 2.06 8 9 9 650 550 570 2 0
## 48 3.25 9 7 8 640 540 540 1 1
## 49 3.45 10 10 10 600 570 530 1 1
## 50 2.31 9 7 7 580 530 510 1 0
## 51 4.00 10 10 9 630 570 630 1 1
## 52 2.50 10 9 7 620 550 430 1 0
## 53 3.08 8 8 8 620 560 490 2 1
## 54 3.38 10 9 10 670 690 550 1 1
## 55 2.69 8 6 6 550 510 460 1 0
## 56 3.64 9 9 9 600 590 590 2 1
## 57 3.26 10 10 10 510 450 470 2 1
## 58 1.49 8 9 9 500 540 530 2 0
## 59 2.93 8 4 4 550 480 470 1 0
## 60 2.92 7 9 8 740 600 640 1 0
## 61 3.99 10 10 10 750 610 640 1 1
## 62 3.27 8 10 9 480 450 560 2 1
## 63 3.05 9 9 10 700 560 550 1 1
## 64 3.36 8 10 9 520 490 470 2 1
## 65 0.03 5 7 8 460 450 500 1 0
## 66 2.57 7 9 8 520 550 570 2 0
## 67 3.33 7 8 7 610 450 480 2 1
## 68 3.06 8 9 10 630 560 650 2 1
## 69 2.39 6 8 9 620 530 480 1 0
## 70 2.21 8 10 10 500 510 590 2 0
## 71 2.99 10 10 10 580 480 530 2 0
## 72 4.00 10 10 10 760 650 630 1 1
## 73 1.20 8 7 7 520 480 560 1 0
## 74 3.28 9 10 9 610 540 460 1 1
## 75 3.87 10 10 10 690 580 570 1 1
## 76 2.52 8 8 8 510 480 460 2 0
## 77 3.32 9 9 9 580 490 480 1 1
## 78 1.02 9 9 7 560 560 560 2 0
## 79 2.91 6 9 10 580 700 640 2 0
## 80 2.14 10 10 10 700 650 640 1 0
## 81 2.50 10 10 10 520 480 440 1 0
## 82 3.36 10 9 10 640 580 630 1 1
## 83 3.51 7 9 8 650 640 640 1 1
## 84 2.36 6 5 8 540 520 520 2 0
## 85 1.87 6 7 8 700 580 560 1 0
## 86 3.45 10 10 10 770 760 730 1 1
## 87 2.96 8 7 9 500 540 610 2 0
## 88 3.24 6 7 8 660 640 610 2 1
## 89 3.32 9 9 10 730 640 670 1 1
## 90 3.71 10 10 10 710 760 660 2 1
## 91 3.18 9 10 10 620 620 550 2 1
## 92 3.59 10 9 10 690 580 560 1 1
## 93 2.93 8 9 9 490 530 550 2 0
## 94 3.93 9 10 10 690 740 670 2 1
## 95 1.41 8 8 9 690 410 460 1 0
## 96 1.90 6 7 7 540 720 650 1 0
## 97 3.45 10 10 9 640 670 600 2 1
## 98 3.06 9 10 9 590 450 460 1 1
## 99 1.85 8 8 8 570 520 520 1 0
## 100 3.13 9 10 10 550 530 520 1 1
## 101 1.81 7 7 7 550 510 490 1 0
## 102 2.38 9 6 8 640 580 640 2 0
## 103 2.45 9 10 9 720 670 700 1 0
## 104 3.19 6 7 8 540 510 490 2 1
## 105 2.23 7 7 8 690 620 570 1 0
## 106 1.83 5 7 10 560 550 560 2 0
## 107 3.38 7 8 9 630 640 530 2 1
## 108 3.43 9 8 9 670 600 590 1 1
## 109 2.74 9 7 7 780 610 680 1 0
## 110 4.00 10 10 10 710 600 630 2 1
## 111 2.93 8 8 10 610 480 440 1 0
## 112 1.68 7 8 8 650 530 450 1 0
## 113 3.71 9 10 9 620 500 520 2 1
## 114 1.72 7 8 9 530 610 540 2 0
## 115 1.63 10 9 9 540 600 560 2 0
## 116 0.85 10 9 9 560 520 470 1 0
## 117 2.94 8 10 10 630 700 580 1 0
## 118 3.37 10 9 9 560 460 480 1 1
## 119 3.15 8 8 7 690 670 670 1 1
## 120 2.96 10 10 10 550 560 490 2 0
## 121 3.48 9 9 9 710 660 610 2 1
## 122 2.05 10 10 10 670 550 620 2 0
## 123 1.66 10 9 10 580 480 470 2 0
## 124 3.12 10 9 9 530 480 480 2 1
## 125 2.78 9 9 8 520 490 520 2 0
## 126 3.33 10 9 10 650 580 480 2 1
## 127 2.57 6 7 7 560 550 540 1 0
## 128 3.10 8 9 9 710 600 730 1 1
## 129 2.30 8 9 10 630 540 500 1 0
## 130 2.74 10 10 8 570 450 590 1 0
## 131 2.19 10 10 10 700 530 560 1 0
## 132 3.36 10 10 10 690 580 570 2 1
## 133 3.03 10 10 10 630 540 580 2 1
## 134 3.49 8 9 8 600 660 620 2 1
## 135 3.88 10 10 10 740 640 690 1 1
## 136 2.71 9 8 9 510 430 500 2 0
## 137 2.55 10 10 10 720 800 670 1 0
## 138 2.82 8 8 9 610 550 590 2 0
## 139 3.65 10 9 10 620 530 520 1 1
## 140 3.75 10 10 10 770 730 770 1 1
## 141 2.59 6 8 9 590 650 630 1 0
## 142 1.99 9 9 9 540 500 480 2 0
## 143 2.57 6 7 6 580 520 560 1 0
## 144 3.99 9 9 10 680 650 590 1 1
## 145 2.31 10 9 10 590 660 480 1 0
## 146 2.75 9 10 9 580 590 520 1 0
## 147 1.72 8 7 7 520 400 430 2 0
## 148 3.73 9 10 10 630 630 620 2 1
## 149 3.62 10 10 8 640 560 540 1 1
## 150 3.23 10 10 10 640 510 350 1 1
gpa_fit2 = glm(GPA ~ sex, data = gpa)
summary(gpa_fit2)
##
## Call:
## glm(formula = GPA ~ sex, data = gpa)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.7535 -0.5010 0.1470 0.5717 1.2165
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.6345 0.2017 13.060 <2e-16 ***
## sex 0.1490 0.1366 1.091 0.277
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.6681081)
##
## Null deviance: 99.675 on 149 degrees of freedom
## Residual deviance: 98.880 on 148 degrees of freedom
## AIC: 369.17
##
## Number of Fisher Scoring iterations: 2
broom::tidy(gpa_fit2, conf.int = TRUE) %>% select(term, estimate, conf.low, conf.high)
## # A tibble: 2 × 4
## term estimate conf.low conf.high
## <chr> <dbl> <dbl> <dbl>
## 1 (Intercept) 2.63 2.24 3.03
## 2 sex 0.149 -0.119 0.417
The p-value (> 0.05) suggests that there is no significant logistic relationship between sex and GPA. The confidence interval of the estimate contains 0.
gpa_fit3 = glm(GPA ~ sex + SATM + SATCR, data = gpa)
summary(gpa_fit3)
##
## Call:
## glm(formula = GPA ~ sex + SATM + SATCR, data = gpa)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.2196 -0.4862 0.1300 0.5944 1.3042
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.8293247 0.6723888 -1.233 0.21941
## sex 0.4388964 0.1386813 3.165 0.00189 **
## SATM 0.0044183 0.0011014 4.011 9.6e-05 ***
## SATCR 0.0005309 0.0008686 0.611 0.54197
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.565892)
##
## Null deviance: 99.675 on 149 degrees of freedom
## Residual deviance: 82.620 on 146 degrees of freedom
## AIC: 346.22
##
## Number of Fisher Scoring iterations: 2
broom::tidy(gpa_fit3, conf.int = TRUE) %>% select(term, estimate, conf.low, conf.high)
## # A tibble: 4 × 4
## term estimate conf.low conf.high
## <chr> <dbl> <dbl> <dbl>
## 1 (Intercept) -0.829 -2.15 0.489
## 2 sex 0.439 0.167 0.711
## 3 SATM 0.00442 0.00226 0.00658
## 4 SATCR 0.000531 -0.00117 0.00223
The p-values suggest that there is significant evidence that sex and SATM scores have a significant logistic relationship with GPA.
Gender seems to now have a role in GPA unless SATM scores are also taken into consideration. When sex is the only variable int he model, we faile to reject the null hypothesis that coefficient estimate is 0. When Adding SATM scores, there is sufficient evidence that allows us to reject the null and suggests that the variable may play a role in predicting GPA. The confience interval of the estimates for the significant variables are the only ones that do not contain 0.