All good things must end. Here’s the one last homework set on our good ol Admissions dataset. Even though it has only 400 rows, and 4 variables, but Wow! How much fun we had with it!!
Admit = api_rfit("gradAdmit")
Admit$rank_cat <- factor(Admit$rank)
Accept = subset(Admit, admit== '1')
Reject = subset(Admit, admit== '0')
Import the data using the api_rfit()
function as
before, then convert rank to factors. No need to scale gre
and gpa
here.
Although admit
being 0 and 1 should be considered
categorical, none of our calculations below will be affected either
way.
anova_test <- aov(gpa ~ rank_cat, Admit)
summary(anova_test)
## Df Sum Sq Mean Sq F value Pr(>F)
## rank_cat 3 0.9 0.310 2.16 0.092 .
## Residuals 396 56.9 0.144
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
USE \(\alpha = 0.10\). Let
us look at the GPA average among the four ranks, to see if there is a
relationship here. Perform this (one-way) ANOVA test with the entire
dataset. Interpret the results.
Hint: Make sure you use rank_cat
as your \(x\)-variable (categorical), not
rank
(numerical). You can certainly try both ways and see
the difference. In this case, using the categorical \(x\) is the correct way for ANOVA test.
P value of the test is: 0.092
We reject the null hypothesis
NOTE: I have not found good ways to display ANOVA summary results
yet. Just use the standard summary()
function on the
aov()
result object, and turn on the code block to show the
results is fine here.
Accept_anova_test <- aov(gpa ~ rank_cat, Accept)
summary(Accept_anova_test)
## Df Sum Sq Mean Sq F value Pr(>F)
## rank_cat 3 0.3 0.101 0.73 0.54
## Residuals 123 17.0 0.138
Reject_anova_test <- aov(gpa ~ rank_cat, Reject)
summary(Reject_anova_test)
## Df Sum Sq Mean Sq F value Pr(>F)
## rank_cat 3 0.5 0.180 1.27 0.28
## Residuals 269 38.1 0.142
This time we will perform the same ANOVA, but on the admitted
and rejected subgroups separately. Interpret the results.
The P-Value for both accept and reject is larger than the level of
significance of .10, so we fail to reject the null hypothesis.
table <- table(Admit$admit, Admit$rank_cat)
xkabledply(table, title="chi squared")
1 | 2 | 3 | 4 | |
---|---|---|---|---|
0 | 28 | 97 | 93 | 55 |
1 | 33 | 54 | 28 | 12 |
chi_sq = chisq.test(table)
chi_sq
##
## Pearson's Chi-squared test
##
## data: table
## X-squared = 25, df = 3, p-value = 1e-05
Give an example where we should use the chi-square test with
this dataset. Perform the test here and interpret the
results.
We would use it for the rank_cat and admit sets. There is a higher
frequency for rejected in ranks 3 & 4. The P-value is less than our
level of significance so we reject the null hypothesis.
Hint: Remember that for chi-square test, you need to first create the
contingency table like we did in the class example,
then use the chi.test()
function with that contingency
table.
Also notice that whether you use rank
or
rank_cat
here will arrive at the same result, since this is
simple about counting frequencies. Treating them as numerical does not
have any material difference.
correlation <- cor.test(Admit$gpa, Admit$gre, method = 'pearson')
correlation
##
## Pearson's product-moment correlation
##
## data: Admit$gpa and Admit$gre
## t = 8, df = 398, p-value = 2e-15
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.297 0.465
## sample estimates:
## cor
## 0.384
summary(correlation)
## Length Class Mode
## statistic 1 -none- numeric
## parameter 1 -none- numeric
## p.value 1 -none- numeric
## estimate 1 -none- numeric
## null.value 1 -none- numeric
## alternative 1 -none- character
## method 1 -none- character
## data.name 1 -none- character
## conf.int 2 -none- numeric
Give an example where we should use the correlation test
(Pearson’s) with this dataset. Perform the test here and interpret the
results.
We can use the correlation test for gre & gpa. The p-value is less
than the level of significance so we can reject the null hypothesis and
accept the alternative that the true correlation is not equal to 0.
NOTE: Also no good ways to display cor.test()
result
object. Just use the standard summary()
function on the
cor.test result object, and turn on the code block to show the results
is fine here.