HW - Anova, Chi-square, and correlation tests

All good things must end. Here’s the one last homework set on our good ol Admissions dataset. Even though it has only 400 rows, and 4 variables, but Wow! How much fun we had with it!!

Admit = api_rfit("gradAdmit")
Admit$rank_cat <- factor(Admit$rank)
Accept = subset(Admit, admit== '1')
Reject = subset(Admit, admit== '0')

Q1. Import

Import the data using the api_rfit() function as before, then convert rank to factors. No need to scale gre and gpa here.
Although admit being 0 and 1 should be considered categorical, none of our calculations below will be affected either way.

Q2. ANOVA test (entire dataset)

anova_test <- aov(gpa ~ rank_cat, Admit)
summary(anova_test)

##              Df Sum Sq Mean Sq F value Pr(>F)  
## rank_cat      3    0.9   0.310    2.16  0.092 .
## Residuals   396   56.9   0.144                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

USE \(\alpha = 0.10\). Let us look at the GPA average among the four ranks, to see if there is a relationship here. Perform this (one-way) ANOVA test with the entire dataset. Interpret the results.
Hint: Make sure you use rank_cat as your \(x\)-variable (categorical), not rank (numerical). You can certainly try both ways and see the difference. In this case, using the categorical \(x\) is the correct way for ANOVA test.

What is the p-value of the test?

P value of the test is: 0.092

What is the conclusion you make from this (assuming \(\alpha = 0.10\))?

We reject the null hypothesis

NOTE: I have not found good ways to display ANOVA summary results yet. Just use the standard summary() function on the aov() result object, and turn on the code block to show the results is fine here.

Q3. ANOVA test (admitted and rejected subgroups)

Accept_anova_test <- aov(gpa ~ rank_cat, Accept)
summary(Accept_anova_test)

##              Df Sum Sq Mean Sq F value Pr(>F)
## rank_cat      3    0.3   0.101    0.73   0.54
## Residuals   123   17.0   0.138

Reject_anova_test <- aov(gpa ~ rank_cat, Reject)
summary(Reject_anova_test)

##              Df Sum Sq Mean Sq F value Pr(>F)
## rank_cat      3    0.5   0.180    1.27   0.28
## Residuals   269   38.1   0.142

This time we will perform the same ANOVA, but on the admitted and rejected subgroups separately. Interpret the results.
The P-Value for both accept and reject is larger than the level of significance of .10, so we fail to reject the null hypothesis.

Q4. Chi-square test

table <- table(Admit$admit, Admit$rank_cat)
xkabledply(table, title="chi squared")

chi squared
	1	2	3	4
0	28	97	93	55
1	33	54	28	12

chi_sq = chisq.test(table)
chi_sq

## 
##  Pearson's Chi-squared test
## 
## data:  table
## X-squared = 25, df = 3, p-value = 1e-05

Give an example where we should use the chi-square test with this dataset. Perform the test here and interpret the results.
We would use it for the rank_cat and admit sets. There is a higher frequency for rejected in ranks 3 & 4. The P-value is less than our level of significance so we reject the null hypothesis.

Hint: Remember that for chi-square test, you need to first create the contingency table like we did in the class example, then use the chi.test() function with that contingency table.
Also notice that whether you use rank or rank_cat here will arrive at the same result, since this is simple about counting frequencies. Treating them as numerical does not have any material difference.

Q5. Correlation test

correlation <- cor.test(Admit$gpa, Admit$gre, method = 'pearson')
correlation

## 
##  Pearson's product-moment correlation
## 
## data:  Admit$gpa and Admit$gre
## t = 8, df = 398, p-value = 2e-15
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.297 0.465
## sample estimates:
##   cor 
## 0.384

summary(correlation)

##             Length Class  Mode     
## statistic   1      -none- numeric  
## parameter   1      -none- numeric  
## p.value     1      -none- numeric  
## estimate    1      -none- numeric  
## null.value  1      -none- numeric  
## alternative 1      -none- character
## method      1      -none- character
## data.name   1      -none- character
## conf.int    2      -none- numeric

Give an example where we should use the correlation test (Pearson’s) with this dataset. Perform the test here and interpret the results.
We can use the correlation test for gre & gpa. The p-value is less than the level of significance so we can reject the null hypothesis and accept the alternative that the true correlation is not equal to 0.

NOTE: Also no good ways to display cor.test() result object. Just use the standard summary() function on the cor.test result object, and turn on the code block to show the results is fine here.

Assignment Anova, Chi-square, and correlation tests

GWU DATS 1001 Data Science for ALL - Edwin Lo

2024-10-08