To continue to make this project feel cohesive, the previous part can be found here: https://rpubs.com/bekkahmoore/748646
I’m going to keep using the same data, but now I’ll look at it using ANOVA (Analysis of Variance). Since ANOVA has to have more than 2 means to look at, I am kind of limited with what I can do with this data set. But, I’ll make a new category for Oklahoma’s two closest neighboring states and take a look at what our mean looks like against theirs.
Neighbors <-data[which(data$State %in% c("Oklahoma","Texas","Kansas")),]
I’m going to hypothesize that the means are equal between the marriages of the younger age (23-25) between the 3 states.
$$ H_0: \
H_A: \ $$
anova_test <- aov(data$Married.ages.23.25..pct.~data$State, data=Neighbors)
summary(anova_test)
## Df Sum Sq Mean Sq F value Pr(>F)
## data$State 48 3.016 0.06283 11.54 <2e-16 ***
## Residuals 771 4.200 0.00545
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The Pr(>F) value above is the p-value of the F-Statistic. Since it is extremely small, we can reject the null hypothesis that the means between the 3 states are equal. Let’s get a couple visuals to check the conditions of applying ANOVA.
plot(anova_test, 1)
plot(anova_test,2)
## Warning: not plotting observations with leverage one:
## 110, 686, 820
Since the red line in the first plot is sitting mostly in the middle, we can assume that the states have equal variances. The QQ plot also suggests that the data is pretty normal since a majority of the points are sitting on the line. Because these things do check the conditions for applying ANOVA, I think it’s safe to say we can reject the null hypothesis that the means are equal.
I’ll now look at the means of these 3 states, but now look at if being a Religious Private institution affects them. Because of this, we will have 3 separate hypothesis tests running that will answer the following questions:
\[ H_0: \text{The means of younger married couples are equal in OK, TX, and KS}\\ A_0: \text{The means of younger married couples in OK, TX, and KS are not equal}\\ H_1: \text{The means of younger married couples who attend Private Religious universities are equal}\\ A_1: \text{The means of younger married couples who attend Private Religious universities are not equal}\\ H_2: \text{The means of younger married couples in OK, TX, and KS who attend Private Religious universities are equal}\\ A_2: \text{The means of younger married couples in OK, TX, and KS who attend Private Religious universities are not equal}\\ \]
I don’t know that I did the correct notation for that, but let’s run Two Way ANOVA and take a look:
anova_test2 <- aov(data$Married.ages.23.25..pct.~data$State*data$Religious.private., data=Neighbors)
summary(anova_test2)
## Df Sum Sq Mean Sq F value Pr(>F)
## data$State 48 3.016 0.06283 12.788 < 2e-16 ***
## data$Religious.private. 1 0.315 0.31482 64.078 4.72e-15 ***
## data$State:data$Religious.private. 41 0.303 0.00740 1.505 0.0238 *
## Residuals 729 3.582 0.00491
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-values for all 3 of those hypotheses tested are very small (<0.05) so we can reject each null hypothesis that any means are equal in the 3 categories. I think it’s important to note that the p-value for being in OK, TX, or KS while also attending a Private Religious university is 0.0238. While it’s still in the rejection region, it is noticeably larger than state or religious private by themselves. What do we make of this? I’m not sure, since I’m still rejecting the null hypothesis. But I thought that it would be neat enough to mention!
Let’s make sure the conditions for ANOVA were met here as well with the same visuals from earlier.
plot(anova_test2, 1)
plot(anova_test2,2)
## Warning: not plotting observations with leverage one:
## 24, 89, 110, 155, 156, 174, 309, 314, 345, 384, 389, 438, 686, 751, 820
Just like before, the plots align with conditions we needed to meet to run ANOVA. So I will confidently reject this null hypothesis too.
Since we were able to reject that the means were equal between the states in my data set, I am curious to see what those means actually are.
My data set is in Excel, so I just went and calculated the means by filtering by state: OK = 0.26327 TX = 0.18569 KS = 0.66003
These means are definitely not equal, but I am kind of surprised that Kansas is the highest! Let me see what it would be including the Private Religious university filter:
OK = 0.22973 TX = 0.20013 KS = 0.69344
So by these results, we can see that Kansas is the place to be if you want to be married between the ages of 23-25! I would have figured that the Private Religious means would have been higher across the board, since this whole project is centered around that Ring by Spring myth that surrounds religious universities, but Oklahoma’s religious mean was lower. I would have liked to calculate those numbers here in r as well, but I couldn’t quite figure out how to filter the data set correctly. Anyway, pretty neat!