This is a continuation of this project https://rpubs.com/bekkahmoore/755806. This will be the final part!
This test will be very similar to the ANOVA test I ran in part 4 of the project, but this time I will be looking at the median between the 3 neighboring states. I’m going to add that code here again to define “Neighbors”:
Neighbors <-data[which(data$State %in% c("Oklahoma","Texas","Kansas")),]
I’m going to hypothesize that the median of the older married couples (ages 32-34) is equal between OK, KS, and TX.
\[ H_0: \text{The medians of older married couples in OK, TX, and KS are equal}\\ \]
\[ H_A: \text{The medians of older married couples in OK, TX, and KS are NOT equal}\\ \] Let’s run the test.
model = kruskal.test(data$Married.ages.32.34..pct.~data$State, data = Neighbors)
model
##
## Kruskal-Wallis rank sum test
##
## data: data$Married.ages.32.34..pct. by data$State
## Kruskal-Wallis chi-squared = 252.02, df = 48, p-value < 2.2e-16
ggplot(data = Neighbors, aes(x= data$State, y = data$Married.ages.32.34..pct.))+
geom_boxplot()
That p-value is teeny tiny, so we can reject the null hypothesis that the medians are equal between the states. Since I did the Kruskal test, there wasn’t a ton of difference between it and the ANOVA method. They are both non-parametric, so I can’t compare the two to any parametric methods.
I’ll now look at the correlation between the older married couples and younger married couples using the Spearman Rank method. I did this back in the Regression project, but we’ll take a look again here:
\[ H_0: \text{The spearman correlation between the group aged (32-34) and the group aged (23-25) is zero}\\ \] \[ H_A: \text{The spearman correlation between the group aged (32-34) and the group aged (23-25) is NOT zero}\\ \] Let’s run it
cor.test(data$Married.ages.32.34..pct., data$Married.ages.23.25..pct., method = "spearman")
##
## Spearman's rank correlation rho
##
## data: data$Married.ages.32.34..pct. and data$Married.ages.23.25..pct.
## S = 39160198, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.5738571
I think if I’m looking at those results right, there is a correlation. Because of this, we can reject the null hypothesis that the spearman correlation was 0.
ggplot(data = Neighbors, aes(data$State, data$Married.ages.32.34..pct., color = 'red'))+
geom_point()
As much as I tried to limit the visuals to the Neighbors data (OK, TX, and KS states only), I couldn’t figure it out. But they were still pretty neat! Viewing all of the states’ data feels a little overwhelming.
This project has been very fun, and very informative. I have learned a lot about R while investigating the myth behind “Ring by Spring” at private religious universities.