summary(marriage_data1)
##      Name              State           Married.ages.32.34..pct.
##  Length:820         Length:820         Min.   :0.1735          
##  Class :character   Class :character   1st Qu.:0.5394          
##  Mode  :character   Mode  :character   Median :0.5993          
##                                        Mean   :0.5878          
##                                        3rd Qu.:0.6544          
##                                        Max.   :0.8474          
##  Married.ages.23.25..pct.    HBCU.           Religious.private.
##  Min.   :0.009272         Length:820         Length:820        
##  1st Qu.:0.055751         Class :character   Class :character  
##  Median :0.111568         Mode  :character   Mode  :character  
##  Mean   :0.131132                                              
##  3rd Qu.:0.181810                                              
##  Max.   :0.566885                                              
##  Nonsectarian.private.
##  Length:820           
##  Class :character     
##  Mode  :character     
##                       
##                       
## 

For my quantitative variable, I will focus on the percentage of students from all the universities in my data set that are married between the ages of 23-25 (fresh out of college). Below are the mean, standard deviation, and five number summary for this variable, respectively.

mean(marriage_data1$Married.ages.23.25..pct.)
## [1] 0.1311321
sd(marriage_data1$Married.ages.23.25..pct.)
## [1] 0.09386227
fivenum(marriage_data1$Married.ages.23.25..pct.)
## [1] 0.00927220 0.05564775 0.11156760 0.18184130 0.56688480

To give a visual of the data, I will create a histogram, box plot, and qq plot for this variable:

hist(marriage_data1$Married.ages.23.25..pct.)

boxplot(marriage_data1$Married.ages.23.25..pct.)

qqnorm(marriage_data1$Married.ages.23.25..pct.)

I wouldn’t say there are any outliers of my data, but it is definitely skewed to the left of my histogram. I think this is because the data is defined as what percentage of college goers/graduates in that age range are married. The actual number might be bigger at larger universities, but as you can tell the majority of the colleges have between 0 and .015 percent of the people in this age range married.

The categorical variable I want to focus on is whether the private college is religiously affiliated or not. I will make a frequency and relative frequency table for this.

table(marriage_data1$Religious.private.)
## 
##  No Yes 
## 505 315
table(marriage_data1$Religious.private.)/length(marriage_data1$Religious.private.)
## 
##        No       Yes 
## 0.6158537 0.3841463

I’ll now make a two way table to test the correlation of religious universities and state. I am curious to see if there are more religious colleges’ data that was gathered from the South, or the “Bible Belt” of America.

with(marriage_data1, table(marriage_data1$State, marriage_data1$Religious.private.))
##     
##      No Yes
##   AL 12   4
##   AR  7   1
##   CA 42  15
##   CO 10   1
##   CT 10   2
##   DC  2   3
##   DE  1   0
##   FL 16   4
##   GA 16   8
##   HI  1   1
##   IA  6  10
##   ID  2   1
##   IL 16  19
##   IN  6  14
##   KS  7   6
##   KY  6   6
##   LA  8   5
##   MA 31   7
##   MD  4   1
##   ME  3   1
##   MI 18   9
##   MN  1  14
##   MO 12  10
##   MS  4   1
##   MT  5   1
##   NC 16  18
##   ND  4   0
##   NE  0   3
##   NH  8   1
##   NJ 14   8
##   NM  3   0
##   NV  2   0
##   NY 63  11
##   OH 12  22
##   OK  9   2
##   OR  9   4
##   PA 33  33
##   RI  5   2
##   SC 10   3
##   SD  0   1
##   TN  8  12
##   TX 23  21
##   UT  3   1
##   VA 16   6
##   VT  4   2
##   WA  8   6
##   WI  4  12
##   WV  4   3
##   WY  1   0

I was very surprised to see that 33 of PA’s colleges were religious. CA’s number surprised me too. The Southern states(TX, OK,LA,TN,GA,AL,etc) are not overwhelmingly Religously affiliated. Actually, the only southern state to have more Religously affiliated colleges surveyed than not was Tennesee. I’m not going to disregard the idea that living in the south plays a large role in young marriage yet, but that two way table works against that notion.

#Hypothesis Testing#

The first test I will conduct will deal with the mean for the older ages. My null hypothesis is that the average percentage of college graduates that got married in the age group 32-34 is 0.10 percent, based on the data I’ve seen. My alternative hypothesis is that the average is higher than .10 percent. I first need to figure out what the standard deviation for the 32-34 age group.

sd(marriage_data1$Married.ages.32.34..pct.)
## [1] 0.1067439
set.seed(100)
x <- round(rnorm(30, mean = .50, sd = 1 ),0)
t.test(x, mu=.50)
## 
##  One Sample t-test
## 
## data:  x
## t = 0.50162, df = 29, p-value = 0.6197
## alternative hypothesis: true mean is not equal to 0.5
## 95 percent confidence interval:
##  0.2948524 0.8384810
## sample estimates:
## mean of x 
## 0.5666667

My 95% confidence interval includes .50, so because of this, I am able to accept the null hypothesis.

Now I will conduct a hypothesis test to compare the two means from my age groups. My null hypothesis is that the average marriage rate is higher in the younger age group than the older. The alternative hypothesis would be that the 32-34 year old group actually has the higher mean.

t.test(marriage_data1$Married.ages.23.25..pct., marriage_data1$Married.ages.32.34..pct.)
## 
##  Welch Two Sample t-test
## 
## data:  marriage_data1$Married.ages.23.25..pct. and marriage_data1$Married.ages.32.34..pct.
## t = -92.003, df = 1611.6, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.4664212 -0.4469488
## sample estimates:
## mean of x mean of y 
## 0.1311321 0.5878171

From that test, we can see that the mean of the age group 32-34 is substantially larger than that of the younger generation. I am surprised by this statistic, actually. I guess the older group has more time to develop the relationship and make sure this partner is right for them to marry them, rather than being pushed into a proposal by the “ring by spring” pressure and that proposal never ending in a marriage.

#Highlights# This report was really fun for me to do. Some of the data that I found most interesting: There are not substantially more religious affiliated colleges in the south. Places like PA and CA that have larger populations had a ton. I thought geographic location would make a difference since the south is kind of known for being more “religious”.

The older group has a much higher percentage of marriage than the younger. I explained this in detail just above, but I still think it’s cool. It made me shift my perspective. A question I wish my data answered is what the divorce rate of the younger age group is, and how much the pressure of “Ring by Spring” impacts lives in that sense. Maybe something to look into at a later time.