Data Preparation

Data Source

https://github.com/fivethirtyeight/data/tree/master/marriage

# load data
library(psych)
library(ggplot2)
library(plyr)
divorce_df <- read.table("https://raw.githubusercontent.com/fivethirtyeight/data/master/marriage/divorce.csv", 
                 header = TRUE, sep = ",")
#We would do analysis on divorce rates into two categories. 
#1) For people aged 35 to 44 
#2) For people aged 45 to 54

#Create a dataframe that has only the educational variables of divorce rates

Edu_divorce_rates_35_to_44 <- divorce_df[c('year','all_3544','HS_3544','SC_3544','BAp_3544','BAo_3544','GD_3544')]
Edu_divorce_rates_45_to_54 <- divorce_df[c('year','all_4554','HS_4554','SC_4554','BAp_4554','BAo_4554','GD_4554')]

#Rename columns to more descriptive name

Edu_divorce_rates_35_to_44<-rename(Edu_divorce_rates_35_to_44, c("all_3544"="ALL", "HS_3544"="High_School",  "SC_3544"="Some_college","BAp_3544"="Bachelors_degree_more", "BAo_3544"="bachelors_degree_only", "GD_3544"="Graducate_degree"))

Edu_divorce_rates_45_to_54<-rename(Edu_divorce_rates_45_to_54, c("all_4554"="ALL", "HS_4554"="High_School",  "SC_4554"="Some_college","BAp_4554"="Bachelors_degree_more", "BAo_4554"="bachelors_degree_only", "GD_4554"="Graducate_degree"))

Research question

You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.

Are divorce rates higher for couples with High school diploma compared to couples with College degree

Is educational level correlated to divorce rates for married couples?

Cases

What are the cases, and how many are there?

There are 17 observations in the data set

Data collection

Describe the method of data collection.

Data is collected from American Community Survey (years 2001-2012), via IPUMS USA.

Type of study

What type of study is this (observational/experiment)?

Observational study

Data Source

If you collected the data, state self-collected. If not, provide a citation/link.

Data is collected by American Community Survey via IPUMS USA and is available online here: https://github.com/fivethirtyeight/data/tree/master/marriage. For this project, data was extracted using the read.table ()

Ben Cassleman, fivethirtyeight, (2014), GitHub repository, https://github.com/fivethirtyeight/data/tree/master/marriage

Response

What is the response variable, and what type is it (numerical/categorical)?

Rate of divorce at education level. It is a numerical variable

Explanatory

What is the explanatory variable, and what type is it (numerical/categorival)?

Educational level, it is a categorical variable

Relevant summary statistics

Provide summary statistics relevant to your research question. For example, if you’re comparing means across groups provide means, SDs, sample sizes of each group. This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.

summary(divorce_df$HS_3544)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.03489 0.17254 0.17545 0.16015 0.18838 0.19240
summary(divorce_df$BAo_3544)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.02751 0.10711 0.11086 0.10182 0.11186 0.11853
describe(divorce_df$all_3544)
##    vars  n mean   sd median trimmed  mad  min  max range  skew kurtosis
## X1    1 17 0.14 0.04   0.16    0.15 0.01 0.03 0.17  0.13 -1.83      1.8
##      se
## X1 0.01
#plot of year vs high school degree holders divorce rates
plot(Edu_divorce_rates_35_to_44[c('year','High_School')])
lines(Edu_divorce_rates_35_to_44[c('year','High_School')])

#plot of year vs bachelors degree or more holders divorce rates
plot(Edu_divorce_rates_35_to_44[c('year','Bachelors_degree_more')])
lines(Edu_divorce_rates_35_to_44[c('year','Bachelors_degree_more')])

ggplot(Edu_divorce_rates_35_to_44, aes(x=High_School)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(Edu_divorce_rates_35_to_44, aes(x=Bachelors_degree_more)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.