Overview

Data Source: https://data.world/siyeh/state-marriage-rate
This dataset includes data about the number of marriages per 1,000 total population from 1990 to 2016. The data is broken apart into census regions, census division, and state.

Cleaning the Data

Importing the data This data is being loaded into a dataframe from a csv stored on GitHub.

url_path <- "https://raw.githubusercontent.com/devinteran/Data607-Project2/master/marriage_data%20-%20Sheet1.csv"
marriage_data_raw <- read_csv(url_path)
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   state = col_character(),
##   census_division = col_character(),
##   census_region = col_character()
## )
## See spec(...) for full column specifications.
marriage_data <- marriage_data_raw

Let’s view the first 10 columns of the data:

kable(head(marriage_data[,1:10])) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
state census_division census_region 2016 2015 2014 2013 2012 2011 2010
Alabama East South Central South 7.1478 7.3515 7.8068 7.8178 8.2 8.4 8.2
Alaska Pacific West 7.1034 7.4076 7.5088 7.2939 7.2 7.8 8.0
Arizona Mountain West 5.9305 5.9225 5.7804 5.4011 5.6 5.7 5.9
Arkansas West South Central South 9.8610 10.0403 10.1120 9.7511 10.9 10.4 10.8
California Pacific West 6.4636 6.1850 6.4415 6.4605 6.0 5.8 5.8
Colorado Mountain West 7.4254 6.7918 7.0616 6.4527 6.8 7.0 6.9

We need to pivot the data into a longer format so that each row contains one observation for a single year.

marriage_data_long <- marriage_data %>% pivot_longer(col=starts_with(c("20","19")),names_to="Year",values_to='count')

The resulting data look clean. We’re ready to begin our analysis:

kable(head(marriage_data_long)) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
state census_division census_region Year count
Alabama East South Central South 2016 7.1478
Alabama East South Central South 2015 7.3515
Alabama East South Central South 2014 7.8068
Alabama East South Central South 2013 7.8178
Alabama East South Central South 2012 8.2000
Alabama East South Central South 2011 8.4000

Data Exploration

First, what census region and census divisions do we have in our data?

region_plus_division <- arrange(unique(select(drop_na(marriage_data_long),census_region,census_division)),census_region,census_division)
kable(region_plus_division) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
census_region census_division
Midwest East North Central
Midwest West North Central
Northeast Middle Atlantic
Northeast New England
South East South Central
South South Atlantic
South West South Central
West Mountain
West Pacific
region_division_data <- select(drop_na(marriage_data_long),census_region,census_division,Year,count)
region <-  group_by(marriage_data_long,census_region,Year)
region_marriages <- summarize(drop_na(region),Marriages = sum(count))

How have has the number of marriages trended regionally over the past 35 years?

#Separate out for easier y-axis limits
mw_ne <- filter(region_marriages,census_region %in% c('Midwest','Northeast'))
s_w <- filter(region_marriages,census_region %in% c('South','West'))

g1 <- ggplot(mw_ne,aes(fill=census_region,x = Year,y = Marriages)) +
  geom_bar(position="dodge",stat = "identity") +
  facet_wrap(~census_region) + 
  theme(axis.text.x = element_text(angle = 90)) +
  ylab("Count Marriages") 
  scale_fill_manual(values= c("#999999", "#E69F00"))
## <ggproto object: Class ScaleDiscrete, Scale, gg>
##     aesthetics: fill
##     axis_order: function
##     break_info: function
##     break_positions: function
##     breaks: waiver
##     call: call
##     clone: function
##     dimension: function
##     drop: TRUE
##     expand: waiver
##     get_breaks: function
##     get_breaks_minor: function
##     get_labels: function
##     get_limits: function
##     guide: legend
##     is_discrete: function
##     is_empty: function
##     labels: waiver
##     limits: NULL
##     make_sec_title: function
##     make_title: function
##     map: function
##     map_df: function
##     n.breaks.cache: NULL
##     na.translate: TRUE
##     na.value: NA
##     name: waiver
##     palette: function
##     palette.cache: NULL
##     position: left
##     range: <ggproto object: Class RangeDiscrete, Range, gg>
##         range: NULL
##         reset: function
##         train: function
##         super:  <ggproto object: Class RangeDiscrete, Range, gg>
##     reset: function
##     scale_name: manual
##     train: function
##     train_df: function
##     transform: function
##     transform_df: function
##     super:  <ggproto object: Class ScaleDiscrete, Scale, gg>
g2 <- ggplot(s_w,aes(fill=census_region,x = Year,y = Marriages)) +
  geom_bar(position="dodge",stat = "identity") +
  facet_wrap(~census_region) + 
  theme(axis.text.x = element_text(angle = 90)) +
  ylab("Count Marriages") 

grid.arrange(g1,g2,top="Marriages Per 1,000 Population")

#Marriages in the West
Here we’ve separated out states into four separate graphs in alphabetical order. This was only done in order to see the data more easily.
* Hawaii,Nevada,Utah,Washington,Wyoming have had the largest decrease in marriages
* The number of marraiges in Oregon and New Mexico have been very consistent over the years
* Hawaii had a strange increase in marriages, peaking in 2004, then dropping pretty consistently in years since

west_marriage_a_c <- select(filter(drop_na(marriage_data_long),census_region == "West",Year > 1998,state %in% c("Alaska","Arizona","California","Colorado")),census_region,state,Year,count)
west_marriage_d_m <- select(filter(drop_na(marriage_data_long),census_region == "West",Year > 1998,state %in% c("Hawaii","Idaho","Montana")),census_region,state,Year,count)
west_marriage_n_o <- select(filter(drop_na(marriage_data_long),census_region == "West",Year > 1998,state %in% c("Nevada","New Mexico","Oregon")),census_region,state,Year,count)
west_marriage_p_z <- select(filter(drop_na(marriage_data_long),census_region == "West",Year > 1998,state %in% c("Utah","Washington","Wyoming")),census_region,state,Year,count)
#States A-C
a <- ggplot(west_marriage_a_c, aes(fill=state,x=Year,y=count,group=state,colour=state)) +
  geom_line(linetype = "dashed")+
  geom_point() +
  theme(axis.text.x = element_text(angle = 90))
#States D-M
b <- ggplot(west_marriage_d_m, aes(fill=state,x=Year,y=count,group=state,colour=state)) +
  geom_line(linetype = "dashed")+
  geom_point() +
  theme(axis.text.x = element_text(angle = 90))
#States N-O
c <- ggplot(west_marriage_n_o, aes(fill=state,x=Year,y=count,group=state,colour=state)) +
  geom_line(linetype = "dashed")+
  geom_point() +
  theme(axis.text.x = element_text(angle = 90))
#States P-Z
d <- ggplot(west_marriage_p_z, aes(fill=state,x=Year,y=count,group=state,colour=state)) +
  geom_line(linetype = "dashed")+
  geom_point() +
  theme(axis.text.x = element_text(angle = 90))

grid.arrange(a,b,c,d,nrow=2,top = "Marriages Per 1000 Per State - West Region")

States that appear to have dropped

drop_marriages <- select(filter(drop_na(marriage_data_long),census_region == "West",Year > 1998,state %in% c("Hawaii","Nevada","Utah","Washington","Wyoming")),census_region,state,Year,count)

ggplot(drop_marriages, aes(fill=state,x=Year,y=count,group=state,colour=state)) +
  geom_line(linetype = "dashed")+
  geom_point() +
  theme(axis.text.x = element_text(angle = 90)) +
  ggtitle("Marriages Per 1000") + 
  theme(plot.title = element_text(hjust = 0.5))

Conclusion

Overall, the marriage rate in the United States has dropped most significantly in Nevada in the West from 1990 to 2016. If we have additional information it would be interesting to investigate if this trend has any correlation with Las Vegas since many people go to Las Vegas on a whim to get married. Maybe certain laws were put into place that prevented travelers from getting married there. Perhaps, this trend has nothing to do with Las Vegas at all. City data would help us determine that.

It would also be interesting to include additional analysis to see which states had the highest increase in marriage rates from 1990 to 2016.