Overview

Data Source: https://data.world/siyeh/state-marriage-rate
This dataset includes data about the number of marriages per 1,000 total population from 1990 to 2016. The data is broken apart into census regions, census division, and state.

Cleaning the Data

Importing the data This data is being loaded into a dataframe from a csv stored on GitHub.

url_path <- "https://raw.githubusercontent.com/devinteran/Data607-Project2/master/marriage_data%20-%20Sheet1.csv"
marriage_data_raw <- read_csv(url_path)

## Parsed with column specification:
## cols(
##   .default = col_double(),
##   state = col_character(),
##   census_division = col_character(),
##   census_region = col_character()
## )

## See spec(...) for full column specifications.

marriage_data <- marriage_data_raw

Let’s view the first 10 columns of the data:

kable(head(marriage_data[,1:10])) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))

state	census_division	census_region	2016	2015	2014	2013	2012	2011	2010
Alabama	East South Central	South	7.1478	7.3515	7.8068	7.8178	8.2	8.4	8.2
Alaska	Pacific	West	7.1034	7.4076	7.5088	7.2939	7.2	7.8	8.0
Arizona	Mountain	West	5.9305	5.9225	5.7804	5.4011	5.6	5.7	5.9
Arkansas	West South Central	South	9.8610	10.0403	10.1120	9.7511	10.9	10.4	10.8
California	Pacific	West	6.4636	6.1850	6.4415	6.4605	6.0	5.8	5.8
Colorado	Mountain	West	7.4254	6.7918	7.0616	6.4527	6.8	7.0	6.9

We need to pivot the data into a longer format so that each row contains one observation for a single year.

marriage_data_long <- marriage_data %>% pivot_longer(col=starts_with(c("20","19")),names_to="Year",values_to='count')

The resulting data look clean. We’re ready to begin our analysis:

kable(head(marriage_data_long)) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))

state	census_division	census_region	Year	count
Alabama	East South Central	South	2016	7.1478
Alabama	East South Central	South	2015	7.3515
Alabama	East South Central	South	2014	7.8068
Alabama	East South Central	South	2013	7.8178
Alabama	East South Central	South	2012	8.2000
Alabama	East South Central	South	2011	8.4000

Data Exploration

First, what census region and census divisions do we have in our data?

region_plus_division <- arrange(unique(select(drop_na(marriage_data_long),census_region,census_division)),census_region,census_division)
kable(region_plus_division) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))

census_region	census_division
Midwest	East North Central
Midwest	West North Central
Northeast	Middle Atlantic
Northeast	New England
South	East South Central
South	South Atlantic
South	West South Central
West	Mountain
West	Pacific

region_division_data <- select(drop_na(marriage_data_long),census_region,census_division,Year,count)
region <-  group_by(marriage_data_long,census_region,Year)
region_marriages <- summarize(drop_na(region),Marriages = sum(count))

How have has the number of marriages trended regionally over the past 35 years?

There has been a slight decrease in marriages across all regions
The largest decrease in marriages is happening the West, let’s dive further into this region

#Separate out for easier y-axis limits
mw_ne <- filter(region_marriages,census_region %in% c('Midwest','Northeast'))
s_w <- filter(region_marriages,census_region %in% c('South','West'))

g1 <- ggplot(mw_ne,aes(fill=census_region,x = Year,y = Marriages)) +
  geom_bar(position="dodge",stat = "identity") +
  facet_wrap(~census_region) + 
  theme(axis.text.x = element_text(angle = 90)) +
  ylab("Count Marriages") 
  scale_fill_manual(values= c("#999999", "#E69F00"))

## <ggproto object: Class ScaleDiscrete, Scale, gg>
##     aesthetics: fill
##     axis_order: function
##     break_info: function
##     break_positions: function
##     breaks: waiver
##     call: call
##     clone: function
##     dimension: function
##     drop: TRUE
##     expand: waiver
##     get_breaks: function
##     get_breaks_minor: function
##     get_labels: function
##     get_limits: function
##     guide: legend
##     is_discrete: function
##     is_empty: function
##     labels: waiver
##     limits: NULL
##     make_sec_title: function
##     make_title: function
##     map: function
##     map_df: function
##     n.breaks.cache: NULL
##     na.translate: TRUE
##     na.value: NA
##     name: waiver
##     palette: function
##     palette.cache: NULL
##     position: left
##     range: <ggproto object: Class RangeDiscrete, Range, gg>
##         range: NULL
##         reset: function
##         train: function
##         super:  <ggproto object: Class RangeDiscrete, Range, gg>
##     reset: function
##     scale_name: manual
##     train: function
##     train_df: function
##     transform: function
##     transform_df: function
##     super:  <ggproto object: Class ScaleDiscrete, Scale, gg>

g2 <- ggplot(s_w,aes(fill=census_region,x = Year,y = Marriages)) +
  geom_bar(position="dodge",stat = "identity") +
  facet_wrap(~census_region) + 
  theme(axis.text.x = element_text(angle = 90)) +
  ylab("Count Marriages") 

grid.arrange(g1,g2,top="Marriages Per 1,000 Population")

#Marriages in the West
Here we’ve separated out states into four separate graphs in alphabetical order. This was only done in order to see the data more easily.
* Hawaii,Nevada,Utah,Washington,Wyoming have had the largest decrease in marriages
* The number of marraiges in Oregon and New Mexico have been very consistent over the years
* Hawaii had a strange increase in marriages, peaking in 2004, then dropping pretty consistently in years since

west_marriage_a_c <- select(filter(drop_na(marriage_data_long),census_region == "West",Year > 1998,state %in% c("Alaska","Arizona","California","Colorado")),census_region,state,Year,count)
west_marriage_d_m <- select(filter(drop_na(marriage_data_long),census_region == "West",Year > 1998,state %in% c("Hawaii","Idaho","Montana")),census_region,state,Year,count)
west_marriage_n_o <- select(filter(drop_na(marriage_data_long),census_region == "West",Year > 1998,state %in% c("Nevada","New Mexico","Oregon")),census_region,state,Year,count)
west_marriage_p_z <- select(filter(drop_na(marriage_data_long),census_region == "West",Year > 1998,state %in% c("Utah","Washington","Wyoming")),census_region,state,Year,count)
#States A-C
a <- ggplot(west_marriage_a_c, aes(fill=state,x=Year,y=count,group=state,colour=state)) +
  geom_line(linetype = "dashed")+
  geom_point() +
  theme(axis.text.x = element_text(angle = 90))
#States D-M
b <- ggplot(west_marriage_d_m, aes(fill=state,x=Year,y=count,group=state,colour=state)) +
  geom_line(linetype = "dashed")+
  geom_point() +
  theme(axis.text.x = element_text(angle = 90))
#States N-O
c <- ggplot(west_marriage_n_o, aes(fill=state,x=Year,y=count,group=state,colour=state)) +
  geom_line(linetype = "dashed")+
  geom_point() +
  theme(axis.text.x = element_text(angle = 90))
#States P-Z
d <- ggplot(west_marriage_p_z, aes(fill=state,x=Year,y=count,group=state,colour=state)) +
  geom_line(linetype = "dashed")+
  geom_point() +
  theme(axis.text.x = element_text(angle = 90))

grid.arrange(a,b,c,d,nrow=2,top = "Marriages Per 1000 Per State - West Region")

States that appear to have dropped

Here it’s clear the state having the biggest overaly impact and decrease in marriages is Nevada
If we had more granular data including city data, we could inspect if this difference has anything to do with Las Vegasß

drop_marriages <- select(filter(drop_na(marriage_data_long),census_region == "West",Year > 1998,state %in% c("Hawaii","Nevada","Utah","Washington","Wyoming")),census_region,state,Year,count)

ggplot(drop_marriages, aes(fill=state,x=Year,y=count,group=state,colour=state)) +
  geom_line(linetype = "dashed")+
  geom_point() +
  theme(axis.text.x = element_text(angle = 90)) +
  ggtitle("Marriages Per 1000") + 
  theme(plot.title = element_text(hjust = 0.5))

Conclusion

Overall, the marriage rate in the United States has dropped most significantly in Nevada in the West from 1990 to 2016. If we have additional information it would be interesting to investigate if this trend has any correlation with Las Vegas since many people go to Las Vegas on a whim to get married. Maybe certain laws were put into place that prevented travelers from getting married there. Perhaps, this trend has nothing to do with Las Vegas at all. City data would help us determine that.

It would also be interesting to include additional analysis to see which states had the highest increase in marriage rates from 1990 to 2016.

Project 2 - Marriage Data Analysis