Data Source: https://data.world/siyeh/state-marriage-rate
This dataset includes data about the number of marriages per 1,000 total population from 1990 to 2016. The data is broken apart into census regions, census division, and state.
Importing the data This data is being loaded into a dataframe from a csv stored on GitHub.
url_path <- "https://raw.githubusercontent.com/devinteran/Data607-Project2/master/marriage_data%20-%20Sheet1.csv"
marriage_data_raw <- read_csv(url_path)
## Parsed with column specification:
## cols(
## .default = col_double(),
## state = col_character(),
## census_division = col_character(),
## census_region = col_character()
## )
## See spec(...) for full column specifications.
marriage_data <- marriage_data_raw
Let’s view the first 10 columns of the data:
kable(head(marriage_data[,1:10])) %>%
kable_styling(bootstrap_options = c("striped", "hover"))
| state | census_division | census_region | 2016 | 2015 | 2014 | 2013 | 2012 | 2011 | 2010 |
|---|---|---|---|---|---|---|---|---|---|
| Alabama | East South Central | South | 7.1478 | 7.3515 | 7.8068 | 7.8178 | 8.2 | 8.4 | 8.2 |
| Alaska | Pacific | West | 7.1034 | 7.4076 | 7.5088 | 7.2939 | 7.2 | 7.8 | 8.0 |
| Arizona | Mountain | West | 5.9305 | 5.9225 | 5.7804 | 5.4011 | 5.6 | 5.7 | 5.9 |
| Arkansas | West South Central | South | 9.8610 | 10.0403 | 10.1120 | 9.7511 | 10.9 | 10.4 | 10.8 |
| California | Pacific | West | 6.4636 | 6.1850 | 6.4415 | 6.4605 | 6.0 | 5.8 | 5.8 |
| Colorado | Mountain | West | 7.4254 | 6.7918 | 7.0616 | 6.4527 | 6.8 | 7.0 | 6.9 |
We need to pivot the data into a longer format so that each row contains one observation for a single year.
marriage_data_long <- marriage_data %>% pivot_longer(col=starts_with(c("20","19")),names_to="Year",values_to='count')
The resulting data look clean. We’re ready to begin our analysis:
kable(head(marriage_data_long)) %>%
kable_styling(bootstrap_options = c("striped", "hover"))
| state | census_division | census_region | Year | count |
|---|---|---|---|---|
| Alabama | East South Central | South | 2016 | 7.1478 |
| Alabama | East South Central | South | 2015 | 7.3515 |
| Alabama | East South Central | South | 2014 | 7.8068 |
| Alabama | East South Central | South | 2013 | 7.8178 |
| Alabama | East South Central | South | 2012 | 8.2000 |
| Alabama | East South Central | South | 2011 | 8.4000 |
First, what census region and census divisions do we have in our data?
region_plus_division <- arrange(unique(select(drop_na(marriage_data_long),census_region,census_division)),census_region,census_division)
kable(region_plus_division) %>%
kable_styling(bootstrap_options = c("striped", "hover"))
| census_region | census_division |
|---|---|
| Midwest | East North Central |
| Midwest | West North Central |
| Northeast | Middle Atlantic |
| Northeast | New England |
| South | East South Central |
| South | South Atlantic |
| South | West South Central |
| West | Mountain |
| West | Pacific |
region_division_data <- select(drop_na(marriage_data_long),census_region,census_division,Year,count)
region <- group_by(marriage_data_long,census_region,Year)
region_marriages <- summarize(drop_na(region),Marriages = sum(count))
#Separate out for easier y-axis limits
mw_ne <- filter(region_marriages,census_region %in% c('Midwest','Northeast'))
s_w <- filter(region_marriages,census_region %in% c('South','West'))
g1 <- ggplot(mw_ne,aes(fill=census_region,x = Year,y = Marriages)) +
geom_bar(position="dodge",stat = "identity") +
facet_wrap(~census_region) +
theme(axis.text.x = element_text(angle = 90)) +
ylab("Count Marriages")
scale_fill_manual(values= c("#999999", "#E69F00"))
## <ggproto object: Class ScaleDiscrete, Scale, gg>
## aesthetics: fill
## axis_order: function
## break_info: function
## break_positions: function
## breaks: waiver
## call: call
## clone: function
## dimension: function
## drop: TRUE
## expand: waiver
## get_breaks: function
## get_breaks_minor: function
## get_labels: function
## get_limits: function
## guide: legend
## is_discrete: function
## is_empty: function
## labels: waiver
## limits: NULL
## make_sec_title: function
## make_title: function
## map: function
## map_df: function
## n.breaks.cache: NULL
## na.translate: TRUE
## na.value: NA
## name: waiver
## palette: function
## palette.cache: NULL
## position: left
## range: <ggproto object: Class RangeDiscrete, Range, gg>
## range: NULL
## reset: function
## train: function
## super: <ggproto object: Class RangeDiscrete, Range, gg>
## reset: function
## scale_name: manual
## train: function
## train_df: function
## transform: function
## transform_df: function
## super: <ggproto object: Class ScaleDiscrete, Scale, gg>
g2 <- ggplot(s_w,aes(fill=census_region,x = Year,y = Marriages)) +
geom_bar(position="dodge",stat = "identity") +
facet_wrap(~census_region) +
theme(axis.text.x = element_text(angle = 90)) +
ylab("Count Marriages")
grid.arrange(g1,g2,top="Marriages Per 1,000 Population")
#Marriages in the West
Here we’ve separated out states into four separate graphs in alphabetical order. This was only done in order to see the data more easily.
* Hawaii,Nevada,Utah,Washington,Wyoming have had the largest decrease in marriages
* The number of marraiges in Oregon and New Mexico have been very consistent over the years
* Hawaii had a strange increase in marriages, peaking in 2004, then dropping pretty consistently in years since
west_marriage_a_c <- select(filter(drop_na(marriage_data_long),census_region == "West",Year > 1998,state %in% c("Alaska","Arizona","California","Colorado")),census_region,state,Year,count)
west_marriage_d_m <- select(filter(drop_na(marriage_data_long),census_region == "West",Year > 1998,state %in% c("Hawaii","Idaho","Montana")),census_region,state,Year,count)
west_marriage_n_o <- select(filter(drop_na(marriage_data_long),census_region == "West",Year > 1998,state %in% c("Nevada","New Mexico","Oregon")),census_region,state,Year,count)
west_marriage_p_z <- select(filter(drop_na(marriage_data_long),census_region == "West",Year > 1998,state %in% c("Utah","Washington","Wyoming")),census_region,state,Year,count)
#States A-C
a <- ggplot(west_marriage_a_c, aes(fill=state,x=Year,y=count,group=state,colour=state)) +
geom_line(linetype = "dashed")+
geom_point() +
theme(axis.text.x = element_text(angle = 90))
#States D-M
b <- ggplot(west_marriage_d_m, aes(fill=state,x=Year,y=count,group=state,colour=state)) +
geom_line(linetype = "dashed")+
geom_point() +
theme(axis.text.x = element_text(angle = 90))
#States N-O
c <- ggplot(west_marriage_n_o, aes(fill=state,x=Year,y=count,group=state,colour=state)) +
geom_line(linetype = "dashed")+
geom_point() +
theme(axis.text.x = element_text(angle = 90))
#States P-Z
d <- ggplot(west_marriage_p_z, aes(fill=state,x=Year,y=count,group=state,colour=state)) +
geom_line(linetype = "dashed")+
geom_point() +
theme(axis.text.x = element_text(angle = 90))
grid.arrange(a,b,c,d,nrow=2,top = "Marriages Per 1000 Per State - West Region")
drop_marriages <- select(filter(drop_na(marriage_data_long),census_region == "West",Year > 1998,state %in% c("Hawaii","Nevada","Utah","Washington","Wyoming")),census_region,state,Year,count)
ggplot(drop_marriages, aes(fill=state,x=Year,y=count,group=state,colour=state)) +
geom_line(linetype = "dashed")+
geom_point() +
theme(axis.text.x = element_text(angle = 90)) +
ggtitle("Marriages Per 1000") +
theme(plot.title = element_text(hjust = 0.5))
Overall, the marriage rate in the United States has dropped most significantly in Nevada in the West from 1990 to 2016. If we have additional information it would be interesting to investigate if this trend has any correlation with Las Vegas since many people go to Las Vegas on a whim to get married. Maybe certain laws were put into place that prevented travelers from getting married there. Perhaps, this trend has nothing to do with Las Vegas at all. City data would help us determine that.
It would also be interesting to include additional analysis to see which states had the highest increase in marriage rates from 1990 to 2016.