I have chosen to ivestigate marriage data by State between 1990 and 2016.

We will investigate overall trends and look at some indivual behaviors on the state level



1) Loading required libraries and downloading data

We load all required libraries:

library(httr)
library(dplyr)
library(tidyr)
library(ggplot2)
library(knitr)
library(date)
library(readxl)

We download the "xlsx’ file from the world data site (https://data.world/siyeh/state-marriage-rate):

GET("https://query.data.world/s/6clilulytst42jgx3frithsindjkzt", write_disk(tf <- tempfile(fileext = ".xlsx")))
marriage <- read_excel(tf)

In the table below we can see that the data set shows the marriage rate (per 1000) for each state in the years 1990, 1995, 1999 through 2016:

kable(head(marriage),digits=1)
State 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 2000 1999 1995 1990
Alabama 7.1 7.4 7.8 7.8 8.2 8.4 8.2 8.3 8.6 8.9 9.2 9.2 9.4 9.6 9.9 9.4 10.1 10.8 9.8 10.6
Alaska 7.1 7.4 7.5 7.3 7.2 7.8 8.0 7.8 8.4 8.5 8.2 8.2 8.5 8.1 8.3 8.1 8.9 8.6 9.0 10.2
Arizona 5.9 5.9 5.8 5.4 5.6 5.7 5.9 5.6 6.0 6.4 6.5 6.6 6.7 6.5 6.7 7.6 7.5 8.2 8.8 10.0
Arkansas 9.9 10.0 10.1 9.8 10.9 10.4 10.8 10.7 10.6 12.0 12.4 12.9 13.4 13.4 14.3 14.3 15.4 14.8 14.4 15.3
California 6.5 6.2 6.4 6.5 6.0 5.8 5.8 5.8 6.7 6.2 6.3 6.4 6.4 6.1 6.2 6.5 5.8 6.4 6.3 7.9
Colorado 7.4 6.8 7.1 6.5 6.8 7.0 6.9 6.9 7.4 7.1 7.2 7.6 7.4 7.8 8.0 8.2 8.3 8.2 9.0 9.8

We have a look at the state with the highest and lowest marriage rate:

#use arrange to find max
maxstate <- arrange(marriage, desc(marriage$"2016"))[1,]
kable(maxstate,digits=1)
State 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 2000 1999 1995 1990
Nevada 28.4 31 31.9 32.3 35.1 36.9 38.3 40.3 42.3 48.6 52.1 57.4 62.1 63.9 67.4 69.6 72.2 82.3 85.2 99

Nevada has the highest marriage rate for any given year but has seen a steep decrease. The high amount of weddings is most likely due to the very popular wedding tourism of out of state pairs getting married in Nevada.

#use arrange to find min
minstate <- arrange(marriage, marriage$"2016")[1,]
kable(minstate,digits=1)
State 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 2000 1999 1995 1990
Illinois 5.4 5.9 6.2 5.4 5.8 5.6 5.7 5.7 5.9 6.1 6.2 5.9 6.2 6.5 6.6 7.2 6.9 7 6.9 8.8

Illinois has the lowest marriage rate in 2016.

#use summarise to find look globally
avg_mar <- round(marriage %>%
  summarise_if(is.numeric, mean, na.rm = TRUE),1)
#avg_mar <- round(avg_mar,1)
kable(avg_mar)
2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 2000 1999 1995 1990
7.4 7.5 7.7 7.5 7.5 7.6 7.6 7.5 7.8 8.2 8.4 8.7 9 9 9.3 9.5 9.8 9.9 10.3 11.6
long_avg_mar <- tidyr::gather(avg_mar, "year", "AvgMarRate")
colors <- c("Average Marriage Rate" = "black")

ggplot(long_avg_mar, aes(x = year)) +
    geom_line(aes(y = AvgMarRate, color = "Average Marriage Rate",group = 1), size = 1.2) +
    labs(x = "year",
         y = "marriage rate",
         color = "Legend") +
    scale_color_manual(values = colors) +
    ggtitle("Average US-wide Marriage Rate") +
    theme(plot.title = element_text(hjust = 0.5))

In the above graph we show the unweighted average wedding rate, which would mean that we assume that all states have the same population, which is clearly not true. However, we still consider an average over all state’s mortality rate to be an intersting indicator. We can clearly see a strong reduction of marriage rate from 11.6 to roughly 7.4



#we make the data long with gather:
marry <- tidyr::gather(marriage, "Year", "Amount", -State)
# kable(head(marry))
#we then make the data wide again by focusing on the first and last year of our data series (1990 and 2016)
marry_minmax <- marry %>% 
    dplyr::filter(Year == min(marry$Year)| Year == max(marry$Year))
marry_minmax <- tidyr::spread(marry_minmax,Year,Amount)

#we then make the data wide again by focusing on the first and last year of our data series (1990 and 2016)

We now have a table with the first an last year’s marriage rate for every state:

kable(head(marry_minmax))
State 1990 2016
Alabama 10.6 7.147821
Alaska 10.2 7.103441
Arizona 10.0 5.930541
Arkansas 15.3 9.860962
California 7.9 6.463590
Colorado 9.8 7.425443

Looking at the states with the least and most change below, again Nevada stands out with a reduction by more than 70 percent:

#add percentage change col
marry_minmax$percchange <- (marry_minmax$'1990' - marry_minmax$'2016') / marry_minmax$'1990'
#add abs change col
marry_minmax$abschange <- (marry_minmax$'1990' - marry_minmax$'2016')
kable(head(marry_minmax))
State 1990 2016 percchange abschange
Alabama 10.6 7.147821 0.3256772 3.452179
Alaska 10.2 7.103441 0.3035842 3.096559
Arizona 10.0 5.930541 0.4069459 4.069459
Arkansas 15.3 9.860962 0.3554927 5.439038
California 7.9 6.463590 0.1818241 1.436410
Colorado 9.8 7.425443 0.2423018 2.374557

states with the least change:

kable(head(arrange(marry_minmax, desc(percchange))))
State 1990 2016 percchange abschange
Nevada 99.0 28.392297 0.7132091 70.607703
South Carolina 15.9 6.632979 0.5828315 9.267021
Kentucky 13.5 7.379354 0.4533812 6.120646
Idaho 13.9 8.077759 0.4188662 5.822241
Arizona 10.0 5.930541 0.4069459 4.069459
Illinois 8.8 5.357481 0.3911954 3.442519

states with the most change:

kable(head(arrange(marry_minmax, percchange)))
State 1990 2016 percchange abschange
District of Columbia 8.2 8.149214 0.0061934 0.0507861
Hawaii 16.4 15.555557 0.0514904 0.8444429
Montana 8.6 7.840617 0.0883004 0.7593830
North Carolina 7.8 6.967624 0.1067149 0.8323764
West Virginia 7.2 6.354097 0.1174865 0.8459028
New York 8.6 7.451752 0.1335172 1.1482478

From the below plot we can see that most states have reduced marriage rates by around 20 to 40 percent:

ggplot(marry_minmax, aes(x=reorder(State, -percchange), y=percchange)) + 
  geom_bar(stat = "identity", width=0.9) +
  coord_flip() +
  ggtitle("Percentage reduction in Birthrate by State 2016 vs 1990") +
  xlab("State") + 
  ylab("percentage reduction in birthrate")+
  scale_fill_brewer(palette="Oranges")