I have chosen to ivestigate marriage data by State between 1990 and 2016.
We will investigate overall trends and look at some indivual behaviors on the state level
We load all required libraries:
library(httr)
library(dplyr)
library(tidyr)
library(ggplot2)
library(knitr)
library(date)
library(readxl)
We download the "xlsx’ file from the world data site (https://data.world/siyeh/state-marriage-rate):
GET("https://query.data.world/s/6clilulytst42jgx3frithsindjkzt", write_disk(tf <- tempfile(fileext = ".xlsx")))
marriage <- read_excel(tf)
In the table below we can see that the data set shows the marriage rate (per 1000) for each state in the years 1990, 1995, 1999 through 2016:
kable(head(marriage),digits=1)
| State | 2016 | 2015 | 2014 | 2013 | 2012 | 2011 | 2010 | 2009 | 2008 | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000 | 1999 | 1995 | 1990 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Alabama | 7.1 | 7.4 | 7.8 | 7.8 | 8.2 | 8.4 | 8.2 | 8.3 | 8.6 | 8.9 | 9.2 | 9.2 | 9.4 | 9.6 | 9.9 | 9.4 | 10.1 | 10.8 | 9.8 | 10.6 |
| Alaska | 7.1 | 7.4 | 7.5 | 7.3 | 7.2 | 7.8 | 8.0 | 7.8 | 8.4 | 8.5 | 8.2 | 8.2 | 8.5 | 8.1 | 8.3 | 8.1 | 8.9 | 8.6 | 9.0 | 10.2 |
| Arizona | 5.9 | 5.9 | 5.8 | 5.4 | 5.6 | 5.7 | 5.9 | 5.6 | 6.0 | 6.4 | 6.5 | 6.6 | 6.7 | 6.5 | 6.7 | 7.6 | 7.5 | 8.2 | 8.8 | 10.0 |
| Arkansas | 9.9 | 10.0 | 10.1 | 9.8 | 10.9 | 10.4 | 10.8 | 10.7 | 10.6 | 12.0 | 12.4 | 12.9 | 13.4 | 13.4 | 14.3 | 14.3 | 15.4 | 14.8 | 14.4 | 15.3 |
| California | 6.5 | 6.2 | 6.4 | 6.5 | 6.0 | 5.8 | 5.8 | 5.8 | 6.7 | 6.2 | 6.3 | 6.4 | 6.4 | 6.1 | 6.2 | 6.5 | 5.8 | 6.4 | 6.3 | 7.9 |
| Colorado | 7.4 | 6.8 | 7.1 | 6.5 | 6.8 | 7.0 | 6.9 | 6.9 | 7.4 | 7.1 | 7.2 | 7.6 | 7.4 | 7.8 | 8.0 | 8.2 | 8.3 | 8.2 | 9.0 | 9.8 |
We have a look at the state with the highest and lowest marriage rate:
#use arrange to find max
maxstate <- arrange(marriage, desc(marriage$"2016"))[1,]
kable(maxstate,digits=1)
| State | 2016 | 2015 | 2014 | 2013 | 2012 | 2011 | 2010 | 2009 | 2008 | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000 | 1999 | 1995 | 1990 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Nevada | 28.4 | 31 | 31.9 | 32.3 | 35.1 | 36.9 | 38.3 | 40.3 | 42.3 | 48.6 | 52.1 | 57.4 | 62.1 | 63.9 | 67.4 | 69.6 | 72.2 | 82.3 | 85.2 | 99 |
Nevada has the highest marriage rate for any given year but has seen a steep decrease. The high amount of weddings is most likely due to the very popular wedding tourism of out of state pairs getting married in Nevada.
#use arrange to find min
minstate <- arrange(marriage, marriage$"2016")[1,]
kable(minstate,digits=1)
| State | 2016 | 2015 | 2014 | 2013 | 2012 | 2011 | 2010 | 2009 | 2008 | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000 | 1999 | 1995 | 1990 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Illinois | 5.4 | 5.9 | 6.2 | 5.4 | 5.8 | 5.6 | 5.7 | 5.7 | 5.9 | 6.1 | 6.2 | 5.9 | 6.2 | 6.5 | 6.6 | 7.2 | 6.9 | 7 | 6.9 | 8.8 |
Illinois has the lowest marriage rate in 2016.
#use summarise to find look globally
avg_mar <- round(marriage %>%
summarise_if(is.numeric, mean, na.rm = TRUE),1)
#avg_mar <- round(avg_mar,1)
kable(avg_mar)
| 2016 | 2015 | 2014 | 2013 | 2012 | 2011 | 2010 | 2009 | 2008 | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000 | 1999 | 1995 | 1990 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 7.4 | 7.5 | 7.7 | 7.5 | 7.5 | 7.6 | 7.6 | 7.5 | 7.8 | 8.2 | 8.4 | 8.7 | 9 | 9 | 9.3 | 9.5 | 9.8 | 9.9 | 10.3 | 11.6 |
long_avg_mar <- tidyr::gather(avg_mar, "year", "AvgMarRate")
colors <- c("Average Marriage Rate" = "black")
ggplot(long_avg_mar, aes(x = year)) +
geom_line(aes(y = AvgMarRate, color = "Average Marriage Rate",group = 1), size = 1.2) +
labs(x = "year",
y = "marriage rate",
color = "Legend") +
scale_color_manual(values = colors) +
ggtitle("Average US-wide Marriage Rate") +
theme(plot.title = element_text(hjust = 0.5))
In the above graph we show the unweighted average wedding rate, which would mean that we assume that all states have the same population, which is clearly not true. However, we still consider an average over all state’s mortality rate to be an intersting indicator. We can clearly see a strong reduction of marriage rate from 11.6 to roughly 7.4
#we make the data long with gather:
marry <- tidyr::gather(marriage, "Year", "Amount", -State)
# kable(head(marry))
#we then make the data wide again by focusing on the first and last year of our data series (1990 and 2016)
marry_minmax <- marry %>%
dplyr::filter(Year == min(marry$Year)| Year == max(marry$Year))
marry_minmax <- tidyr::spread(marry_minmax,Year,Amount)
#we then make the data wide again by focusing on the first and last year of our data series (1990 and 2016)
We now have a table with the first an last year’s marriage rate for every state:
kable(head(marry_minmax))
| State | 1990 | 2016 |
|---|---|---|
| Alabama | 10.6 | 7.147821 |
| Alaska | 10.2 | 7.103441 |
| Arizona | 10.0 | 5.930541 |
| Arkansas | 15.3 | 9.860962 |
| California | 7.9 | 6.463590 |
| Colorado | 9.8 | 7.425443 |
Looking at the states with the least and most change below, again Nevada stands out with a reduction by more than 70 percent:
#add percentage change col
marry_minmax$percchange <- (marry_minmax$'1990' - marry_minmax$'2016') / marry_minmax$'1990'
#add abs change col
marry_minmax$abschange <- (marry_minmax$'1990' - marry_minmax$'2016')
kable(head(marry_minmax))
| State | 1990 | 2016 | percchange | abschange |
|---|---|---|---|---|
| Alabama | 10.6 | 7.147821 | 0.3256772 | 3.452179 |
| Alaska | 10.2 | 7.103441 | 0.3035842 | 3.096559 |
| Arizona | 10.0 | 5.930541 | 0.4069459 | 4.069459 |
| Arkansas | 15.3 | 9.860962 | 0.3554927 | 5.439038 |
| California | 7.9 | 6.463590 | 0.1818241 | 1.436410 |
| Colorado | 9.8 | 7.425443 | 0.2423018 | 2.374557 |
states with the least change:
kable(head(arrange(marry_minmax, desc(percchange))))
| State | 1990 | 2016 | percchange | abschange |
|---|---|---|---|---|
| Nevada | 99.0 | 28.392297 | 0.7132091 | 70.607703 |
| South Carolina | 15.9 | 6.632979 | 0.5828315 | 9.267021 |
| Kentucky | 13.5 | 7.379354 | 0.4533812 | 6.120646 |
| Idaho | 13.9 | 8.077759 | 0.4188662 | 5.822241 |
| Arizona | 10.0 | 5.930541 | 0.4069459 | 4.069459 |
| Illinois | 8.8 | 5.357481 | 0.3911954 | 3.442519 |
states with the most change:
kable(head(arrange(marry_minmax, percchange)))
| State | 1990 | 2016 | percchange | abschange |
|---|---|---|---|---|
| District of Columbia | 8.2 | 8.149214 | 0.0061934 | 0.0507861 |
| Hawaii | 16.4 | 15.555557 | 0.0514904 | 0.8444429 |
| Montana | 8.6 | 7.840617 | 0.0883004 | 0.7593830 |
| North Carolina | 7.8 | 6.967624 | 0.1067149 | 0.8323764 |
| West Virginia | 7.2 | 6.354097 | 0.1174865 | 0.8459028 |
| New York | 8.6 | 7.451752 | 0.1335172 | 1.1482478 |
From the below plot we can see that most states have reduced marriage rates by around 20 to 40 percent:
ggplot(marry_minmax, aes(x=reorder(State, -percchange), y=percchange)) +
geom_bar(stat = "identity", width=0.9) +
coord_flip() +
ggtitle("Percentage reduction in Birthrate by State 2016 vs 1990") +
xlab("State") +
ylab("percentage reduction in birthrate")+
scale_fill_brewer(palette="Oranges")