Jennifer Ganeles
5/5/19

Marriage Rate and Population by U.S. County: Are people less likely to commit in areas with more options for partners?

This week’s homework uses spatial mapping to explore how population size might compare with marriage rates across the United States. Do higher concentrations of people affect the decision to commit to marriage? Using Social Explorer, I retrieved my data from the American Community Survey (5-Year Estimates). For this analysis, “Marriage Rate” refers to those who made the decision to get married, including both currently married as well as formerly married individuals (e.g. widows, divorcées..etc.).

Using Social Explorer to retrieve county-level information:

Marriage rate was calculated by dividing the total number of people in each county who have made the decision to get married by the total population in each county of those over the age of 15.

library(readr)
library(dplyr)
library(magrittr)
acsdata<-read_csv("/Users/jenniferganeles/Downloads/R12142286_SL050.csv")%>%
    rename(Total_Pop=SE_A00001_001,
         STATEFP=Geo_STATE,
         COUNTYFP=Geo_COUNTY,
         GEOID=Geo_FIPS,
         PopOver15=SE_A11001_001,
         Never_Married=SE_A11001_002,
         Married=SE_A11001_003,
         Separated=SE_A11001_004,
         Widowed=SE_A11001_005,
         Divorced=SE_A11001_006)%>%
  mutate(NM_Rate=Never_Married/PopOver15*100,
         Has_Been_Married=Married+Separated+Widowed+Divorced,
         Marriage_Rate=Has_Been_Married/PopOver15*100)%>%
      select(STATEFP, COUNTYFP, GEOID, Never_Married, Has_Been_Married, PopOver15, NM_Rate, Marriage_Rate)
   
head(acsdata)
summary(acsdata)
   STATEFP            COUNTYFP            GEOID           Never_Married    
 Length:3220        Length:3220        Length:3220        Min.   :     13  
 Class :character   Class :character   Class :character   1st Qu.:   2354  
 Mode  :character   Mode  :character   Mode  :character   Median :   5802  
                                                          Mean   :  27087  
                                                          3rd Qu.:  15678  
                                                          Max.   :3429945  
 Has_Been_Married    PopOver15          NM_Rate      Marriage_Rate  
 Min.   :     54   Min.   :     67   Min.   :10.14   Min.   :32.12  
 1st Qu.:   6737   1st Qu.:   9080   1st Qu.:22.84   1st Qu.:68.36  
 Median :  15218   Median :  21318   Median :26.43   Median :73.57  
 Mean   :  54551   Mean   :  81638   Mean   :27.89   Mean   :72.11  
 3rd Qu.:  38217   3rd Qu.:  54564   3rd Qu.:31.64   3rd Qu.:77.16  
 Max.   :4797906   Max.   :8227851   Max.   :67.88   Max.   :89.86  

Using the Tigris Package to Retrieve County-Level Shapefiles:

library(sf)
library(tigris)
options(tigris_class="sf")
t_county<-counties(cb=TRUE)
names(t_county)
 [1] "STATEFP"  "COUNTYFP" "COUNTYNS" "AFFGEOID" "GEOID"    "NAME"     "LSAD"    
 [8] "ALAND"    "AWATER"   "geometry"

Merging the Data:

comb_data<-t_county%>%
  left_join(acsdata, by="GEOID")

Excluding Peripheral States:

t_comb_data_sub<-comb_data%>%
  filter(STATEFP.x !="02")%>%
  filter(STATEFP.x !="15")%>%
  filter(STATEFP.x !="60")%>%
  filter(STATEFP.x !="66")%>%
  filter(STATEFP.x !="69")%>%
  filter(STATEFP.x !="72")%>%
  filter(STATEFP.x !="78")

Map #1: Population by U.S. County:

library(tmap)
library(tmaptools)
library(RColorBrewer)
us_states<-t_comb_data_sub%>%
  aggregate_map(by="STATEFP.x")
USPopulationRate<-tm_shape(t_comb_data_sub, projection=2163) +tm_fill("PopOver15", title="Population", breaks=c(74,11214,25848,66608,10105722), style="fixed", palette="PuBu", border.col="grey", border.alpha=.4)+tm_shape(us_states)+tm_borders(lwd=.36, col="black", alpha=1) +tm_layout(panel.labels="Population By U.S. County", legend.outside = TRUE, legend.outside.position = "bottom")
USPopulationRate

Due to the large range of population sizes across the country, I manually included legend breaks using the minimum, 1st quartile, median, 3rd quartile, and maximum values.

Map #2: Marriage Rate by U.S. County:

MarriageRate<-tm_shape(t_comb_data_sub, projection = 2163)+tm_fill("Marriage_Rate", title="Marriage_Rate", breaks=c(32.12, 68.36, 73.57, 77.16, 89.86), style="fixed", palette="Reds", border.col="grey", border.alpha=.4)+tm_shape(us_states)+tm_borders(lwd=.36, col="black", alpha=1)+tm_layout(panel.labels="Marriage Rate by U.S. County", legend.outside=TRUE, legend.outside.position = "bottom")
MarriageRate

Once again, I created legend breaks using the minimum, 1st quartile, median, 3rd quartile, and maximum values.

library(grid)
grid.newpage()
pushViewport(viewport(layout=grid.layout(1,2)))
print(USPopulationRate, vp=viewport(layout.pos.col=1))
print(MarriageRate, vp=viewport(layout.pos.col=2))

As one can see from the maps above, there seems to be an inverse relationship between population size and the decision to get married, though this relationship is not perfect. Unsurprisingly, population size is largest along the coast and arond major cities. Marriage rates, on the other hand, appear to be lower in more populated states, such as California and New York. This may suggest that people are less likley to commit when more options for partners are available. However, my analysis is limited as I have not controlled for other factors such as migration and age.

Non-Spatial County-Level Information:

library(ggplot2)
library(ggthemes)
ggplot(data=acsdata, aes(Marriage_Rate))+geom_histogram(aes(fill="Red"))+theme_tufte()+ labs(x="Marriage Rate", y="Number of U.S. Counties")+theme(legend.position="none")

As the histogram above shows, county marriage rates range from around 32% to about 90% with the highest number of counties having a marriage rate of about 73%.

ggplot(data=acsdata, aes(x=PopOver15,y=Marriage_Rate))+
  geom_point()+stat_smooth(method="lm")+ theme_tufte()+labs(x="Population", y="Marriage Rate")

The above linear model graph suggests that as county populations increase, marriage rates tend to decrease.

Spatial vs. Non-Spatial Information:

Spatial mapping describes the absolute and relative location of geographic features, whereas non-spatial data can only describe the characteristics of a geographical feature using numbers, characters, or logical statements. Non-spatial data is oftentimes presented in a way that is difficult for the layman to comprehend or conceptualize. When research is centered around patterns connected to location (e.g. the suitability of a place for a specific activity or event), spatial mapping provides a more understandable visualization of complex spatial issues. However, spatial mapping cannot always provide the level of specificity needed for certain analyses. Non-spatial statistical analysis is therefore needed as well. In my case, spatial mapping provided understandable and intriguing information regarding location, population size, and the tendency to get married, but I used non-spatial data, such as summary statistics, in order to help create the maps in a way that presented the most information.

Example of cb=TRUE vs. cb=FALSE:

When “Cb” (cartographic boundary shapefiles) is set to true, the map is more detailed and precise, whereas “cb”= FALSE produces a less detailed map (see both versions below). The most obvious differences between the below maps is the presence or absence of negative space. When cb=FALSE, certain details, such as lakes and other bodies of water, become non-existent, whereas cb=TRUE is able to differentiate between land and water in areas that are unclear.

t_county2<-counties(cb=FALSE)
names(t_county2)
 [1] "STATEFP"  "COUNTYFP" "COUNTYNS" "GEOID"    "NAME"     "NAMELSAD" "LSAD"    
 [8] "CLASSFP"  "MTFCC"    "CSAFP"    "CBSAFP"   "METDIVFP" "FUNCSTAT" "ALAND"   
[15] "AWATER"   "INTPTLAT" "INTPTLON" "geometry"
comb_data2<-t_county2%>%
  left_join(acsdata, by="GEOID")
t_comb_data_sub2<-comb_data2%>%
  filter(STATEFP.x !="02")%>%
  filter(STATEFP.x !="15")%>%
  filter(STATEFP.x !="60")%>%
  filter(STATEFP.x !="66")%>%
  filter(STATEFP.x !="69")%>%
  filter(STATEFP.x !="72")%>%
  filter(STATEFP.x !="78")
us_states2<-t_comb_data_sub2%>%
  aggregate_map(by="STATEFP.x")
#cb=TRUE
tm_shape(t_comb_data_sub, projection = 2163)+tm_fill("Marriage_Rate", title="Marriage Rate", breaks=c(32.12, 68.36, 73.57, 77.16, 89.86), style="fixed", palette="Reds", border.col="grey", border.alpha=.4)+tm_shape(us_states)+tm_borders(lwd=.36, col="black", alpha=1)+tm_layout(panel.labels="cb=TRUE")

#cb=FALSE
tm_shape(t_comb_data_sub2, projection=2163) +tm_fill("Marriage_Rate", title="Marriage Rate", breaks=c(32.12, 68.36, 73.57, 77.16, 89.86), style="fixed", palette="Reds", border.col="grey", border.alpha=.4)+tm_shape(us_states2)+tm_borders(lwd=.36, col="black", alpha=1) +tm_layout(panel.labels="cb=FALSE")

---
title: "Soc 712: Homework 10"
output: html_notebook
---
*Jennifer Ganeles*
<br/>*5/5/19*

#Marriage Rate and Population by U.S. County: Are people less likely to commit in areas with more options for partners?

This week's homework uses spatial mapping to explore how population size might compare with marriage rates across the United States. Do higher concentrations of people affect the decision to commit to marriage? Using Social Explorer, I retrieved my data from the American Community Survey (5-Year Estimates). For this analysis, "Marriage Rate" refers to those who made the decision to get married, including both currently married as well as formerly married individuals (e.g. widows, divorcées..etc.). 

###Using Social Explorer to retrieve county-level information:
Marriage rate was calculated by dividing *the total number of people in each county who have made the decision to get married* by *the total population in each county of those over the age of 15*.
```{r message=FALSE, warning=FALSE}
library(readr)
library(dplyr)
library(magrittr)

acsdata<-read_csv("/Users/jenniferganeles/Downloads/R12142286_SL050.csv")%>%
    rename(Total_Pop=SE_A00001_001,
         STATEFP=Geo_STATE,
         COUNTYFP=Geo_COUNTY,
         GEOID=Geo_FIPS,
         PopOver15=SE_A11001_001,
         Never_Married=SE_A11001_002,
         Married=SE_A11001_003,
         Separated=SE_A11001_004,
         Widowed=SE_A11001_005,
         Divorced=SE_A11001_006)%>%
  mutate(NM_Rate=Never_Married/PopOver15*100,
         Has_Been_Married=Married+Separated+Widowed+Divorced,
         Marriage_Rate=Has_Been_Married/PopOver15*100)%>%
      select(STATEFP, COUNTYFP, GEOID, Never_Married, Has_Been_Married, PopOver15, NM_Rate, Marriage_Rate)
   
head(acsdata)
summary(acsdata)
```




###Using the Tigris Package to Retrieve County-Level Shapefiles:
```{r message=FALSE}
library(sf)
library(tigris)
options(tigris_class="sf")
t_county<-counties(cb=TRUE)
names(t_county)
```
###Merging the Data:
```{r}
comb_data<-t_county%>%
  left_join(acsdata, by="GEOID")
```

###Excluding Peripheral States:
```{r}
t_comb_data_sub<-comb_data%>%
  filter(STATEFP.x !="02")%>%
  filter(STATEFP.x !="15")%>%
  filter(STATEFP.x !="60")%>%
  filter(STATEFP.x !="66")%>%
  filter(STATEFP.x !="69")%>%
  filter(STATEFP.x !="72")%>%
  filter(STATEFP.x !="78")
```


### Map #1: Population by U.S. County:

```{r warning=FALSE}
library(tmap)
library(tmaptools)
library(RColorBrewer)
us_states<-t_comb_data_sub%>%
  aggregate_map(by="STATEFP.x")
USPopulationRate<-tm_shape(t_comb_data_sub, projection=2163) +tm_fill("PopOver15", title="Population", breaks=c(74,11214,25848,66608,10105722), style="fixed", palette="PuBu", border.col="grey", border.alpha=.4)+tm_shape(us_states)+tm_borders(lwd=.36, col="black", alpha=1) +tm_layout(panel.labels="Population By U.S. County", legend.outside = TRUE, legend.outside.position = "bottom")
USPopulationRate
```
Due to the large range of population sizes across the country, I manually included legend breaks using the minimum, 1st quartile, median, 3rd quartile, and maximum values. 

###Map #2: Marriage Rate by U.S. County:

```{r}
MarriageRate<-tm_shape(t_comb_data_sub, projection = 2163)+tm_fill("Marriage_Rate", title="Marriage_Rate", breaks=c(32.12, 68.36, 73.57, 77.16, 89.86), style="fixed", palette="Reds", border.col="grey", border.alpha=.4)+tm_shape(us_states)+tm_borders(lwd=.36, col="black", alpha=1)+tm_layout(panel.labels="Marriage Rate by U.S. County", legend.outside=TRUE, legend.outside.position = "bottom")
MarriageRate
```
Once again, I created legend breaks using the minimum, 1st quartile, median, 3rd quartile, and maximum values.

```{r message=FALSE}
library(grid)
grid.newpage()
pushViewport(viewport(layout=grid.layout(1,2)))
print(USPopulationRate, vp=viewport(layout.pos.col=1))
print(MarriageRate, vp=viewport(layout.pos.col=2))
```

As one can see from the maps above, there seems to be an inverse relationship between population size and the decision to get married, though this relationship is not perfect. Unsurprisingly, population size is largest along the coast and arond major cities. Marriage rates, on the other hand, appear to be lower in more populated states, such as California and New York. This may suggest that people are less likley to commit when more options for partners are available. However, my analysis is limited as I have not controlled for other factors such as migration and age. 

###Non-Spatial County-Level Information: 
```{r}
library(ggplot2)
library(ggthemes)
ggplot(data=acsdata, aes(Marriage_Rate))+geom_histogram(aes(fill="Red"))+theme_tufte()+ labs(x="Marriage Rate", y="Number of U.S. Counties")+theme(legend.position="none")
```
As the histogram above shows, county marriage rates range from around 32% to about 90% with the highest number of counties having a marriage rate of about 73%.

```{r}
ggplot(data=acsdata, aes(x=PopOver15,y=Marriage_Rate))+
  geom_point()+stat_smooth(method="lm")+ theme_tufte()+labs(x="Population", y="Marriage Rate")
```
The above linear model graph suggests that as county populations increase, marriage rates tend to decrease. 

###Spatial vs. Non-Spatial Information:
Spatial mapping describes the absolute and relative location of geographic features, whereas non-spatial data can only describe the characteristics of a geographical feature using numbers, characters, or logical statements. Non-spatial data is oftentimes presented in a way that is difficult for the layman to comprehend or conceptualize. When research is centered around patterns connected to location (e.g. the suitability of a place for a specific activity or event), spatial mapping provides a more understandable visualization of complex spatial issues.  However, spatial mapping cannot always provide the level of specificity needed for certain analyses. Non-spatial statistical analysis is therefore needed as well. In my case, spatial mapping provided understandable and intriguing information regarding location, population size, and the tendency to get married, but I used non-spatial data, such as summary statistics, in order to help create the maps in a way that presented the most information. 


###Example of cb=TRUE vs. cb=FALSE: 
When "Cb" (cartographic boundary shapefiles) is set to true, the map is more detailed and precise, whereas "cb"= FALSE produces a less detailed map (see both versions below). The most obvious differences between the below maps is the presence or absence of negative space. When cb=FALSE, certain details, such as lakes and other bodies of water, become non-existent, whereas cb=TRUE is able to differentiate between land and water in areas that are unclear.
```{r message=FALSE}
t_county2<-counties(cb=FALSE)
names(t_county2)
```

```{r}
comb_data2<-t_county2%>%
  left_join(acsdata, by="GEOID")
```
```{r}
t_comb_data_sub2<-comb_data2%>%
  filter(STATEFP.x !="02")%>%
  filter(STATEFP.x !="15")%>%
  filter(STATEFP.x !="60")%>%
  filter(STATEFP.x !="66")%>%
  filter(STATEFP.x !="69")%>%
  filter(STATEFP.x !="72")%>%
  filter(STATEFP.x !="78")
```

```{r}
us_states2<-t_comb_data_sub2%>%
  aggregate_map(by="STATEFP.x")
``` 

```{r}
#cb=TRUE
tm_shape(t_comb_data_sub, projection = 2163)+tm_fill("Marriage_Rate", title="Marriage Rate", breaks=c(32.12, 68.36, 73.57, 77.16, 89.86), style="fixed", palette="Reds", border.col="grey", border.alpha=.4)+tm_shape(us_states)+tm_borders(lwd=.36, col="black", alpha=1)+tm_layout(panel.labels="cb=TRUE")
#cb=FALSE
tm_shape(t_comb_data_sub2, projection=2163) +tm_fill("Marriage_Rate", title="Marriage Rate", breaks=c(32.12, 68.36, 73.57, 77.16, 89.86), style="fixed", palette="Reds", border.col="grey", border.alpha=.4)+tm_shape(us_states2)+tm_borders(lwd=.36, col="black", alpha=1) +tm_layout(panel.labels="cb=FALSE")
```

