The data I have selected for this week’s homework is from Social Explorer. I have chosen the United States Health Data (2016). The dataset included information by county regarding Quality of Life, Fair/Poor Health, Low Birthweight, Health Care Providers, Health Insurance, etc. but I will be focusing on teen births. I want to determine which counties of the United States have a higher prevalence of teen births.

Reading in Data & Maps

library(tigris)
library(tmap)
library(dplyr)
library(readr)

options(tigris_class = "sf")
countymap <- counties(cb = TRUE)
ushealth <- read.csv("/Users/rachel_ramphal/Documents/Data Sets/USHealthData.csv") %>%
  rename(TeenBirths = SE_T010_001,
         BirthPerPop = SE_NV008_001,
         GEOID = Geo_FIPS) 
  countymap$GEOID <- as.integer(countymap$GEOID)
  
  mergedmap <- left_join(countymap, ushealth, by="GEOID") 
    mergedmap$STATEFP <- as.integer(mergedmap$STATEFP) 

Spatial Approach

Creating Map for Teen Births in the United States

This map will show the number of teen births in each county.

library(tigris)
library(tmap)
library(tmaptools)
library(dplyr)

usmap <- mergedmap %>%
    filter(STATEFP != 02) %>%
    filter(STATEFP != 15) %>%
    filter(STATEFP != 60) %>%
    filter(STATEFP != 66) %>%
    filter(STATEFP != 69) %>%
    filter(STATEFP != 72) %>%
    filter(STATEFP != 78) %>%
    filter(STATEFP != 79) 
  
us_states <- mergedmap %>%
    aggregate_map(by = "STATEFP")
  
tm_shape(usmap, projection = 2163) + 
    tm_polygons("TeenBirths", palette = "Blues") +
      tm_shape(us_states) +
        tm_borders(lwd = .5, col = "black", alpha = 1) +
          tm_layout(title = "Teen Births in the U.S. (2016)",
                    title.position = c("center", "top"),
                    legend.position = c("left", "bottom"),
                    legend.text.size = .6,
                    frame = FALSE)

By looking at this map it seems that a county in California has the highest number of teen births in the country. Let’s take a closer look to determine which county this is.

Taking a Closer Look at California

This map will show the teen births per county in the state of California.

california <- mergedmap %>%
    filter(STATEFP == 6)

tm_shape(california, projection = 2163) +
    tm_polygons("TeenBirths", palette = "RdYlGn", border.col = "black") +
       tm_text("NAME", size = .3) + 
           tm_layout(title = "Teen Births in California (2016)",
                    title.position = c("left", "top"),
                    legend.position = c("left", "top"),
                    legend.text.size = .5,
                    legend.outside = TRUE,
                    frame = FALSE)

From this map we are able to determine that Los Angeles has the highest amount of teen births in the state, and possibly the country.

While these maps may make it seem like Los Angeles has the largest number of teen births, it does not take into account the population of each county. Some counties have a much smaller population than others which would affect the way it looks on the map. If we look at the rates of teen births by the population of each county the map may look very different.

Teen Birth Rates Per 100,000 Population in the United States

library(tigris)
library(tmap)
library(tmaptools)
library(dplyr)

usmappop <- mergedmap %>%
    filter(STATEFP != 02) %>%
    filter(STATEFP != 15) %>%
    filter(STATEFP != 60) %>%
    filter(STATEFP != 66) %>%
    filter(STATEFP != 69) %>%
    filter(STATEFP != 72) %>%
    filter(STATEFP != 78) %>%
    filter(STATEFP != 79) 
  
tm_shape(usmappop, projection = 2163) + 
    tm_polygons("BirthPerPop", palette = "PRGn") +
      tm_shape(us_states) +
        tm_borders(lwd = .5, col = "black", alpha = 1) +
           tm_layout(title = "Teen Birth Rate in the U.S. (2016)",
                     title.position = c("center", "top"),
                     legend.position = c("left", "bottom"),
                     legend.text.size = .6,
                     frame = FALSE)

This map tells a much different story from the first U.S. map. By looking at this map (that takes the population of each county into account) we can see than counties in Texas, New Mexico, Montana, North & South Dakota, etc. have higher teen birth rates than Los Angeles. We can determine here that Texas has the most counties with higher teen birth rates than any other state. This is not the same information obtained from the first map. While Los Angeles may have the highest count of teen births in the country it does not have the highest rate of teen births. Los Angeles may have a much larger population than these other counties that are now seen to have a higher teen birth rate.

Looking at California Again – Teen Birth Rates Per 100,000 Population

This map will show the teen birth rate of each county in California.

californiapop <- mergedmap %>%
    filter(STATEFP == 6)

tm_shape(californiapop, projection = 2163) +
    tm_polygons("BirthPerPop", palette = "RdPu", border.col = "white") +
       tm_text("NAME", size = .3) +
           tm_layout(title = "Teen Birth Rate in California (2016)",
                    title.position = c("left", "top"),
                    legend.position = c("left", "top"),
                    legend.text.size = .7,
                    legend.outside = TRUE,
                    frame = FALSE)

By looking at this map we can see that Los Angeles does not even have the highest teen birth rate in the state of California. Fresno, Tulare, Kern, etc. all have higher teen birth rates than Los Angeles.

A Closer Look at Texas

This map will show the teen birth rate of each county in Texas.

texaspop <- mergedmap %>%
    filter(STATEFP == 48)

tm_shape(texaspop, projection = 2163) +
    tm_polygons("BirthPerPop", palette = "Greens", border.col = "black") +
       tm_text("NAME", size = .4) +
           tm_layout(title = "Teen Birth Rate in Texas (2016)",
                    title.position = c("left", "top"),
                    legend.position = c("left", "top"),
                    legend.text.size = .7,
                    legend.outside = TRUE,
                    frame = FALSE)

This map shows that the county of Brooks has the highest teen birth rate in Texas, and it is one of the counties with the highest teen birth rates in the country.

Non-Spatial Approach

library(ggplot2)

tbirthratebar <- mergedmap %>%
  group_by(STATEFP, TeenBirths) %>%
  
  ggplot() +
    geom_col(aes(y = BirthPerPop, x = STATEFP), fill = "cornflowerblue") +
      labs(title = "Teen Birth Rates (Per 100,000 Population) by State",
           x = "State Number",
           y = "Number of Teen Births")

tbirthratebar

Spatial vs. Non-Spatial Approach

The spatial approach is much more readable than a non-spatial approach for this type of data. From the non-spatial approach we can see that one line is much longer than the other, for state 48 which is Texas. From the spatial approach it was easier to determine that Texas has the highest teen birth rate. The spatial approach also told a clearer story since it involved county and state data, using the actual map of the United States is a nicer way to present the information. It was also able to show the missing data much easier. However this does not necessarily mean that a spatial approach is always better than a non-spatial one, it depends on what you are researching and the type of data you are using.

Slide 33: cb=TRUE vs cb=FALSE

options(tigris_class = "sf")
cb_map <- counties(cb = FALSE)

usteenbirth <- read.csv("/Users/rachel_ramphal/Documents/Data Sets/USHealthData.csv") %>%
  rename(TeenBirths = SE_T010_001,
         BirthPerPop = SE_NV008_001,
         GEOID = Geo_FIPS) 

  cb_map$GEOID <- as.integer(cb_map$GEOID)
  
  newmerge <- left_join(cb_map, usteenbirth, by="GEOID") 
    newmerge$STATEFP <- as.integer(newmerge$STATEFP) 
newtexaspop <- newmerge %>%
    filter(STATEFP == 48)

tm_shape(newtexaspop, projection = 2163) +
    tm_polygons("BirthPerPop", palette = "Reds") +
       tm_text("NAME", size = .4) +
           tm_layout(title = "Teen Birth Rate in Texas (2016)",
                    title.position = c("left", "top"),
                    legend.position = c("left", "top"),
                    legend.text.size = .7,
                    legend.outside = TRUE,
                    frame = FALSE)

When I used the cd=FALSE command the data took much longer to load when I ran the code to produce a new map of Texas.