CSC 110 - Lab 8

Intro

In this lab, you will practice creating static and interactive data-driven maps in R. This lab requires quite a few packages that you may not have used before in R. Be sure that any new packages are installed before loading them. Please allow enough time for knitting your file correctly. Exercises are modeled after those from assigned chapters in your text, readings, and in-class activities.

Submission: Submit your knitted html document (one file per pair of students) to my Dropbox folder here. In case the link doesn’t work, the url is: https://www.dropbox.com/request/2JudOzAG7ZWgbNUrMJkT. Your assignment must be submitted as an HTML file generated in RStudio. You may not need R code to answer every question. If you answer without using R code, delete the code chunk. If the question requires R code, make sure you display R code. If the question requires a figure, make sure you display a figure. Make sure that both students’ names are printed at the top of the document. Please proofread your document before submitting. This lab is due on Wednesday, November 20, 2019 at 12:15pm. ### Datasets - Peru Geospatial Data: Data made available via the getData() function - Peru Clinics and Hospitals: Data available from the Humanitarian Data Exchange (HDX). There are four total data files. - US Cities Data: The data were derived from here: https://simplemaps.com/data/us-cities - NC Income Data: The data were derived from here: https://datausa.io/profile/geo/north-carolina ### Load packages You will need to load the following packages: - sf: sf stands for special features; used for working with shapefiles and geospatial data - raster: For using the function getData() to obtain map files - sp: Allows for dealing with geometric and shapefile data - spData: Contains shapefile datasets of locations around the world - spDataLarge: Similar to spData, but with larger datasets - maps: Contains geographic datasets - tmap & tmaptools: Used for plotting static maps - mapview: Can be used to make static maps interactive - ggmap: For incorporating mapping with ggplot - leaflet: For creating interactive maps - tidyverse : For using pipes and wrangling functions

Part 1: Static (non-interactive) Maps

Try this first! Run the code below.

ma1 <- tm_shape(nz) + tm_fill(col = "red", alpha = 0.3)
ma2 <- tm_shape(nz) + tm_borders(col = "blue")
ma3 <- tm_shape(nz) + tm_fill(col="red", alpha = 0.3) +
        tm_borders(col="blue", lty = 2)
ma4 <- tm_shape(nz) + tm_fill(col = "Land_area", alpha = 0.3, title=expression("Area (km"^2*")")) +
        tm_borders(col = "blue", lty = 2) +
        tm_layout(title="New Zealand") +
        tm_legend(position = c("right", "bottom"))
tmap_arrange(ma1, ma2, ma3, ma4)

#### The ‘megaplot’ above consists of four maps of New Zealand, each with a different combination of aesthetics and enchancements. In the following exercises, you will use elements of this example to assemble your own megaplot. ### Exercise 1: Load in geospatial data for the country Peru. First, visit this link to find the correct ISO-2 country code for Peru.

# Hints: Load Peru data, with level = 2 to get second-level administrative boundaries (provinces)
# Hint:
 peru <- getData('GADM', country='PE', level = 2)
# Create a base map of Peru, to which we will add additional layers.
peru_base <-  tm_shape(peru) + tm_fill() + tm_borders()
peru_base

# Check out your map

Exercise 2: Change the fill color of our base Peru map to ‘darkred’ and set the transparency of the fill to 0.5

# Make changes to the map and save as peru_base
peru_base <- tm_shape(peru) + tm_fill(col= 'darkred', alpha=0.5) + tm_borders()
peru_base

# Print the map to check your changes

Exercise 3: Load in data on health clinics and hospitals in Peru. Add points to

# Hint: Make sure that all four files are saved to the same folder, even though you will only read in one.
peru_health <- st_read("healthsites.shp")

## Reading layer `healthsites' from data source `/Users/savagetav/Desktop/healthsites.shp' using driver `ESRI Shapefile'
## Simple feature collection with 1286 features and 14 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -81.27151 ymin: -18.15393 xmax: -69.09038 ymax: -0.9190018
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs

# Plot the points representing where health clinics and hospitals are located throughout Peru
peru_health1 <- tm_shape(peru_health) + tm_dots(size = 0.5)
peru_health1

### Exercise 4: Add the dots from Exercise 3 onto the base Peru map. Assign this map to p2. Are there any areas where clinics and hospitals are clustered?

p2 <- tm_shape(peru_health) + tm_dots(size = 0.5) + peru_base
p2

ANSWER: Based upon the map, the areas where clinics and hospitals seemed to be clustered are along the western border of Peru.

Exercise 5: Create a color palette to differentiate clinics from hospitals on the map. Assign this map to `p3`

p3 <- tm_shape(peru_health) + tm_dots(col = "type") + peru_base
p3

### Exercise 6: Create a polished version of the previous map.

# Add a title to the previous map, Capitalize the legend title, and save the plot to p4. 
p4 <- tm_shape(peru_health) + tm_dots(col = "type") + peru_base + tm_layout(title="Peru's Hospitals and Clinics") 
p4

### Exercise 7: Arrange your base map, p2, p3, and p4 into a megaplot.

peru_base <-  tm_shape(peru) + tm_fill(col= 'darkred', alpha=0.5) + tm_borders()
p2 <- tm_shape(peru_health) + tm_dots() + peru_base
p3 <- tm_shape(peru_health) + tm_dots(col = "type") + peru_base
p4 <- tm_shape(peru_health) + tm_dots(col = "type") + peru_base + tm_layout(title="Peru's Hospitals and Clinics") 
tmap_arrange(peru_base, p2, p3, p4)

#### Now let’s move on to interactive maps. ## Part 2: Interactive Maps ### Exercise 8: Load the us_cities dataset, which is located at https://csc110.drchesmith.com/uscities.csv.

us_cities <- read_csv("https://csc110.drchesmith.com/uscities.csv")

## Parsed with column specification:
## cols(
##   city = col_character(),
##   city_ascii = col_character(),
##   state_id = col_character(),
##   state_name = col_character(),
##   county_fips = col_double(),
##   county_name = col_character(),
##   county_fips_all = col_character(),
##   county_name_all = col_character(),
##   lat = col_double(),
##   lng = col_double(),
##   population = col_double(),
##   density = col_double(),
##   source = col_character(),
##   military = col_logical(),
##   incorporated = col_logical(),
##   timezone = col_character(),
##   ranking = col_double(),
##   zips = col_character(),
##   id = col_double()
## )

Exercise 9: Map the 50 most populated cities using circle markers with a radius based on `population/1000000`. Which state appears to have the most cities in the top 50?

us_cities %>%
  top_n(50, population) %>%
  group_by(city) %>%
  leaflet() %>%
  addTiles() %>%
  addCircleMarkers(
    radius = ~population/1000000,
    stroke = FALSE, fillOpacity = 0.5)

## Assuming "lng" and "lat" are longitude and latitude, respectively

ANSWER: New York appears to have the most cities out of the top 50.

Exercise 10: Filter the US cities dataset to only the state of North Carolina. Load in the `nc_income` dataset. Join the two datasets. Calculate the city in each county with the highest household income (only for Race = `Total`). Using the resulting data frame, create an interactive map that plots the 10 cities with the overall highest household income.

nc <- us_cities%>%
  filter(state_id == "NC")
nc_income <- read_csv("https://csc110.drchesmith.com/nc_income.csv")

## Parsed with column specification:
## cols(
##   `ID Race` = col_double(),
##   Race = col_character(),
##   `ID Year` = col_double(),
##   Year = col_double(),
##   `Household Income by Race` = col_double(),
##   `Household Income by Race Moe` = col_double(),
##   Geography = col_character(),
##   `ID Geography` = col_character(),
##   `Slug Geography` = col_character()
## )

nc_joined <- left_join(nc, nc_income, by = c("county_name" = "Geography"))

# Filter US Cities dataset to only North Carolina cities and counties
# You can filter your data 
  # filter (race = total)
top_count <- nc_joined %>%
  filter(Race == "Total")%>%
  
  # group (my county name)
  group_by(county_name) %>%
  summarise(averageH = mean(`Household Income by Race`))%>%
  arrange(desc(averageH))%>%
  head(10)
  
  # summarise(mean income by race)
  # arrange by that descending variable and take top ten counties
  # create a new dataset where you pick latitude or longitude or take the average of each county's and then join the two ( == ) 

  counties_lat_long <- nc_joined%>%
    select(county_name, lat, lng)%>%
    group_by(county_name)%>%
    summarise_all(mean)
  
  left_join(top_count, counties_lat_long)%>%
  arrange(desc(averageH))

## Joining, by = "county_name"

## # A tibble: 10 x 4
##    county_name averageH   lat   lng
##    <chr>          <dbl> <dbl> <dbl>
##  1 Union          77691  35.0 -80.6
##  2 Wake           77318  35.8 -78.6
##  3 Orange         69940  36.0 -79.1
##  4 Mecklenburg    65588  35.3 -80.8
##  5 Moore          64184  35.2 -79.4
##  6 Cabarrus       61490  35.4 -80.6
##  7 Durham         59891  36.1 -78.9
##  8 Johnston       58111  35.5 -78.3
##  9 Chatham        57770  35.7 -79.3
## 10 Brunswick      56181  34.0 -78.2

# Load the nc_income data
# Join the two datasets
# Wrangle the data and create an interactive map

CSC 110 - Lab 8

Amanda Monahan and Tavis Braithwaite

November 20, 2019