In this lab, you will practice creating static and interactive data-driven maps in R. This lab requires quite a few packages that you may not have used before in R. Be sure that any new packages are installed before loading them. Please allow enough time for knitting your file correctly. Exercises are modeled after those from assigned chapters in your text, readings, and in-class activities.
Submission: Submit your knitted html document (one file per pair of students) to my Dropbox folder here. In case the link doesn’t work, the url is: https://www.dropbox.com/request/2JudOzAG7ZWgbNUrMJkT. Your assignment must be submitted as an HTML file generated in RStudio. You may not need R code to answer every question. If you answer without using R code, delete the code chunk. If the question requires R code, make sure you display R code. If the question requires a figure, make sure you display a figure. Make sure that both students’ names are printed at the top of the document. Please proofread your document before submitting. This lab is due on Wednesday, November 20, 2019 at 12:15pm. ### Datasets - Peru Geospatial Data: Data made available via the getData() function - Peru Clinics and Hospitals: Data available from the Humanitarian Data Exchange (HDX). There are four total data files. - US Cities Data: The data were derived from here: https://simplemaps.com/data/us-cities - NC Income Data: The data were derived from here: https://datausa.io/profile/geo/north-carolina ### Load packages You will need to load the following packages: - sf: sf stands for special features; used for working with shapefiles and geospatial data - raster: For using the function getData() to obtain map files - sp: Allows for dealing with geometric and shapefile data - spData: Contains shapefile datasets of locations around the world - spDataLarge: Similar to spData, but with larger datasets - maps: Contains geographic datasets - tmap & tmaptools: Used for plotting static maps - mapview: Can be used to make static maps interactive - ggmap: For incorporating mapping with ggplot - leaflet: For creating interactive maps - tidyverse : For using pipes and wrangling functions
ma1 <- tm_shape(nz) + tm_fill(col = "red", alpha = 0.3)
ma2 <- tm_shape(nz) + tm_borders(col = "blue")
ma3 <- tm_shape(nz) + tm_fill(col="red", alpha = 0.3) +
tm_borders(col="blue", lty = 2)
ma4 <- tm_shape(nz) + tm_fill(col = "Land_area", alpha = 0.3, title=expression("Area (km"^2*")")) +
tm_borders(col = "blue", lty = 2) +
tm_layout(title="New Zealand") +
tm_legend(position = c("right", "bottom"))
tmap_arrange(ma1, ma2, ma3, ma4)
#### The ‘megaplot’ above consists of four maps of New Zealand, each with a different combination of aesthetics and enchancements. In the following exercises, you will use elements of this example to assemble your own megaplot. ### Exercise 1: Load in geospatial data for the country Peru. First, visit this link to find the correct ISO-2 country code for Peru.
# Hints: Load Peru data, with level = 2 to get second-level administrative boundaries (provinces)
# Hint:
peru <- getData('GADM', country='PE', level = 2)
# Create a base map of Peru, to which we will add additional layers.
peru_base <- tm_shape(peru) + tm_fill() + tm_borders()
peru_base
# Check out your map
# Make changes to the map and save as peru_base
peru_base <- tm_shape(peru) + tm_fill(col= 'darkred', alpha=0.5) + tm_borders()
peru_base
# Print the map to check your changes
# Hint: Make sure that all four files are saved to the same folder, even though you will only read in one.
peru_health <- st_read("healthsites.shp")
## Reading layer `healthsites' from data source `/Users/savagetav/Desktop/healthsites.shp' using driver `ESRI Shapefile'
## Simple feature collection with 1286 features and 14 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: -81.27151 ymin: -18.15393 xmax: -69.09038 ymax: -0.9190018
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
# Plot the points representing where health clinics and hospitals are located throughout Peru
peru_health1 <- tm_shape(peru_health) + tm_dots(size = 0.5)
peru_health1
### Exercise 4: Add the dots from Exercise 3 onto the base Peru map. Assign this map to p2. Are there any areas where clinics and hospitals are clustered?
p2 <- tm_shape(peru_health) + tm_dots(size = 0.5) + peru_base
p2
ANSWER: Based upon the map, the areas where clinics and hospitals seemed to be clustered are along the western border of Peru.
p3p3 <- tm_shape(peru_health) + tm_dots(col = "type") + peru_base
p3
### Exercise 6: Create a polished version of the previous map.
# Add a title to the previous map, Capitalize the legend title, and save the plot to p4.
p4 <- tm_shape(peru_health) + tm_dots(col = "type") + peru_base + tm_layout(title="Peru's Hospitals and Clinics")
p4
### Exercise 7: Arrange your base map, p2, p3, and p4 into a megaplot.
peru_base <- tm_shape(peru) + tm_fill(col= 'darkred', alpha=0.5) + tm_borders()
p2 <- tm_shape(peru_health) + tm_dots() + peru_base
p3 <- tm_shape(peru_health) + tm_dots(col = "type") + peru_base
p4 <- tm_shape(peru_health) + tm_dots(col = "type") + peru_base + tm_layout(title="Peru's Hospitals and Clinics")
tmap_arrange(peru_base, p2, p3, p4)
#### Now let’s move on to interactive maps. ## Part 2: Interactive Maps ### Exercise 8: Load the
us_cities dataset, which is located at https://csc110.drchesmith.com/uscities.csv.
us_cities <- read_csv("https://csc110.drchesmith.com/uscities.csv")
## Parsed with column specification:
## cols(
## city = col_character(),
## city_ascii = col_character(),
## state_id = col_character(),
## state_name = col_character(),
## county_fips = col_double(),
## county_name = col_character(),
## county_fips_all = col_character(),
## county_name_all = col_character(),
## lat = col_double(),
## lng = col_double(),
## population = col_double(),
## density = col_double(),
## source = col_character(),
## military = col_logical(),
## incorporated = col_logical(),
## timezone = col_character(),
## ranking = col_double(),
## zips = col_character(),
## id = col_double()
## )
population/1000000. Which state appears to have the most cities in the top 50?us_cities %>%
top_n(50, population) %>%
group_by(city) %>%
leaflet() %>%
addTiles() %>%
addCircleMarkers(
radius = ~population/1000000,
stroke = FALSE, fillOpacity = 0.5)
## Assuming "lng" and "lat" are longitude and latitude, respectively
ANSWER: New York appears to have the most cities out of the top 50.
nc_income dataset. Join the two datasets. Calculate the city in each county with the highest household income (only for Race = Total). Using the resulting data frame, create an interactive map that plots the 10 cities with the overall highest household income.nc <- us_cities%>%
filter(state_id == "NC")
nc_income <- read_csv("https://csc110.drchesmith.com/nc_income.csv")
## Parsed with column specification:
## cols(
## `ID Race` = col_double(),
## Race = col_character(),
## `ID Year` = col_double(),
## Year = col_double(),
## `Household Income by Race` = col_double(),
## `Household Income by Race Moe` = col_double(),
## Geography = col_character(),
## `ID Geography` = col_character(),
## `Slug Geography` = col_character()
## )
nc_joined <- left_join(nc, nc_income, by = c("county_name" = "Geography"))
# Filter US Cities dataset to only North Carolina cities and counties
# You can filter your data
# filter (race = total)
top_count <- nc_joined %>%
filter(Race == "Total")%>%
# group (my county name)
group_by(county_name) %>%
summarise(averageH = mean(`Household Income by Race`))%>%
arrange(desc(averageH))%>%
head(10)
# summarise(mean income by race)
# arrange by that descending variable and take top ten counties
# create a new dataset where you pick latitude or longitude or take the average of each county's and then join the two ( == )
counties_lat_long <- nc_joined%>%
select(county_name, lat, lng)%>%
group_by(county_name)%>%
summarise_all(mean)
left_join(top_count, counties_lat_long)%>%
arrange(desc(averageH))
## Joining, by = "county_name"
## # A tibble: 10 x 4
## county_name averageH lat lng
## <chr> <dbl> <dbl> <dbl>
## 1 Union 77691 35.0 -80.6
## 2 Wake 77318 35.8 -78.6
## 3 Orange 69940 36.0 -79.1
## 4 Mecklenburg 65588 35.3 -80.8
## 5 Moore 64184 35.2 -79.4
## 6 Cabarrus 61490 35.4 -80.6
## 7 Durham 59891 36.1 -78.9
## 8 Johnston 58111 35.5 -78.3
## 9 Chatham 57770 35.7 -79.3
## 10 Brunswick 56181 34.0 -78.2
# Load the nc_income data
# Join the two datasets
# Wrangle the data and create an interactive map