Data Wrangling 9: Creating Maps

Linne Piedras-Sarabia

4/13/2024

Introduction

In this handout, we will learn to make static choropleth maps, sometimes called heat maps, using the ggplot2 package. A choropleth map is a map that shows a geographic landscape with units such as countries, states, or watersheds, where each region is colored according to a particular value such as population or density.

To make our maps we will need the packages, ggplot2, dplyr, us_maps, and maps. Observe that these packages, as well as many others, are called in the setup R chunk above. The maps package contains several simple data files that allow us to create maps. (Note: before working through this handout, you should be familiar with the basic ggplot and dplyr functions that we learned in classes 4 and 5).

Data: In addition to data files from the maps package, we will use the 2024_StatePopulation data, which includes the state name (region), estimated 2024 population (population), percentage of the US population (percentUS_pop), number of electric vehicles registered in the state in 2021 and 20221 (evreg_2021 and evreg_2022), and percentages of the population that are Black, Asian, Hispanic, White and Female, (e.g. per_asian), population density - number of people per square mile (pop_density), average student loan debt (avgstudentloan), total state student loan debt (totaldebt), and several other variables. We will load this dataframe in the chunk below, even though we won’t make use of it yet. Take a look at the column names and remember that you can come back here to check them if you need to.

# Load the continental US map data; name the data 'MainStates' and take a look.
MainStates <- map_data("state")

kable(head(MainStates)) %>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
        full_width = FALSE, position = "left")
long lat group order region subregion
-87.46201 30.38968 1 1 alabama NA
-87.48493 30.37249 1 2 alabama NA
-87.52503 30.37249 1 3 alabama NA
-87.53076 30.33239 1 4 alabama NA
-87.57087 30.32665 1 5 alabama NA
-87.58806 30.32665 1 6 alabama NA
# Read the state population data
StatePop <- read.csv("/home/share/Stat324/Data/StatePopulations2024.csv", as.is = TRUE)
datatable(StatePop)

The Maps Function

If you just want a quick U.S. map, you can use the R map function from the map package. In the chunk below we make a static US map. It’s very plain, so next we let R choose how to color each state. There are not a great many other customizations we can use with this map function.

# Use the map package to make a continental US map, with each state colored.
maps::map("state")

# Now color each state, letting R choose the colors
maps::map("state", fill = TRUE, col = palette())

We can use the same function to make a map of the world:

maps::map("world")

# Make a map of the world with the default color palette:
maps::map("world", fill = TRUE, col = palette())

Creating ggplot maps

Another easy way to get a plot of the US is to use the plot_usmap function. This will create a map of all 50 US states (including Alaska and Hawaii although they will not be in a geographically correct location). By default, this function does not color the states. We can use fill to change this. These plots are ggplot compatible, so we can customize them as we would any other ggplot.

# Create a base map showing all 50 US states
plot_usmap(regions = "states")

# Plot all 50 US states using red fill.
plot_usmap(regions = "states", fill = "red")

Of course, a better way to make a map is to start it as a ggplot. To make a ggplot map, first create a ggplot(), then add a geom_polygon with arguments of the data (the states in map_data, given the name MainStates), and aes (latitude and longitude values). If you include a color command, you can choose the color of the state boundaries.
Note: To simplify our typing, we have stored the state map data as MainStates.

# Plot the 48 continental states with ggplot2, using black borders and medium
# purple fill:
ggplot() + geom_polygon(data = MainStates, aes(x = long, y = lat, group = group),
    color = "white")

In this map, all of the states were colored the same way by default, and R chose to use the color black. As before, if we include the fill command, we can specify a color for all the states

The chunk below changes the colors in both maps:

# Plot the 48 continental states with ggplot2, using black borders and medium
# purple fill:
ggplot() + geom_polygon(data = MainStates, aes(x = long, y = lat, group = group),
    color = "white", fill = "mediumpurple")

Of course we’d probably rather color by state to more easily distinguish between the states. This is difficult to do with the plot_usmap function because the states do not have numerical values associated with their names. However, it is easy with a ggplot map because one of the variables in our MainStates data is the region (state name):

ggplot() + geom_polygon(data = map_data("state"), aes(x = long, y = lat, group = group,
    fill = region), color = "white")

Oops. We don’t want to see the “legend” which tells us how each state is colored, because each state has a different color. The legend is way too much information in this case. Let’s redo this map and tell R to not display a legend. (How are we doing that?)

ggplot() + geom_polygon(data = MainStates, aes(x = long, y = lat, group = group,
    fill = region), color = "white") + theme(legend.position = "none")

QUESTION: What command is added in CHUNK 9, and what effect does this addition have on the map? Do you think we should include one of these “projection” commands in future US maps? What about in world maps?

We added coord_map(“polyconic”). This command helps extend/distort the size and scale of the map while keeping its shape. I think we should include one of these projection commands, because it will allow us to customize. For a world map, it would depend because a world map is technically on a spherical surface, and I’d be curious to see how different projections tackle that.

ggplot() + geom_polygon(data = MainStates, aes(x = long, y = lat, group = group,
    fill = region), color = "white") + coord_map("polyconic") + theme(legend.position = "none")

There are several other projections you might want to try. The easiest way to experiment with the projections is to use the mplot function from the Console window. Move to the Console window, type mplot(MainStates) at the prompt. Choose option 3 (map). You should color by region and then you can run through the projections to see which ones work/are helpful for a US map.

Exercise 1: Move to the Console window, and type mplot(MainStates) at the prompt. Choose option 3 (map). You should color by region and choose none for the legend. Then run through the projections to see which ones work/are helpful for a US map. Do you suggest bicentric, elliptic, guyou, hex, or homing? What about lagrange, globular or mollweide?**

I would use globular or mollweide for sure. I would probably also use homing, but I’m not sure.

Note: We don’t need to include a projection command with a map made using plot_usmap because they all include the albers equal area projection by default. Take a look at the maps made in CHUNKS 4 AND 5 to see the difference in no projection (ggplot) and the albers equal area projection (us_map plot).


In the maps package documentation, we see that in addition to state, the maps package includes (among others) county, france, italy, nz, usa, us.cities, world, world.cities and world2 files.

Exercise 2: Make the necessary changes in the R EXERCISE 2 CHUNK below in order to make a map of all U.S.COUNTIES. First load the appropriate map data and save it with name USCounties. Then make a base map that uses only one color, but shows the county boundaries. Then make a second map that is colored by the counties.

Hints:

  1. We can find the name of the data in map_data by looking at the maps package information in the Files/Plots/Packages/Help/ window.
  2. Find the variable name for the counties with the names(USCounties) command.
  3. Replace the question marks in the EXERCISE 2 chunk with the correct words. Then “uncomment” (remove the hashtag from) each code line and knit the chunk.
USCounties <- map_data("county")
names("USCounties")
## NULL
# A map with only one color
ggplot() + geom_polygon(data = USCounties, aes(x = long, y = lat, group = group),
    color = "black", linewidth = 0.2, fill = "mistyrose") + coord_map("tetra")

# The map colored by county
ggplot() + geom_polygon(data = USCounties, aes(x = long, y = lat, group = group,
    fill = subregion), color = "black", linewidth = 0.1) + theme(legend.position = "none") +
    coord_map("tetra")


Exercise 3: Use the world file to create a ggplot base map of the world with white borders and “slateblue3” fill. Include the + coord_map("globular") command in order to improve the map projection.

AllWorld <- map_data("world")
ggplot() + geom_polygon(data = AllWorld, aes(x = long, y = lat, group = group), color = "white",
    fill = "slateblue3") + coord_map("globular")


Exercise 4: Adjust your code from the previous chunk so that your world map is colored by country (region). Make sure to tell R you don’t want a legend.

AllWorld <- map_data("world")
ggplot() + geom_polygon(data = AllWorld, aes(x = long, y = lat, group = group, fill = region),
    color = "white") + coord_map("globular") + theme(legend.position = "none")

Now let’s return to the US. Observe that the MainStates file has a column titled group. In all of our ggplot maps, within the aes function, we have included the command group = group. This is critical because it tells R the order in which it should “connect the dots”. Look at what happens when we omit this command:

ggplot() + geom_polygon(data = MainStates, aes(x = long, y = lat), color = "black",
    fill = "plum")

Chaos.


Adding Labels

One nice feature of the plot_usmap function is that we can easily add the 2 letter abbreviation for each state name because there is a column of these abbreviations in the data. This only requires including the command labels = TRUE.

# Plot all 50 US states with state abbreviations included:
plot_usmap(regions = "states", fill = "darkgreen", color = "gold", labels = TRUE,
    label_color = "white")

This is about the most we can do with the plot_usmap function, even though some state names overlap and we can’t see “HI”. So, let’s add the state names to a ggplot() map of the MainStates. We can’t add the 2-letter abbreviations because the MainStates data doesn’t contain the abbreviations. So we’ll add the full state names instead.

To do this, first, we will grab the state names and the centers of their latitudes and longitudes from the R dataset state. Within this data, there are variables state.center (x and y coordinates, respectively), and state.name. We’ll store this information as x,y, and state in a new dataframe called StateNameLocs.

Then we’ll create a ggplot map (StateNamesUSMap) using this new dataframe. We’ll add white state boundaries, color each state distinctly and make sure there is no legend. Finally, we’ll ask to see the map and show a label of the state name on each state.

StateNameLocs <- data.frame(x = state.center$x, y = state.center$y, state = state.name)

StateNamesUSMap <- ggplot(StateNameLocs, aes(x = x, y = y)) + geom_polygon(data = MainStates,
    color = "white", aes(x = long, y = lat, fill = region, group = group)) + theme(legend.position = "none") +
    coord_map("globular")

StateNamesUSMap + geom_label(aes(label = state), size = 2)

Observe that the state data includes Alaska and Hawaii so we get those state names on this map. We could filter the data to remove those states if we wish. Recall how we tell R to not include these two states: !(state %in% c("Alaska", "Hawaii").

StateNameLocs2 <- filter(StateNameLocs, !(state %in% c("Alaska", "Hawaii")))

StateNamesUSMap2 <- ggplot(StateNameLocs2, aes(x = x, y = y)) + geom_polygon(data = map_data("state"),
    color = "white", aes(x = long, y = lat, fill = region, group = group)) + theme(legend.position = "none") +
    coord_map("globular") + scale_fill_viridis_d(option = "plasma")

StateNamesUSMap2 + geom_label(aes(label = state), size = 2)

Exercise 5: What is the result of the size= 2 command? What happens if we change the 2 to 3? If we change it to 1? What if we omit the command? (Try these.)
This command sets the size of the labels to 2.

Observe that we used the viridis color package with option “plasma” to color the map in chunk 13. Other options you could try include “inferno”, “magma”, “mako”, “rocket”, “turbo” and “cividis”. (Do you like “plasma” or “rocket” better for this map?) If you just use scale_fill_viridis_d() with no option, R will use the viridis default coloring.

Regional Maps

It’s easy to make a map of only some states and include those state names (as labels or simply text). Let’s adjust our code to make a US map of just the western states Washington, Oregon, Idaho, California, and Nevada. We will need to filter our state locations and then restrict the regions in our geom_polygon, but we know how to do that. (Be careful - the states dataset capitalizes all the state names, but map_data("state") (and therefore MainStates) does not.)

WestStateNames <- filter(StateNameLocs, state %in% c("Washington", "Oregon", "Idaho",
    "California", "Nevada"))

WestStateUSMap <- ggplot(WestStateNames, aes(x = x, y = y)) + geom_polygon(data = map_data("state",
    region = c("washington", "oregon", "idaho", "california", "nevada")), color = "white",
    aes(x = long, y = lat, fill = region, group = group)) + theme(legend.position = "none") +
    coord_map("globular")

WestStateUSMap + geom_text(aes(label = state), size = 3)

QUESTION: How are the state names in this map different from those in the map of CHUNK 13? How was the geom_ command that applied the state names different in CHUNK 14 from CHUNK 13?
Chunk 14 uses geom_text rather than geom_label. geom_text() displays text, while geom_label() has a white background box.


Customizing Maps

Now that we have a good base map of the mainland U.S. states, we can color each state by one of the variables in our data. As a first example, let’s use each state’s 2024 estimated population. We will need to use the dplyr package to join the MainStates and StatePop data (remember we loaded the StatePop file in CHUNK 1). Both dataframes have a variable titled region (which contains the state names, not capitalized), so we’ll use region as our “by variable” in an inner join. As always, after the join we’ll look at the joined dataframe to make sure we got what we wanted.

# Use the dplyr package to join the MainStates and StatePopulation files
MergedStates <- inner_join(MainStates, StatePop, by = "region")
datatable(MergedStates, options = list(autoWidth = FALSE))
datatable(MainStates)
datatable(StatePop)

Exercise 6: What columns were added to the StatePop file when it was joined with the MainStates file?
We added long, lat, group, order and subregion.


Next we create a base map of the mainland United States, using the 2024 state populations as a fill variable. We’ll assign our base map the name MyUSGraph so that we can use the map in subsequent chunks. Remember that we’ll have to tell R to display any named map - that won’t happen automatically.

# Create and display a 'Chloropleth map' of the United States (name it
# MyUSGraph):
MyUSGraph <- ggplot()

MyUSGraph <- MyUSGraph + geom_polygon(data = MergedStates, aes(x = long, y = lat,
    group = group, fill = population), color = "purple", linewidth = 0.2) + coord_map("stereographic") +
    scale_fill_viridis_b(option = "mako", direction = 1, name = "Population") + theme(legend.position = "right")

MyUSGraph

QUESTION: What color did we use to outline the states? How did we control the thickness of the outlines? Where are the commands that control the outline color and thickness placed? How did we specify the fill color scheme? What happens if you change direction = 1 to direction = -1 ? (The only two options for the direction are +1 and -1.)

The outline color is purple. We controlled the thickness with the linewidth=0.2 command. We specified the color scheme with the command scale_fill_virdis. The commands that control the outline color and thickness were placed within the geom_polygin() command. If we change direction, it will reverse the direction of the coor gradient.

Observation: In the knitted file the legend on this map is difficult to read because the populations in our data are in millions (10^6) of people. The scale on our legend (in the knitted file) is written in R’s version of scientific notation.

We could still color the map this way, but include a more useful legend if we fill by population/100000 instead of just population. This is done in CHUNK 17 below. We do not need to mutate the data - we just divide the population variable by 100000 in our fill command.

Note also that in the previous map, exactly four colors were used. That is, the states’ populations were put into one of four color “bins”. It might be better to have a more continuous scale moving from smallest population to largest. This adjustment is made in the chunk below by using scale_fill_viridis_c() instead of scale_fill_viridis_b(). The colors are also made to go from lightest to darkest as the population increases (instead of the reverse, darkest to lightest, which we saw in MyUSGraph above).

MyUSGraph <- ggplot()

MyUSGraph <- MyUSGraph + geom_polygon(data = MergedStates, aes(x = long, y = lat,
    group = group, fill = population/1e+05), color = "black", linewidth = 0.3) +
    labs(title = "US Population by State") + coord_map("globular") + theme(legend.position = "right",
    legend.box = "horizontal") + scale_fill_viridis_c(option = "rocket", direction = -1)

MyUSGraph

In CHUNK 18, we make even more adjustments - in particular, we address the possibility of missing values (na.values), assign our own label to the legend, choose our own colors, and specify the cut-points on the color scale. Before running the code below, predict the effect of the commands given. Then confirm your guesses.

MyUSGraph2 <- MyUSGraph + scale_fill_continuous(name = "Pop/100,000", low = "gold",
    high = "darkblue", limits = c(5, 400), breaks = c(50, 100, 150, 200), na.value = "grey50") +
    labs(title = "Continental US State Populations") + coord_map("tetra") + theme(legend.position = "bottom",
    legend.box = "horizontal")
MyUSGraph2

QUESTION: Why are the states of California and Texas colored grey in this map? How could you make a minor change in the commands to make sure California is colored correctly?

The population values fall outside the limits for color mapping in the original map.

You might notice that when we chose our own fill colors, we cannot specify a “middle” color - only the low and high colors. Specifying the colors is fine, but using the virdis color scheme is probably a better idea because these palettes were developed to make plots that are pretty, better represent your data, easier to read by those with colorblindness, and print well in gray scale.

MyUSGraph3 <- MyUSGraph + scale_fill_continuous(name = "Population/100,000") + coord_map("tetra") +
    theme(legend.position = "right") + scale_fill_viridis_c(option = "turbo", direction = -1,
    name = "Pop per 100,000")
MyUSGraph3

QUESTION: Have you noticed that sometimes I’ve placed the legend below the map, and sometimes on the right? Where do you prefer to place the legend? What command is controlling this?

Yes, I noticed the commands. I prefer putting the legend on the right if the space if available. If not, somewhere on the bottom. The command controlling this placement is in theme(legend.position=“right”).

Exercise 7a: Use the map from CHUNK EXERCISE 7a to answer the following questions:

  1. What part of the code colors the states by the number of 2022 electric vehicles?
    fill = evreg_2022

  2. What continuous color scheme is used? What direction (light to dark or dark to light?), and what part of the code determines this?
    magma with the color gradient = -1, making the direction dark to light.

  3. Are there any limits specified on the color scale? If so, what are the limits?
    No.


Now let’s look closely at the map and see what it tells us.

The map is not very informative because the number of electric vehicles in California is so much greater than in any other state. There are several things we might to do adjust for this. One thing we could do is set color limits so that California’s data is not included in the coloring. To do this, we’ll need to know the range of the values:

range(MergedStates$evreg_2022)
## [1]    640 903620

Now we know that the number of electric vehicles in California was 903,620, and all other states had at least 604 electric vehicles. I’ll guess that the next largest number of electric vehicles is no more than 1700,000, so we’ll use this value as an upper limit.

Exercise 7b: Recreate the previous map, but add the command limits=c(640,170000), just before the name = "Electric Vehicles" command at the end of the code. Uncomment the commands before running the chunk.

EV2022USMAP2 <- ggplot()

EV2022USMAP2 <- EV2022USMAP2 + geom_polygon(data = MergedStates, aes(x = long, y = lat,
    group = group, fill = evreg_2022), color = "navyblue", linewidth = 0.3) + labs(title = "Number of Electric Vehicles by State 2022") +
    coord_map("tetra") + theme(legend.position = "right") + scale_fill_viridis_c(option = "magma",
    direction = -1, limits = c(640, 170000), name = "Electric Vehicles")

EV2022USMAP2

This is definitely an improvement, and perhaps is all we need. But our fill range is still quite broad. Perhaps a log transformation of these values would make a better map. CHUNK 21 colors by the natural log of the registered electric vehicles in each state. Although we know the minimum and maximum values of the number of ev’s, we don’t know the min and max of the ln of these values. I just guessed when I wrote the code below. (What were my guesses?)

EV2022USMAP3 <- ggplot()

EV2022USMAP3 <- EV2022USMAP3 + geom_polygon(data = MergedStates, aes(x = long, y = lat,
    group = group, fill = log(evreg_2022)), color = "navyblue", linewidth = 0.3) +
    labs(title = "2022 Ln(Electric Vehicles by State)") + coord_map("tetra") + theme(legend.position = "right") +
    scale_fill_viridis_c(option = "magma", direction = -1, limits = c(1, 9), name = "LN Vehicles")
EV2022USMAP3

Why are many states colored grey and none colored yellow-orange?

Apparently my guesses about the limits were not very good. R is coloring all states with ln(evreg_2022) values larger than 9 grey. Because no states are colored yellow to orange, there are no states with ln(evreg_2021) values near 1.

We should have asked R for the minimum and maximum of these ln values instead of just guessing the range. Let’s create the map again after finding the min and max values. (We could just ask for the range, but asking for the quartiles and median will give us more information about possible outliers - values we might want to drop from the coloring.)

range(log(MergedStates$evreg_2022))
## [1]  6.461468 13.714164
summary(log(MergedStates$evreg_2022))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   6.461   8.897  10.409  10.210  11.111  13.714

Ah - it looks like we need a lower limit of about 6.4 and an upper limit about 13.72. Copy the code from CHUNK 21 and change the limits to create a better map.

EV2022USMAP3 <- ggplot()

EV2022USMAP3 <- EV2022USMAP3 + geom_polygon(data = MergedStates, aes(x = long, y = lat,
    group = group, fill = log(evreg_2022)), color = "navyblue", linewidth = 0.3) +
    labs(title = "2022 Ln(Electric Vehicles by State)") + coord_map("tetra") + theme(legend.position = "right") +
    scale_fill_viridis_c(option = "magma", direction = -1, limits = c(6.4, 13.72),
        name = "LN Vehicles")
EV2022USMAP3

Note: The viridis palette usually does a pretty good job of figuring out what the color limits should be, so we may not need to set those ourselves. However, knowing the actual minimum, maximum, and center (mean and/or median) is always helpful. In addition, viridis has a hard time setting correct limits for transformed variables like we are using here.

QUESTIONS: Which states (aside from California) have the largest number of registered electric vehicles? Are these the states with the largest populations? Are the states with the smallest numbers of registered ev’s the smallest states in terms of population? Is there any state whose number of registered ev’s is particularly surprising?

Texas and Florida have the highest largest number of registered EVs. Yes, this fits with their population. I was surprised by how many EVs where in Arizona. ***

Exercise 8a: Create a electric vehicle US map that colors the states by the percentage of electric vehicles per person (fill = 100*evreg_2022/population). Adjust the limits to range from 0.05 to 1.5 (this will exclude California from the coloring). Compare this map to the previous map.

EV_Percentage_US_Map <- ggplot() + geom_polygon(data = MergedStates, aes(x = long,
    y = lat, group = group, fill = 100 * evreg_2022/population), color = "navyblue",
    linewidth = 0.3) + labs(title = "Percentage of Electric Vehicles per Person 2022") +
    coord_map("tetra") + theme(legend.position = "bottom") + scale_fill_viridis_c(option = "magma",
    direction = -1, limits = c(0.05, 1.5), name = "Electric Vehicles Percentage")

EV_Percentage_US_Map

QUESTIONS: Are the states with the greatest numbers of electric vehicles per population the same as the states with the largest numbers of electric vehicles? Are the states with the smallest numbers of electric vehicles per population the same as the states with the smallest numbers of electric vehicles?

Texas has a smaller population than I imagined. No, these states don’t quite match-up. As for the smaller states, They don’t match either.

Exercise 8b: The data also contains a variable / column of the number of 2021 registered electric vehicles (evreg_2021). Make a map of the percentage increase in electric vehicles per state, which would be 100*(evreg_2022 - evreg_2021)/evreg_2021). Do you need to give R some limits? If so, provide them.

Percentage_Increase_Map <- ggplot() + geom_polygon(data = MergedStates, aes(x = long,
    y = lat, group = group, fill = 100 * (evreg_2022 - evreg_2021)/evreg_2021), color = "navyblue",
    linewidth = 0.3) + labs(title = "Percentage Increase in Electric Vehicles 2021-2022") +
    coord_map("tetra") + theme(legend.position = "right") + scale_fill_viridis_c(option = "magma",
    direction = -1, name = "Percentage Increase")

Percentage_Increase_Map

QUESTIONS: Which states saw the greatest percentage increase in the number of electric vehicles from 2021 to 2022? Which states had the smallest percentage increase? Does any of this surprise you?

Oklahoma had the highest rate, which surprised me since I imagine Oklahoma is smaller. The smallest increases were in Washington and Oregon, which was interesting.


Exercise 9a: The StatePop file contains variables of the percentage female (pct_female), percentage Black (pct_black), percentage Asian (pct_asian), percentage white (pct_white) and percentage Hispanic (pct_hisp) for each state. Because they are all percentages, the numbers range from a minimum of 0 to a maximum of 100, so we should be able to make maps of these data without needing to re-scale. Of course we may want to adjust the limits. Determine the minimum and maximum values for each of these variables in the chunk below.

summary(MergedStates$pct_female)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   48.60   50.10   50.50   50.48   51.00   51.40
summary(MergedStates$pct_black)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.60    6.50   13.20   14.41   20.00   38.00
summary(MergedStates$pct_asian)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.900   1.900   3.400   4.444   5.500  15.900
summary(MergedStates$pct_white)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   35.20   53.50   63.80   65.07   77.50   92.50
summary(MergedStates$pct_hisp)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.90    5.60   10.20   13.62   17.70   50.10

Exercise 9b: Make a US map colored by the percentage of the females, and a map for the percentage of each of the 4 races (5 different maps). Determine appropriate limits so that all states are colored in each map, and give your maps and legends appropriate titles.

# females
perct_femalemap <- ggplot()

perct_femalemap <- perct_femalemap + geom_polygon(data = MergedStates, aes(x = long,
    y = lat, group = group, fill = pct_female), color = "navyblue", linewidth = 0.3) +
    labs(title = "US Percentage of Females") + coord_map("tetra") + theme(legend.position = "right") +
    scale_fill_viridis_c(option = "magma", direction = 1, limits = c(48.6, 51.4),
        name = "% of Females")
perct_femalemap

# blacks
pct_blackmap <- ggplot()

pct_blackmap <- pct_blackmap + geom_polygon(data = MergedStates, aes(x = long, y = lat,
    group = group, fill = pct_black), color = "navyblue", linewidth = 0.3) + labs(title = "US Percentage of Blacks") +
    coord_map("tetra") + theme(legend.position = "right") + scale_fill_viridis_c(option = "magma",
    direction = -1, limits = c(0.6, 38), name = "% of Blacks")
pct_blackmap

# asians
pct_asianmap <- ggplot()

pct_asianmap <- pct_asianmap + geom_polygon(data = MergedStates, aes(x = long, y = lat,
    group = group, fill = pct_asian), color = "navyblue", linewidth = 0.3) + labs(title = "US Percentage of Asians") +
    coord_map("tetra") + theme(legend.position = "right") + scale_fill_viridis_c(option = "magma",
    direction = -1, limits = c(0.9, 15.9), name = "% of Asians")
pct_asianmap

# white
pct_whitesmap <- ggplot()

pct_whitesmap <- pct_whitesmap + geom_polygon(data = MergedStates, aes(x = long,
    y = lat, group = group, fill = pct_white), color = "navyblue", linewidth = 0.3) +
    labs(title = "US Percentage of Whites") + coord_map("tetra") + theme(legend.position = "right") +
    scale_fill_viridis_c(option = "magma", direction = -1, limits = c(35.2, 92.5),
        name = "% of Whites")
pct_whitesmap

# hispanic
pct_hispmap <- ggplot()

pct_hispmap <- pct_hispmap + geom_polygon(data = MergedStates, aes(x = long, y = lat,
    group = group, fill = pct_hisp), color = "navyblue", linewidth = 0.3) + labs(title = "US Percentage of Hispanics") +
    coord_map("tetra") + theme(legend.position = "right") + scale_fill_viridis_c(option = "magma",
    direction = -1, limits = c(1.9, 50.1), name = "% of Hispanics")
pct_hispmap

Do any of these maps surprise you? Could you have predicted them?
Being from Texas, I did predict there was more diversity in the state due to our proximity to the border and the state’s history. However, I was surprised that there’s a higher range of Whites up north for some reason.

The ggplot2 website and the Data Visualization Cheat Sheet provide many additional detailed of how to customize your graphs


Adding Points to a Map

The maps package also includes a us.cities file. The following code adds a point for each major city in the United States to the map. Notice that the size of the point is determined by the population of that city (at the time the maps package was most recently updated). Be sure to locate that command in the code.

AdjustedUSGraph2 <- MyUSGraph2 + scale_fill_viridis_c(option = "mako", direction = -1,
    name = "State Pop") + theme(legend.position = "bottom", legend.box = "horizontal") +
    geom_point(data = us.cities, aes(x = long, y = lat, size = pop/1e+06), color = "pink") +
    scale_size(name = "City Pop")
AdjustedUSGraph2

Oops - the us.cities file includes some cities in Alaska and in Hawaii (outside the continental US). We need to filter so that we do not include any Alaskan or Hawaiian cities on our map. In this case, the easiest way to filter is to notice that all cities in the continental US have longitude values > -130, but cities in Alaska and Hawaii do not. (We see this by looking at the previous map.)

# Make and store 'Demograph' of the US states colored by population/1,000,000.
DemoGraph <- ggplot()

DemoGraph <- DemoGraph + geom_polygon(data = MergedStates, aes(x = long, y = lat,
    group = group, fill = population/1e+06), color = "white", linewidth = 0.2) +
    labs(title = "US State Populations") + scale_fill_viridis_c(option = "cividis",
    direction = 1, name = "Pop/million", limits = c(0, 40)) + coord_map("tetra")
DemoGraph

Now filter the cities before we put them on our map.

# Remove cities outside the Continental US.
MainCities <- filter(us.cities, long >= -130)

# Add the filtered major US cities to DemoGraph:
DemoGraph2 <- DemoGraph + geom_point(data = MainCities, aes(x = long, y = lat, size = pop/1e+06),
    color = "deeppink", alpha = 0.5) + scale_size(name = "City Populations")
DemoGraph2


The last thing we’ll consider today is zooming. In the chunk below we zoom in on a particular region of Demograph2, the graph we just made. Observe that we do this by restricting the horizontal and vertical ranges in the DemoGraph2 map using the xlim and ylim commands.

# Zoom into a particular region of the Demograph2 plot.  Call this Demograph3.
DemoGraph3 <- DemoGraph2 + coord_map("tetra", xlim = c(-80, -67), ylim = c(38, 48))
DemoGraph3

Exercise 10: Create Demograph4 that zooms in on the state of Florida.

# Zoom in Florida.
Demograph4 <- DemoGraph2 + coord_map("polyconic", xlim = c(-88, -79), ylim = c(24,
    31))

Demograph4


Additional Resources


Homework 9

  1. Complete Exercises 1-10 above.

  2. Complete Exercises 11-15 below.


Exercise 11: (a) Another variable in the 2024 State Population data is avgstudentloan. This is the average student loan debt for each state. Create a US map that colors each state by the average student loan debt divided by 1000, so that the debt is in thousands of dollars. Give your map and legend appropriate titles.

map_avg_student_loan <- ggplot(MergedStates, aes(x = long, y = lat, group = group,
    fill = avgstudentloan/1000)) + geom_polygon(color = "red") + scale_fill_continuous(name = "Average Student Loan Debt (in thousands of dollars)",
    label = scales::comma_format(), limits = c(0, NA)) + labs(title = "Average Student Loan Debt by State") +
    theme(legend.position = "bottom") + scale_fill_viridis_b(option = "mako") + coord_map()  # Use default map projection

map_avg_student_loan

Which state has the highest average debt? What is the average in that state? (Hint: Use the summary or range function.)
Maryland has the highest average debt with $42,861.

Which state has the smallest average student loan debt? What is the average for that state?
North Dakota has the smallest average debt with $28,602.

Exercise 12a Another variable in these data is the total student loan debt in the state. These values are recorded in billions of dollars. Create a map that colors the states by their total student loan debt (totaldebt). The values recorded are in billions of dollars. Give your map and legend appropriate titles and make sure the units are clear on your map.

map_total_student_loan_debt <- ggplot(MergedStates, aes(x = long, y = lat, group = group,
    fill = totaldebt)) + geom_polygon(color = "red") + scale_fill_continuous(name = "Total Student Loan Debt (in billions of dollars)",
    label = scales::comma_format(), limits = c(0, NA)) + labs(title = "Total Student Loan Debt by State") +
    theme(legend.position = "bottom") + coord_map()

map_total_student_loan_debt

QUESTIONS: Based on the map, which states have the largest totals of student loan debt? Which states have the smallest totals? Do the states with the largest and smallest totals correspond to the states with the largest and smallest average student loan debts? Explain.

The states with the largest based on the map are Texas and California. The smallest seems to be New Mexico and Oklahoma. This seems to correspond with the map.

Exercise 12b Let’s compare the total student loan debt per state with the state populations. Create a map that colors each state by 100000*(total debt/ population). This would be the total debt per 1000 population. (This uses debt in 100 million dollars rather than billions to make the scale manageable.) You will probably need to tell R what limits to use in this case - I’d suggest from about 2.7 to 6.2. Be sure to give your map and legend appropriate titles and make sure the units are clear on your map.

MergedStates$total_debt_per_1000_population <- 1e+05 * (MergedStates$totaldebt/MergedStates$population)

map_total_debt_per_1000_population <- ggplot(MergedStates, aes(x = long, y = lat,
    group = group, fill = total_debt_per_1000_population)) + geom_polygon(color = "black") +
    scale_fill_viridis_c(name = "Total Debt per 1000 Population ($100 million)",
        label = scales::comma_format(), direction = -1) + labs(title = "Total Debt per 1000 Population by State") +
    theme(legend.position = "bottom") + coord_map()

map_total_debt_per_1000_population

QUESTIONS: Discuss what this graph shows and compare it to the previous to graphs.

This graph shows that Georgia has the highest Total Debt per 1000 population, which was not seen in our past maps. ***

Exercise 13 Follow these steps to make a restricted state map.

  • Filter the MainState file to so that it includes only about 8-10 mainland U.S. states. The states should have something in common that interests you.
  • Join your newly restricted file with the StatePop file. Be sure to look at a few rows of your joined data to make sure the merge was successful.
  • Then create a list of label locations for your restricted state list (see CHUNKs 10-12 where we did this before).
  • Finally, create a map that displays only your selected states, colored by the “density” column in the MergedStates data. Add the data labels and display an appropriately named legend.

Reminder: To restrict MainStates to a subset of states use something like this command: …ClarkState <- filter(MainStates,region %in% c("california", "oregon", "idaho", "nevada" . . .))

# Step One, filter
selected_states <- filter(MainStates, region %in% c("texas", "arkansas", "louisiana",
    "oklahoma", "kansas", "georgia", "mississippi", "alabama"))
# Join Data
MergedSelected <- inner_join(selected_states, StatePop, by = "region")
# Checking if it was joined successfully datatable(MergedSelected, options =
# list(autoWidth = FALSE)) List of labels Calculate centroids of the selected
# states
centroids <- aggregate(cbind(long, lat) ~ region, data = selected_states, FUN = mean)

# Create label locations dataframe
label_locations <- data.frame(x = centroids$long, y = centroids$lat, state = centroids$region)
# map
newmap <- ggplot() + geom_polygon(data = MergedSelected, aes(x = long, y = lat, group = group,
    fill = density), color = "white") + geom_label(data = label_locations, aes(x = x,
    y = y, label = state), size = 3, fill = "white", color = "black", label.padding = unit(0.3,
    "lines"), label.r = unit(0.2, "lines"), show.legend = FALSE) + scale_fill_viridis_c(option = "magma",
    direction = -1, name = "Density") + labs(title = "Density of Selected States") +
    theme(legend.position = "right")

newmap

Exercise 14 Create a map of your restricted list of states and color each state by the drug overdose rate (overdose_rt). You will want to use the merged file you created for Exercise 12. Determine appropriate limits for your scale and be sure to use appropriate titles on your map and legend.

summary(MergedStates$overdose_rt)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   10.30   19.40   28.10   29.64   35.00   81.40
OverdoseMap <- ggplot()

OverdoseMap <- OverdoseMap + geom_polygon(data = MergedSelected, aes(x = long, y = lat,
    group = group, fill = overdose_rt), color = "navyblue", linewidth = 0.3) + geom_label(data = label_locations,
    aes(x = x, y = y, label = state), size = 3, fill = "white", color = "black",
    label.padding = unit(0.3, "lines"), label.r = unit(0.2, "lines"), show.legend = FALSE) +
    labs(title = "Drug Overdose Rate by Selected States") + coord_map("tetra") +
    theme(legend.position = "right") + scale_fill_viridis_c(option = "magma", direction = -1,
    limits = c(10.3, 81.4), name = "Drug Overdose Rate")
OverdoseMap

Exercise 15 Add points to represent the well-known cities (using MainCities) in each of your chosen states. You should show only cities that lie in your subset of states. Eliminate other cities before you use the MainCities data. (Note: The “state” variable in us.cities and therefore in MainCities is named country.etc.)

Hints: Use filter and %in%.

Example ``MainCities <- filter(us.cities, long >=-130 & long < -100)
        ``Clarkcities <- filter(MainCities, country.etc %in% c("CA", "OR",  ...))`` ...
summary(MergedSelected$long)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -106.65  -97.09  -91.99  -92.80  -89.75  -80.84
MainCities <- filter(us.cities, long >= -106.7 & long < -81)
Selectedcities <- filter(MainCities, country.etc %in% c("KS", "AR", "GA", "AL", "LA",
    "MS", "OK", "TX"))
# datatable(Selectedcities)

us_map <- ggplot() + geom_polygon(data = selected_states, aes(x = long, y = lat,
    group = group), color = "black", fill = "lightblue") + geom_point(data = Selectedcities,
    aes(x = long, y = lat), color = "blue", size = 2) + labs(title = "Selected States and Cities") +
    theme_minimal()

# Display the map
us_map


As always, change the name and date at the top of the file. Knit your file to an HTML document. Open the HTML file in a browser, and save the file WITH YOUR NAME in it. Then post the file in Moodle no later than midnight Saturday, April 13.