Introduction
In this handout, we will learn to make static choropleth
maps, sometimes called heat maps, using the
ggplot2 package. A choropleth map is a map that shows a
geographic landscape with units such as countries, states, or
watersheds, where each region is colored according to a particular value
such as population or density.
To make our maps we will need the packages,
ggplot2, dplyr, us_maps, and
maps. Observe that these packages, as well as many others,
are called in the setup R chunk above. The maps package
contains several simple data files that allow us to create maps.
(Note: before working through this handout, you should be
familiar with the basic ggplot and dplyr
functions that we learned in classes 4 and 5).
Data: In addition to data files from the
maps package, we will use the 2024_StatePopulation
data, which includes the state name
(region), estimated 2024 population
(population), percentage of the US
population (percentUS_pop), number of
electric vehicles registered in the state in 2021 and 20221
(evreg_2021 and evreg_2022), and
percentages of the population that are Black, Asian, Hispanic,
White and Female, (e.g. per_asian), population
density - number of people per square mile
(pop_density), average student loan debt
(avgstudentloan), total state student loan debt
(totaldebt), and several other variables. We will load this
dataframe in the chunk below, even though we won’t make use of it yet.
Take a look at the column names and remember that you can come back here
to check them if you need to.
# Load the continental US map data; name the data 'MainStates' and take a look.
MainStates <- map_data("state")
kable(head(MainStates)) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = FALSE, position = "left")| long | lat | group | order | region | subregion |
|---|---|---|---|---|---|
| -87.46201 | 30.38968 | 1 | 1 | alabama | NA |
| -87.48493 | 30.37249 | 1 | 2 | alabama | NA |
| -87.52503 | 30.37249 | 1 | 3 | alabama | NA |
| -87.53076 | 30.33239 | 1 | 4 | alabama | NA |
| -87.57087 | 30.32665 | 1 | 5 | alabama | NA |
| -87.58806 | 30.32665 | 1 | 6 | alabama | NA |
# Read the state population data
StatePop <- read.csv("/home/share/Stat324/Data/StatePopulations2024.csv", as.is = TRUE)
datatable(StatePop)The Maps Function
If you just want a quick U.S. map, you can use the R map
function from the map package. In the chunk below we make a
static US map. It’s very plain, so next we let R choose how to color
each state. There are not a great many other customizations we can use
with this map function.
# Use the map package to make a continental US map, with each state colored.
maps::map("state")# Now color each state, letting R choose the colors
maps::map("state", fill = TRUE, col = palette())We can use the same function to make a map of the world:
maps::map("world")# Make a map of the world with the default color palette:
maps::map("world", fill = TRUE, col = palette())Creating ggplot maps
Another easy way to get a plot of the US is to use the
plot_usmap function. This will create a map of all 50 US
states (including Alaska and Hawaii although they will not be in a
geographically correct location). By default, this function does not
color the states. We can use fill to change this. These
plots are ggplot compatible, so we can customize them as we would any
other ggplot.
# Create a base map showing all 50 US states
plot_usmap(regions = "states")# Plot all 50 US states using red fill.
plot_usmap(regions = "states", fill = "red")Of course, a better way to make a map is to start it as a ggplot. To
make a ggplot map, first create a ggplot(), then add a
geom_polygon with arguments of the data (the
states in map_data, given the name
MainStates), and aes (latitude and longitude
values). If you include a color command, you can choose the
color of the state boundaries.
Note: To simplify our typing, we have stored the state map data as
MainStates.
# Plot the 48 continental states with ggplot2, using black borders and medium
# purple fill:
ggplot() + geom_polygon(data = MainStates, aes(x = long, y = lat, group = group),
color = "white")In this map, all of the states were colored the same way by default,
and R chose to use the color black. As before, if we include the
fill command, we can specify a color for all the states
The chunk below changes the colors in both maps:
# Plot the 48 continental states with ggplot2, using black borders and medium
# purple fill:
ggplot() + geom_polygon(data = MainStates, aes(x = long, y = lat, group = group),
color = "white", fill = "mediumpurple")Of course we’d probably rather color by state to more easily
distinguish between the states. This is difficult to do with the
plot_usmap function because the states do not have
numerical values associated with their names. However, it is easy with a
ggplot map because one of the variables in our
MainStates data is the region (state
name):
ggplot() + geom_polygon(data = map_data("state"), aes(x = long, y = lat, group = group,
fill = region), color = "white")Oops. We don’t want to see the “legend” which tells us how each state is colored, because each state has a different color. The legend is way too much information in this case. Let’s redo this map and tell R to not display a legend. (How are we doing that?)
ggplot() + geom_polygon(data = MainStates, aes(x = long, y = lat, group = group,
fill = region), color = "white") + theme(legend.position = "none")QUESTION: What command is added in CHUNK 9, and what effect does this addition have on the map? Do you think we should include one of these “projection” commands in future US maps? What about in world maps?
We added coord_map(“polyconic”). This command helps extend/distort the size and scale of the map while keeping its shape. I think we should include one of these projection commands, because it will allow us to customize. For a world map, it would depend because a world map is technically on a spherical surface, and I’d be curious to see how different projections tackle that.
ggplot() + geom_polygon(data = MainStates, aes(x = long, y = lat, group = group,
fill = region), color = "white") + coord_map("polyconic") + theme(legend.position = "none")There are several other projections you might want to try. The
easiest way to experiment with the projections is to use the
mplot function from the Console window. Move to the Console
window, type mplot(MainStates) at the prompt. Choose option
3 (map). You should color by region and then you can
run through the projections to see which ones work/are helpful for a US
map.
Exercise 1: Move to the Console window, and type
mplot(MainStates) at the prompt. Choose option 3 (map). You
should color by region and choose none for the legend.
Then run through the projections to see which ones work/are helpful for
a US map. Do you suggest bicentric, elliptic, guyou, hex, or homing?
What about lagrange, globular or mollweide?**
I would use globular or mollweide for sure. I would probably also use homing, but I’m not sure.
Note: We don’t need to include a projection command with a
map made using plot_usmap because they all include the
albers equal area projection by default. Take a look at the
maps made in CHUNKS 4 AND 5 to see the difference in no projection
(ggplot) and the albers equal area projection (us_map plot).
In the maps
package documentation, we see that in addition to
state, the maps package includes (among
others) county, france, italy,
nz, usa, us.cities,
world, world.cities and world2
files.
Exercise 2: Make the necessary changes in the R
EXERCISE 2 CHUNK below in order to make a map of all
U.S.COUNTIES. First load the appropriate map data and
save it with name USCounties. Then make a base map that
uses only one color, but shows the county boundaries. Then make a second
map that is colored by the counties.
Hints:
- We can find the name of the data in
map_databy looking at themapspackage information in the Files/Plots/Packages/Help/ window.
- Find the variable name for the counties with the
names(USCounties)command.
- Replace the question marks in the EXERCISE 2 chunk with the correct words. Then “uncomment” (remove the hashtag from) each code line and knit the chunk.
USCounties <- map_data("county")
names("USCounties")## NULL
# A map with only one color
ggplot() + geom_polygon(data = USCounties, aes(x = long, y = lat, group = group),
color = "black", linewidth = 0.2, fill = "mistyrose") + coord_map("tetra")# The map colored by county
ggplot() + geom_polygon(data = USCounties, aes(x = long, y = lat, group = group,
fill = subregion), color = "black", linewidth = 0.1) + theme(legend.position = "none") +
coord_map("tetra")Exercise 3: Use the world file to
create a ggplot base map of the world with white borders and
“slateblue3” fill. Include the + coord_map("globular")
command in order to improve the map projection.
AllWorld <- map_data("world")
ggplot() + geom_polygon(data = AllWorld, aes(x = long, y = lat, group = group), color = "white",
fill = "slateblue3") + coord_map("globular")Exercise 4: Adjust your code from the previous chunk
so that your world map is colored by country (region). Make
sure to tell R you don’t want a legend.
AllWorld <- map_data("world")
ggplot() + geom_polygon(data = AllWorld, aes(x = long, y = lat, group = group, fill = region),
color = "white") + coord_map("globular") + theme(legend.position = "none")Now let’s return to the US. Observe that the MainStates
file has a column titled group. In all of our ggplot maps,
within the aes function, we have included the command
group = group. This is critical because it tells R the
order in which it should “connect the dots”. Look at what happens when
we omit this command:
ggplot() + geom_polygon(data = MainStates, aes(x = long, y = lat), color = "black",
fill = "plum")Chaos.
Adding Labels
One nice feature of the plot_usmap function is that we
can easily add the 2 letter abbreviation for each state name because
there is a column of these abbreviations in the data. This only requires
including the command labels = TRUE.
# Plot all 50 US states with state abbreviations included:
plot_usmap(regions = "states", fill = "darkgreen", color = "gold", labels = TRUE,
label_color = "white")This is about the most we can do with the plot_usmap
function, even though some state names overlap and we can’t see “HI”.
So, let’s add the state names to a ggplot() map of the
MainStates. We can’t add the 2-letter abbreviations because
the MainStates data doesn’t contain the abbreviations. So
we’ll add the full state names instead.
To do this, first, we will grab the state names and the centers of
their latitudes and longitudes from the R dataset state.
Within this data, there are variables state.center (x and y
coordinates, respectively), and state.name. We’ll store
this information as x,y, and
state in a new dataframe called
StateNameLocs.
Then we’ll create a ggplot map (StateNamesUSMap) using
this new dataframe. We’ll add white state boundaries, color each state
distinctly and make sure there is no legend. Finally, we’ll ask to see
the map and show a label of the state name on each state.
StateNameLocs <- data.frame(x = state.center$x, y = state.center$y, state = state.name)
StateNamesUSMap <- ggplot(StateNameLocs, aes(x = x, y = y)) + geom_polygon(data = MainStates,
color = "white", aes(x = long, y = lat, fill = region, group = group)) + theme(legend.position = "none") +
coord_map("globular")
StateNamesUSMap + geom_label(aes(label = state), size = 2)Observe that the state data includes Alaska and Hawaii
so we get those state names on this map. We could filter the data to
remove those states if we wish. Recall how we tell R to not
include these two states:
!(state %in% c("Alaska", "Hawaii").
StateNameLocs2 <- filter(StateNameLocs, !(state %in% c("Alaska", "Hawaii")))
StateNamesUSMap2 <- ggplot(StateNameLocs2, aes(x = x, y = y)) + geom_polygon(data = map_data("state"),
color = "white", aes(x = long, y = lat, fill = region, group = group)) + theme(legend.position = "none") +
coord_map("globular") + scale_fill_viridis_d(option = "plasma")
StateNamesUSMap2 + geom_label(aes(label = state), size = 2)Exercise 5: What is the result of the
size= 2 command? What happens if we change the 2 to 3? If
we change it to 1? What if we omit the command? (Try these.)
This command sets the size of the
labels to 2.
Observe that we used the viridis color package with
option “plasma” to color the map in chunk 13. Other options you could
try include “inferno”, “magma”, “mako”, “rocket”, “turbo” and “cividis”.
(Do you like “plasma” or “rocket” better for this map?) If you just use
scale_fill_viridis_d() with no option, R will use the
viridis default coloring.
Regional Maps
It’s easy to make a map of only some states and include those state
names (as labels or simply text). Let’s adjust our code to make a US map
of just the western states Washington, Oregon, Idaho, California, and
Nevada. We will need to filter our state locations and then restrict the
regions in our geom_polygon, but we know how to do that.
(Be careful - the states dataset capitalizes all the state
names, but map_data("state") (and therefore
MainStates) does not.)
WestStateNames <- filter(StateNameLocs, state %in% c("Washington", "Oregon", "Idaho",
"California", "Nevada"))
WestStateUSMap <- ggplot(WestStateNames, aes(x = x, y = y)) + geom_polygon(data = map_data("state",
region = c("washington", "oregon", "idaho", "california", "nevada")), color = "white",
aes(x = long, y = lat, fill = region, group = group)) + theme(legend.position = "none") +
coord_map("globular")
WestStateUSMap + geom_text(aes(label = state), size = 3)QUESTION: How are the state names in this map
different from those in the map of CHUNK 13? How was the
geom_ command that applied the state names different in
CHUNK 14 from CHUNK 13?
Chunk 14 uses geom_text rather than
geom_label. geom_text() displays text, while geom_label() has a white
background box.
Customizing Maps
Now that we have a good base map of the mainland U.S. states, we can
color each state by one of the variables in our data. As a first
example, let’s use each state’s 2024 estimated population. We will need
to use the dplyr package to join the
MainStates and StatePop data (remember
we loaded the StatePop file in CHUNK 1). Both dataframes have a variable
titled region (which contains the state names, not
capitalized), so we’ll use region as our “by variable” in an inner
join. As always, after the join we’ll look at the joined dataframe
to make sure we got what we wanted.
# Use the dplyr package to join the MainStates and StatePopulation files
MergedStates <- inner_join(MainStates, StatePop, by = "region")
datatable(MergedStates, options = list(autoWidth = FALSE))datatable(MainStates)datatable(StatePop)Exercise 6: What columns were added to the
StatePop file when it was joined with the
MainStates file?
We added long, lat, group, order and
subregion.
Next we create a base map of the mainland United States, using the
2024 state populations as a fill variable. We’ll assign our
base map the name MyUSGraph so that we can use the map in
subsequent chunks. Remember that we’ll have to tell R to display any
named map - that won’t happen automatically.
# Create and display a 'Chloropleth map' of the United States (name it
# MyUSGraph):
MyUSGraph <- ggplot()
MyUSGraph <- MyUSGraph + geom_polygon(data = MergedStates, aes(x = long, y = lat,
group = group, fill = population), color = "purple", linewidth = 0.2) + coord_map("stereographic") +
scale_fill_viridis_b(option = "mako", direction = 1, name = "Population") + theme(legend.position = "right")
MyUSGraphQUESTION: What color did we use to outline the
states? How did we control the thickness of the outlines? Where are the
commands that control the outline color and thickness placed? How did we
specify the fill color scheme? What happens if you change
direction = 1 to direction = -1 ? (The only
two options for the direction are +1 and -1.)
The outline color is purple. We controlled the thickness with the linewidth=0.2 command. We specified the color scheme with the command scale_fill_virdis. The commands that control the outline color and thickness were placed within the geom_polygin() command. If we change direction, it will reverse the direction of the coor gradient.
Observation: In the knitted file the legend on this map is
difficult to read because the populations in our data are in millions
(10^6) of people. The scale on our legend (in the knitted file) is
written in R’s version of scientific notation.
We could still color the map this way, but include a more useful
legend if we fill by population/100000 instead of just
population. This is done in CHUNK 17 below. We do not need
to mutate the data - we just divide the population variable by
100000 in our fill command.
Note also that in the previous map, exactly four colors were used.
That is, the states’ populations were put into one of four color “bins”.
It might be better to have a more continuous scale moving from smallest
population to largest. This adjustment is made in the chunk below by
using scale_fill_viridis_c() instead of
scale_fill_viridis_b(). The colors are also made to go from
lightest to darkest as the population increases (instead of the reverse,
darkest to lightest, which we saw in MyUSGraph above).
MyUSGraph <- ggplot()
MyUSGraph <- MyUSGraph + geom_polygon(data = MergedStates, aes(x = long, y = lat,
group = group, fill = population/1e+05), color = "black", linewidth = 0.3) +
labs(title = "US Population by State") + coord_map("globular") + theme(legend.position = "right",
legend.box = "horizontal") + scale_fill_viridis_c(option = "rocket", direction = -1)
MyUSGraphIn CHUNK 18, we make even more adjustments - in particular, we address the possibility of missing values (na.values), assign our own label to the legend, choose our own colors, and specify the cut-points on the color scale. Before running the code below, predict the effect of the commands given. Then confirm your guesses.
MyUSGraph2 <- MyUSGraph + scale_fill_continuous(name = "Pop/100,000", low = "gold",
high = "darkblue", limits = c(5, 400), breaks = c(50, 100, 150, 200), na.value = "grey50") +
labs(title = "Continental US State Populations") + coord_map("tetra") + theme(legend.position = "bottom",
legend.box = "horizontal")
MyUSGraph2QUESTION: Why are the states of California and Texas colored grey in this map? How could you make a minor change in the commands to make sure California is colored correctly?
The population values fall outside the limits for color mapping in the original map.
You might notice that when we chose our own fill colors, we cannot
specify a “middle” color - only the low and high colors. Specifying the
colors is fine, but using the virdis color scheme is
probably a better idea because these palettes were developed to make
plots that are pretty, better represent your data, easier to read by
those with colorblindness, and print well in gray scale.
MyUSGraph3 <- MyUSGraph + scale_fill_continuous(name = "Population/100,000") + coord_map("tetra") +
theme(legend.position = "right") + scale_fill_viridis_c(option = "turbo", direction = -1,
name = "Pop per 100,000")
MyUSGraph3QUESTION: Have you noticed that sometimes I’ve placed the legend below the map, and sometimes on the right? Where do you prefer to place the legend? What command is controlling this?
Yes, I noticed the commands. I prefer putting the legend on the right if the space if available. If not, somewhere on the bottom. The command controlling this placement is in theme(legend.position=“right”).
Exercise 7a: Use the map from CHUNK EXERCISE 7a to answer the following questions:
What part of the code colors the states by the number of 2022 electric vehicles?
fill = evreg_2022What continuous color scheme is used? What direction (light to dark or dark to light?), and what part of the code determines this?
magma with the color gradient = -1, making the direction dark to light.Are there any limits specified on the color scale? If so, what are the limits?
No.
Now let’s look closely at the map and see what it tells us.
The map is not very informative because the number of electric vehicles in California is so much greater than in any other state. There are several things we might to do adjust for this. One thing we could do is set color limits so that California’s data is not included in the coloring. To do this, we’ll need to know the range of the values:
range(MergedStates$evreg_2022)## [1] 640 903620
Now we know that the number of electric vehicles in California was 903,620, and all other states had at least 604 electric vehicles. I’ll guess that the next largest number of electric vehicles is no more than 1700,000, so we’ll use this value as an upper limit.
Exercise 7b: Recreate the previous map, but add the
command limits=c(640,170000), just before the
name = "Electric Vehicles" command at the end of the code.
Uncomment the commands before running the chunk.
EV2022USMAP2 <- ggplot()
EV2022USMAP2 <- EV2022USMAP2 + geom_polygon(data = MergedStates, aes(x = long, y = lat,
group = group, fill = evreg_2022), color = "navyblue", linewidth = 0.3) + labs(title = "Number of Electric Vehicles by State 2022") +
coord_map("tetra") + theme(legend.position = "right") + scale_fill_viridis_c(option = "magma",
direction = -1, limits = c(640, 170000), name = "Electric Vehicles")
EV2022USMAP2This is definitely an improvement, and perhaps is all we need. But
our fill range is still quite broad. Perhaps a log
transformation of these values would make a better map. CHUNK 21 colors
by the natural log of the registered electric vehicles in each state.
Although we know the minimum and maximum values of the number
of ev’s, we don’t know the min and max of the ln of these values. I just
guessed when I wrote the code below. (What were my guesses?)
EV2022USMAP3 <- ggplot()
EV2022USMAP3 <- EV2022USMAP3 + geom_polygon(data = MergedStates, aes(x = long, y = lat,
group = group, fill = log(evreg_2022)), color = "navyblue", linewidth = 0.3) +
labs(title = "2022 Ln(Electric Vehicles by State)") + coord_map("tetra") + theme(legend.position = "right") +
scale_fill_viridis_c(option = "magma", direction = -1, limits = c(1, 9), name = "LN Vehicles")
EV2022USMAP3Why are many states colored grey and none colored yellow-orange?
Apparently my guesses about the limits were not very good. R is
coloring all states with ln(evreg_2022) values larger than
9 grey. Because no states are colored yellow to orange, there are no
states with ln(evreg_2021) values near 1.
We should have asked R for the minimum and maximum of these ln values instead of just guessing the range. Let’s create the map again after finding the min and max values. (We could just ask for the range, but asking for the quartiles and median will give us more information about possible outliers - values we might want to drop from the coloring.)
range(log(MergedStates$evreg_2022))## [1] 6.461468 13.714164
summary(log(MergedStates$evreg_2022))## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.461 8.897 10.409 10.210 11.111 13.714
Ah - it looks like we need a lower limit of about 6.4 and an upper limit about 13.72. Copy the code from CHUNK 21 and change the limits to create a better map.
EV2022USMAP3 <- ggplot()
EV2022USMAP3 <- EV2022USMAP3 + geom_polygon(data = MergedStates, aes(x = long, y = lat,
group = group, fill = log(evreg_2022)), color = "navyblue", linewidth = 0.3) +
labs(title = "2022 Ln(Electric Vehicles by State)") + coord_map("tetra") + theme(legend.position = "right") +
scale_fill_viridis_c(option = "magma", direction = -1, limits = c(6.4, 13.72),
name = "LN Vehicles")
EV2022USMAP3Note: The viridis palette usually does a pretty good job of
figuring out what the color limits should be, so we may not need to set
those ourselves. However, knowing the actual minimum, maximum, and
center (mean and/or median) is always helpful. In addition,
viridis has a hard time setting correct limits for
transformed variables like we are using here.
QUESTIONS: Which states (aside from California) have the largest number of registered electric vehicles? Are these the states with the largest populations? Are the states with the smallest numbers of registered ev’s the smallest states in terms of population? Is there any state whose number of registered ev’s is particularly surprising?
Texas and Florida have the highest largest number of registered EVs. Yes, this fits with their population. I was surprised by how many EVs where in Arizona. ***
Exercise 8a: Create a electric vehicle US map that
colors the states by the percentage of electric vehicles per person
(fill = 100*evreg_2022/population). Adjust the limits to
range from 0.05 to 1.5 (this will exclude California from the coloring).
Compare this map to the previous map.
EV_Percentage_US_Map <- ggplot() + geom_polygon(data = MergedStates, aes(x = long,
y = lat, group = group, fill = 100 * evreg_2022/population), color = "navyblue",
linewidth = 0.3) + labs(title = "Percentage of Electric Vehicles per Person 2022") +
coord_map("tetra") + theme(legend.position = "bottom") + scale_fill_viridis_c(option = "magma",
direction = -1, limits = c(0.05, 1.5), name = "Electric Vehicles Percentage")
EV_Percentage_US_MapQUESTIONS: Are the states with the greatest numbers of electric vehicles per population the same as the states with the largest numbers of electric vehicles? Are the states with the smallest numbers of electric vehicles per population the same as the states with the smallest numbers of electric vehicles?
Texas has a smaller population than I imagined. No, these states don’t quite match-up. As for the smaller states, They don’t match either.
Exercise 8b: The data also contains a variable /
column of the number of 2021 registered electric vehicles
(evreg_2021). Make a map of the percentage increase
in electric vehicles per state, which would be
100*(evreg_2022 - evreg_2021)/evreg_2021). Do you need to
give R some limits? If so, provide them.
Percentage_Increase_Map <- ggplot() + geom_polygon(data = MergedStates, aes(x = long,
y = lat, group = group, fill = 100 * (evreg_2022 - evreg_2021)/evreg_2021), color = "navyblue",
linewidth = 0.3) + labs(title = "Percentage Increase in Electric Vehicles 2021-2022") +
coord_map("tetra") + theme(legend.position = "right") + scale_fill_viridis_c(option = "magma",
direction = -1, name = "Percentage Increase")
Percentage_Increase_MapQUESTIONS: Which states saw the greatest percentage increase in the number of electric vehicles from 2021 to 2022? Which states had the smallest percentage increase? Does any of this surprise you?
Oklahoma had the highest rate, which surprised me since I imagine Oklahoma is smaller. The smallest increases were in Washington and Oregon, which was interesting.
Exercise 9a: The StatePop file contains
variables of the percentage female (pct_female), percentage
Black (pct_black), percentage Asian
(pct_asian), percentage white (pct_white) and
percentage Hispanic (pct_hisp) for each state. Because they
are all percentages, the numbers range from a minimum of 0 to a maximum
of 100, so we should be able to make maps of these data without needing
to re-scale. Of course we may want to adjust the limits. Determine the
minimum and maximum values for each of these variables in the chunk
below.
summary(MergedStates$pct_female)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 48.60 50.10 50.50 50.48 51.00 51.40
summary(MergedStates$pct_black)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.60 6.50 13.20 14.41 20.00 38.00
summary(MergedStates$pct_asian)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.900 1.900 3.400 4.444 5.500 15.900
summary(MergedStates$pct_white)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 35.20 53.50 63.80 65.07 77.50 92.50
summary(MergedStates$pct_hisp)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.90 5.60 10.20 13.62 17.70 50.10
Exercise 9b: Make a US map colored by the percentage of the females, and a map for the percentage of each of the 4 races (5 different maps). Determine appropriate limits so that all states are colored in each map, and give your maps and legends appropriate titles.
# females
perct_femalemap <- ggplot()
perct_femalemap <- perct_femalemap + geom_polygon(data = MergedStates, aes(x = long,
y = lat, group = group, fill = pct_female), color = "navyblue", linewidth = 0.3) +
labs(title = "US Percentage of Females") + coord_map("tetra") + theme(legend.position = "right") +
scale_fill_viridis_c(option = "magma", direction = 1, limits = c(48.6, 51.4),
name = "% of Females")
perct_femalemap# blacks
pct_blackmap <- ggplot()
pct_blackmap <- pct_blackmap + geom_polygon(data = MergedStates, aes(x = long, y = lat,
group = group, fill = pct_black), color = "navyblue", linewidth = 0.3) + labs(title = "US Percentage of Blacks") +
coord_map("tetra") + theme(legend.position = "right") + scale_fill_viridis_c(option = "magma",
direction = -1, limits = c(0.6, 38), name = "% of Blacks")
pct_blackmap# asians
pct_asianmap <- ggplot()
pct_asianmap <- pct_asianmap + geom_polygon(data = MergedStates, aes(x = long, y = lat,
group = group, fill = pct_asian), color = "navyblue", linewidth = 0.3) + labs(title = "US Percentage of Asians") +
coord_map("tetra") + theme(legend.position = "right") + scale_fill_viridis_c(option = "magma",
direction = -1, limits = c(0.9, 15.9), name = "% of Asians")
pct_asianmap# white
pct_whitesmap <- ggplot()
pct_whitesmap <- pct_whitesmap + geom_polygon(data = MergedStates, aes(x = long,
y = lat, group = group, fill = pct_white), color = "navyblue", linewidth = 0.3) +
labs(title = "US Percentage of Whites") + coord_map("tetra") + theme(legend.position = "right") +
scale_fill_viridis_c(option = "magma", direction = -1, limits = c(35.2, 92.5),
name = "% of Whites")
pct_whitesmap# hispanic
pct_hispmap <- ggplot()
pct_hispmap <- pct_hispmap + geom_polygon(data = MergedStates, aes(x = long, y = lat,
group = group, fill = pct_hisp), color = "navyblue", linewidth = 0.3) + labs(title = "US Percentage of Hispanics") +
coord_map("tetra") + theme(legend.position = "right") + scale_fill_viridis_c(option = "magma",
direction = -1, limits = c(1.9, 50.1), name = "% of Hispanics")
pct_hispmapDo any of these maps surprise you? Could you have predicted
them?
Being from Texas, I did predict there
was more diversity in the state due to our proximity to the border and
the state’s history. However, I was surprised that there’s a higher
range of Whites up north for some reason.
The ggplot2 website and the Data Visualization Cheat Sheet provide many additional detailed of how to customize your graphs
Adding Points to a Map
The maps package also includes a us.cities
file. The following code adds a point for each major city in the United
States to the map. Notice that the size of the point is determined by
the population of that city (at the time the maps package
was most recently updated). Be sure to locate that command in the
code.
AdjustedUSGraph2 <- MyUSGraph2 + scale_fill_viridis_c(option = "mako", direction = -1,
name = "State Pop") + theme(legend.position = "bottom", legend.box = "horizontal") +
geom_point(data = us.cities, aes(x = long, y = lat, size = pop/1e+06), color = "pink") +
scale_size(name = "City Pop")
AdjustedUSGraph2Oops - the us.cities file includes some cities in Alaska
and in Hawaii (outside the continental US). We need to
filter so that we do not include any Alaskan or Hawaiian
cities on our map. In this case, the easiest way to filter is to notice
that all cities in the continental US have longitude values > -130,
but cities in Alaska and Hawaii do not. (We see this by looking at the
previous map.)
# Make and store 'Demograph' of the US states colored by population/1,000,000.
DemoGraph <- ggplot()
DemoGraph <- DemoGraph + geom_polygon(data = MergedStates, aes(x = long, y = lat,
group = group, fill = population/1e+06), color = "white", linewidth = 0.2) +
labs(title = "US State Populations") + scale_fill_viridis_c(option = "cividis",
direction = 1, name = "Pop/million", limits = c(0, 40)) + coord_map("tetra")
DemoGraphNow filter the cities before we put them on our map.
# Remove cities outside the Continental US.
MainCities <- filter(us.cities, long >= -130)
# Add the filtered major US cities to DemoGraph:
DemoGraph2 <- DemoGraph + geom_point(data = MainCities, aes(x = long, y = lat, size = pop/1e+06),
color = "deeppink", alpha = 0.5) + scale_size(name = "City Populations")
DemoGraph2The last thing we’ll consider today is zooming. In
the chunk below we zoom in on a particular region of
Demograph2, the graph we just made. Observe that we do this by
restricting the horizontal and vertical ranges in the
DemoGraph2 map using the xlim and
ylim commands.
# Zoom into a particular region of the Demograph2 plot. Call this Demograph3.
DemoGraph3 <- DemoGraph2 + coord_map("tetra", xlim = c(-80, -67), ylim = c(38, 48))
DemoGraph3Exercise 10: Create Demograph4 that zooms in on the state of Florida.
# Zoom in Florida.
Demograph4 <- DemoGraph2 + coord_map("polyconic", xlim = c(-88, -79), ylim = c(24,
31))
Demograph4Additional Resources
http://docs.ggplot2.org/current/: A well-documented list of ggplot2 components with descriptions
http://www.statmethods.net/advgraphs/ggplot2.html: Quick-R introduction to graphics
http://cran.r-project.org/web/packages/ggplot2/ggplot2.pdf: Formal documentation of the ggplot2 package
http://stackoverflow.com/tags/ggplot2: Stackoverflow, an online community to share information.
http://www.cookbook-r.com/Graphs/: R Graphics Cookbook, a text by Winston Chang
http://ggplot2.org/book/ : Sample chapters of the text, ggplot2: Elegant Graphics for Data Analysis
Homework 9
Complete Exercises 1-10 above.
Complete Exercises 11-15 below.
Exercise 11: (a) Another variable in the 2024 State
Population data is avgstudentloan. This is the average
student loan debt for each state. Create a US map that colors each state
by the average student loan debt divided by 1000, so that the debt is in
thousands of dollars. Give your map and legend appropriate titles.
map_avg_student_loan <- ggplot(MergedStates, aes(x = long, y = lat, group = group,
fill = avgstudentloan/1000)) + geom_polygon(color = "red") + scale_fill_continuous(name = "Average Student Loan Debt (in thousands of dollars)",
label = scales::comma_format(), limits = c(0, NA)) + labs(title = "Average Student Loan Debt by State") +
theme(legend.position = "bottom") + scale_fill_viridis_b(option = "mako") + coord_map() # Use default map projection
map_avg_student_loan
Which state has the highest average debt? What is the average in that
state? (Hint: Use the
summary or
range function.)
Maryland has the highest average debt
with $42,861.
Which state has the smallest average student loan debt? What is the
average for that state?
North Dakota has the smallest average
debt with $28,602.
Exercise 12a Another variable in these data is the
total student loan debt in the state. These values are recorded in
billions of dollars. Create a map that colors the states by their total
student loan debt (totaldebt). The values recorded are in
billions of dollars. Give your map and legend appropriate titles and
make sure the units are clear on your map.
map_total_student_loan_debt <- ggplot(MergedStates, aes(x = long, y = lat, group = group,
fill = totaldebt)) + geom_polygon(color = "red") + scale_fill_continuous(name = "Total Student Loan Debt (in billions of dollars)",
label = scales::comma_format(), limits = c(0, NA)) + labs(title = "Total Student Loan Debt by State") +
theme(legend.position = "bottom") + coord_map()
map_total_student_loan_debtQUESTIONS: Based on the map, which states have the largest totals of student loan debt? Which states have the smallest totals? Do the states with the largest and smallest totals correspond to the states with the largest and smallest average student loan debts? Explain.
The states with the largest based on the map are Texas and California. The smallest seems to be New Mexico and Oklahoma. This seems to correspond with the map.
Exercise 12b Let’s compare the total student loan debt per state with the state populations. Create a map that colors each state by 100000*(total debt/ population). This would be the total debt per 1000 population. (This uses debt in 100 million dollars rather than billions to make the scale manageable.) You will probably need to tell R what limits to use in this case - I’d suggest from about 2.7 to 6.2. Be sure to give your map and legend appropriate titles and make sure the units are clear on your map.
MergedStates$total_debt_per_1000_population <- 1e+05 * (MergedStates$totaldebt/MergedStates$population)
map_total_debt_per_1000_population <- ggplot(MergedStates, aes(x = long, y = lat,
group = group, fill = total_debt_per_1000_population)) + geom_polygon(color = "black") +
scale_fill_viridis_c(name = "Total Debt per 1000 Population ($100 million)",
label = scales::comma_format(), direction = -1) + labs(title = "Total Debt per 1000 Population by State") +
theme(legend.position = "bottom") + coord_map()
map_total_debt_per_1000_populationQUESTIONS: Discuss what this graph shows and compare it to the previous to graphs.
This graph shows that Georgia has the highest Total Debt per 1000 population, which was not seen in our past maps. ***
Exercise 13 Follow these steps to make a restricted state map.
- Filter the
MainStatefile to so that it includes only about 8-10 mainland U.S. states. The states should have something in common that interests you. - Join your newly restricted file with the StatePop file. Be sure to
look at a few rows of your joined data to make sure the merge was
successful.
- Then create a list of label locations for your restricted state list
(see CHUNKs 10-12 where we did this before).
- Finally, create a map that displays only your selected states,
colored by the “density” column in the
MergedStatesdata. Add the data labels and display an appropriately named legend.
Reminder: To restrict MainStates to a subset of
states use something like this command:
…ClarkState <- filter(MainStates,region %in% c("california", "oregon", "idaho", "nevada" . . .))…
# Step One, filter
selected_states <- filter(MainStates, region %in% c("texas", "arkansas", "louisiana",
"oklahoma", "kansas", "georgia", "mississippi", "alabama"))
# Join Data
MergedSelected <- inner_join(selected_states, StatePop, by = "region")
# Checking if it was joined successfully datatable(MergedSelected, options =
# list(autoWidth = FALSE)) List of labels Calculate centroids of the selected
# states
centroids <- aggregate(cbind(long, lat) ~ region, data = selected_states, FUN = mean)
# Create label locations dataframe
label_locations <- data.frame(x = centroids$long, y = centroids$lat, state = centroids$region)
# map
newmap <- ggplot() + geom_polygon(data = MergedSelected, aes(x = long, y = lat, group = group,
fill = density), color = "white") + geom_label(data = label_locations, aes(x = x,
y = y, label = state), size = 3, fill = "white", color = "black", label.padding = unit(0.3,
"lines"), label.r = unit(0.2, "lines"), show.legend = FALSE) + scale_fill_viridis_c(option = "magma",
direction = -1, name = "Density") + labs(title = "Density of Selected States") +
theme(legend.position = "right")
newmapExercise 14 Create a map of your restricted list of
states and color each state by the drug overdose rate
(overdose_rt). You will want to use the merged file you
created for Exercise 12. Determine appropriate limits for your scale and
be sure to use appropriate titles on your map and legend.
summary(MergedStates$overdose_rt)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.30 19.40 28.10 29.64 35.00 81.40
OverdoseMap <- ggplot()
OverdoseMap <- OverdoseMap + geom_polygon(data = MergedSelected, aes(x = long, y = lat,
group = group, fill = overdose_rt), color = "navyblue", linewidth = 0.3) + geom_label(data = label_locations,
aes(x = x, y = y, label = state), size = 3, fill = "white", color = "black",
label.padding = unit(0.3, "lines"), label.r = unit(0.2, "lines"), show.legend = FALSE) +
labs(title = "Drug Overdose Rate by Selected States") + coord_map("tetra") +
theme(legend.position = "right") + scale_fill_viridis_c(option = "magma", direction = -1,
limits = c(10.3, 81.4), name = "Drug Overdose Rate")
OverdoseMapExercise 15 Add points to represent the well-known
cities (using MainCities) in each of your chosen states.
You should show only cities that lie in your subset of states. Eliminate
other cities before you use the MainCities data. (Note: The
“state” variable in us.cities and therefore in
MainCities is named country.etc.)
Hints: Use filter and %in%.
Example ``MainCities <- filter(us.cities, long >=-130 & long < -100)
``Clarkcities <- filter(MainCities, country.etc %in% c("CA", "OR", ...))`` ...
summary(MergedSelected$long)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -106.65 -97.09 -91.99 -92.80 -89.75 -80.84
MainCities <- filter(us.cities, long >= -106.7 & long < -81)
Selectedcities <- filter(MainCities, country.etc %in% c("KS", "AR", "GA", "AL", "LA",
"MS", "OK", "TX"))
# datatable(Selectedcities)
us_map <- ggplot() + geom_polygon(data = selected_states, aes(x = long, y = lat,
group = group), color = "black", fill = "lightblue") + geom_point(data = Selectedcities,
aes(x = long, y = lat), color = "blue", size = 2) + labs(title = "Selected States and Cities") +
theme_minimal()
# Display the map
us_mapAs always, change the name and date at the top of the file. Knit your file to an HTML document. Open the HTML file in a browser, and save the file WITH YOUR NAME in it. Then post the file in Moodle no later than midnight Saturday, April 13.