Interactive graphics allow you to manipulate plotted data to gain further insight. As an example, an interactive graphic would allow you to zoom in on a subset of your data without the need to create a new plot. In this course, you will learn how to create and customize interactive graphics in plotly using the R programming language. Along the way, you will review data visualization best practices and be introduced to new plot types such as scatterplot matrices and binned scatterplots.
In the final chapter, you use your plotly toolkit to explore the results of the 2018 United States midterm elections, learning how to create maps in plotly along the way.
## Warning: package 'readr' was built under R version 3.4.4
## Warning: package 'dplyr' was built under R version 3.4.4
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Warning: package 'plotly' was built under R version 3.4.4
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.4.4
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
## Parsed with column specification:
## cols(
## state = col_character(),
## state.abbr = col_character(),
## turnout2018 = col_double(),
## turnout2014 = col_double(),
## ballots = col_integer(),
## vep = col_integer(),
## vap = col_integer()
## )
## Warning: package 'bindrcpp' was built under R version 3.4.4
In the United States, midterm elections typically see lower voter turnout than presidential elections. However, with so much buzz surrounding the 2018 midterm elections, turnout was expected to be higher than for previous midterm elections. Was this the case?
Your task is to create a scatterplot comparing the voter turnout (i.e. the proportion of eligible voters that cast votes) in each state between the 2014 and 2018 midterm elections.
Note that plotly has already been loaded for you.
Create a scatterplot displaying turnout in 2014 on the x-axis and turnout in 2018 on the y-axis. Title the x-axis “2014 voter turnout” and the y-axis “2018 voter turnout”.
## Parsed with column specification:
## cols(
## state = col_character(),
## state.abbr = col_character(),
## turnout2018 = col_double(),
## turnout2014 = col_double(),
## ballots = col_integer(),
## vep = col_integer(),
## vap = col_integer()
## )
While it was not difficult to determine that higher proportions of eligible voters turned out in nearly every state from your last plot, it probably took you a little time to see this. Hover info certainly makes this task easier, but interactivity alone isn’t enough to make this task “easy.”
So how can you help readers digest your chart more quickly? By adding the reference line y=x. Observations that fall above this reference line will correspond to states with higher voter turnout in 2018. In plotly, you can add a line by connecting two points using add_lines(x = c(x1, x2), y = (y1, y2)).
Your task is to add the y=x reference line to your previous plot.
Note that your plot from the last exercise is stored in the object p.
Use add_lines() to add a reference line passing through the points (0.25, 0.25) and (0.6, 0.6). Hide the legend that’s added to the chart by default.
The hover information on the previous scatterplot allows you to determine which state had the highest turnout, but it takes considerable time to compare the turnout between states. In this exercise, your task is to create a scatterplot displaying state on the y-axis and voter turnout on the x-axis. Scatterplots displaying one categorical and one quantitative variable are often called dotplots, and allow for quicker comparisons between groups.
The turnout dataset contains information on the proportion of eligible voters (turnout) that voted in the 2018 midterm election in each state.
In the sample code, turnout %>% top_n(15, wt = turnout) extracts the 15 states with the highest turnout rates.
Note that plotly, dplyr, and forcats have already been loaded for you.
For the top 15 states, create a dotplot (i.e. scatterplot) displaying turnout2018 on the x-axis and state on the y-axis, where state has been reordered by turnout2018. Title the y-axis “State” and set its type to category. Title the x-axis “Eligible voter turnout”
## Warning: package 'forcats' was built under R version 3.4.4
Control of the Senate was up for grabs in the 2018 midterm elections, and along with it President Trump’s ability to shape the judicial branch of government. Both parties fought hard to control this chamber of Congress, so how did this translate to fundraising? A first step at understanding this issue is to visualize the distribution of funds received by Senate candidates.
Your task is to create a histogram of displaying the distribution of funds received by Senate candidates during the 2018 election cycle. When you’re done, try to identify the race with the highest level of fundraising.
plotly has already been loaded for you.
Filter to extract only the Senate races (designated by “S”). Create a histogram of receipts. Add the title “Fundraising for 2018 Senate races” and title the x-axis “Total contributions received”.
As you saw, most Senate campaigns raised under $1M and the vast majority raised under $20M, so what races raised these astronomical amounts? Histograms bin observations, obscuring easy identification of individual candidates, so a different chart is needed to explore this question.
Your task is to create a dotplot of the 15 Senate campaigns that raised the most money during the 2018 election cycle. You will also need to customize the hover info to facilitate easy identification of the candidates.
Focus first on creating the plot, but be sure to review how the hover info was customized!
Note that plotly has already been loaded for you.
Use top_n() to extract the cases corresponding to the 15 Senate campaigns that raised the most money. For the top 15 campaigns, create a dotplot (i.e. scatterplot) displaying receipts on the x-axis and state on the y-axis, where state has been reordered by receipts. Change the colors so that blue represents Democrats (DEM) and red represents Republicans (REP)
#
You already saw that voter turnout increased in nearly every state in the 2018 midterm elections compared to the 2014 midterms. In this exercise, your task is to map the change in voter turnout between these two midterm elections.
The turnout data frame, dplyr, and plotly have already been loaded for you.
Use mutate() to add a change column to turnout, which is calculated by as the difference between the turnout in 2018 (turnout2018) and 2014 (turnout2014). Use plot_geo() and add_trace() to create a choropleth map of the change in voter turnout by state, mapping change to z and state.abbr to locations. Restrict the scope of the map to the ‘usa’ using layout().
There were 33 Senate seats on the ballot in the 2018 midterms (plus two special elections that we’ll ignore in this exercise). Your task is to create a choropleth map using the winning candidate’s political party to color in the state.
This task requires you to map a factor to the fill color. However, the z aesthetic expects a numeric variable. An easy work around is to convert party to a numeric variable via as.numeric(party) and then manually specify the desired colors in add_trace(). Additionally, the colorbar is no longer very useful, and can be removed by adding the layer hide_colorbar().
The senate_winners data frame and plotly have already been loaded for you.
Create a choropleth map of the where the color of the state represents the winning party. In add_trace(), manually specify the colors “dodgerblue”, “mediumseagreen”, and “tomato” (in that order). Complete the hover info text with the appropriate column names.
## Parsed with column specification:
## cols(
## Row_ID = col_integer(),
## name = col_character(),
## id = col_character(),
## state = col_character(),
## party = col_character(),
## incumbent = col_character(),
## votes = col_integer(),
## pct.vote = col_integer()
## )
## Warning: package 'tidyverse' was built under R version 3.4.4
## -- Attaching packages ---------------------------------- tidyverse 1.2.1 --
## v tibble 1.4.2 v purrr 0.2.5
## v tidyr 0.8.1 v stringr 1.3.1
## Warning: package 'tibble' was built under R version 3.4.3
## Warning: package 'tidyr' was built under R version 3.4.4
## Warning: package 'purrr' was built under R version 3.4.4
## Warning: package 'stringr' was built under R version 3.4.4
## -- Conflicts ------------------------------------- tidyverse_conflicts() --
## x plotly::filter() masks dplyr::filter(), stats::filter()
## x dplyr::lag() masks stats::lag()
## Observations: 33
## Variables: 8
## $ Row_ID <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1...
## $ name <fct> SINEMA, KYRSTEN, FEINSTEIN, DIANNE, MURPHY, CHRISTOP...
## $ id <fct> S8AZ00197, S0CA00199, S2CT00132, S8DE00079, S8FL0027...
## $ state <fct> AZ, CA, CT, DE, FL, HI, IN, MA, MD, ME, MI, MN, MO, ...
## $ party <fct> DEM, DEM, DEM, DEM, REP, DEM, REP, DEM, DEM, IND, DE...
## $ incumbent <fct> OPEN, INCUMBENT, INCUMBENT, INCUMBENT, CHALLENGER, I...
## $ votes <int> 938976, 4777661, 818614, 217358, 4097689, 276133, 11...
## $ pct.vote <int> 49, 54, 59, 61, 50, 71, 51, 60, 64, 54, 52, 60, 51, ...
The maps created using plot_geo() are still plotly objects, so you can add additional layers as before. In this exercise, you will add points to a United States map representing the locations where President Trump held rallies for the 2018 midterm election. The dataset rallies2018 contains the date, city, state, latitude, longitude, and number of number of people who spoke.
Note that plotly has already been loaded for you.
Use add_markers() to add points representing the rallies to the U.S. map. Be sure to map long to the x-axis, lat to the y-axis, and no.speakers to the size of the points. Add the title “2018 Trump Rallies”. Restrict the scope of the map to the ‘usa’.
## Parsed with column specification:
## cols(
## Row_No = col_integer(),
## date = col_character(),
## Yeatr = col_integer(),
## city = col_character(),
## state = col_character(),
## no.speakers = col_integer(),
## lat = col_double(),
## long = col_double()
## )
## Warning: `line.width` does not currently support multiple values.
In the previous exercise you saw the default settings for the geo layout in plotly, but this it is quite easy to customize by specifying additional arguments in the list passed to geo in layout().
In this exercise you will explore a few useful options outlined below:
To change the color of the landmass, add the arguments showland = TRUE and set a landcolor. To make lakes distinct from landmasses, add the arguments showlakes = TRUE and set a lakecolor. To display states/provinces, set showsubunit = TRUE, and the set subunitcolor. To display countries, set showcountries = TRUE, and the set countrycolor. Note that you must use the toRGB() function in order to pass R colors to the geo layout.
plotly has already been loaded for you.
Customize the appearance of your map from the previous exercise by defining the list g and passing it to the geo layout: Set the landmass color with “gray90”. Set the lake color with “white”. Set the state (subunit) color with “white”.
## Warning: `line.width` does not currently support multiple values.
In the last lesson you created a choropleth map for the Senate results using plot_geo() with a few workarounds. In this exercise, your task is to recreate that map from polygons. That is, create a U.S. map from polygons and fill in states based on the winner of the Senate race.
The senate_map data frame and plotly have already been loaded for you. senate_map contains the information you have seen previously, along with the boundary information needed to draw state polygons.
Create a state-level choropleth map where party is mapped to color and region is mapped to split. Specify that boundary lines should have width = 0.4 and that the legend should not be shown. Set the polygon colors to “dodgerblue”, “mediumseagreen”, and “tomato” in the plot_ly() layer. To draw the boundaries for states with NAs for party (i.e. a state without a Senate race), change the color of the lines with toRGB(“gray60”).
To simplify your code, define the layout options to remove the axis titles, grid, zero lines, and tick marks as the list map_axes, and then pass this list to xaxis and yaxis. Complete the code to do this.
NOT ALL POLYGONS ARE READ IN
The 2018 Senate race in Florida was extremely contentious, and was not resolved on election night. The race was too close to call, and the recount process was as controversial as the race, with accusations of poorly designed ballots reminiscent of the infamous butterfly ballot in the 2000 presidential election, and a slew of legal challenges.
In this exercise, your task is to create a county-level choropleth map of the percentage of the two-party vote that the Republican candidate, Rick Scott (the ultimate winner of the race), received according to the first set of results (pre-recount).
The results are in fl_results and the county boundaries are in fl_boundaries. plotly has already been loaded for you.
Join the fl_boundaries and fl_results data frames. fl_boundaries and fl_results have different column names for the counties so you will need to map subregion to CountyName.
Create a county-level choropleth map where counties are colored by the percentage of voters who voted for Rick Scott. Specify that the boundary lines should have a width of 0.4 and that the legend should not be shown.
Define the axis layout settings in map_axes to remove the titles, grid lines, zero lines, and tick marks, and pass this list to the xaxis and yaxis layouts.
## Warning: Column `subregion`/`CountyName` joining factors with different
## levels, coercing to character vector
## 'data.frame': 2443 obs. of 14 variables:
## $ long : num -82.7 -82.6 -82.6 -82.6 -82.6 ...
## $ lat : num 29.8 29.8 29.8 29.9 29.9 ...
## $ group : int 290 290 290 290 290 290 290 290 290 290 ...
## $ order : int 12216 12217 12218 12219 12220 12221 12222 12223 12224 12225 ...
## $ region : Factor w/ 2 levels "florida","florida h": 1 1 1 1 1 1 1 1 1 1 ...
## $ subregion : Factor w/ 67 levels "alachua","baker",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ PartyCode : Factor w/ 1 level "REP": 1 1 1 1 1 1 1 1 1 1 ...
## $ CountyCode : Factor w/ 67 levels "ALA","BAK","BAY",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Precincts : int 63 63 63 63 63 63 63 63 63 63 ...
## $ PrecinctsReporting: int 63 63 63 63 63 63 63 63 63 63 ...
## $ CanNameLast : Factor w/ 1 level "Scott": 1 1 1 1 1 1 1 1 1 1 ...
## $ CanNameFirst : Factor w/ 1 level "Rick": 1 1 1 1 1 1 1 1 1 1 ...
## $ CanVotes : int 40590 40590 40590 40590 40590 40590 40590 40590 40590 40590 ...
## $ Pctvote : num 0.353 0.353 0.353 0.353 0.353 ...
## Warning: line.color doesn't (yet) support data arrays
## Warning: Only one fillcolor per trace allowed
## Warning: line.color doesn't (yet) support data arrays
## Warning: Only one fillcolor per trace allowed