These are my notes for GRD 610A: Data Visualization II in Winter 2022 at the College for Creative Studies. These notes are for my work in the book Data Visualization by Kieran Healy (Princeton University Press, 2019).
Objects in R are created and referred to by their names. Certain names are not allowed because they are reserved words such as TRUE, if, mean(), and NA. Names also cannot start with a number or contain spaces. There are different naming conventions.
Snake Case
my_data
this_is_snake_case
Camel Case
myData
thisIsCamelCase
Pascal Case
MyData
ThisIsPascalCase
Pick one naming convention and stick with it. Be consistent; don’t switch between conventions. I recommend snake case.
# This is a comment (it starts with #)
my_data <- c(1, 2, 3, 4) # Assign using <- ; use ALT + - or OPTION + -
My_Data
## Error in eval(expr, envir, enclos): object 'My_Data' not found
# Cannot be found because we called it my_data (lowercase)
# Now we can see it
my_data
## [1] 1 2 3 4
Think of functions like a recipe. The arguments of the function are the ingredients and what happens within the function is the sequence of cooking steps.
c(1, 2, 3, 1, 3, 5, 25) # c() is the combine function, it puts things together into a vector/list
## [1] 1 2 3 1 3 5 25
my_numbers <- c(1, 2, 3, 1, 3, 5, 25)
your_numbers <- c(5, 31, 71, 1, 3, 21, 6)
my_numbers
## [1] 1 2 3 1 3 5 25
mean(x = my_numbers)
## [1] 5.714286
mean(my_numbers) # you don't have to specify the argument names, but order matters if you do not specify
## [1] 5.714286
mean(x = your_numbers)
## [1] 19.71429
my_summary <- summary(my_numbers)
my_summary
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.500 3.000 5.714 4.000 25.000
table(my_numbers)
## my_numbers
## 1 2 3 5 25
## 2 1 2 1 1
sd(my_numbers)
## [1] 8.616153
my_numbers * 5
## [1] 5 10 15 5 15 25 125
my_numbers + 1
## [1] 2 3 4 2 4 6 26
my_numbers + my_numbers # How is this different than the last line?
## [1] 2 4 6 2 6 10 50
# If you're not sure what an object is, ask for its class or type
class(my_numbers)
## [1] "numeric"
class(my_summary)
## [1] "summaryDefault" "table"
class(summary)
## [1] "function"
my_new_vector <- c(my_numbers, "Apple") # What happens if we combine a word with numbers?
my_new_vector
## [1] "1" "2" "3" "1" "3" "5" "25" "Apple"
class(my_new_vector)
## [1] "character"
# Let's look at a new dataset
titanic
## fate sex n percent
## 1 perished male 1364 62.0
## 2 perished female 126 5.7
## 3 survived male 367 16.7
## 4 survived female 344 15.6
class(titanic)
## [1] "data.frame"
# Titanic is a data frame, which is like a table
# The $ operator can be used to access a column of a data frame by name
titanic$percent
## [1] 62.0 5.7 16.7 15.6
# Tibbles are slightly different than data frames. They are also data tables, but they provide more information.
titanic_tb <- as_tibble(titanic)
titanic_tb # How is does this compare to titanic above?
## # A tibble: 4 x 4
## fate sex n percent
## <fct> <fct> <dbl> <dbl>
## 1 perished male 1364 62
## 2 perished female 126 5.7
## 3 survived male 367 16.7
## 4 survived female 344 15.6
# To see inside an object, ask for its structure
str(my_numbers)
## num [1:7] 1 2 3 1 3 5 25
str(my_summary)
## 'summaryDefault' Named num [1:6] 1 1.5 3 5.71 4 ...
## - attr(*, "names")= chr [1:6] "Min." "1st Qu." "Median" "Mean" ...
Programming in R can be challenging and it takes time to get used to. Be patient and take a break if you get stuck. Make sure parentheses are opened and closed. Complete your commands (look out for the + in the console). Take your time and lookout for typos.
In this section, we will get data from a URL and make a quick figure.
# Data source
url <- "https://cdn.rawgit.com/kjhealy/viz-organdata/master/organdonation.csv"
# Read the CSV from the URL
organs <- read_csv(file = url)
## Rows: 238 Columns: 21
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (7): country, world, opt, consent.law, consent.practice, consistent, ccode
## dbl (14): year, donors, pop, pop.dens, gdp, gdp.lag, health, health.lag, pub...
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Take a quick look at the data
glimpse(organs)
## Rows: 238
## Columns: 21
## $ country <chr> "Australia", "Australia", "Australia", "Australia", "~
## $ year <dbl> NA, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1~
## $ donors <dbl> NA, 12.09, 12.35, 12.51, 10.25, 10.18, 10.59, 10.26, ~
## $ pop <dbl> 17065, 17284, 17495, 17667, 17855, 18072, 18311, 1851~
## $ pop.dens <dbl> 0.2204433, 0.2232723, 0.2259980, 0.2282198, 0.2306484~
## $ gdp <dbl> 16774, 17171, 17914, 18883, 19849, 21079, 21923, 2296~
## $ gdp.lag <dbl> 16591, 16774, 17171, 17914, 18883, 19849, 21079, 2192~
## $ health <dbl> 1300, 1379, 1455, 1540, 1626, 1737, 1846, 1948, 2077,~
## $ health.lag <dbl> 1224, 1300, 1379, 1455, 1540, 1626, 1737, 1846, 1948,~
## $ pubhealth <dbl> 4.8, 5.4, 5.4, 5.4, 5.4, 5.5, 5.6, 5.7, 5.9, 6.1, 6.2~
## $ roads <dbl> 136.59537, 122.25179, 112.83224, 110.54508, 107.98096~
## $ cerebvas <dbl> 682, 647, 630, 611, 631, 592, 576, 525, 516, 493, 474~
## $ assault <dbl> 21, 19, 17, 18, 17, 16, 17, 17, 16, 15, 16, 15, 14, N~
## $ external <dbl> 444, 425, 406, 376, 387, 371, 395, 385, 410, 409, 393~
## $ txp.pop <dbl> 0.9375916, 0.9257116, 0.9145470, 0.9056433, 0.8961075~
## $ world <chr> "Liberal", "Liberal", "Liberal", "Liberal", "Liberal"~
## $ opt <chr> "In", "In", "In", "In", "In", "In", "In", "In", "In",~
## $ consent.law <chr> "Informed", "Informed", "Informed", "Informed", "Info~
## $ consent.practice <chr> "Informed", "Informed", "Informed", "Informed", "Info~
## $ consistent <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes~
## $ ccode <chr> "Oz", "Oz", "Oz", "Oz", "Oz", "Oz", "Oz", "Oz", "Oz",~
# View(organs) # Run in RStudio
# Another way to view data
gapminder
## # A tibble: 1,704 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
## 7 Afghanistan Asia 1982 39.9 12881816 978.
## 8 Afghanistan Asia 1987 40.8 13867957 852.
## 9 Afghanistan Asia 1992 41.7 16317921 649.
## 10 Afghanistan Asia 1997 41.8 22227415 635.
## # ... with 1,694 more rows
# Make a plot object
p <- ggplot(data = gapminder,
mapping = aes(x = gdpPercap,
y = lifeExp))
# Create a scatterplot
p + geom_point()
ggplot2 is an R library/package that allows us to map data to visual elements. Using it we can control the way the data appears in the plot and how each element of the plot will be displayed. Aesthetic Mappings make the connection between the data and how it is displayed on the plot (location, size, color, shape, etc.). Geoms define the type of plot (scatterplot, line plot, box plot, bar chart, etc.). Code is added together to make the plot using + the plus sign. More pieces can be added to the plot that define the scales, legend, labels, axes, style or theme of the plot, etc. Each part can be added using different functions with arguments specifying the look of the plot; the plot is built up piece by piece.
In tidy data:\ 1. Each variable forms a column.
2. Each observation forms a row.
3. Each type of observational unit forms a table.
From Wickham, H. (2014). Tidy Data. Journal of Statistical Software, 59(10).
Build a plot layer by layer, starting with telling ggplot what data to use and how to map or link it to parts of the plot, like the x and y axes. Then add on the type of geom.
p <- ggplot(data = gapminder,
mapping = aes(x = gdpPercap,
y = lifeExp))
p + geom_point()
Trying different geom_ functions.
p <- ggplot(data = gapminder,
mapping = aes(x = gdpPercap,
y = lifeExp))
p + geom_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
p + geom_point() + # add the points back into the plot
geom_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
p + geom_point() +
geom_smooth(method = "lm") # use a linear model
## `geom_smooth()` using formula 'y ~ x'
p + geom_point() +
geom_smooth(method = "gam") + # generalized additive model
scale_x_log10() # transform x-axis to log-10 scale
## `geom_smooth()` using formula 'y ~ s(x, bs = "cs")'
p + geom_point() +
geom_smooth(method = "gam") +
scale_x_log10(labels = scales::dollar) # format x-axis in dollars
## `geom_smooth()` using formula 'y ~ s(x, bs = "cs")'
Using the aesthetics mapping, different parts of the data can be encoded in different ways.
p <- ggplot(data = gapminder,
mapping = aes(x = gdpPercap,
y = lifeExp,
color = "purple")) # ggplot adds the value "purple" to all rows
p + geom_point() +
geom_smooth(method = "loess") +
scale_x_log10()
## `geom_smooth()` using formula 'y ~ x'
# To actually turn all of the points purple, we need to set the color property of the geom_ function
p <- ggplot(data = gapminder,
mapping = aes(x = gdpPercap,
y = lifeExp))
p + geom_point(color = "purple") + # set point color to purple
geom_smooth(method = "loess") +
scale_x_log10()
## `geom_smooth()` using formula 'y ~ x'
p + geom_point(alpha = 0.3) + # make points more transparent
geom_smooth(color = "orange", # make line orange
se = FALSE, # remove standard error band
size = 8, # increase thickness of the line
method = "lm") +
scale_x_log10()
## `geom_smooth()` using formula 'y ~ x'
p + geom_point(alpha = 0.3) + # make points more transparent
geom_smooth(method = "gam") +
scale_x_log10(labels = scales::dollar) +
# Add title and labels
labs(x = "GDP per Capita",
y = "Life Expectancy in Years",
title = "Economic Growth and Life Expectancy",
subtitle = "Data points are country-years",
caption = "Source: Gapminder")
## `geom_smooth()` using formula 'y ~ s(x, bs = "cs")'
# Map data by continent
p <- ggplot(data = gapminder,
mapping = aes(x = gdpPercap,
y = lifeExp,
color = continent))
p + geom_point() +
geom_smooth(method = "loess") +
scale_x_log10()
## `geom_smooth()` using formula 'y ~ x'
p <- ggplot(data = gapminder,
mapping = aes(x = gdpPercap,
y = lifeExp,
color = continent,
fill = continent)) # now the error bands will also have the color
p + geom_point() +
geom_smooth(method = "loess") +
scale_x_log10()
## `geom_smooth()` using formula 'y ~ x'
p <- ggplot(data = gapminder,
mapping = aes(x = gdpPercap,
y = lifeExp))
p + geom_point(mapping = aes(color = continent)) + # points will be colored by continent
geom_smooth(method = "loess") + # the smoothed line will be for all data
scale_x_log10()
## `geom_smooth()` using formula 'y ~ x'
p + geom_point(mapping = aes(color = log(pop))) + # points will be colored by population
scale_x_log10()
Use here() to save plots in the current directory. This function can also be used to reference folders within the current directory. For this class, use .svg to save in vector format and embed in Adobe Illustrator. The function to save a plot is ggsave() which will automatically save the last plot and can also be provided a ggplot object to save.
Pick at least two of the questions presented under the Where to Go Next section and answer them.
“Code almost never works properly the first time you write it.” (p. 73)
p <- ggplot(data = gapminder,
mapping = aes(x = year,
y = gdpPercap))
p + geom_line() # Something is wrong, we didn't tell it how to group
p + geom_line(aes(group = country)) # Now there is a line per country
Facet = small multiple (i.e. a separate graph for each value of the variable)
p <- ggplot(data = gapminder,
mapping = aes(x = year,
y = gdpPercap))
p + geom_line(aes(group = country)) +
facet_wrap(~continent) # make a separate plot for each continent
# Make it look a little nicer
p + geom_line(color = "gray70",
aes(group = country)) +
geom_smooth(size = 1.1, method = "loess", se = FALSE) +
scale_y_log10(labels = scales::dollar) +
facet_wrap(~continent, ncol = 5) +
labs(x = "Year",
y = "GDP per capita",
title = "GDP per capita on Five Continents")
## `geom_smooth()` using formula 'y ~ x'
# New dataset 2016 General Social Survey with more categorical data
glimpse(gss_sm)
## Rows: 2,867
## Columns: 32
## $ year <dbl> 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016~
## $ id <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,~
## $ ballot <labelled> 1, 2, 3, 1, 3, 2, 1, 3, 1, 3, 2, 1, 2, 3, 2, 3, 3, 2,~
## $ age <dbl> 47, 61, 72, 43, 55, 53, 50, 23, 45, 71, 33, 86, 32, 60, 76~
## $ childs <dbl> 3, 0, 2, 4, 2, 2, 2, 3, 3, 4, 5, 4, 3, 5, 7, 2, 6, 5, 0, 2~
## $ sibs <labelled> 2, 3, 3, 3, 2, 2, 2, 6, 5, 1, 4, 4, 3, 6, 0, 1, 3, 8,~
## $ degree <fct> Bachelor, High School, Bachelor, High School, Graduate, Ju~
## $ race <fct> White, White, White, White, White, White, White, Other, Bl~
## $ sex <fct> Male, Male, Male, Female, Female, Female, Male, Female, Ma~
## $ region <fct> New England, New England, New England, New England, New En~
## $ income16 <fct> $170000 or over, $50000 to 59999, $75000 to $89999, $17000~
## $ relig <fct> None, None, Catholic, Catholic, None, None, None, Catholic~
## $ marital <fct> Married, Never Married, Married, Married, Married, Married~
## $ padeg <fct> Graduate, Lt High School, High School, NA, Bachelor, NA, H~
## $ madeg <fct> High School, High School, Lt High School, High School, Hig~
## $ partyid <fct> "Independent", "Ind,near Dem", "Not Str Republican", "Not ~
## $ polviews <fct> Moderate, Liberal, Conservative, Moderate, Slightly Libera~
## $ happy <fct> Pretty Happy, Pretty Happy, Very Happy, Pretty Happy, Very~
## $ partners <fct> NA, "1 Partner", "1 Partner", NA, "1 Partner", "1 Partner"~
## $ grass <fct> NA, Legal, Not Legal, NA, Legal, Legal, NA, Not Legal, NA,~
## $ zodiac <fct> Aquarius, Scorpio, Pisces, Cancer, Scorpio, Scorpio, Capri~
## $ pres12 <labelled> 3, 1, 2, 2, 1, 1, NA, NA, NA, 2, NA, NA, 1, 1, 2, 1, ~
## $ wtssall <dbl> 0.9569935, 0.4784968, 0.9569935, 1.9139870, 1.4354903, 0.9~
## $ income_rc <fct> Gt $170000, Gt $50000, Gt $75000, Gt $170000, Gt $170000, ~
## $ agegrp <fct> Age 45-55, Age 55-65, Age 65+, Age 35-45, Age 45-55, Age 4~
## $ ageq <fct> Age 34-49, Age 49-62, Age 62+, Age 34-49, Age 49-62, Age 4~
## $ siblings <fct> 2, 3, 3, 3, 2, 2, 2, 6+, 5, 1, 4, 4, 3, 6+, 0, 1, 3, 6+, 2~
## $ kids <fct> 3, 0, 2, 4+, 2, 2, 2, 3, 3, 4+, 4+, 4+, 3, 4+, 4+, 2, 4+, ~
## $ religion <fct> None, None, Catholic, Catholic, None, None, None, Catholic~
## $ bigregion <fct> Northeast, Northeast, Northeast, Northeast, Northeast, Nor~
## $ partners_rc <fct> NA, 1, 1, NA, 1, 1, NA, 1, NA, 3, 1, NA, 1, NA, 0, 1, 0, N~
## $ obama <dbl> 0, 1, 0, 0, 1, 1, NA, NA, NA, 0, NA, NA, 1, 1, 0, 1, 0, 1,~
# Practice using facet_grid() to facet between multiple variables
p <- ggplot(data = gss_sm,
mapping = aes(x = age,
y = childs))
p + geom_point(alpha = 0.2) +
geom_smooth() +
facet_grid(sex ~ race)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 18 rows containing non-finite values (stat_smooth).
## Warning: Removed 18 rows containing missing values (geom_point).
Each geom_ function has an associated stat_ function that is used to plot the data. Sometimes this involves transforming the data in some way.
p <- ggplot(data = gss_sm,
mapping = aes(x = bigregion))
p + geom_bar() # makes a bar graph that counts the number of observations per region; count is computed for us
p + geom_bar(mapping = aes(y = ..prop..)) # the prop statistic can show us proportions
# But this is not right, each shows 100%
# So, we need to fix the automatic grouping that is occurring by region
p + geom_bar(mapping = aes(y = ..prop.., group = 1)) # using group = 1 is basically a placeholder that says all the data is in the same group
# Look at a different variable
table(gss_sm$religion)
##
## Protestant Catholic Jewish None Other
## 1371 649 51 619 159
p <- ggplot(data = gss_sm,
mapping = aes(x = religion, color = religion))
p + geom_bar() # only the outline has a color - we need to use fill
p <- ggplot(data = gss_sm,
mapping = aes(x = religion, fill = religion))
p + geom_bar()
# Remove the legend
p + geom_bar() +
guides(fill = FALSE)
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
# How can we look at two variables together
p <- ggplot(data = gss_sm,
mapping = aes(x = bigregion,
fill = religion))
p + geom_bar() # Stacked bar chart of counts
p + geom_bar(position = "fill") # Stacked bar chart of proportions
p + geom_bar(position = "dodge") # Bar chart of counts side by side
p + geom_bar(position = "dodge",
mapping = aes(y = ..prop..)) # Bar chart of proportions side by side
# Not quite right - all are 100%
p + geom_bar(position = "dodge",
mapping = aes(y = ..prop..,
group = religion)) # Bar chart of proportions side by side
# The proportions sum to 1 by religion across the regions
p <- ggplot(data = gss_sm,
mapping = aes(x = religion))
p + geom_bar(position = "dodge",
mapping = aes(y = ..prop..,
group = bigregion)) +
facet_wrap(~bigregion, ncol = 2)
# Now the proportions sum to 1 by region across religions
Histograms create bins of numerical data and display the distribution of the data within those bins.
# A new dataset
glimpse(midwest)
## Rows: 437
## Columns: 28
## $ PID <int> 561, 562, 563, 564, 565, 566, 567, 568, 569, 570,~
## $ county <chr> "ADAMS", "ALEXANDER", "BOND", "BOONE", "BROWN", "~
## $ state <chr> "IL", "IL", "IL", "IL", "IL", "IL", "IL", "IL", "~
## $ area <dbl> 0.052, 0.014, 0.022, 0.017, 0.018, 0.050, 0.017, ~
## $ poptotal <int> 66090, 10626, 14991, 30806, 5836, 35688, 5322, 16~
## $ popdensity <dbl> 1270.9615, 759.0000, 681.4091, 1812.1176, 324.222~
## $ popwhite <int> 63917, 7054, 14477, 29344, 5264, 35157, 5298, 165~
## $ popblack <int> 1702, 3496, 429, 127, 547, 50, 1, 111, 16, 16559,~
## $ popamerindian <int> 98, 19, 35, 46, 14, 65, 8, 30, 8, 331, 51, 26, 17~
## $ popasian <int> 249, 48, 16, 150, 5, 195, 15, 61, 23, 8033, 89, 3~
## $ popother <int> 124, 9, 34, 1139, 6, 221, 0, 84, 6, 1596, 20, 7, ~
## $ percwhite <dbl> 96.71206, 66.38434, 96.57128, 95.25417, 90.19877,~
## $ percblack <dbl> 2.57527614, 32.90043290, 2.86171703, 0.41225735, ~
## $ percamerindan <dbl> 0.14828264, 0.17880670, 0.23347342, 0.14932156, 0~
## $ percasian <dbl> 0.37675897, 0.45172219, 0.10673071, 0.48691813, 0~
## $ percother <dbl> 0.18762294, 0.08469791, 0.22680275, 3.69733169, 0~
## $ popadults <int> 43298, 6724, 9669, 19272, 3979, 23444, 3583, 1132~
## $ perchsd <dbl> 75.10740, 59.72635, 69.33499, 75.47219, 68.86152,~
## $ percollege <dbl> 19.63139, 11.24331, 17.03382, 17.27895, 14.47600,~
## $ percprof <dbl> 4.355859, 2.870315, 4.488572, 4.197800, 3.367680,~
## $ poppovertyknown <int> 63628, 10529, 14235, 30337, 4815, 35107, 5241, 16~
## $ percpovertyknown <dbl> 96.27478, 99.08714, 94.95697, 98.47757, 82.50514,~
## $ percbelowpoverty <dbl> 13.151443, 32.244278, 12.068844, 7.209019, 13.520~
## $ percchildbelowpovert <dbl> 18.011717, 45.826514, 14.036061, 11.179536, 13.02~
## $ percadultpoverty <dbl> 11.009776, 27.385647, 10.852090, 5.536013, 11.143~
## $ percelderlypoverty <dbl> 12.443812, 25.228976, 12.697410, 6.217047, 19.200~
## $ inmetro <int> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0~
## $ category <chr> "AAR", "LHR", "AAR", "ALU", "AAR", "AAR", "LAR", ~
p <- ggplot(data = midwest,
mapping = aes(x = area))
p + geom_histogram() # count is computed automatically by the default stat function
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
p + geom_histogram(bins = 10) # set 10 bins
# Look at only two states
oh_wi <- c("OH", "WI")
p <- ggplot(data = subset(midwest, subset = state %in% oh_wi),
mapping = aes(x = percollege,
fill = state))
p + geom_histogram(alpha = 0.4, bins = 20) # Overlapping histograms
# Density estimate of the underlying distribution - density plot
p <- ggplot(data = midwest,
mapping = aes(x = area))
p + geom_density()
# Density by state
p <- ggplot(data = midwest,
mapping = aes(x = area,
fill = state,
color = state))
p + geom_density(alpha = 0.3)
# Compare to geom_line(stat = "density")
p + geom_line(stat = "density")
# Scaled density
p <- ggplot(data = subset(midwest, subset = state %in% oh_wi),
mapping = aes(x = percollege,
fill = state,
color = state))
p + geom_density(alpha = 0.3,
mapping = aes(y = ..scaled..))
Avoiding transformations - sometimes the data is already aggregated or summarized and we do not need a transformation.
titanic # this data is already summarized
## fate sex n percent
## 1 perished male 1364 62.0
## 2 perished female 126 5.7
## 3 survived male 367 16.7
## 4 survived female 344 15.6
p <- ggplot(data = titanic,
mapping = aes(x = fate,
y = percent,
fill = sex))
p + geom_bar(position = "dodge",
stat = "identity") + # plot values as provided, do not summarize/count/etc.
theme(legend.position = "top") # this puts the legend at the top of the graph
oecd_sum # another dataset that is already summarized
## # A tibble: 57 x 5
## # Groups: year [57]
## year other usa diff hi_lo
## <int> <dbl> <dbl> <dbl> <chr>
## 1 1960 68.6 69.9 1.30 Below
## 2 1961 69.2 70.4 1.20 Below
## 3 1962 68.9 70.2 1.30 Below
## 4 1963 69.1 70 0.900 Below
## 5 1964 69.5 70.3 0.800 Below
## 6 1965 69.6 70.3 0.700 Below
## 7 1966 69.9 70.3 0.400 Below
## 8 1967 70.1 70.7 0.600 Below
## 9 1968 70.1 70.4 0.300 Below
## 10 1969 70.1 70.6 0.5 Below
## # ... with 47 more rows
p <- ggplot(data = oecd_sum,
mapping = aes(x = year,
y = diff,
fill = hi_lo))
p + geom_col() + # this is the same as geom_bar with stat = "identity"
guides(fill = FALSE) + # no legend
labs(x = NULL, # no x-axis label
y = "Difference in Years",
title = "The US Life Expectancy Gap",
subtitle = "Difference between US and OECD average life expectancies, 1960-2015",
caption = "Data: OECD. After a chart by Christopher Ingraham, Washington Post, December 27th 2017")
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## Warning: Removed 1 rows containing missing values (position_stack).
Pick at least two of the questions presented under the Where to Go Next section and answer them.