2026-4-15

Today’s Class

  • Warm-up: data visualization
  • Data Analysis by Hand
  • Elements of Visualization
  • Group Activity
  • Grammar of Graphics

Wednesday’s Class

  • Grammar of Graphics
  • Intro to visualization in R
  • Re-creating Visualizations using R

Office Hours

  • Office Hours: Fridays, 1:30pm-3:00pm (Tyler)
  • Tuesdays, 10:30am-12:00pm (Yao)

Miscellaneous

  • Problem set and file naming
  • GitHub/PSet Submissions
  • PSet 3 and week 3 course site materials released shortly

Groups!

##             group 1           group 2         group 3         group 4
## 1       Qin, Celine Mahoney, Brigette     Pham, Canon     Ong, Alyssa
## 2    Knowles, Genny      Barga, Jolie    Mendoza, Ava Bell, Mary Rose
## 3 Wolfenstein, Luci     Pacheco, Alex Batson, Anthony     Smith, Reid
##           group 5        group 6
## 1   Moore, Allana Devir, Lindsey
## 2                  Leahy, Olivia
## 3 Kang, Christine   Myoung, Sein

Warm-up

  • Produce a visualization with the information on the handouts!
  • You don’t need to use all the information or produce a finished product
##             group 1           group 2         group 3         group 4
## 1       Qin, Celine Mahoney, Brigette     Pham, Canon     Ong, Alyssa
## 2    Knowles, Genny      Barga, Jolie    Mendoza, Ava Bell, Mary Rose
## 3 Wolfenstein, Luci     Pacheco, Alex Batson, Anthony     Smith, Reid
##           group 5        group 6
## 1   Moore, Allana Devir, Lindsey
## 2                  Leahy, Olivia
## 3 Kang, Christine   Myoung, Sein

Data Analysis by Hand

Snow’s Cholera Map

  • In 1854, London suffered a cholera outbreak
  • Unknown origin/cause
  • Snow’s data analysis (and visualization) pointed to a water pump (Broad St Pump)
  • Further analysis showed different rates of infection for those living in areas serviced by different water providers
  • Interactive version here

Minard’s Map of Napoleon’s March on Moscow

  • In 1869, Minard illustrated Napoleon’s March on Moscow (1812-1813)
  • Combines geographical information, time, and population of the army

DuBois on Racial Geography

  • Goal: illustrate the US color line at the Paris Exposition Universelle (1900)
  • We might call this public sociology

DuBois on Taxable Property

  • Highlighted both progress and inequality for Black Southern US Citizens
  • Displayed Black Southerners as a group with commonalities to the future-oriented “thinking world,” in spite of position within segregated society

Elements of Visualization

Color

  • How can color be used well (or poorly) to communicate social science?

Color: Example 1

Color: Example 2

Color

  • How can color be used well (or poorly) to communicate social science?
  • Contrasts can be more (or less) visually appealing
  • Color can points readers’ attention to or from certain areas
  • Consider: existing associations with color

Scale

  • How can scale be used well (or poorly) to communicate social science?

Scale: Example 1

Scale: Example 2

Scale

  • How can scale be used well (or poorly) to communicate social science?
  • Accurately reflect data
  • Make comparisons easy, while holding viewer’s attention

Labels

  • How can labels be used well (or poorly) to communicate social science?

Labels: Example 1

Labels: Example 2

Labels

  • How can labels be used well (or poorly) to communicate social science?
  • Readability
  • Conciseness
  • Placement

Comparisons

  • How can comparisons be used well (or poorly) to communicate social science?

Comparisons: Example 1

Comparisons: Example 2

Comparisons: Example 3

Comparisons

  • How can comparisons be used well (or poorly) to communicate social science?
  • Key comparisons should be easy for the viewer
  • Details of comparison

Activity

  • Re-visit your visualization with the information on the handouts!
  • What do you want to communicate?
  • How will you use color, scale, labels, and comparisons to make your point?
##             group 1           group 2         group 3         group 4
## 1       Qin, Celine Mahoney, Brigette     Pham, Canon     Ong, Alyssa
## 2    Knowles, Genny      Barga, Jolie    Mendoza, Ava Bell, Mary Rose
## 3 Wolfenstein, Luci     Pacheco, Alex Batson, Anthony     Smith, Reid
##           group 5        group 6
## 1   Moore, Allana Devir, Lindsey
## 2                  Leahy, Olivia
## 3 Kang, Christine   Myoung, Sein

How would we describe our graphics?

  • Finish your plots from Monday! And think about:
  • What data would someone need to completely re-create your plot?
  • What aesthetic instructions would someone need to completely re-create your plot?

Today’s Class

  • Grammar of Graphics
  • Intro to visualization in R
  • Re-creating Visualizations using R

Office Hours

  • Office Hours: Fridays, 1:30pm-3:00pm (Tyler)
  • Tuesdays, 10:30am-12:00pm (Yao)

Miscellaneous

Bonus Data Visualization

Bonus Data Visualization

Bonus Data Visualization

Grammar of Graphics

Grammar of Graphics

  • The “grammar of graphics” is the gg in ggplot2
  • It uses Object Oriented Design which is a system for how objects relate to one another
  • This means its a little different than code we have written so far!
  • Think: Data and Aesthetics

Grammar of Graphics

  • Data is just our data (but we may need to manipulate it to be legible to ggplot2!)
  • Aesthetics include colors, scales, labels, etc.

Getting started with ggplot2

  • We can try ggplot2 with the gapminder data
  • What if we wanted more info on the gapminder data?
library(gapminder)

View(gapminder)

Getting started with ggplot2

  • Let’s try it out
  • We can start with a blank canvas using ggplot
library(ggplot2)

# plot blank canvas
ggplot()

Getting started with ggplot2

  • Now let’s add to it
  • We can input data and aesthetics
  • Why don’t we see any data?
library(gapminder)

# add data and aesthetics
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))

Getting started with ggplot2

  • ggplot2 works in layers!
  • To see our data, we need to add a geom_ layer

Adding Geom Layers

  • The + sign adds layers to our ggplot
  • We could try geom_point
  • What about another geom?
# create a plot
ggplot(data = gapminder,
            mapping = aes(x = gdpPercap, y = lifeExp))+
  geom_point()

Adding Geom Layers

  • There are multiple ways to do this
  • When/why would we want to assign an object (p) to our plot?
# create the plot
p <- ggplot(data = gapminder,
            mapping = aes(x = gdpPercap, y = lifeExp))

# plot it
p + geom_point()

Adding Color

  • We can add color within the geom_point function
  • First: manually set
# create the plot
ggplot(data = gapminder,
        mapping = aes(x = gdpPercap, y = lifeExp))+
  geom_point(col = "purple")

Adding Color

  • We can add color within the geom_point function
  • First: manually set
  • Second: within aesthetics
# create the plot
ggplot(data = gapminder,
        mapping = aes(x = gdpPercap, y = lifeExp,
        col = continent))+
  geom_point()

Adding Color, Size, Alpha

  • Let’s try it!
  • Set the color to vary by year
  • Manually set alpha to 0.5
  • Set the size to vary by pop
  • Try changing other aesthetics! Can you make a different type of plot?
# create the plot
ggplot(data = gapminder,
        mapping = aes(x = gdpPercap, y = lifeExp,
        col = continent))+
  geom_point()

Reshaping Data

Let’s Create a Dataframe

# create a dataframe (note that : returns a sequence)
df <- data.frame(year = 2000:2020, pop = 40:60)

# look at the first 6 rows
head(df)
##   year pop
## 1 2000  40
## 2 2001  41
## 3 2002  42
## 4 2003  43
## 5 2004  44
## 6 2005  45

Say we want to reshape change a variable

library(dplyr)

# function to get from millions to people
millions_to_people <- function(m){
  p <- m/1000000
  return(p)
}

# run function to create new variable
df %>%
  mutate(pop_p = millions_to_people(pop)) %>%
  head() 
##   year pop   pop_p
## 1 2000  40 4.0e-05
## 2 2001  41 4.1e-05
## 3 2002  42 4.2e-05
## 4 2003  43 4.3e-05
## 5 2004  44 4.4e-05
## 6 2005  45 4.5e-05

Try it!

  • On your own try creating a dataframe and dividing pop by 100
# create a dataframe
df <- data.frame(year = 2000:2020, pop = 40:60)


# create your function here
 <- function(x){
   
 }

# modify your dataframe here
df

The logic behind pipes

Instead of this:

fourth_function(third_function(second_function(first_function(x))))

We do this:

x %>%
  first_function() %>% 
  second_function() %>% 
  third_function() %>%
  fourth_function()

So we divided our data .. right?

  • How would we permanently change df?
# run function on to create new variable
df %>%
  mutate(pop_p = millions_to_people(pop)) %>%
  head() 
##   year pop   pop_p
## 1 2000  40 4.0e-05
## 2 2001  41 4.1e-05
## 3 2002  42 4.2e-05
## 4 2003  43 4.3e-05
## 5 2004  44 4.4e-05
## 6 2005  45 4.5e-05

Introducing … the Magrittr Pipe

  • We can permanently change an object with the magrittr pipe (%<>%)
library(magrittr)

# we use the magrittr pipe
df %<>%
  mutate(pop_p = millions_to_people(pop))

# check that it worked
head(df)

Try it out!

  • Try using the magrittr pipe!
  • Is there a way to do the same thing with the <- and %>% operators?
library(magrittr)

# we use the magrittr pipe
df %<>%
  mutate(pop_p = millions_to_people(pop))

We can filter to observations of interest

  • We can filter to observations of interest

Reshaping Data for Plots

California Demographic Data

library(readr)
library(RCurl)

# input url from github
url <- getURL("https://raw.githubusercontent.com/tylermcdaniel/soc10/refs/heads/deployment/Data/ca_dem_table.csv")

# download wikipedia data (from github)
ca_dem_table <- read_csv(url)

Reshaping California Demographic Data for Plots

  • What if we want to divide 2000_num by the total for that year (33871648) and then remove the “total” row?
ca_dem_table %<>%

Reshaping California Demographic Data for Plots

Problem set

  • In Problem Set 3, you can create a ggplot with the California demographic data
  • Try reshaping the data with magrittr

ggplot Review

  • Start with data and aesthetics
  • Add layers to your plot
  • Consider which elements you want to manually set vs. vary with aesthetics

Miscellaneous