+ - 0:00:00
Notes for current slide
Notes for next slide

R for Economics and Social Science Research

Norberto E. Milla, Jr.

1 / 31

Day 2: Data Visualization with R

2 / 31

Introduction to Visualization

  • "A picture is worth a thousand words”.

  • a good visualization makes it easier to identify patterns and trends

    • provides a quick and effective way to communicate information

    • displays complex relationships among data

    • highlights interesting and compelling "stories" from the data

3 / 31

The ggplot2 package

  • ggplot2 is a system for declaratively creating graphics which is inspired by the Grammar of Graphics book of Leland Wilkinson (2005)

  • the key idea behind ggplot2 is that it allows to easily building up a complex plot layer by layer

  • each layer adds an extra level of information to the plot

  • sophisticated plots are build tailored to the problem at hand

4 / 31

The ggplot2 package

  • using the ggplot() function we build a graph in layers

  • basic elements:

    • data: the information you want to visualise

    • mapping: description of how the variables are mapped to aesthetic attributes

      • layer: geometries and statistical summaries

      • scale: color, shape, size, legend

      • coordinate system: axes and gridlines

      • facet: specifies how to break up and display subsets of data

      • theme: controls the finer points of display, like the font size and background color

5 / 31

Basic plots: scatter plot

library(carData)
ggplot(data = Salaries,
mapping = aes(x = yrs.service, y = salary))

6 / 31

Basic plots: scatter plot

ggplot(data = Salaries,
mapping = aes(x = yrs.service, y = salary)) +
geom_point()

7 / 31

Basic plots: scatter plot

ggplot(data = Salaries,
mapping = aes(x = yrs.service,
y = salary)) +
geom_point(color = "blue",
size = 3,
alpha = 1.5)

8 / 31

Basic plots: scatter plot

ggplot(data = Salaries,
mapping = aes(x = yrs.service,
y = salary)) +
geom_point(color = "blue",
size = 3,
alpha = 1.5) +
labs(x = "Years of Service",
y = "Salary")

9 / 31

Basic plots: scatter plot

ggplot(data = Salaries,
mapping = aes(x = yrs.service,
y = salary)) +
geom_point(color = "blue",
size = 3,
alpha = 1.5) +
labs(x = "Years of Service",
y = "Salary") +
geom_smooth(method="lm",
col = "red")

10 / 31

Basic plots: scatter plot

ggplot(data = Salaries,
mapping = aes(x = yrs.service,
y = salary)) +
geom_point(color = "blue",
size = 3,
alpha = 1.5) +
labs(x = "Years of Service",
y = "Salary") +
geom_smooth(method="lm",
col = "red") +
theme_classic()

11 / 31

Basic plots: scatter plot

ggplot(data = Salaries,
mapping = aes(x = yrs.service,
y = salary,
color = sex)) +
geom_point(size = 3,
alpha = 1.5) +
labs(x = "Years of Service",
y = "Salary") +
geom_smooth(method="lm") +
theme_classic()

12 / 31

Basic plots: scatter plot

ggplot(data = Salaries,
mapping = aes(x = yrs.service,
y = salary)) +
geom_point(color = "lightblue",
size = 3,
alpha = 1.5) +
labs(x = "Years of Service",
y = "Salary") +
geom_smooth(method="lm",
col = "red") +
facet_wrap(~sex)+
theme_classic()

13 / 31

Basic plots: bar plot

Salaries %>%
select(rank) %>%
ggplot(aes(x=rank)) +
geom_bar(fill="lightblue")+
labs(x="Rank",
y = "No. of faculty") +
theme_classic()

14 / 31

Basic plots: bar plot

Salaries %>%
select(rank) %>%
ggplot(aes(x=rank)) +
geom_bar(fill="lightblue")+
labs(x="Rank",
y = "No. of faculty") +
coord_flip()+
theme_classic()

15 / 31

Basic plots: bar plot

Salaries %>%
select(rank) %>%
count(rank) %>%
mutate(p=round(n/sum(n)*100,1)) %>%
ggplot(aes(x = reorder(rank,p),
y = p,
label = p)) +
geom_col(fill="lightblue",
col = "black") +
geom_text(aes(label = p),
hjust=-0.5,
size = 3.5) +
labs(x = " ",
y = "Percent (%)")+
coord_flip() +
theme_classic()

16 / 31

Basic plots: bar plot

Salaries %>%
drop_na(salary) %>%
group_by(rank) %>%
summarize(n = length(salary),
Mean = mean(salary),
SD = sd(salary)) %>%
mutate(SE = SD/sqrt(n)) %>%
as.data.frame() %>%
ggplot(aes(x=rank, y = Mean)) +
geom_col(fill= "lightblue",
col="black",
width = 0.5) +
geom_errorbar(aes(ymin = Mean-SE,
ymax = Mean+SE),
size = 0.7,
width = 0.15) +
theme_classic2()

17 / 31

Basic plots: bar plot

Salaries %>%
select(discipline, rank, sex) %>%
group_by(sex) %>%
count(rank) %>%
as.data.frame() %>%
ggplot(aes(x=rank,
y = n,
fill=reorder(sex,-n))) +
geom_bar(stat = "identity",
position = "stack",
width = 0.5) +
coord_flip() +
geom_text(aes(label = n),
hjust=1.3,
size = 3)+
labs(x = "Rank",
y = "No. of faculty",
fill = "Sex") +
theme_classic()

18 / 31

Basic plots: bar plot

Salaries %>%
select(discipline, rank, sex) %>%
group_by(sex) %>%
count(rank) %>%
as.data.frame() %>%
ggplot(aes(x=rank,
y = n,
fill = sex)) +
geom_bar(stat = "identity",
position = "dodge",
col = "black") +
geom_text(aes(label = n),
position = position_dodge(0.9),
vjust = -1,
size = 3)+
labs(x = "Rank",
y = "No. of faculty",
fill = "Sex") +
scale_y_continuous(expand = c(0,0),
limits = c(0,300)) +
theme_classic()

19 / 31

Basic plots: histogram

Salaries %>%
ggplot(aes(x = salary)) +
geom_histogram(fill = "lightblue",
color = "black") +
scale_y_continuous(expand = c(0,0)) +
facet_wrap(~sex)

20 / 31

Basic plots: density plot

Salaries %>%
ggplot(aes(x = salary)) +
geom_density(fill = "lightblue") +
scale_y_continuous(expand = c(0,0)) +
facet_wrap(~rank)

21 / 31

Basic plots: density plot

Salaries %>%
ggplot(aes(x = salary,
fill = rank,
color = rank)) +
scale_y_continuous(expand = c(0,0)) +
geom_density()

22 / 31

Basic plots: box plot

Salaries %>%
ggplot(aes(x = salary,
fill = rank,
color = rank)) +
geom_boxplot() +
scale_y_continuous(expand = c(0,0)) +
theme(axis.text.x=element_blank()) +
coord_flip()

23 / 31

LUNCH BREAK

24 / 31

Interactive plots

g1 <- ggplot(data = Salaries,
mapping = aes(x = yrs.service,
y = salary,
color = sex)) +
geom_point(size = 3,
alpha = 1.5) +
labs(x = "Years of Service",
y = "Salary") +
geom_smooth(method="lm") +
theme_classic()
ggplotly(g1)
25 / 31

Interactive plots

020406050000100000150000200000
sexFemaleMaleYears of ServiceSalary
26 / 31

Interactive plots

g1 <- ggplot(data = Salaries,
aes(x = yrs.service,
y = salary,
color = sex,
Label1 = sex,
Label2 = rank,
Label3 = discipline)) +
geom_point(size = 3,
alpha = 1.5) +
labs(x = "Years of Service",
y = "Salary") +
geom_smooth(method="lm") +
theme_classic()
ggplotly(g1, tooltip = c("Label1", "Label2", "Label3"))
27 / 31

Interactive plots

020406050000100000150000200000
sexFemaleMaleYears of ServiceSalary
28 / 31

Animated plots

g <- ggplot(data = gapminder,
aes(x = gdpPercap,
y = lifeExp,
size = pop,
fill = continent)) +
geom_point(aes(frame = year,
id = country)) +
scale_size(range = c(2, 12)) +
scale_x_log10() +
facet_wrap(~continent)
ggplotly(g)
29 / 31

Animated plots

4060801e+031e+041e+054060801e+031e+041e+051e+031e+041e+05
popcontinentAfricaAmericasAsiaEuropeOceania~year: 1952195219621972198219922002gdpPercaplifeExpAfricaAmericasAsiaEuropeOceaniaPlay
30 / 31

Interactive dashboard: flexdashboard

31 / 31

Day 2: Data Visualization with R

2 / 31
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow