Data Visualization with R

What is R?


Why R?


Why R?


Demo: Visualize census data

load("./data/data_workshop.Rda")  # load data
str(census)  # check data structure
## 'data.frame':    1000 obs. of  7 variables:
##  $ gender  : Factor w/ 2 levels "F","M": 1 1 1 1 2 1 1 1 2 2 ...
##  $ race    : Factor w/ 9 levels "Black   ","E Asian ",..: 2 7 1 1 7 7 4 2 6 4 ...
##  $ program : Factor w/ 3 levels "Academic","Applied",..: 1 1 2 1 1 1 1 2 1 1 ...
##  $ progress: Factor w/ 4 levels "Having Difficulty",..: 2 2 1 3 2 3 2 3 2 2 ...
##  $ mark    : num  90 85 41 67 58 67 88 52 76 90 ...
##  $ mark9   : num  94 78 41 53 67 69 90 50 85 90 ...
##  $ absence : num  1 2 8 1 3 0 1 6 4 3 ...

Bar graph: count one categorical variable, 'race'

library(ggplot2)  # load ggplot2 library
qplot(race, data = census)

plot of chunk unnamed-chunk-2


Bar graph: specify fill color to darkblue

qplot(race, data = census, fill = I("darkblue"))

plot of chunk unnamed-chunk-3


Bar graph: specify fill color, by 'progress'

qplot(race, data = census, fill = progress)

plot of chunk unnamed-chunk-4


Bar graph: too colorful? use grey scale

qplot(race, data = census, fill = progress) + scale_fill_grey()

plot of chunk unnamed-chunk-5


Bar graph: don't like “stack”? use “dodge”

qplot(race, data = census, fill = progress, position = "dodge")

plot of chunk unnamed-chunk-6


Bar graph: care about percentage? “fill” space

qplot(race, data = census, fill = progress, position = "fill", ylab = "percentage")

plot of chunk unnamed-chunk-7


Bar graph: flip coordinates

qplot(race, data = census, fill = progress, position = "fill", ylab = "percentage") + 
    coord_flip()

plot of chunk unnamed-chunk-8


Histogram: count one continuous variable, 'mark'

qplot(mark, data = census)

plot of chunk unnamed-chunk-9


Histogram: split by 'progress'

qplot(mark, data = census, facets = progress ~ .)

plot of chunk unnamed-chunk-10


Histogram: split by 'progress' and 'gender'

qplot(mark, data = census, facets = progress ~ gender)

plot of chunk unnamed-chunk-11


Plot continuous on categorical

qplot(progress, mark, data = census)  # results not good

plot of chunk unnamed-chunk-12


Jitter plot

qplot(progress, mark, data = census, geom = "jitter")

plot of chunk unnamed-chunk-13


Jitter plot: deal with overplotting

qplot(progress, mark, data = census, geom = "jitter", alpha = I(1/3))

plot of chunk unnamed-chunk-14


Box plot

qplot(progress, mark, data = census, geom = "boxplot")

plot of chunk unnamed-chunk-15


Combine box plot with jitter plot

qplot(progress, mark, data = census, geom = c("boxplot", "jitter"), alpha = I(1/5))

plot of chunk unnamed-chunk-16


Density curve - Another angle on the same data

qplot(mark, data = census, fill = progress, geom = "density")

plot of chunk unnamed-chunk-17


Density curve: make transparent

qplot(mark, data = census, fill = progress, geom = "density", alpha = I(1/2))

plot of chunk unnamed-chunk-18


Density curve: stack together

qplot(mark, data = census, fill = progress, geom = "density", position = "stack")

plot of chunk unnamed-chunk-19


Density curve: fill y axis

qplot(mark, data = census, fill = progress, geom = "density", position = "fill")

plot of chunk unnamed-chunk-20


Scatterplot: two continuous variables

qplot(mark9, mark, data = census)

plot of chunk unnamed-chunk-21


Scatterplot: change shape of points

qplot(mark9, mark, data = census, shape = I(1))

plot of chunk unnamed-chunk-22


Scatterplot: colorize points by 'program'

qplot(mark9, mark, data = census, shape = I(1), colour = program)

plot of chunk unnamed-chunk-23


Scatterplot: define size of points by 'absence'

qplot(mark9, mark, data = census, shape = I(1), colour = program, size = absence)

plot of chunk unnamed-chunk-24


Scatterplot: add a linear regression line

qplot(mark9, mark, data = census, geom = c("point", "smooth"), method = "lm")

plot of chunk unnamed-chunk-25


Scatterplot: split by 'program'

qplot(mark9, mark, data = census, geom = c("point", "smooth"), method = "lm", 
    facets = . ~ program)

plot of chunk unnamed-chunk-26


To Harness the Computational Power of R

# Filter data: filter students who got 0
census.sub <- subset(census, mark > 0 & mark9 > 0)
qplot(mark9, mark, data = census.sub, geom = c("point", "smooth"), method = "lm")

plot of chunk unnamed-chunk-27


Summarize data, and heatmap

library(plyr)  # load library plyr
hm.df <- ddply(census, .(race, program), summarize, absence = mean(absence))
ggplot(hm.df, aes(race, program, fill = absence)) + geom_tile() + scale_fill_gradient2(high = "red", 
    low = "white")  # plot a heatmap

plot of chunk unnamed-chunk-28


Maps

library(maps)  # load maps library
crime.map <- read.csv("./data/crime.map.csv")  # read data
ggplot(crime.map, aes(x = long, y = lat, group = group, fill = Murder)) + geom_polygon(colour = "black")

plot of chunk unnamed-chunk-29


Export visualizations

ggsave(file = "plot.pdf")
ggsave(file = "plot.jpeg", dpi = 72)
ggsave(file = "plot.svg", plot = htmap, width = 10, height = 5)

Shiny: interactive web applications


knitr: Reproducible Report Writing



Additional resources


Questions?