Interactive Data Visualization


Chester Ismay (cismay@reed.edu)

Paideia 2k16

Slides available at http://rpubs.com/cismay/paideia_2k16_idv
Supplementary HTML file at http://rpubs.com/cismay/paideia_2k16_idv_sup

The Iris flower data set

  • Introduced by Ronald Fisher in 1936

  • The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica, and Iris versicolor).

  • Four features were measured from each sample: the length and the width of the sepals and petals, in centimetres. Based on the combination of these four features, Fisher developed a model to distinguish the species from each other.

Source: Wikipedia

Scatterplots

Traditional (boring) plot

with(iris, plot(x = Petal.Width, y = Sepal.Length))

Prettier (not quite as boring) plot

qplot(Petal.Width, Sepal.Length, data = iris)

Interactive plot using plotly

ggiris <- qplot(Petal.Width, Sepal.Length, data = iris)
ggplotly(ggiris)

Prettier interactive plot using plotly

ggiris_colored <- qplot(Petal.Width, Sepal.Length, data = iris, 
  color = Species)
ggplotly(ggiris_colored)

Another interactive plot

iris %>% plot_ly(x = Petal.Width, y = Sepal.Length,
  type = "scatter", color = Species, mode = "markers")

Scatterplots (Part Deux)

Reed College majors
VS
Total Faculty FTE by department

  • Based off analysis done by Rich Majerus in 2014 using the googleVis package

  • Data does not include 143 interdisciplinary majors and 9 undecided majors.

  • Majors like Bio/Chem are split between the two departments

  • General Lit/Lit majors are included with English

  • Dance majors and faculty are included with Theatre

major_data %>% ggplot(aes(x = Majors, y = FTE)) +
  geom_point() +
  ggtitle("Reed College Majors and FTE by Department")

Left-click and drag to select an area of the chart to zoom on. Right-click to zoom back out.

Alaskan departure delays in PNW

  • The pnwflights14 package provides information contains information about all flights that departed from SEA in Seattle and PDX in Portland, in 2014: 162,049 flights in total.

  • We can use this data and the dplyr package to look at daily maximum departure delays throughout the year for Alaskan Airlines.

Time series/line graphs

alaskan %>% ggplot(aes(x = date2014, y = max_dep_delay)) +
  geom_line() +
  scale_x_date(date_breaks = "1 month", date_labels = "%b %y") +
  xlab("Date") +
  ylab("Maximum Departure Delay")

ggplotly()

Plotting the time series using dygraph

(Converting to time series format using xts)

alaskan_ts <- xts(alaskan$max_dep_delay, alaskan$date2014)
colnames(alaskan_ts) <- "Max Departure Delay"
dygraph(alaskan_ts) %>% dyRangeSelector()

Canadian and US population and geography

  • Canada is an extremely large land mass (2nd largest country in the world), but is only the 37th largest country in terms of population

  • The US ranks 4th highest in land mass and 3rd highest in population

  • We can use data in the maps package to better visualize why these rankings exist

Maps

data(canada.cities, package = "maps")
canada_plot <- ggplot(canada.cities, aes(x = long, y = lat)) +
  coord_equal() +
  geom_point(aes(size=pop, text = paste0(name, ",",
    "Pop: ", prettyNum(pop, big.mark = ",", scientific = FALSE))), 
    colour = "red", alpha = 1/2) +
  borders(regions="canada")
canada_plot

ggplotly(canada_plot)

data(us.cities, package = "maps")
us_plot <- ggplot(us.cities, aes(x = long, y = lat)) +
  coord_equal() +
  geom_point(aes(size=pop, text = paste0(name, ",",
    "Pop: ", prettyNum(pop, big.mark = ",", scientific = FALSE))), 
    colour = "red", alpha = 1/2) +
  borders(regions="usa", xlim = c(-200, -60), ylim = c(20, 80))
us_plot

ggplotly(us_plot)

3D objects

New Zealand’s highest volcano

plot_ly(z = volcano, type = "surface")

Interactive Data Tables

datatable(iris, options = list(pageLength = 5))

Another data table example

RA Duty Scheduling

Other resources

Plotting maps in R with ggplot2

HTML Widgets for R

Leaflet package for R

GapMinder (now owned by Google)

Hans Rosling’s TED talk - “The Best Stats You’ve Ever Seen”

What can I help you with?

  • Data analysis
  • Data wrangling/cleaning
  • Data visualization
  • Data tidying/manipulating
  • Reproducible research

When am I available?

  • Email me at cismay@reed.edu or chester.ismay@reed.edu to schedule a time to meet if office hours don’t work
  • Tentative Spring 2016 office (ETC 223) hours
    • Mondays (10 AM to 11 AM)
    • Tuesdays (2 PM to 3 PM)
    • Wednesdays (1:30 PM to 2:30 PM)
  • Sometimes available for virtual office hours via Google Hangouts (email me for details)

Thanks!


cismay@reed.edu


sessionInfo()
## R version 3.2.3 (2015-12-10)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.11.2 (El Capitan)
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] DT_0.1                  googleVis_0.5.10        maps_3.0.2              xts_0.9-7              
##  [5] zoo_1.7-12              readr_0.2.2             knitr_1.12              dplyr_0.4.3            
##  [9] dygraphs_0.6            pnwflights14_0.1.0.9000 plotly_2.3.0            ggplot2_2.0.0          
## [13] revealjs_0.5           
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.3        RColorBrewer_1.1-2 formatR_1.2.1      plyr_1.8.3         base64enc_0.1-3   
##  [6] viridis_0.3.2      tools_3.2.3        digest_0.6.9       jsonlite_0.9.19    evaluate_0.8      
## [11] gtable_0.1.2       lattice_0.20-33    DBI_0.3.1          yaml_2.1.13        parallel_3.2.3    
## [16] gridExtra_2.0.0    httr_1.0.0         stringr_1.0.0      htmlwidgets_0.5    grid_3.2.3        
## [21] R6_2.1.1           rmarkdown_0.9.5    RJSONIO_1.3-0      magrittr_1.5       scales_0.3.0      
## [26] htmltools_0.3      assertthat_0.1     colorspace_1.2-6   labeling_0.3       stringi_1.0-1     
## [31] lazyeval_0.1.10    munsell_0.4.2