Welcome to Data Visualization

Fall 2018

Administration

  • Syllabus (found on Blackboard)
  • Email: must be from your Horizon account and include “451'' in the subject line
    • e.g. STAT 451 homework question
  • Blackboard

Why Data Visualization?

  • Data - more data than ever with larger and larger data sets every day.
  • Currently, the world produces about 2.5 quintillion bytes of data each day (that's 2,500,000,000,000,000,000 bytes!)
  • How do we make sense of it?
    • Summarize it
    • Make a picture
    • Tell a story

Software

Many software packages exist that can effectively visualize data

  • R and R Studio
  • R Studio shiny
  • ggobi
  • Tableau
  • Plotly
  • Python and matplotlib and Seaborn and Bokeh
  • D3

R

  • In this class we will focus on R. If time permits we may explore other software.
  • To begin the course, we will concentrate on basics of R to get us ready for the semester.
  • Download R.
  • R is free and available for Windows, Mac, and Linux operating systems. Download the version of R compatible with your operating system. If you are running Windows or MacOS, you should choose one of the precompiled binary distributions (i.e., ready-to-run applications) linked at the top of the R Project's webpage.

R Continued

  • Once R is installed, download and install R Studio. R Studio is an “Integrated Development Environment'', or IDE. This means it is a front-end for R that makes it much easier to work with. R Studio is also free, and available for Windows, Mac, and Linux platforms.
  • Install the tidyverse library add-on for R. This library (and others) provide useful functionality that we will take advantage of throughout the semester. You can learn more about the tidyverse's family of packages at its website.

Install

To install the tidyverse, make sure you have an Internet connection and then launch R Studio. Type (don't copy and paste) the following lines of code at R's command prompt, located in the window named “Console”, and hit return. In the code below, the <- arrow is made up of two keystrokes, first < and then the short dash or minus symbol, -.

my_packages <- c(“tidyverse”, “broom”, “coefplot”, “cowplot”, “gapminder”, “GGally”, “ggrepel”, “ggridges”, “gridExtra”, “here”, “interplot”, “margins”, “maps”, “mapproj”, “mapdata”, “MASS”, “quantreg”, “rlang”, “scales”, “survey”, “srvyr”, “viridis”, “viridisLite”, “devtools”) install.packages(my_packages, repos = “http://cran.rstudio.com”)