Compiled on Fri Jan 27 13:01:56 2017.

Source file ⇒ 2017-lec4.Rmd

Announcments

  1. Upgrade to DC version 0.6.1 by reinstalling DC
devtools::install_github("DataComputing/DataComputing")
  1. Once you upgrade to DC version 0.6.1 to make new R-Markdown file you can use Datacomputing simple html template:

Go to the File/New File/R Markdown menu. Select “From Template” and, in the resulting dialog book, choose “Data Computing simple (HTML)” and click OK. The template will open in the editor pane.

Today:

  1. DC chap 5 Introduction to Graphics
  2. DC chap 6 Frames, Glyphs, and other Components of Graphics

DC Chapter 5 Introduction to Graphics

Chapter 5 gives a brief introduction to many of the different kinds of graphs you can make in R. We will concentrate on three: a scatter plot, a bar plot, and a box and whisker plot. We will use three helper functions: gf_point(), gf_counts(), and gf_boxplot(). They will help us learn how to use ggplot2 in the next chapter.

We will be using two new packages statisticalModeling and mosaicData. Here is how to install them:

install.packages("statisticalModeling")
install.packages("mosaicData")

We need a few preliminary concepts.

loading data into R with data()

The function data() is used to load a particular dataset from a package. For example, the package mosaicData contains a dataset called CPS85 which is data from the 1985 Current Population Survey.

##   wage educ race sex hispanic south married exper union age   sector
## 1  9.0   10    W   M       NH    NS Married    27   Not  43    const
## 2  5.5   12    W   M       NH    NS Married    20   Not  38    sales
## 3  3.8   12    W   F       NH    NS  Single     4   Not  22    sales
## 4 10.5   12    W   F       NH    NS Married    29   Not  47 clerical
## 5 15.0   12    W   M       NH    NS Married    40 Union  58    const
## 6  9.0   16    W   F       NH    NS Married    27   Not  49 clerical

mosaicData contains many data packages and if you load it (i.e. library(mosaicData)) then you put all of those datasets into your computer’s memory. It might be better to just load the particular dataset you need using the data() function.

data("CPS85",package="mosaicData")

This loads just the dataset CPS85.

a formula in R

A formula in R is an expression built around the ~ sign. It enables you to describe a relationship among variables. y~x is y as a function of x. For instance wage ~ age makes age the independent variable and wage the dependent variable.

For example

library(statisticalModeling)
data(CPS85, package = "mosaicData")
gf_point(educ ~ age, data = CPS85)

In class exercise

Ch 5 DC exercises

DC chap 6 Frames, Glyphs, and other Components of Graphics

We will be describing the components of graphs made with the ggplot2 package.

Glyphs and Data

In its original sense, in archeology, a glyph is a carved symbol.

Data Glyphs (Geoms)

A data glyph is also a geometrical object:

A data glyph visually describes your data.

  • Some are very simple (ex dots)

  • Some combine your data and summarize it (ex histogram)

  • Some are complicated representations of your data (example: confidence interval for expected conditional mean)

See: http://docs.ggplot2.org/current/

The components of a graph made with ggplot2:

We need volcabulary to describe the components of a graph made with ggplot2.

We will illustrate using the datatable mosaicData::CPS85

data(CPS85, package="mosaicData")
head(CPS85)
##   wage educ race sex hispanic south married exper union age   sector
## 1  9.0   10    W   M       NH    NS Married    27   Not  43    const
## 2  5.5   12    W   M       NH    NS Married    20   Not  38    sales
## 3  3.8   12    W   F       NH    NS  Single     4   Not  22    sales
## 4 10.5   12    W   F       NH    NS Married    29   Not  47 clerical
## 5 15.0   12    W   M       NH    NS Married    40 Union  58    const
## 6  9.0   16    W   F       NH    NS Married    27   Not  49 clerical

Frame= A rectangular space for drawing glyps.

ggplot()

Aesthetics= properties of the frame or glyphs that relate to variables in the data table for example color, shape, position of points. Scales= Scales control the mapping between datatable variables and aesthetics.

CPS85 %>% ggplot(aes(x=age,y=wage))

Glyph= geometrical objects inside the frame.

CPS85 %>% ggplot(aes(x=age,y=wage)) + geom_point()

Graphical Attributes= properties of glyphs that don’t relate to variables in the data table. For example transparancy (alpha) or color.

CPS85 %>% ggplot(aes(x=age,y=wage)) + geom_point(alpha=.2, colour="red")

Facets= Multiple side by side graphs used to display levels of a categorical variable.

CPS85 %>% ggplot(aes(x=age,y=wage)) + geom_point() + facet_grid(married ~ .)

guides indicate to the viewer what the scale (mapping) is.

CPS85 %>% ggplot(aes(x=age,y=wage)) + geom_point(aes(shape=sex)) + facet_grid(married ~ .)

Examples of guides are:

  • Axis ticks and numbers
  • Legends

*Labels on faceted graphics

Layers= data from more than one glyph are graphed together.

CPS85 %>% ggplot(aes(x=age,y=wage)) +geom_point(colour = "pink", size = 4) + geom_point(colour = "black", size = 1.5)

Example:

Here is part of the data table mosaic::NHANES and a graphical representation.

##   sbp dbp    sex  smoker
## 1 129  75   male   never
## 2 105  62 female   never
## 3 122  72   male   never
## 4 128  83 female  former
## 5 123  90   male  former
## 6 122  77   male current

Identify each of the components:

Frame: rectangular region

Glyph: points

Facets: sex

Aesthetics: The frames aesthetics is x and y. The points aesthetic is smoker.

Scales: x=sbp, y=dbp, color=smoker

Graphical attributes: size, alpha, shape

Guides: tick mark on axes, labels on faceted graphs, legend

Layers: none

Here is the ggplot2 command:

p <- ggplot(df, aes(x = sbp, y = dbp)) + 
  xlab("Systolic BP") + ylab("Diastolic BP")
p + geom_point(size=5, aes(color=smoker), alpha=.8, shape=17) +facet_grid(. ~ sex)

In class exercise: (see problem 6.3 in DC)

Consider the graphic:

Identify each of the components:

Frame: –a rectanglular region

Glyphs: —confidence regions and stars (two different glyphs)

Facets: — none

Aesthetics: —x,y, color of center point of confidence interval, y position of bottom of confidence interval, y position of top of confidence interval, number of stars,

Scales: x=protein, y=center, color of center point of confidence interval=polarity, y position of bottom of confidence interval=low, y position of top of confidence interval=high, number of stars= signif

Graphical attributes: hard to tell but probably alpha and size

Guides: ticker marks on y, protein names next to CI

Layers: different glyphs

i-clicker

Look at graph from nytimes on prediction of 36 senate seats from different polling organizations.

Answ: c. Notice that text is a guide not an aethetic (perhaps you are thinking of text as a glyph but it really is a guide. If text is a glyph then font would be an example of a graphical attribute of the glyph) ####To do Finish in class exercises and hw1 for Tuesday. Next time chapter 7.