Compiled on Fri Jan 27 13:01:56 2017.
Source file ⇒ 2017-lec4.Rmd
devtools::install_github("DataComputing/DataComputing")
Go to the File/New File/R Markdown menu. Select “From Template” and, in the resulting dialog book, choose “Data Computing simple (HTML)” and click OK. The template will open in the editor pane.
Chapter 5 gives a brief introduction to many of the different kinds of graphs you can make in R. We will concentrate on three: a scatter plot, a bar plot, and a box and whisker plot. We will use three helper functions: gf_point(), gf_counts(), and gf_boxplot(). They will help us learn how to use ggplot2 in the next chapter.
We will be using two new packages statisticalModeling and mosaicData. Here is how to install them:
install.packages("statisticalModeling")
install.packages("mosaicData")
We need a few preliminary concepts.
The function data() is used to load a particular dataset from a package. For example, the package mosaicData contains a dataset called CPS85 which is data from the 1985 Current Population Survey.
## wage educ race sex hispanic south married exper union age sector
## 1 9.0 10 W M NH NS Married 27 Not 43 const
## 2 5.5 12 W M NH NS Married 20 Not 38 sales
## 3 3.8 12 W F NH NS Single 4 Not 22 sales
## 4 10.5 12 W F NH NS Married 29 Not 47 clerical
## 5 15.0 12 W M NH NS Married 40 Union 58 const
## 6 9.0 16 W F NH NS Married 27 Not 49 clerical
mosaicData contains many data packages and if you load it (i.e. library(mosaicData)) then you put all of those datasets into your computer’s memory. It might be better to just load the particular dataset you need using the data() function.
data("CPS85",package="mosaicData")
This loads just the dataset CPS85.
A formula in R is an expression built around the ~ sign. It enables you to describe a relationship among variables. y~x is y as a function of x. For instance wage ~ age makes age the independent variable and wage the dependent variable.
For example
library(statisticalModeling)
data(CPS85, package = "mosaicData")
gf_point(educ ~ age, data = CPS85)
We will be describing the components of graphs made with the ggplot2 package.
In its original sense, in archeology, a glyph is a carved symbol.
A data glyph is also a geometrical object:
A data glyph visually describes your data.
Some are very simple (ex dots)
Some combine your data and summarize it (ex histogram)
Some are complicated representations of your data (example: confidence interval for expected conditional mean)
We need volcabulary to describe the components of a graph made with ggplot2.
We will illustrate using the datatable mosaicData::CPS85
data(CPS85, package="mosaicData")
head(CPS85)
## wage educ race sex hispanic south married exper union age sector
## 1 9.0 10 W M NH NS Married 27 Not 43 const
## 2 5.5 12 W M NH NS Married 20 Not 38 sales
## 3 3.8 12 W F NH NS Single 4 Not 22 sales
## 4 10.5 12 W F NH NS Married 29 Not 47 clerical
## 5 15.0 12 W M NH NS Married 40 Union 58 const
## 6 9.0 16 W F NH NS Married 27 Not 49 clerical
Frame= A rectangular space for drawing glyps.
ggplot()
Aesthetics= properties of the frame or glyphs that relate to variables in the data table for example color, shape, position of points. Scales= Scales control the mapping between datatable variables and aesthetics.
CPS85 %>% ggplot(aes(x=age,y=wage))
Glyph= geometrical objects inside the frame.
CPS85 %>% ggplot(aes(x=age,y=wage)) + geom_point()
Graphical Attributes= properties of glyphs that don’t relate to variables in the data table. For example transparancy (alpha) or color.
CPS85 %>% ggplot(aes(x=age,y=wage)) + geom_point(alpha=.2, colour="red")
Facets= Multiple side by side graphs used to display levels of a categorical variable.
CPS85 %>% ggplot(aes(x=age,y=wage)) + geom_point() + facet_grid(married ~ .)
guides indicate to the viewer what the scale (mapping) is.
CPS85 %>% ggplot(aes(x=age,y=wage)) + geom_point(aes(shape=sex)) + facet_grid(married ~ .)
Examples of guides are:
*Labels on faceted graphics
Layers= data from more than one glyph are graphed together.
CPS85 %>% ggplot(aes(x=age,y=wage)) +geom_point(colour = "pink", size = 4) + geom_point(colour = "black", size = 1.5)
Here is part of the data table mosaic::NHANES and a graphical representation.
## sbp dbp sex smoker
## 1 129 75 male never
## 2 105 62 female never
## 3 122 72 male never
## 4 128 83 female former
## 5 123 90 male former
## 6 122 77 male current
Identify each of the components:
Frame: rectangular region
Glyph: points
Facets: sex
Aesthetics: The frames aesthetics is x and y. The points aesthetic is smoker.
Scales: x=sbp, y=dbp, color=smoker
Graphical attributes: size, alpha, shape
Guides: tick mark on axes, labels on faceted graphs, legend
Layers: none
Here is the ggplot2 command:
p <- ggplot(df, aes(x = sbp, y = dbp)) +
xlab("Systolic BP") + ylab("Diastolic BP")
p + geom_point(size=5, aes(color=smoker), alpha=.8, shape=17) +facet_grid(. ~ sex)
Consider the graphic:
Identify each of the components:
Frame: –a rectanglular region
Glyphs: —confidence regions and stars (two different glyphs)
Facets: — none
Aesthetics: —x,y, color of center point of confidence interval, y position of bottom of confidence interval, y position of top of confidence interval, number of stars,
Scales: x=protein, y=center, color of center point of confidence interval=polarity, y position of bottom of confidence interval=low, y position of top of confidence interval=high, number of stars= signif
Graphical attributes: hard to tell but probably alpha and size
Guides: ticker marks on y, protein names next to CI
Layers: different glyphs
Look at graph from nytimes on prediction of 36 senate seats from different polling organizations.
Answ: c. Notice that text is a guide not an aethetic (perhaps you are thinking of text as a glyph but it really is a guide. If text is a glyph then font would be an example of a graphical attribute of the glyph) ####To do Finish in class exercises and hw1 for Tuesday. Next time chapter 7.