Compiled on Fri Jan 27 13:01:56 2017.
Source file ⇒ 2017-lec4.Rmd
devtools::install_github("DataComputing/DataComputing")
Go to the File/New File/R Markdown menu. Select “From Template” and, in the resulting dialog book, choose “Data Computing simple (HTML)” and click OK. The template will open in the editor pane.
Chapter 5 gives a brief introduction to many of the different kinds of graphs you can make in R. We will concentrate on three: a scatter plot, a bar plot, and a box and whisker plot. We will use three helper functions: gf_point()
, gf_counts()
, and gf_boxplot()
. They will help us learn how to use ggplot2
in the next chapter.
We will be using two new packages statisticalModeling
and mosaicData
. Here is how to install them:
install.packages("statisticalModeling")
install.packages("mosaicData")
We need a few preliminary concepts.
The function data() is used to load a particular dataset from a package. For example, the package mosaicData
contains a dataset called CPS85
which is data from the 1985 Current Population Survey.
## wage educ race sex hispanic south married exper union age sector
## 1 9.0 10 W M NH NS Married 27 Not 43 const
## 2 5.5 12 W M NH NS Married 20 Not 38 sales
## 3 3.8 12 W F NH NS Single 4 Not 22 sales
## 4 10.5 12 W F NH NS Married 29 Not 47 clerical
## 5 15.0 12 W M NH NS Married 40 Union 58 const
## 6 9.0 16 W F NH NS Married 27 Not 49 clerical
mosaicData
contains many data packages and if you load it (i.e. library(mosaicData)
) then you put all of those datasets into your computer’s memory. It might be better to just load the particular dataset you need using the data() function.
data("CPS85",package="mosaicData")
This loads just the dataset CPS85
.
A formula in R is an expression built around the ~
sign. It enables you to describe a relationship among variables. y~x is y as a function of x. For instance wage ~ age makes age the independent variable and wage the dependent variable.
For example
library(statisticalModeling)
data(CPS85, package = "mosaicData")
gf_point(educ ~ age, data = CPS85)
We will be describing the components of graphs made with the ggplot2 package.
In its original sense, in archeology, a glyph is a carved symbol.
A data glyph is also a geometrical object:
A data glyph visually describes your data.
Some are very simple (ex dots)
Some combine your data and summarize it (ex histogram)
Some are complicated representations of your data (example: confidence interval for expected conditional mean)
We need volcabulary to describe the components of a graph made with ggplot2.
We will illustrate using the datatable mosaicData::CPS85
data(CPS85, package="mosaicData")
head(CPS85)
## wage educ race sex hispanic south married exper union age sector
## 1 9.0 10 W M NH NS Married 27 Not 43 const
## 2 5.5 12 W M NH NS Married 20 Not 38 sales
## 3 3.8 12 W F NH NS Single 4 Not 22 sales
## 4 10.5 12 W F NH NS Married 29 Not 47 clerical
## 5 15.0 12 W M NH NS Married 40 Union 58 const
## 6 9.0 16 W F NH NS Married 27 Not 49 clerical
Frame= A rectangular space for drawing glyps.
ggplot()
Aesthetics= properties of the frame or glyphs that relate to variables in the data table for example color, shape, position of points. Scales= Scales control the mapping between datatable variables and aesthetics.
CPS85 %>% ggplot(aes(x=age,y=wage))
Glyph= geometrical objects inside the frame.
CPS85 %>% ggplot(aes(x=age,y=wage)) + geom_point()
Graphical Attributes= properties of glyphs that don’t relate to variables in the data table. For example transparancy (alpha) or color.
CPS85 %>% ggplot(aes(x=age,y=wage)) + geom_point(alpha=.2, colour="red")
Facets= Multiple side by side graphs used to display levels of a categorical variable.
CPS85 %>% ggplot(aes(x=age,y=wage)) + geom_point() + facet_grid(married ~ .)
guides indicate to the viewer what the scale (mapping) is.
CPS85 %>% ggplot(aes(x=age,y=wage)) + geom_point(aes(shape=sex)) + facet_grid(married ~ .)
Examples of guides are:
*Labels on faceted graphics
Layers= data from more than one glyph are graphed together.
CPS85 %>% ggplot(aes(x=age,y=wage)) +geom_point(colour = "pink", size = 4) + geom_point(colour = "black", size = 1.5)
Here is part of the data table mosaic::NHANES
and a graphical representation.
## sbp dbp sex smoker
## 1 129 75 male never
## 2 105 62 female never
## 3 122 72 male never
## 4 128 83 female former
## 5 123 90 male former
## 6 122 77 male current
Identify each of the components:
Frame: rectangular region
Glyph: points
Facets: sex
Aesthetics: The frames aesthetics is x and y. The points aesthetic is smoker.
Scales: x=sbp, y=dbp, color=smoker
Graphical attributes: size, alpha, shape
Guides: tick mark on axes, labels on faceted graphs, legend
Layers: none
Here is the ggplot2 command:
p <- ggplot(df, aes(x = sbp, y = dbp)) +
xlab("Systolic BP") + ylab("Diastolic BP")
p + geom_point(size=5, aes(color=smoker), alpha=.8, shape=17) +facet_grid(. ~ sex)
Consider the graphic:
Identify each of the components:
Frame: –a rectanglular region
Glyphs: —confidence regions and stars (two different glyphs)
Facets: — none
Aesthetics: —x,y, color of center point of confidence interval, y position of bottom of confidence interval, y position of top of confidence interval, number of stars,
Scales: x=protein, y=center, color of center point of confidence interval=polarity, y position of bottom of confidence interval=low, y position of top of confidence interval=high, number of stars= signif
Graphical attributes: hard to tell but probably alpha and size
Guides: ticker marks on y, protein names next to CI
Layers: different glyphs
Look at graph from nytimes on prediction of 36 senate seats from different polling organizations.
Answ: c. Notice that text is a guide not an aethetic (perhaps you are thinking of text as a glyph but it really is a guide. If text is a glyph then font would be an example of a graphical attribute of the glyph) ####To do Finish in class exercises and hw1 for Tuesday. Next time chapter 7.