Source file ⇒ ~/Desktop/lec6.Rmd
For homeworks from now on please submit an html r-markdown file with .Rmd file embedded in it.

For example, to make .Rmd:

In Rstudio New File/R Markdown then From Template/ simple HTML.

To embed the source code .Rmd in your html document read my instructions in Wednesday’s lecture in the section heading Embedded Files within HTML files

To submit your html document save it on your computer. Then upload it to b-courses like you would upload any other document. You are allowed to upload multiple documents for your assignment. Don’t use RPubs since then other students can go to the RPubs website and see your solution.

This week

We will introduce graphing in R using ggplot steps:

  1. We will look at some different kinds of graphs that we are able to make (chapter 5)
  2. We will discuss the component parts of our graphs (frames, scales, guides, facets, layers) (chap 6)
  3. We will make our data tables ready for graphing (chap 7)
  4. Introduce ggplot and the grammar of graphics (chap 8)

Today: chapters 5 and 6

Chapter 5 Introduction to Graphics

Scatter plots

The main purpose of a scatter plot is to show the relationship between two variables across several or many cases. Most often, there is a Cartesian coordinate system in which the x-axis represents one variable and the y-axis the value of a second variable.

Example: Consider the NHANES data giving medical and morphometric measurements of individual people. Here is a scatter plot showing the relationship between two variables: height and age.

Each dot is one case. The position of that dot signifies the value of the two variables for that case.

Your book uses the word glyph to describe the basic graphical unit that represents one case. A glyph means a mark or a symbol. A glyph in this example is a point.

Constructing a scatter plot interactively with scatterGraphHelper()

steps: 0. Make sure DataComputing is loaded 1. In the console type scatterGraphHelper(NHANES) (it wont work in your r-markdown file since you need to interact with the function during compiing) 2. Map variables in NHANES to attributes (aesthetics) of our glyph (ex color or size)

Example:
map Age to x
map Height to y
map Sex to color
map Sex to facet

Warning: These interactive functions are very buggy so don’t fool around with it too much or you will crash Rstudio. In fact don’t use any of the other interactive tools in this chapter besides scatterGraph

Volcabulary:

glyph= graphical unit (point)
aesthetic= a visual property of the glyph (position, shape, color).
scale = the relationship between a variable and teh aesthetic to which it is mapped.
Age -> x
Height -> y
Sex ->color
frame = The position scale describing how data are mapped to x and y guide An indication for human viewers of the scale

Your turn:

#load the data table at http://tiny.cc/dcf/table-6.2.csv
my_table <-read.csv("http://tiny.cc/dcf/table-6-2.csv")
head(my_table)
##               country      gdp educ roadways net_users
## 1             Albania  9383.46  3.3     0.63      >35%
## 2             Algeria  7335.03  4.3     0.05       >5%
## 3              Angola  6904.82  3.5     0.04       >0%
## 4            Anguilla 10903.89  2.8     1.92      >15%
## 5 Antigua and Barbuda 17635.14  2.4     2.64      >60%
## 6           Argentina 17920.07  6.3     0.08      >15%
#scatterGraphHelper(my_table)  write this in the console
# you decide what attributes you want your glyph to have
#show expression will show the ggplot command in the console which you can put in your r-markdown chunk if you want to put the graph in your report.

Make an interactive scatterplot with scatterGraphHelper() Answer:

  1. What variables constitute the frame?
  2. What glyphs is used?
  3. What are the aesthetics for those glyphs?
  4. Which variable is mapped to each aethetic?
  5. Which variable, if any, is used for faceting?
  6. Which scales are displayed with a guide?

Other Graphs

Displays of Distribution

A histogram shows how many cases fall into given ranges of the variable. For instance, here’s a histogram of heights from NHANES:

Barchart

#not glyph ready
head(Minneapolis2013)
##   Precinct           First      Second      Third Ward
## 1     P-10    BETSY HODGES   undervote  undervote  W-7
## 2     P-06        BOB FINE MARK ANDREW  undervote W-10
## 3     P-09 KURTIS W. HANNA    BOB FINE MIKE GOULD W-10
## 4     P-05    BETSY HODGES DON SAMUELS  undervote W-13
## 5     P-01     DON SAMUELS   undervote  undervote  W-5
## 6     P-04       undervote   undervote  undervote  W-6
FirstPlaceTally <- Minneapolis2013 %>% 
  rename(candidate=First) %>%
  group_by(candidate) %>%
  summarise(total=n()) %>%
  arrange( desc(total))

#glyph ready

FirstPlaceTally
## Source: local data frame [38 x 2]
## 
##             candidate total
##                 (chr) (int)
## 1        BETSY HODGES 28935
## 2         MARK ANDREW 19584
## 3         DON SAMUELS  8335
## 4          CAM WINTON  7511
## 5  JACKIE CHERRYHOMES  3524
## 6            BOB FINE  2094
## 7           DAN COHEN  1798
## 8  STEPHANIE WOODRUFF  1010
## 9     MARK V ANDERSON   975
## 10          undervote   834
## ..                ...   ...

There are many more (frequency plolygon, maps, networks)
Geom=Glyph

See: http://docs.ggplot2.org/current/

in class exercises

Exercise 6-3

Exercise 6-2 (if time)