Source file ⇒ Lec7_2.Rmd

Today:

  1. Review ideas of Chapter 6 (Frames, Glyphs, and other components of graphs)
  2. Chapter 7 (Wrangling and Data Verbs)

Chapter 6

Volcabulary:

See figure 6.7 page 57

Frame= The space for drawing gyphs. Here a rectangular region set by roadways and gdp.
Glyph= A symbol or a marking in a frame . There is one glyph for for each case in the data table. Here points.
Aesthetics= The properties of a glyph. Here x,y, size.
Scales and Guides= Variables of the data table are mapped to aesthetics of the glyph. Here, x=roadways, y= gdp, educ= size
Facets= Multiple side by side graphs used to display levels of a categorical variable. Here facets are different ranges of Internet connectivity.
Layers= data from more than one data table are graphed together. Here not layered.

In class exercise:

Look at graph from nytimes on prediction of 36 senate seats from different polling organizations.

  1. What variables define the frame in this graphic?
  2. What is the glyph and its graphical attributes (i.e. aesthetics)?
  3. What do you think the scales are?

Chapter 7 Wrangling and Data Verbs

Rarely is a data table glyph ready. We usually need to wrangle the data. For example say we want to make a bar chart ranking the candidates by how many precincts they are in first place.

#not glyph ready
head(Minneapolis2013)
##   Precinct           First      Second      Third Ward
## 1     P-10    BETSY HODGES   undervote  undervote  W-7
## 2     P-06        BOB FINE MARK ANDREW  undervote W-10
## 3     P-09 KURTIS W. HANNA    BOB FINE MIKE GOULD W-10
## 4     P-05    BETSY HODGES DON SAMUELS  undervote W-13
## 5     P-01     DON SAMUELS   undervote  undervote  W-5
## 6     P-04       undervote   undervote  undervote  W-6
FirstPlaceTally <- Minneapolis2013 %>% 
  rename(candidate=First) %>%
  group_by(candidate) %>%
  summarise(total=n()) %>%
  arrange( desc(total))

#glyph ready

FirstPlaceTally
## Source: local data frame [38 x 2]
## 
##             candidate total
##                 (chr) (int)
## 1        BETSY HODGES 28935
## 2         MARK ANDREW 19584
## 3         DON SAMUELS  8335
## 4          CAM WINTON  7511
## 5  JACKIE CHERRYHOMES  3524
## 6            BOB FINE  2094
## 7           DAN COHEN  1798
## 8  STEPHANIE WOODRUFF  1010
## 9     MARK V ANDERSON   975
## 10          undervote   834
## ..                ...   ...

Here is a nice presentation of Data Verbs we will be using: link

In class exercise:

Each of these tasks can be performed using a single data
verb. For each task, say which verb it is: choices are: summarise(), mutate(), arrange(), filter(), select(), group_by()

```

  1. Find the average of one of the variables.

  2. Add a new column that is the ratio between two variables.

  3. Sort the cases in descending order of a variable.

  4. Create a new data table that includes only those cases that meet a criterion.

  5. From a data table with several variables produce an output that has the same cases but only the first two variables.

  6. From a data table with a categorical variables and a quantitative variable produce a count for the number of cases for each level of the categorical variable.

solution:

mtcars
##                      mpg cyl  disp
## Mazda RX4           21.0   6 160.0
## Mazda RX4 Wag       21.0   6 160.0
## Datsun 710          22.8   4 108.0
## Hornet 4 Drive      21.4   6 258.0
## Hornet Sportabout   18.7   8 360.0
## Valiant             18.1   6 225.0
## Duster 360          14.3   8 360.0
## Merc 240D           24.4   4 146.7
## Merc 230            22.8   4 140.8
## Merc 280            19.2   6 167.6
## Merc 280C           17.8   6 167.6
## Merc 450SE          16.4   8 275.8
## Merc 450SL          17.3   8 275.8
## Merc 450SLC         15.2   8 275.8
## Cadillac Fleetwood  10.4   8 472.0
## Lincoln Continental 10.4   8 460.0
## Chrysler Imperial   14.7   8 440.0
## Fiat 128            32.4   4  78.7
## Honda Civic         30.4   4  75.7
## Toyota Corolla      33.9   4  71.1
## Toyota Corona       21.5   4 120.1
## Dodge Challenger    15.5   8 318.0
## AMC Javelin         15.2   8 304.0
## Camaro Z28          13.3   8 350.0
## Pontiac Firebird    19.2   8 400.0
## Fiat X1-9           27.3   4  79.0
## Porsche 914-2       26.0   4 120.3
## Lotus Europa        30.4   4  95.1
## Ford Pantera L      15.8   8 351.0
## Ferrari Dino        19.7   6 145.0
## Maserati Bora       15.0   8 301.0
## Volvo 142E          21.4   4 121.0
mtcars %>% 
  summarise(avg_count=mean(cyl))
##   avg_count
## 1    6.1875
mtcars %>% 
  mutate(ratio=mpg/cyl) %>% 
  tail()
##     mpg cyl  disp    ratio
## 27 26.0   4 120.3 6.500000
## 28 30.4   4  95.1 7.600000
## 29 15.8   8 351.0 1.975000
## 30 19.7   6 145.0 3.283333
## 31 15.0   8 301.0 1.875000
## 32 21.4   4 121.0 5.350000
mtcars %>% 
  arrange(desc(mpg)) %>%
  tail()
##     mpg cyl disp
## 27 15.0   8  301
## 28 14.7   8  440
## 29 14.3   8  360
## 30 13.3   8  350
## 31 10.4   8  472
## 32 10.4   8  460
mtcars %>% 
  filter(mpg >30)
##    mpg cyl disp
## 1 32.4   4 78.7
## 2 30.4   4 75.7
## 3 33.9   4 71.1
## 4 30.4   4 95.1
mtcars %>% 
  select(mpg, cyl) %>%
  tail()
##                 mpg cyl
## Porsche 914-2  26.0   4
## Lotus Europa   30.4   4
## Ford Pantera L 15.8   8
## Ferrari Dino   19.7   6
## Maserati Bora  15.0   8
## Volvo 142E     21.4   4
BabyNames %>%
  group_by(sex) %>%
  summarise(total=sum(count))
## Source: local data frame [2 x 2]
## 
##     sex     total
##   (chr)     (int)
## 1     F 165280729
## 2     M 168137041

iclicker question from class

The graph presented in class isn’t glyph ready. You need to do some data wrangling. Notice that we need to join two tables to get a table including LandArea. This is a correction from class. We will do joins in chapter 10.

table1 <- ZipGeography %>% 
  group_by(State) %>%
  summarise(aveZipPopulation=mean(Population, na.rm=TRUE)) %>%
  arrange(aveZipPopulation)
  
table2 <- ZipGeography %>%
  select(State, LandArea)
  
table1 %>% left_join(table2,by="State")
## Source: local data frame [42,741 x 3]
## 
##           State aveZipPopulation LandArea
##          (fctr)            (dbl)    (dbl)
## 1  North Dakota         1687.966      0.9
## 2  North Dakota         1687.966       NA
## 3  North Dakota         1687.966    148.7
## 4  North Dakota         1687.966    240.6
## 5  North Dakota         1687.966    152.7
## 6  North Dakota         1687.966    133.4
## 7  North Dakota         1687.966    164.5
## 8  North Dakota         1687.966     85.8
## 9  North Dakota         1687.966    314.3
## 10 North Dakota         1687.966    207.5
## ..          ...              ...      ...