Source file ⇒ Lec7.Rmd
Volcabulary:
See figure 6.7 page 57
Frame= The space for drawing gyphs. Here a rectangular region set by roadways and gdp.
Glyph= A symbol or a marking in a frame . There is one glyph for for each case in the data table. Here points.
Aesthetics= The properties of a glyph. Here x,y, size.
Scales and Guides= Variables of the data table are mapped to aesthetics of the glyph. Here, x=roadways, y= gdp, educ= size
Facets= Multiple side by side graphs used to display levels of a categorical variable. Here facets are different ranges of Internet connectivity.
Layers= data from more than one data table are graphed together. Here not layered.
Look at graph from nytimes on prediction of 36 senate seats from different polling organizations.
Rarely is a data table glyph ready. We usually need to wrangle the data. For example say we want to make a bar chart ranking the candidates by how many precincts they are in first place.
#not glyph ready
head(Minneapolis2013)
## Precinct First Second Third Ward
## 1 P-10 BETSY HODGES undervote undervote W-7
## 2 P-06 BOB FINE MARK ANDREW undervote W-10
## 3 P-09 KURTIS W. HANNA BOB FINE MIKE GOULD W-10
## 4 P-05 BETSY HODGES DON SAMUELS undervote W-13
## 5 P-01 DON SAMUELS undervote undervote W-5
## 6 P-04 undervote undervote undervote W-6
FirstPlaceTally <- Minneapolis2013 %>%
rename(candidate=First) %>%
group_by(candidate) %>%
summarise(total=n()) %>%
arrange( desc(total))
#glyph ready
FirstPlaceTally
## Source: local data frame [38 x 2]
##
## candidate total
## (chr) (int)
## 1 BETSY HODGES 28935
## 2 MARK ANDREW 19584
## 3 DON SAMUELS 8335
## 4 CAM WINTON 7511
## 5 JACKIE CHERRYHOMES 3524
## 6 BOB FINE 2094
## 7 DAN COHEN 1798
## 8 STEPHANIE WOODRUFF 1010
## 9 MARK V ANDERSON 975
## 10 undervote 834
## .. ... ...
Here is a nice presentation of Data Verbs we will be using: link
Each of these tasks can be performed using a single data
verb. For each task, say which verb it is: choices are: summarise(), mutate(), arrange(), filter(), select(), group_by(), n()
```
Find the average of one of the variables.
Add a new column that is the ratio between two variables.
Sort the cases in descending order of a variable.
Create a new data table that includes only those cases that meet a criterion.
From a data table with several variables produce an output that has the same cases but only the first two variables.
From a data table with a categorical variables and a quantitative variable produce a count for the number of cases for each level of the categorical variable.
solution:
mtcars
## mpg cyl disp
## Mazda RX4 21.0 6 160.0
## Mazda RX4 Wag 21.0 6 160.0
## Datsun 710 22.8 4 108.0
## Hornet 4 Drive 21.4 6 258.0
## Hornet Sportabout 18.7 8 360.0
## Valiant 18.1 6 225.0
## Duster 360 14.3 8 360.0
## Merc 240D 24.4 4 146.7
## Merc 230 22.8 4 140.8
## Merc 280 19.2 6 167.6
## Merc 280C 17.8 6 167.6
## Merc 450SE 16.4 8 275.8
## Merc 450SL 17.3 8 275.8
## Merc 450SLC 15.2 8 275.8
## Cadillac Fleetwood 10.4 8 472.0
## Lincoln Continental 10.4 8 460.0
## Chrysler Imperial 14.7 8 440.0
## Fiat 128 32.4 4 78.7
## Honda Civic 30.4 4 75.7
## Toyota Corolla 33.9 4 71.1
## Toyota Corona 21.5 4 120.1
## Dodge Challenger 15.5 8 318.0
## AMC Javelin 15.2 8 304.0
## Camaro Z28 13.3 8 350.0
## Pontiac Firebird 19.2 8 400.0
## Fiat X1-9 27.3 4 79.0
## Porsche 914-2 26.0 4 120.3
## Lotus Europa 30.4 4 95.1
## Ford Pantera L 15.8 8 351.0
## Ferrari Dino 19.7 6 145.0
## Maserati Bora 15.0 8 301.0
## Volvo 142E 21.4 4 121.0
mtcars %>%
summarise(avg_count=mean(cyl))
## avg_count
## 1 6.1875
mtcars %>%
mutate(ratio=mpg/cyl) %>%
tail()
## mpg cyl disp ratio
## 27 26.0 4 120.3 6.500000
## 28 30.4 4 95.1 7.600000
## 29 15.8 8 351.0 1.975000
## 30 19.7 6 145.0 3.283333
## 31 15.0 8 301.0 1.875000
## 32 21.4 4 121.0 5.350000
mtcars %>%
arrange(desc(mpg)) %>%
tail()
## mpg cyl disp
## 27 15.0 8 301
## 28 14.7 8 440
## 29 14.3 8 360
## 30 13.3 8 350
## 31 10.4 8 472
## 32 10.4 8 460
mtcars %>%
filter(mpg >30)
## mpg cyl disp
## 1 32.4 4 78.7
## 2 30.4 4 75.7
## 3 33.9 4 71.1
## 4 30.4 4 95.1
mtcars %>%
select(mpg, cyl) %>%
tail()
## mpg cyl
## Porsche 914-2 26.0 4
## Lotus Europa 30.4 4
## Ford Pantera L 15.8 8
## Ferrari Dino 19.7 6
## Maserati Bora 15.0 8
## Volvo 142E 21.4 4
BabyNames %>%
group_by(sex) %>%
summarise(total=sum(count))
## Source: local data frame [2 x 2]
##
## sex total
## (chr) (int)
## 1 F 165280729
## 2 M 168137041