Source file ⇒ /Users/sambamamba/Stat 133 Stuff/Assignment 3.Rmd
Problem 5.4 Using the CPS85 data table (from the mosaicData package) make the graphic illustrated in pg. 54, under Problem 5.4:
frame <- CPS85 %>%
ggplot(aes(x=exper, y=wage))
frame + geom_point(aes(alpha=married)) + facet_wrap(~sector, ncol = 4 ) + scale_x_log10() + scale_y_log10()
Problem 6.3 Consider the graphic in the text for Problem 6.3 Suppose the glyph-ready data underlying the graphic were structured as is in the table under Problem 6.3. Consider the two kinds of glyph present in teh graph.
graphical attribute: double star; variable: “negative polarity”.
graphical attribute: bar; variable: “center,”low“,”high“.
Problem 6.6 In Figure 6.9, what is the glyph and its graphical attributes? a. Glyph: names of the states. Graphical attribute: font. b. Glyph: names of the polling organization. Graphical attribute: the organization’s logo. c. Glyph: Rectangle. Graphical attribute: color. d. Glyph: Rectange. Graphical attribute: color and text.
Problem 6.8 The NCHS data (in the DataComputing package) has 31126 rows. To speed things up, work with a small subset of NCHS:
Small <- NCHS %>% sample_n(size=5000)
Using the data in Small, make the plot (in book) with scatterGraphHelper() (in the DataComputing package). Then, write down the mapping between variables and graphical attributes.
Small <- NCHS %>% sample_n(size=5000) %>% scatterGraphHelper(Small)
Resulting ggplot code ggplot(data=Small,aes(x=bmi,y=weight))+geom_point()+aes(colour=smoker)
| Variable | Graphical Attribute |
|---|---|
| bmi | x-axis |
| weight | y-axis |
| smoker | color |
Small <- NCHS %>% + sample_n(size=5000) %>% + ggplot(aes(x = bmi, y = weight)) Small + geom_point(aes(color = smoker))
Variables: weight, bmi, smoker Attributes: weight = y-axis, bmi = x-axis, smoker = color (yes or no)
Problem 6.2 Consider the graph under problem 6.2. Here are some variables and their levels: * Log enzyme concentration: numerical -3 to 5 * target: CcpN, Uptake, Other * flux: zero or positive * gene: MaeN, PtsG, DctP,… * molecule: Glucose, Fructose, Gluconate,…
flux variable is mapped by having either a filled dot or a hollow dot.log enzyme concentration is mapped by the tick-marks from -3 to 5 on the vertical (y) axis.target is mapped by lines on top of the graph indicating the region in which each level of the variable occursgene variable is mapped on a tick-mark on the horizontal (x) axis.molecule variable is mapped by the different colors of the dot glyphs.molecules Fill: describes whether the molecule has flux = 0 or flux > 0. Location: x-axis (log enzyme concentration) and y-axis (gene type)target and gene.Problem 7.2 These questions refer to the diamonds data table in the ggplot2 package. Take a look at the codebook (using help()) so that you’ll understand the meaning of the tasks. Each of the following tasks can be accomplished by a statement of the form described in the book. For each task, give appropriate R functions or arguments to substitute in place of verb1, verb2, args1, args2, and args3.
diamonds %>% group_by(color) %>% summarise(avg = mean(carat)) %>% arrange(desc(avg)) %>% head(1)
diamonds %>% group_by(color) %>% summarise(Tables = mean(table/carat)) %>% arrange(desc(Tables)) %>% head(1)
Problem 7.4Each of these statements have an error. It might be an error in syntax or an error in the way the data tables are used, etc. Tell what are the error(s) in these expressions.
Problem 7.5 For each of the following outputs, identify the operation linking the input to the output and write down the details (i.e., arguments) of the operation.
Problem 7.6 Using the Minneapolis2013 data table, answer these questions: 1. How many cases are there?
80101
There are 80101 cases.
Second vote selections.Minneapolis2013 %>%
group_by(Second) %>%
tally(sort=TRUE)
## Source: local data frame [38 x 2]
##
## Second n
## (chr) (int)
## 1 BETSY HODGES 14399
## 2 DON SAMUELS 14170
## 3 MARK ANDREW 12757
## 4 undervote 10598
## 5 JACKIE CHERRYHOMES 6470
## 6 BOB FINE 3751
## 7 CAM WINTON 3751
## 8 DAN COHEN 2283
## 9 STEPHANIE WOODRUFF 2128
## 10 DOUG MANN 1052
## .. ... ...
Top 5 candidates are:
| Second | Count |
|---|---|
| BETSY HODGES | 14399 |
| DON SAMUELS | 14170 |
| MARK ANDREW | 12757 |
| undervote | 10598 |
| JACKIE CHERRYHOMES | 6470 |
Minneapolis2013 %>%
group_by(First) %>%
filter(First=="undervote") %>%
nrow()
## [1] 834
There are 834 ballots for the first choice selections.
Minneapolis2013 %>%
group_by(Second) %>%
filter(Second=="undervote") %>%
nrow()
## [1] 10598
There are 10598 ballots for the second choice selections.
Minneapolis2013 %>%
group_by(Second) %>%
filter(Second=="undervote") %>%
nrow()
## [1] 10598
There are 19210 ballots for the third choice selections.
First and Second vote selections? (That is, of all the possible ways a voter might have marked his or her first and second choices, which received the highest number of votes?)The top three combinations are:
| First | Second |
|---|---|
| ABDUL M RAHAMAN “THE ROCK” | undervote |
| ABDUL M RAHAMAN “THE ROCK” | ABDUL M RAHAMAN “THE ROCK” |
| ABDUL M RAHAMAN “THE ROCK” | BETSY HODGES |
Precinct had the highest number of ballots cast?Problem 8.1 Here are several functions from the ggplot2 graphics package used in DataComputing (in the text)
Match each of the functions to the task it performs.
Problem 8.2 Here are two more graphics based on the mosaicData::CPS85 data table. Write ggplot2() statements that will construct each graphic.
frame <- CPS85 %>%
ggplot(aes(x = age, y = wage))
frame + geom_point(aes(color = married)) + facet_wrap(~sector) + coord_cartesian(xlim = c(20, 65), ylim = c(0,30))
frame <- CPS85 %>%
ggplot(aes(x = age, y = wage))
frame + geom_point(aes(color = married)) + facet_grid(sex~married) + coord_cartesian(xlim = c(15, 65), ylim = c(0, 45))