Source file ⇒ /Users/sambamamba/Stat 133 Stuff/Assignment 3.Rmd

Problem 5.4 Using the CPS85 data table (from the mosaicData package) make the graphic illustrated in pg. 54, under Problem 5.4:

frame <- CPS85 %>%
  ggplot(aes(x=exper, y=wage)) 
frame + geom_point(aes(alpha=married)) + facet_wrap(~sector, ncol = 4 ) + scale_x_log10() + scale_y_log10()

Problem 6.3 Consider the graphic in the text for Problem 6.3 Suppose the glyph-ready data underlying the graphic were structured as is in the table under Problem 6.3. Consider the two kinds of glyph present in teh graph.

  1. For each of the two glyphs, list the set of graphical attributes both geometrically (e.g. “dot”) and in terms of the variable from the table that is mapped to that attribute (e.g., polarity)
  1. Which variables define the frame? Give variables for both the horizontal and vertical coordinates.
  1. Is color an attribute of the “**" glyph?
  1. What guides (if any) are displayed?

Problem 6.6 In Figure 6.9, what is the glyph and its graphical attributes? a. Glyph: names of the states. Graphical attribute: font. b. Glyph: names of the polling organization. Graphical attribute: the organization’s logo. c. Glyph: Rectangle. Graphical attribute: color. d. Glyph: Rectange. Graphical attribute: color and text.

Problem 6.8 The NCHS data (in the DataComputing package) has 31126 rows. To speed things up, work with a small subset of NCHS:

Small <- NCHS %>% sample_n(size=5000)

Using the data in Small, make the plot (in book) with scatterGraphHelper() (in the DataComputing package). Then, write down the mapping between variables and graphical attributes.

Small <- NCHS %>% sample_n(size=5000) %>% scatterGraphHelper(Small)

Resulting ggplot code ggplot(data=Small,aes(x=bmi,y=weight))+geom_point()+aes(colour=smoker)

Variable Graphical Attribute
bmi x-axis
weight y-axis
smoker color

Small <- NCHS %>% + sample_n(size=5000) %>% + ggplot(aes(x = bmi, y = weight)) Small + geom_point(aes(color = smoker))

Variables: weight, bmi, smoker Attributes: weight = y-axis, bmi = x-axis, smoker = color (yes or no)

Problem 6.2 Consider the graph under problem 6.2. Here are some variables and their levels: * Log enzyme concentration: numerical -3 to 5 * target: CcpN, Uptake, Other * flux: zero or positive * gene: MaeN, PtsG, DctP,… * molecule: Glucose, Fructose, Gluconate,…

  1. List all of the guides in the graph. For each one, say which variable is being mapped to which graphical attribute.
  1. The basic glyph is a dot. Say what are the graphical attributes of the dot (e.g. color, size,…). For each graphical attribute found in the graph, say which variable is mapped to that attribute.
  1. Which two variables set the frame?
  1. The scaling of the horizontal variable (e.g., the translation of position to variable levels) is set by a combination of two variables. Which two?

Problem 7.2 These questions refer to the diamonds data table in the ggplot2 package. Take a look at the codebook (using help()) so that you’ll understand the meaning of the tasks. Each of the following tasks can be accomplished by a statement of the form described in the book. For each task, give appropriate R functions or arguments to substitute in place of verb1, verb2, args1, args2, and args3.

  1. Which color diamonds seem to be the largest on average (in terms of cases?)

diamonds %>% group_by(color) %>% summarise(avg = mean(carat)) %>% arrange(desc(avg)) %>% head(1)

  1. Which clarity of diamonds has the largest average “table” per carat?

diamonds %>% group_by(color) %>% summarise(Tables = mean(table/carat)) %>% arrange(desc(Tables)) %>% head(1)

Problem 7.4Each of these statements have an error. It might be an error in syntax or an error in the way the data tables are used, etc. Tell what are the error(s) in these expressions.

  1. BabyNames %>% group_by(“First”) %>% summarise( votesReceived=n())
  1. Tmp <- group_by(BabyNames, year, sex) %>% summarise( Tmp, totalBirths=sum(count))
  1. Tmp <- group_by(BabyNames, year, sex) summarise( BabyNames, totalBirth=sum(count))

Problem 7.5 For each of the following outputs, identify the operation linking the input to the output and write down the details (i.e., arguments) of the operation.

  1. BabyNames %>% arrange(sex, color)
  2. BabyNames %>% filter(sex==“F”)
  3. BabyNames %>% filter(sex==“M”, count > 10)
  4. BabyNames %>% summarise(total= sum(count))
  5. BabyNames %>% select(name, count)

Problem 7.6 Using the Minneapolis2013 data table, answer these questions: 1. How many cases are there?

80101

There are 80101 cases.

  1. Who were the top 5 candidates in the Second vote selections.
Minneapolis2013 %>%
  group_by(Second) %>%
  tally(sort=TRUE)
## Source: local data frame [38 x 2]
## 
##                Second     n
##                 (chr) (int)
## 1        BETSY HODGES 14399
## 2         DON SAMUELS 14170
## 3         MARK ANDREW 12757
## 4           undervote 10598
## 5  JACKIE CHERRYHOMES  6470
## 6            BOB FINE  3751
## 7          CAM WINTON  3751
## 8           DAN COHEN  2283
## 9  STEPHANIE WOODRUFF  2128
## 10          DOUG MANN  1052
## ..                ...   ...

Top 5 candidates are:

Second Count
BETSY HODGES 14399
DON SAMUELS 14170
MARK ANDREW 12757
undervote 10598
JACKIE CHERRYHOMES 6470
  1. How many ballots are marked “undervote” in
Minneapolis2013 %>% 
  group_by(First) %>%
  filter(First=="undervote") %>%
  nrow()
## [1] 834

There are 834 ballots for the first choice selections.

Minneapolis2013 %>% 
  group_by(Second) %>%
  filter(Second=="undervote") %>%
  nrow()
## [1] 10598

There are 10598 ballots for the second choice selections.

Minneapolis2013 %>% 
  group_by(Second) %>%
  filter(Second=="undervote") %>%
  nrow()
## [1] 10598

There are 19210 ballots for the third choice selections.

  1. What are the top 3 combinations of First and Second vote selections? (That is, of all the possible ways a voter might have marked his or her first and second choices, which received the highest number of votes?)

The top three combinations are:

First Second
ABDUL M RAHAMAN “THE ROCK” undervote
ABDUL M RAHAMAN “THE ROCK” ABDUL M RAHAMAN “THE ROCK”
ABDUL M RAHAMAN “THE ROCK” BETSY HODGES
  1. Which Precinct had the highest number of ballots cast?

Problem 8.1 Here are several functions from the ggplot2 graphics package used in DataComputing (in the text)

Match each of the functions to the task it performs.

  1. Construct the graphics frame
  1. Add a layer of glyphs
  1. Set an axis label
  1. Divide the frame into facets
  1. Change the scale of the frame

Problem 8.2 Here are two more graphics based on the mosaicData::CPS85 data table. Write ggplot2() statements that will construct each graphic.

frame <- CPS85 %>% 
  ggplot(aes(x = age, y = wage)) 
frame + geom_point(aes(color = married)) + facet_wrap(~sector) + coord_cartesian(xlim = c(20, 65), ylim = c(0,30))

frame <- CPS85 %>%
  ggplot(aes(x = age, y = wage))
frame + geom_point(aes(color = married)) + facet_grid(sex~married) + coord_cartesian(xlim = c(15, 65), ylim = c(0, 45))