Assignment 3

Source file ⇒ /Users/sambamamba/Stat 133 Stuff/Assignment 3.Rmd

Problem 5.4 Using the CPS85 data table (from the mosaicData package) make the graphic illustrated in pg. 54, under Problem 5.4:

frame <- CPS85 %>%
  ggplot(aes(x=exper, y=wage)) 
frame + geom_point(aes(alpha=married)) + facet_wrap(~sector, ncol = 4 ) + scale_x_log10() + scale_y_log10()

Problem 6.3 Consider the graphic in the text for Problem 6.3 Suppose the glyph-ready data underlying the graphic were structured as is in the table under Problem 6.3. Consider the two kinds of glyph present in teh graph.

For each of the two glyphs, list the set of graphical attributes both geometrically (e.g. “dot”) and in terms of the variable from the table that is mapped to that attribute (e.g., polarity)

graphical attribute: double star; variable: “negative polarity”.
graphical attribute: bar; variable: “center,”low“,”high“.

Which variables define the frame? Give variables for both the horizontal and vertical coordinates.

The variables are protein and cell density.

Is color an attribute of the “**" glyph?

No, color is not an attribute of that glyph.

What guides (if any) are displayed?

The guides for the graph are low vs high cell density and protein type.

Problem 6.6 In Figure 6.9, what is the glyph and its graphical attributes? a. Glyph: names of the states. Graphical attribute: font. b. Glyph: names of the polling organization. Graphical attribute: the organization’s logo. c. Glyph: Rectangle. Graphical attribute: color. d. Glyph: Rectange. Graphical attribute: color and text.

Problem 6.8 The NCHS data (in the DataComputing package) has 31126 rows. To speed things up, work with a small subset of NCHS:

Small <- NCHS %>% sample_n(size=5000)

Using the data in Small, make the plot (in book) with scatterGraphHelper() (in the DataComputing package). Then, write down the mapping between variables and graphical attributes.

Small <- NCHS %>% sample_n(size=5000) %>% scatterGraphHelper(Small)

Resulting ggplot code ggplot(data=Small,aes(x=bmi,y=weight))+geom_point()+aes(colour=smoker)

Variable	Graphical Attribute
bmi	x-axis
weight	y-axis
smoker	color

Small <- NCHS %>% + sample_n(size=5000) %>% + ggplot(aes(x = bmi, y = weight)) Small + geom_point(aes(color = smoker))

Variables: weight, bmi, smoker Attributes: weight = y-axis, bmi = x-axis, smoker = color (yes or no)

Problem 6.2 Consider the graph under problem 6.2. Here are some variables and their levels: * Log enzyme concentration: numerical -3 to 5 * target: CcpN, Uptake, Other * flux: zero or positive * gene: MaeN, PtsG, DctP,… * molecule: Glucose, Fructose, Gluconate,…

List all of the guides in the graph. For each one, say which variable is being mapped to which graphical attribute.

The flux variable is mapped by having either a filled dot or a hollow dot.
log enzyme concentration is mapped by the tick-marks from -3 to 5 on the vertical (y) axis.
target is mapped by lines on top of the graph indicating the region in which each level of the variable occurs
Each gene variable is mapped on a tick-mark on the horizontal (x) axis.
The molecule variable is mapped by the different colors of the dot glyphs.

The basic glyph is a dot. Say what are the graphical attributes of the dot (e.g. color, size,…). For each graphical attribute found in the graph, say which variable is mapped to that attribute.

Some of the graphical attributes are: Color: describes the different types of molecules Fill: describes whether the molecule has flux = 0 or flux > 0. Location: x-axis (log enzyme concentration) and y-axis (gene type)

Which two variables set the frame?

The two variables are enzyme concentration and molecule.

The scaling of the horizontal variable (e.g., the translation of position to variable levels) is set by a combination of two variables. Which two?

The two variables are target and gene.

Problem 7.2 These questions refer to the diamonds data table in the ggplot2 package. Take a look at the codebook (using help()) so that you’ll understand the meaning of the tasks. Each of the following tasks can be accomplished by a statement of the form described in the book. For each task, give appropriate R functions or arguments to substitute in place of verb1, verb2, args1, args2, and args3.

Which color diamonds seem to be the largest on average (in terms of cases?)

The largest are color J at 1.162137 carats

diamonds %>% group_by(color) %>% summarise(avg = mean(carat)) %>% arrange(desc(avg)) %>% head(1)

Which clarity of diamonds has the largest average “table” per carat?

Color E at 114.0752 tables/carat.

diamonds %>% group_by(color) %>% summarise(Tables = mean(table/carat)) %>% arrange(desc(Tables)) %>% head(1)

Problem 7.4Each of these statements have an error. It might be an error in syntax or an error in the way the data tables are used, etc. Tell what are the error(s) in these expressions.

BabyNames %>% group_by(“First”) %>% summarise( votesReceived=n())

The variable First should not be in quotations.

Tmp <- group_by(BabyNames, year, sex) %>% summarise( Tmp, totalBirths=sum(count))

You cannot use Tmp as an argument for summarise because it hasn’t been defined in the second line.

Tmp <- group_by(BabyNames, year, sex) summarise( BabyNames, totalBirth=sum(count))

The first argument in the function summarise should be Tmp, not BabyNames.

Problem 7.5 For each of the following outputs, identify the operation linking the input to the output and write down the details (i.e., arguments) of the operation.

BabyNames %>% arrange(sex, color)
BabyNames %>% filter(sex==“F”)
BabyNames %>% filter(sex==“M”, count > 10)
BabyNames %>% summarise(total= sum(count))
BabyNames %>% select(name, count)

Problem 7.6 Using the Minneapolis2013 data table, answer these questions: 1. How many cases are there?

80101

There are 80101 cases.

Who were the top 5 candidates in the Second vote selections.

Minneapolis2013 %>%
  group_by(Second) %>%
  tally(sort=TRUE)

## Source: local data frame [38 x 2]
## 
##                Second     n
##                 (chr) (int)
## 1        BETSY HODGES 14399
## 2         DON SAMUELS 14170
## 3         MARK ANDREW 12757
## 4           undervote 10598
## 5  JACKIE CHERRYHOMES  6470
## 6            BOB FINE  3751
## 7          CAM WINTON  3751
## 8           DAN COHEN  2283
## 9  STEPHANIE WOODRUFF  2128
## 10          DOUG MANN  1052
## ..                ...   ...

Top 5 candidates are:

Second	Count
BETSY HODGES	14399
DON SAMUELS	14170
MARK ANDREW	12757
undervote	10598
JACKIE CHERRYHOMES	6470

How many ballots are marked “undervote” in

First choice selections?

Minneapolis2013 %>% 
  group_by(First) %>%
  filter(First=="undervote") %>%
  nrow()

## [1] 834

There are 834 ballots for the first choice selections.

Second choice selections?

Minneapolis2013 %>% 
  group_by(Second) %>%
  filter(Second=="undervote") %>%
  nrow()

## [1] 10598

There are 10598 ballots for the second choice selections.

Third choice selections?

Minneapolis2013 %>% 
  group_by(Second) %>%
  filter(Second=="undervote") %>%
  nrow()

## [1] 10598

There are 19210 ballots for the third choice selections.

What are the top 3 combinations of First and Second vote selections? (That is, of all the possible ways a voter might have marked his or her first and second choices, which received the highest number of votes?)

The top three combinations are:

First	Second
ABDUL M RAHAMAN “THE ROCK”	undervote
ABDUL M RAHAMAN “THE ROCK”	ABDUL M RAHAMAN “THE ROCK”
ABDUL M RAHAMAN “THE ROCK”	BETSY HODGES

Which Precinct had the highest number of ballots cast?

The precinct with the highest number of ballots is P-06 with 9711 ballots.

Problem 8.1 Here are several functions from the ggplot2 graphics package used in DataComputing (in the text)

Match each of the functions to the task it performs.

Construct the graphics frame

ggplot()

Add a layer of glyphs

geom_point(), geom_histogram(), geom_segment()

Set an axis label

ylab()

Divide the frame into facets

facet_wrap() and facet_grid()

Change the scale of the frame

scale_y_log10()

Problem 8.2 Here are two more graphics based on the mosaicData::CPS85 data table. Write ggplot2() statements that will construct each graphic.

frame <- CPS85 %>% 
  ggplot(aes(x = age, y = wage)) 
frame + geom_point(aes(color = married)) + facet_wrap(~sector) + coord_cartesian(xlim = c(20, 65), ylim = c(0,30))

frame <- CPS85 %>%
  ggplot(aes(x = age, y = wage))
frame + geom_point(aes(color = married)) + facet_grid(sex~married) + coord_cartesian(xlim = c(15, 65), ylim = c(0, 45))

Assignment 3

Samba Njie Jr

February 6, 2016