Grammar of Graphics

PROBLEMS TO TURN IN

MSDR B.1

A user has typed the following commands into the RStudio console.

a <- c(10, 15)
b <- c(TRUE, FALSE)
c <- c("happy", "sad")

What do each of the following commands return? Describe the class of the object as well as its value.

# This returns a dataframe with 2 rows and 3 columns (whose type is a list). Each column is an entry of one of the vectors from above. Column "A" has entries of type double, column "B" has entires of type logical, and column "C" has entries of type character 
data.frame(a,b,c)

##    a     b     c
## 1 10  TRUE happy
## 2 15 FALSE   sad

# This returns a dataframe which is a list of doubles from vectors a and b combined by column.
cbind(a,b)

##       a b
## [1,] 10 1
## [2,] 15 0

# This returns a dataframe which is a list of doubles from vectors a and b combined by row.
rbind(a,b)

##   [,1] [,2]
## a   10   15
## b    1    0

# This returns a dataframe which is a list of characters from vectors a, b, & c combined by column.
cbind(a,b,c)

##      a    b       c      
## [1,] "10" "TRUE"  "happy"
## [2,] "15" "FALSE" "sad"

# This returns the item in the second place of a list containing vectors a, b, & c. In this case the logicals "TRUE" and "FALSE" (vector b) are returned. 
list(a,b,c)[[2]]

## [1]  TRUE FALSE

MSDR D.8

Why does the mosaic package plain R Markdown template include the code chunk option message=FALSE when the mosaic package is loaded?

SOLUTION:

The mosaic package includes the code chunk option message=FALSE in order to keep the code printed to the console to a minimum when loaded. This code means that all of the messages within mosaic aren’t printed when mosaic is loaded (thus reducing clutter).

MDSR 2.3 (Graphical critique)

Choose one of the data graphics listed at http://mdsr-book.github.io/exercises.html#exercise_23 and answer the following questions. Be sure to indicate which graphical display you picked.

Identify the visual cues, coordinate system, and scale(s).
How many variables are depicted in the graphic? Explicitly link each variable to a visual cue that you listed above.
Critique this data graphic using the taxonomy described in this chapter.

SOLUTION:

I chose the “Who does not Pay Income Tax?” graphic. This graphic shows a pie chart that represents the breakdown on who does and doesn’t pay income taxes. Because it’s a pie chart, there’s no coordinate system (and the scale is simply relative between the different pieces of the chart). For each piece of the chart that represents a different statistic, a different color is shown alongside a written description.
There are two variables in this display: the reason for not paying income tax (no reason given if paid) and the associated percentage of people with said reason. Those who paid their income taxes are mapped to a piece of the pie chart with a grey color, and the other four pieces are mapped to a red color scale with a different color/label given to each individual reason for lack of payment on income taxes.
While it is normally best to avoid pie charts, I think that this graphic was simple and effective in its message for its target audience. This pie chart was created in response to a comment Mitt Romney made during the 2012 election and served to break down the reasons for why 46% of the US population didn’t pay their income taxes. In pie charts it is usually difficult to compare individual pieces of the pie chart (due to the nature of the graphic), but in this case the distinctions were easier to make because each individual piece of the pie chart was somewhat turned away from the center (in addition to having a clear different color).

MDSR 3.1 (Galton)

Using the famous Galton data set from the mosaicData package:

Create a scatterplot of each person’s height against their father’s height
Separate your plot into facets by sex
Add regression lines to all of your facets
Using the plot from (1) color the points by sex

Hint: recall that you can find out more about the data set by running the command ?Galton.

SOLUTION:

#1
ggplot(Galton, aes(x=height, y=father)) +
  geom_jitter() +
  labs(x="Height (in)", y = "Father's Height (in)", title = "Individuals' Height vs. Their Father's Height")

#2
ggplot(Galton, aes(x=height, y=father)) +
  geom_jitter() +
  facet_wrap(. ~ sex) +
  labs(x="Height (in)", y = "Father's Height (in)", title = "Individuals' Height vs. Their Father's Height by Sex")

#3
ggplot(Galton, aes(x=height, y=father)) +
  geom_jitter() +
  facet_wrap(. ~ sex) +
  geom_smooth(method = "lm", se=FALSE) +
  labs(x="Height (in)", y = "Father's Height (in)", title = "Individuals' Height vs. Their Father's Height by Sex")

#4
ggplot(Galton, aes(x=height, y=father, col=sex)) +
  geom_jitter() + labs(x="Height (in)", y = "Father's Height (in)", title = "Individuals' Height vs. Their Father's Height by Sex")

MDSR 3.4 (Marriage)

The following questions use the Marriage data set from the mdsr package.

Create an informative and meaningful data graphic.
Identify each of the visual cues that you are using, and describe how they are related to each variable.
Create a data graphic with at least five variables (either quantitative or categorical). For the purposes of this exercise, do not worry about making your visualization meaningful—just try to encode five variables into one plot.

SOLUTION:

#1
ggplot(Marriage, aes(x=age, y=hs, col=race)) +
  geom_jitter() +
  labs(x="Age", y="Number of Years of High School Education", title="Age of Marriage based on Years of Education")

In the above graphic I looked at the relationship between the age at which one married and the number of years of high school education they recieved. I also looked at how race might play into the relationship beween age of marriage and years of high school education. To do this I mapped race to the color aesthetic, age to the x-axis, and number of years of high school education to the y. I then added a jitter so that each individual point would be easier to see (instead of all being on top of one another). Due to the fact that there were not many observations present in the data set it’s difficult to decisively say whether or not a pattern exists. It did seem that those who didn’t fully finish their years of education in high school tended to get married earlier on average, but again, there’s not enough data present.

#3
ggplot(Marriage, aes(x= race, y=hs, size=age, col=sign)) +
  geom_jitter() +
  facet_wrap(.~person) +
  labs(x = "Race", y = "Years of High School Education", title = "Race vs. Years of High School Education", size="Age when married")

MDSR 3.10 (Weather)

The following exercises use the storms data table from the nasaweather package.

Use the geom_path function to plot the path of each tropical storm. Use color to distinguish the storms from one another, and use facetting to plot each year in its own panel.
Using the graphic from part (1) incorporate pressure.
Using the graphic from part (1) incorporate wind.

SOLUTION:

#1
ggplot(storms, aes(x=lat, y=long, col=category)) +
  geom_path() +
  facet_wrap(.~year) +
  labs(x = "Latitude", y = "Longitude", title = "Path of Storms each Year by Hurricane Category")

#2
ggplot(storms, aes(x=lat, y=long, col=category, size=pressure)) +
  geom_path() +
  facet_wrap(.~year) +
  labs(x = "Latitude", y = "Longitude", title = "Path of Storms each Year by Hurricane Category")

#3
ggplot(storms, aes(x=lat, y=long, col=category, size=wind)) +
  geom_path() +
  facet_wrap(.~year) +
  labs(x = "Latitude", y = "Longitude", title = "Path of Storms each Year by Hurricane Category")

Grammar of Graphics

Oliver Baldwin Edwards

9/11/18

PROBLEMS TO TURN IN

MSDR B.1

MSDR D.8

MDSR 2.3 (Graphical critique)

MDSR 3.1 (Galton)

MDSR 3.4 (Marriage)

MDSR 3.10 (Weather)