A user has typed the following commands into the RStudio console.
a <- c(10, 15)
b <- c(TRUE, FALSE)
c <- c("happy", "sad")
What do each of the following commands return? Describe the class of the object as well as its value.
# This returns a dataframe with 2 rows and 3 columns (whose type is a list). Each column is an entry of one of the vectors from above. Column "A" has entries of type double, column "B" has entires of type logical, and column "C" has entries of type character
data.frame(a,b,c)
## a b c
## 1 10 TRUE happy
## 2 15 FALSE sad
# This returns a dataframe which is a list of doubles from vectors a and b combined by column.
cbind(a,b)
## a b
## [1,] 10 1
## [2,] 15 0
# This returns a dataframe which is a list of doubles from vectors a and b combined by row.
rbind(a,b)
## [,1] [,2]
## a 10 15
## b 1 0
# This returns a dataframe which is a list of characters from vectors a, b, & c combined by column.
cbind(a,b,c)
## a b c
## [1,] "10" "TRUE" "happy"
## [2,] "15" "FALSE" "sad"
# This returns the item in the second place of a list containing vectors a, b, & c. In this case the logicals "TRUE" and "FALSE" (vector b) are returned.
list(a,b,c)[[2]]
## [1] TRUE FALSE
Why does the mosaic package plain R Markdown template include the code chunk option message=FALSE when the mosaic package is loaded?
SOLUTION:
The mosaic package includes the code chunk option message=FALSE in order to keep the code printed to the console to a minimum when loaded. This code means that all of the messages within mosaic aren’t printed when mosaic is loaded (thus reducing clutter).
Choose one of the data graphics listed at http://mdsr-book.github.io/exercises.html#exercise_23 and answer the following questions. Be sure to indicate which graphical display you picked.
SOLUTION:
I chose the “Who does not Pay Income Tax?” graphic. This graphic shows a pie chart that represents the breakdown on who does and doesn’t pay income taxes. Because it’s a pie chart, there’s no coordinate system (and the scale is simply relative between the different pieces of the chart). For each piece of the chart that represents a different statistic, a different color is shown alongside a written description.
There are two variables in this display: the reason for not paying income tax (no reason given if paid) and the associated percentage of people with said reason. Those who paid their income taxes are mapped to a piece of the pie chart with a grey color, and the other four pieces are mapped to a red color scale with a different color/label given to each individual reason for lack of payment on income taxes.
While it is normally best to avoid pie charts, I think that this graphic was simple and effective in its message for its target audience. This pie chart was created in response to a comment Mitt Romney made during the 2012 election and served to break down the reasons for why 46% of the US population didn’t pay their income taxes. In pie charts it is usually difficult to compare individual pieces of the pie chart (due to the nature of the graphic), but in this case the distinctions were easier to make because each individual piece of the pie chart was somewhat turned away from the center (in addition to having a clear different color).
Using the famous Galton data set from the mosaicData package:
sexsexHint: recall that you can find out more about the data set by running the command ?Galton.
SOLUTION:
#1
ggplot(Galton, aes(x=height, y=father)) +
geom_jitter() +
labs(x="Height (in)", y = "Father's Height (in)", title = "Individuals' Height vs. Their Father's Height")
#2
ggplot(Galton, aes(x=height, y=father)) +
geom_jitter() +
facet_wrap(. ~ sex) +
labs(x="Height (in)", y = "Father's Height (in)", title = "Individuals' Height vs. Their Father's Height by Sex")
#3
ggplot(Galton, aes(x=height, y=father)) +
geom_jitter() +
facet_wrap(. ~ sex) +
geom_smooth(method = "lm", se=FALSE) +
labs(x="Height (in)", y = "Father's Height (in)", title = "Individuals' Height vs. Their Father's Height by Sex")
#4
ggplot(Galton, aes(x=height, y=father, col=sex)) +
geom_jitter() + labs(x="Height (in)", y = "Father's Height (in)", title = "Individuals' Height vs. Their Father's Height by Sex")
The following questions use the Marriage data set from the mdsr package.
SOLUTION:
#1
ggplot(Marriage, aes(x=age, y=hs, col=race)) +
geom_jitter() +
labs(x="Age", y="Number of Years of High School Education", title="Age of Marriage based on Years of Education")
#3
ggplot(Marriage, aes(x= race, y=hs, size=age, col=sign)) +
geom_jitter() +
facet_wrap(.~person) +
labs(x = "Race", y = "Years of High School Education", title = "Race vs. Years of High School Education", size="Age when married")
The following exercises use the storms data table from the nasaweather package.
geom_path function to plot the path of each tropical storm. Use color to distinguish the storms from one another, and use facetting to plot each year in its own panel.pressure.wind.SOLUTION:
#1
ggplot(storms, aes(x=lat, y=long, col=category)) +
geom_path() +
facet_wrap(.~year) +
labs(x = "Latitude", y = "Longitude", title = "Path of Storms each Year by Hurricane Category")
#2
ggplot(storms, aes(x=lat, y=long, col=category, size=pressure)) +
geom_path() +
facet_wrap(.~year) +
labs(x = "Latitude", y = "Longitude", title = "Path of Storms each Year by Hurricane Category")
#3
ggplot(storms, aes(x=lat, y=long, col=category, size=wind)) +
geom_path() +
facet_wrap(.~year) +
labs(x = "Latitude", y = "Longitude", title = "Path of Storms each Year by Hurricane Category")