Instructions

Exercises: 1-5 (Pgs. 6-7); 1-2, 5 (Pg. 12); 1-5 (Pgs. 20-21)

Assigned: Friday, August 24, 2018

Due: Friday, August 31, 2018 by 5:00 PM

Submission: Submit via an electronic document on Sakai. Must be submitted as a html file generated in RStudio. All assigned problems are chosen according to the textbook R for Data Science.

Chapter 1 (Pgs. 6-7)

Exercise 1

ggplot(data=mpg)

There is a blank space for a graph, with absolutely nothing on it.

Exercise 2

dim(mpg)
## [1] 234  11
nrow(mpg)
## [1] 234
ncol(mpg)
## [1] 11

There are 234 rows and 11 columns in the dataset. ## Exercise 3

?mpg
unique(mpg$drv)
## [1] "f" "4" "r"

The variable drv is a factor variable that takes the following values:

“f” = front-wheel drive, “4” = 4-wheel drive, “r” = rear-wheel drive

Excercise 4

ggplot(data=mpg,aes(x=hwy,y=cyl)) +
  geom_point() + 
  xlab("Highway Miles Per Gallon") +
  ylab("Number of Cylinders")

Excercise 5

ggplot(data=mpg,aes(x=class,y=drv)) + 
  geom_point() + 
  xlab("Type of Car") +
  ylab("Type of Drive")

Scatter plots are not meant to visualize the relationship between two categorical/qualitative variables.

Chapter 1 (Pg. 12)

Exercise 1

ggplot(data=mpg) +
  geom_point(mapping=aes(x=displ,y=hwy,color="blue"))

The points are not blue because we are treating “blue” as a variable here and not an aesthetic property. To fix, we must move color=“blue” outside the mapping option.

ggplot(data=mpg) +
  geom_point(mapping=aes(x=displ,y=hwy),color="blue")

Exercise 2

str(mpg)
## Classes 'tbl_df', 'tbl' and 'data.frame':    234 obs. of  11 variables:
##  $ manufacturer: chr  "audi" "audi" "audi" "audi" ...
##  $ model       : chr  "a4" "a4" "a4" "a4" ...
##  $ displ       : num  1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
##  $ year        : int  1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
##  $ cyl         : int  4 4 4 4 6 6 6 4 4 4 ...
##  $ trans       : chr  "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
##  $ drv         : chr  "f" "f" "f" "f" ...
##  $ cty         : int  18 21 20 21 16 18 18 18 16 20 ...
##  $ hwy         : int  29 29 31 30 26 26 27 26 25 28 ...
##  $ fl          : chr  "p" "p" "p" "p" ...
##  $ class       : chr  "compact" "compact" "compact" "compact" ...

The variables hwy, displ, and cty are all continuous variables.

ggplot(data=mpg) +
  geom_point(mapping=aes(x=displ,y=hwy,color=cty))

ggplot(data=mpg) +
  geom_point(mapping=aes(x=displ,y=hwy,size=cty))

# Does not work for shape. Requires a categorical variable #
# ggplot(data=mpg) +
#   geom_point(mapping=aes(x=displ,y=hwy,shape=cty))

For color, a gradient is used to represent the magnitude of third variable cty. For shape, the size of the shape is used to represent the value of the third variable cty.

Exercise 5

ggplot(data=mpg) +
  geom_point(mapping=aes(x=displ,y=hwy,stroke=cty))

?geom_point

The stroke aesthetic works similar to the shape aesthetic. The size of the outline around the dots is based on the magnitude of the variable cty.

Chapter 1 (Pgs. 20-21)

Exercise 1

For line chart, use geom_line. For boxplot, use geom_boxplot. For histogram, use geom_histogram.

Exercise 2

ggplot(data=mpg,mapping=aes(x=displ,y=hwy,color=drv)) +
  geom_point() +
  geom_smooth(se=F)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Exercise 3

ggplot(data=mpg,mapping=aes(x=displ,y=hwy,color=drv),show.legend=F) +
  geom_point(show.legend=F) +
  geom_smooth(se=F,show.legend=F)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

In order to remove the legend, the option show.legend must be specified for each geometry. Another method for removing the legend is to use guides() and set the targeted aesthetic to FALSE.

ggplot(data=mpg,mapping=aes(x=displ,y=hwy,color=drv)) +
  geom_point() +
  geom_smooth(se=F) +
  guides(color=F)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Exercise 4

The se argument to geom_smooth() controls the standard error boundary around the fitted smooth curve to the data.

Exercise 5

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point() + 
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot() + 
  geom_point(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_smooth(data = mpg, mapping = aes(x = displ, y = hwy))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

The plots look identical

Open Response

Step 1: Select 1 numeric variable and 2 categorical variables. Create a graphic using geom_boxplot() and facet_wrap to illustrate the empirical distributions of the sample.

ggplot(data=diamonds) +
  geom_boxplot(aes(x=color,y=price,color=color)) + 
  facet_wrap(~cut) +
  theme(axis.text.x=element_blank(),axis.ticks.x=element_blank()) + 
  guides(color=guide_legend(title="Color"))+
  xlab("") + ylab("Price (Dollars)")

Step 2: Choose 2 numeric variables and 2 categorical variables and creatively illustrate the relationship between all the variables.

ggplot(data=diamonds) + 
  geom_point(aes(x=carat,y=price,shape=cut,color=color)) 
## Warning: Using shapes for an ordinal variable is not advised