Instructions

Exercises: 1-5 (3.2.4 Exercises); 1-2, 5 (3.3.1 Exercises); 1-5 (3.6.1 Exercises); Open Response

Submission: Submit an electronic document on Sakai. Must be submitted as an HTML file generated in RStudio. All assigned problems are chosen according to the textbook R for Data Science. You do not need R code to answer every question. If you answer without using R code, delete the code chunk. If the question requires R code, make sure you display R code. If the question requires a figure, make sure you display a figure. A lot of the questions can be answered in written response, but require R code and/or figures for understanding and explaining.

Chapter 3 (3.2.4 Exercises)

Exercise 1

ggplot(data = mpg)

# This only shows an empty background of the plot because nothing
# had been stated in the geom function.

Exercise 2

nrow(mpg)
## [1] 234
ncol(mpg)
## [1] 11
# Based on the information below there are 234 rows
# and 11 columns 

Exercise 3

?mpg
# The variable drv takes the variable f, 4, and r.
#f stands for front wheel drive
# 4 stands for 4 wheel drive
# r stands for real wheel drive 

Excercise 4

ggplot(data = mpg) +
  geom_point(mapping = aes(x = cyl, y = hwy))

Excercise 5

ggplot(data = mpg) +
  geom_point(mapping = aes(x = drv, y = class))

#Based on the plot below it is not useful because both variables are
# categorical.

Chapter 3 (3.3.1 Exercises)

Exercise 1

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))

#The reason why it is not blue is because it is inside the
#aes function and it need to be outside of the function.

Exercise 2

str(mpg)
## tibble [234 x 11] (S3: tbl_df/tbl/data.frame)
##  $ manufacturer: chr [1:234] "audi" "audi" "audi" "audi" ...
##  $ model       : chr [1:234] "a4" "a4" "a4" "a4" ...
##  $ displ       : num [1:234] 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
##  $ year        : int [1:234] 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
##  $ cyl         : int [1:234] 4 4 4 4 6 6 6 4 4 4 ...
##  $ trans       : chr [1:234] "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
##  $ drv         : chr [1:234] "f" "f" "f" "f" ...
##  $ cty         : int [1:234] 18 21 20 21 16 18 18 18 16 20 ...
##  $ hwy         : int [1:234] 29 29 31 30 26 26 27 26 25 28 ...
##  $ fl          : chr [1:234] "p" "p" "p" "p" ...
##  $ class       : chr [1:234] "compact" "compact" "compact" "compact" ...
#Based on the information below
#categorical variables are class, fl, drv, trans, model and manufacturer
# continuous variables are displ, year,cyl, cty and hwy.

Exercise 5

?geom_point
## starting httpd help server ... done
# The stroke changes the size of the border for the shape
# and control the size of shapes 21 through 24. 

Chapter 3 (3.6.1 Exercises)

Exercise 1

#geom_line(), geom_boxplot(), geom_histogram(), and geom_area()

Exercise 2

ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + 
  geom_point() + 
  geom_smooth(se = FALSE)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

#Prediction it would display displ as x, hwy as y and color will be green
# Actual the color is different based on the drv 

Exercise 3

#show.legend = FALSE hides the legend that is automatically created
#Removing it would show the legend because it would always be true
#It was shown earlier to show how legends are display.

Exercise 4

#Se argument in geom_smooth() gives standard error gray areas around a line in the line chart to see where where data is above the average. 

Exercise 5

I don’t know if they will look different. Let me check.

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point() + 
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot() + 
  geom_point(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_smooth(data = mpg, mapping = aes(x = displ, y = hwy))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

# No because both geom_smooth()and geom_point()  will use the same data and mappings. 

They do not look different. I am incredibly surprised.

Open Response

For this exercise, use the diamonds dataset in the tidyverse. Use ?diamonds to get more information about the dataset.

Step 1: Select 1 numeric variable and 2 categorical variables. Create a graphic using geom_boxplot() and facet_wrap to illustrate the empirical distributions of the sample.

ggplot(data=diamonds) + facet_wrap(~cut) +geom_boxplot(mapping = aes(x=color, y=price, color=clarity))

Step 2: Choose 2 numeric variables and 2 categorical variables and creatively illustrate the relationship between all the variables.

ggplot(data=diamonds) + 
   geom_point(aes(x=carat,y=price,shape=cut, color=clarity)) 
## Warning: Using shapes for an ordinal variable is not advised