Exercises: 1-5 (3.2.4 Exercises); 1-2, 5 (3.3.1 Exercises); 1-5 (3.6.1 Exercises); Open Response
Submission: Submit an electronic document on Sakai. Must be submitted as an HTML file generated in RStudio. All assigned problems are chosen according to the textbook R for Data Science. You do not need R code to answer every question. If you answer without using R code, delete the code chunk. If the question requires R code, make sure you display R code. If the question requires a figure, make sure you display a figure. A lot of the questions can be answered in written response, but require R code and/or figures for understanding and explaining.
ggplot(data = mpg)
# This only shows an empty background of the plot because nothing
# had been stated in the geom function.
nrow(mpg)
## [1] 234
ncol(mpg)
## [1] 11
# Based on the information below there are 234 rows
# and 11 columns
?mpg
# The variable drv takes the variable f, 4, and r.
#f stands for front wheel drive
# 4 stands for 4 wheel drive
# r stands for real wheel drive
ggplot(data = mpg) +
geom_point(mapping = aes(x = cyl, y = hwy))
ggplot(data = mpg) +
geom_point(mapping = aes(x = drv, y = class))
#Based on the plot below it is not useful because both variables are
# categorical.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))
#The reason why it is not blue is because it is inside the
#aes function and it need to be outside of the function.
str(mpg)
## tibble [234 x 11] (S3: tbl_df/tbl/data.frame)
## $ manufacturer: chr [1:234] "audi" "audi" "audi" "audi" ...
## $ model : chr [1:234] "a4" "a4" "a4" "a4" ...
## $ displ : num [1:234] 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
## $ year : int [1:234] 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
## $ cyl : int [1:234] 4 4 4 4 6 6 6 4 4 4 ...
## $ trans : chr [1:234] "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
## $ drv : chr [1:234] "f" "f" "f" "f" ...
## $ cty : int [1:234] 18 21 20 21 16 18 18 18 16 20 ...
## $ hwy : int [1:234] 29 29 31 30 26 26 27 26 25 28 ...
## $ fl : chr [1:234] "p" "p" "p" "p" ...
## $ class : chr [1:234] "compact" "compact" "compact" "compact" ...
#Based on the information below
#categorical variables are class, fl, drv, trans, model and manufacturer
# continuous variables are displ, year,cyl, cty and hwy.
?geom_point
## starting httpd help server ... done
# The stroke changes the size of the border for the shape
# and control the size of shapes 21 through 24.
#geom_line(), geom_boxplot(), geom_histogram(), and geom_area()
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
geom_point() +
geom_smooth(se = FALSE)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
#Prediction it would display displ as x, hwy as y and color will be green
# Actual the color is different based on the drv
#show.legend = FALSE hides the legend that is automatically created
#Removing it would show the legend because it would always be true
#It was shown earlier to show how legends are display.
#Se argument in geom_smooth() gives standard error gray areas around a line in the line chart to see where where data is above the average.
I don’t know if they will look different. Let me check.
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
ggplot() +
geom_point(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_smooth(data = mpg, mapping = aes(x = displ, y = hwy))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
# No because both geom_smooth()and geom_point() will use the same data and mappings.
They do not look different. I am incredibly surprised.
For this exercise, use the diamonds dataset in the tidyverse. Use ?diamonds to get more information about the dataset.
geom_boxplot() and facet_wrap to illustrate the empirical distributions of the sample.ggplot(data=diamonds) + facet_wrap(~cut) +geom_boxplot(mapping = aes(x=color, y=price, color=clarity))
ggplot(data=diamonds) +
geom_point(aes(x=carat,y=price,shape=cut, color=clarity))
## Warning: Using shapes for an ordinal variable is not advised