Exercises: 1-5 (Pgs. 6-7); 1-2, 5 (Pg. 12); 1-5 (Pgs. 20-21)
Assigned: Friday, August 24, 2018
Due: Friday, August 31, 2018 by 5:00 PM
Submission: Submit via an electronic document on Sakai. Must be submitted as a html file generated in RStudio. All assigned problems are chosen according to the textbook R for Data Science.
ggplot(data=mpg)
There is a blank space for a graph, with absolutely nothing on it.
dim(mpg)
## [1] 234 11
nrow(mpg)
## [1] 234
ncol(mpg)
## [1] 11
There are 234 rows and 11 columns in the dataset. ## Exercise 3
?mpg
unique(mpg$drv)
## [1] "f" "4" "r"
The variable drv is a factor variable that takes the following values:
“f” = front-wheel drive, “4” = 4-wheel drive, “r” = rear-wheel drive
ggplot(data=mpg,aes(x=hwy,y=cyl)) +
geom_point() +
xlab("Highway Miles Per Gallon") +
ylab("Number of Cylinders")
ggplot(data=mpg,aes(x=class,y=drv)) +
geom_point() +
xlab("Type of Car") +
ylab("Type of Drive")
Scatter plots are not meant to visualize the relationship between two categorical/qualitative variables.
ggplot(data=mpg) +
geom_point(mapping=aes(x=displ,y=hwy,color="blue"))
The points are not blue because we are treating “blue” as a variable here and not an aesthetic property. To fix, we must move color=“blue” outside the mapping option.
ggplot(data=mpg) +
geom_point(mapping=aes(x=displ,y=hwy),color="blue")
str(mpg)
## Classes 'tbl_df', 'tbl' and 'data.frame': 234 obs. of 11 variables:
## $ manufacturer: chr "audi" "audi" "audi" "audi" ...
## $ model : chr "a4" "a4" "a4" "a4" ...
## $ displ : num 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
## $ year : int 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
## $ cyl : int 4 4 4 4 6 6 6 4 4 4 ...
## $ trans : chr "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
## $ drv : chr "f" "f" "f" "f" ...
## $ cty : int 18 21 20 21 16 18 18 18 16 20 ...
## $ hwy : int 29 29 31 30 26 26 27 26 25 28 ...
## $ fl : chr "p" "p" "p" "p" ...
## $ class : chr "compact" "compact" "compact" "compact" ...
The variables hwy, displ, and cty are all continuous variables.
ggplot(data=mpg) +
geom_point(mapping=aes(x=displ,y=hwy,color=cty))
ggplot(data=mpg) +
geom_point(mapping=aes(x=displ,y=hwy,size=cty))
# Does not work for shape. Requires a categorical variable #
# ggplot(data=mpg) +
# geom_point(mapping=aes(x=displ,y=hwy,shape=cty))
For color, a gradient is used to represent the magnitude of third variable cty. For shape, the size of the shape is used to represent the value of the third variable cty.
ggplot(data=mpg) +
geom_point(mapping=aes(x=displ,y=hwy,stroke=cty))
?geom_point
The stroke aesthetic works similar to the shape aesthetic. The size of the outline around the dots is based on the magnitude of the variable cty.
For line chart, use geom_line. For boxplot, use geom_boxplot. For histogram, use geom_histogram.
ggplot(data=mpg,mapping=aes(x=displ,y=hwy,color=drv)) +
geom_point() +
geom_smooth(se=F)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
ggplot(data=mpg,mapping=aes(x=displ,y=hwy,color=drv),show.legend=F) +
geom_point(show.legend=F) +
geom_smooth(se=F,show.legend=F)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
In order to remove the legend, the option show.legend must be specified for each geometry. Another method for removing the legend is to use guides() and set the targeted aesthetic to FALSE.
ggplot(data=mpg,mapping=aes(x=displ,y=hwy,color=drv)) +
geom_point() +
geom_smooth(se=F) +
guides(color=F)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
The se argument to geom_smooth() controls the standard error boundary around the fitted smooth curve to the data.
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
ggplot() +
geom_point(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_smooth(data = mpg, mapping = aes(x = displ, y = hwy))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
The plots look identical
geom_boxplot() and facet_wrap to illustrate the empirical distributions of the sample.ggplot(data=diamonds) +
geom_boxplot(aes(x=color,y=price,color=color)) +
facet_wrap(~cut) +
theme(axis.text.x=element_blank(),axis.ticks.x=element_blank()) +
guides(color=guide_legend(title="Color"))+
xlab("") + ylab("Price (Dollars)")
ggplot(data=diamonds) +
geom_point(aes(x=carat,y=price,shape=cut,color=color))
## Warning: Using shapes for an ordinal variable is not advised