Instructions

Exercises: 1-5 (3.2.4 Exercises); 1-2, 5 (3.3.1 Exercises); 1-5 (3.6.1 Exercises); Open Response

Submission: Submit an electronic document. Must be submitted as a PDF file generated in RStudio. All assigned problems are chosen according to the textbook R for Data Science (https://r4ds.had.co.nz/data-visualisation.html#exercises). You do not need R code to answer every question. If you answer without using R code, delete the code chunk. If the question requires R code, make sure you display R code. If the question requires a figure, make sure you display a figure. A lot of the questions can be answered in written response, but require R code and/or figures for understanding and explaining.

Chapter 3 (3.2.4 Exercises)

Exercise 1

ggplot(data=mpg)

If I perform ggplot, I cannot see anything

Exercise 2

dim(mpg)
## [1] 234  11

This data set has 234 rows ,11 columes.

Exercise 3

?mpg

the type of driven train, where f=front-wheel drive, r=rear wheel drive, 4=4ed

#

Excercise 4

ggplot(data=mpg)+geom_point(mapping = aes(x=displ, y=hwy))

Excercise 5

ggplot(data = mpg)+geom_point(mapping=aes(x=class,y=drv))

The class variable is categorical, the x-axis does not show increasing trend

Chapter 3 (3.3.1 Exercises)

Exercise 1

ggplot(data = mpg)+geom_point(mapping = aes(x=displ,y=hwy),color="blue")

The color coding should be outside the bracket. ## Exercise 2

?mpg
str(mpg)
## tibble [234 × 11] (S3: tbl_df/tbl/data.frame)
##  $ manufacturer: chr [1:234] "audi" "audi" "audi" "audi" ...
##  $ model       : chr [1:234] "a4" "a4" "a4" "a4" ...
##  $ displ       : num [1:234] 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
##  $ year        : int [1:234] 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
##  $ cyl         : int [1:234] 4 4 4 4 6 6 6 4 4 4 ...
##  $ trans       : chr [1:234] "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
##  $ drv         : chr [1:234] "f" "f" "f" "f" ...
##  $ cty         : int [1:234] 18 21 20 21 16 18 18 18 16 20 ...
##  $ hwy         : int [1:234] 29 29 31 30 26 26 27 26 25 28 ...
##  $ fl          : chr [1:234] "p" "p" "p" "p" ...
##  $ class       : chr [1:234] "compact" "compact" "compact" "compact" ...

manufacturer, model,drv are categorical, displ is continuous ## Exercise 5

?geom_point
ggplot(data = mpg)+geom_point(mapping = aes(x=displ,y=hwy,stroke=cty))

Use the stroke aesthetic to modify the width of the border.size of the circle represents CTY.

Chapter 3 (3.6.1 Exercises)

Exercise 1

#

geom_line to represent line chart, geom_boxplot to represent boxplot, geom_histogram to represent histogram, geom_tile or geom_area to represent area chart. ## Exercise 2

ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + 
  geom_point() + 
  geom_smooth(se = FALSE)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

I did not predict the three lines, but it shows like this on the graph. ## Exercise 3

ggplot(data=mpg,mapping = aes(x=displ,y=hwy,color=drv),show.legend=F)+geom_point(show.legend=F)+geom_smooth(se=F,show.legend = F)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

The legend has been removed if you put legend false. To make the graph simpler the author includes the legend. ## Exercise 4

#

It provides the uper and lower confidence interval for each line chart displayed unless specified as FALSE

Exercise 5

I don’t know if they will look different. Let me check.

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point() + 
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot() + 
  geom_point(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_smooth(data = mpg, mapping = aes(x = displ, y = hwy))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

They look the same, because they are displaying same x,y variables, the first line can be applied into second and third line, and do not have to repeat.

Open Response

For this exercise, use the diamonds dataset in the tidyverse. Use ?diamonds to get more information about the dataset.

Step 1: Select 1 numeric variable and 2 categorical variables. Create a graphic using geom_boxplot() and facet_wrap to illustrate the empirical distributions of the sample.

ggplot(data=diamonds) +
  geom_boxplot(aes(x=color,y=price,color=color)) + 
  facet_wrap(~cut) +
  theme(axis.text.x=element_blank(),axis.ticks.x=element_blank()) + 
  guides(color=guide_legend(title="Color"))+
  xlab("") + ylab("Price (Dollars)")

Step 2: Choose 2 numeric variables and 2 categorical variables and creatively illustrate the relationship between all the variables.

ggplot(data=diamonds) + 
  geom_point(aes(x=carat,y=price,shape=cut,color=color)) 
## Warning: Using shapes for an ordinal variable is not advised

?diamonds