Question 0: Finish all Lab Exercies and submit a .R code. For text answers, use comments.

1.Make a scatter plot of displ vs hwy from the mpg data set.

ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy))

2.Observe the plot, what preliminary conclusion can you draw from the plot?

As the increase of enjine displacement in litres, the trend of highway miles per gallon is decreasing.

3. Explain why the total number of data points on the plot is less than the total number of samples (which is 234).

Some of the points cannot be seen since the same data cause the overlap of the points

Explain why the total number of data points on the plot is less than the total number of samples (which is 234).

1. Which vehicle class contains least samples in the data set?

ggplot(mpg) +
  geom_bar(mapping = aes(y = class, color = class))

answer: 2seater vehicle class contains least samples in the data set.

2. which three manufacturer contains most samples in the data set?

ggplot(mpg) + 
  geom_bar(aes(y = manufacturer, fill = manufacturer))

answer: dodge, toyota, and volkswagen contains most samples in the data set.

Use dodged bar plots, to answer the following question:

1. In this data set, which manufacturer produces most SUVs?

ggplot(mpg) + 
  geom_bar(mapping = aes(x = manufacturer, fill = class), position = "dodge")

answer: chevrolet, ford, and toyota produce most SUVs ### 2. Change the keyword x in the aes function into y and reproduce the plot. What did you see?

ggplot(mpg) + 
  geom_bar(mapping = aes(y = manufacturer, fill = class), position = "dodge")

answer: the names of manufacturers are not overlapped. The data is more clearer now.

1. Try to remove the line of code geom_boxplot() from the code above and see what it gets.

ggplot(data = mpg, mapping = aes(y = displ)) + 
  stat_boxplot(geom = "errorbar", width = 0.5) + # The "width" controls the line size
  scale_x_discrete(breaks = NULL)

answer: empty whisker line

###2. Try to put the line of code geom_boxplot() before the stat_boxplot line and see what it gets.

ggplot(data = mpg, mapping = aes(y = displ)) + 
  geom_boxplot() +
  stat_boxplot(geom = "errorbar", width = 0.5) + # The "width" controls the line size
  scale_x_discrete(breaks = NULL)

answer: the whisker line is on the box plot

Create a multiple boxplot for variables manufacturer and cty, answer the following question:

ggplot(mpg, mapping = aes(x = cty , y = manufacturer)) +
  stat_boxplot(geom = "errorbar", width = 0.5) +
  geom_boxplot()

1. Within the data set, cars from which manufacturer is most fuel economic?

answer: volkswagen is most fuel economic

2. Within the data set, cars from which manufacturer is least fuel economic?

answer: jeep is least fuel economic

3. Do you think the conclusion from Q1 and Q2 is generally true for data beyond the current data set?

answer: yes, it’s generally true for data beyond the current data set. but to be more accurate, we should consider more comprehensive data based on this.

Question 1: Use the built-in diamonds data set in ggplot2, create a scatter plot and a smooth line plot (in the same graph) for price in y and carat in x. What conclusions can you draw from your figure? Submit a shared Google Doc (paste your figures into the document).

ggplot(data = diamonds, mapping = aes(x = carat, y = price)) +
  geom_point() +
  geom_smooth()
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

answer: with the increment of carat of the diamonds, the price of the diamonds increase.

Question2: (self-study) Do some self-study to see how the function geom_count() works. Create a plot with mpg data set using geom_count(). Submit a shared Google Doc (paste your figures into the document). It can be the same one as the first question.

ggplot(mpg, aes(x = fl, y = hwy)) +
  geom_count()