knitr::opts_chunk$set(echo = FALSE,
                      message = FALSE,
                      warning = FALSE)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.0      ✔ stringr 1.4.1 
## ✔ readr   2.1.2      ✔ forcats 0.5.1 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

in this assignment, we’ll use the diamond data. first load the data:

check variable names and types:

##  [1] "carat"   "cut"     "color"   "clarity" "depth"   "table"   "price"  
##  [8] "x"       "y"       "z"
## # A tibble: 6 × 10
##   carat cut       color clarity depth table price     x     y     z
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
## 2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
## 3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
## 4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
## 5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
## 6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48

Task 1: show a scatterplot of diamond price on carat for different cuts. Color by cuts.

According to the plot, it is found that the ideal and premium cut diamonds are having higher price. Moreover, it is observed that the higher the carat the high the price.

Task 2: show diamond prices over carat by different cuts; show trend line with a linear smoothing function.

Use the variables of “cut” and “price” from the diamonds data.

Based on the plot, it is found that carat and price are positively correlated. Price and cut also have positive correlation.

Task 3: facet the plot from task 2 by a) cut and b) clarity. How do cut and clarity affect price over carat?

a) facet by cut

Based on the graph, it is found that the price of the better cutting diamonds are more stable. Moreover, the price of the diamonds has heteroskedasticity, the price range is higher when carat is larger.

b) facet by clarity

Based on the plot, it is found that the less clear diamonds are having wider spread on carat and the prices are relatively lower.
The better cutting diamonds are limited in size with higher price.

save your graphical output to PDF files, upload both the graphs and R codes on canvas