knitr::opts_chunk$set(echo = FALSE,
message = FALSE,
warning = FALSE)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.0 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
in this assignment, we’ll use the diamond data. first load the data:
## [1] "carat" "cut" "color" "clarity" "depth" "table" "price"
## [8] "x" "y" "z"
## # A tibble: 6 × 10
## carat cut color clarity depth table price x y z
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
According to the plot, it is found that the ideal and premium cut diamonds are having higher price. Moreover, it is observed that the higher the carat the high the price.
Use the variables of “cut” and “price” from the diamonds data.
Based on the plot, it is found that carat and price are positively correlated. Price and cut also have positive correlation.
Based on the graph, it is found that the price of the better cutting diamonds are more stable. Moreover, the price of the diamonds has heteroskedasticity, the price range is higher when carat is larger.
Based on the plot, it is found that the less clear diamonds are
having wider spread on carat and the prices are relatively lower.
The better cutting diamonds are limited in size with higher price.