The diamonds dataset contains information on
53940 diamonds.
Variables include:
head(diamonds,10)
## # A tibble: 10 × 10
## carat cut color clarity depth table price x y z
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
## 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47
## 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53
## 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49
## 10 0.23 Very Good H VS1 59.4 61 338 4 4.05 2.39
summary(diamonds)
## carat cut color clarity depth
## Min. :0.2000 Fair : 1609 D: 6774 SI1 :13063 Min. :43.00
## 1st Qu.:0.4000 Good : 4902 E: 9797 VS2 :12254 1st Qu.:61.00
## Median :0.7000 Very Good:12081 F: 9538 SI2 : 9185 Median :61.80
## Mean :0.7977 Premium :13780 G:11284 VS1 : 8170 Mean :61.75
## 3rd Qu.:1.0400 Ideal :21548 H: 8298 VVS2 : 5066 3rd Qu.:62.50
## Max. :5.0100 I: 5421 VVS1 : 3654 Max. :79.00
## J: 2808 (Other): 2528
## table price x y
## Min. :43.00 Min. : 326 Min. : 3.730 Min. : 3.680
## 1st Qu.:56.00 1st Qu.: 949 1st Qu.: 4.710 1st Qu.: 4.720
## Median :57.00 Median : 2401 Median : 5.700 Median : 5.710
## Mean :57.46 Mean : 3931 Mean : 5.732 Mean : 5.735
## 3rd Qu.:59.00 3rd Qu.: 5323 3rd Qu.: 6.540 3rd Qu.: 6.540
## Max. :95.00 Max. :18823 Max. :10.740 Max. :58.900
##
## z
## Min. : 1.07
## 1st Qu.: 2.91
## Median : 3.53
## Mean : 3.54
## 3rd Qu.: 4.04
## Max. :31.80
##
ggplot(diamonds, aes(x = carat, y = price)) +
geom_point(alpha = 0.5, color = "steelblue") +
geom_smooth(method = "lm", se = FALSE, color = "darkred", linewidth = 1) +
scale_y_continuous(labels = dollar_format(), limits = c(0, 20000)) +
scale_x_continuous(breaks = seq(0, 5, 0.5)) +
labs(
title = "Diamond Price vs Weight (carat)",
x = "Carat",
y = "Price (USD)"
) +
theme(
plot.title = element_text(face = "bold"),
panel.grid.minor = element_blank()
)
## `geom_smooth()` using formula = 'y ~ x'
Comment: Larger diamonds generally have higher prices - strong positive relationship.
ggplot(diamonds, aes(x = carat, y = price)) +
geom_point(alpha = 0.3, color = "darkgreen") +
geom_smooth(method = "lm", se = FALSE, color = "black", linewidth = 1) +
scale_x_log10(breaks = c(0.2, 0.5, 1, 2, 5)) +
scale_y_log10(labels = dollar_format()) +
labs(
title = "Diamond Price vs Carat (log–log scale)",
x = "Carat (log)",
y = "Price (USD, log)"
) +
theme(
plot.title = element_text(face = "bold"),
panel.grid.minor = element_blank()
)
## `geom_smooth()` using formula = 'y ~ x'
Comment: On a log-log scale, the relationship looks linear.
ggplot(diamonds, aes(x = depth, y = price)) +
geom_point(alpha = 0.3, color = "purple") +
geom_smooth(method = "lm", se = FALSE, color = "orange", linewidth = 1) +
scale_x_continuous(limits = c(50, 70), breaks = seq(50, 70, 2)) +
scale_y_continuous(labels = dollar_format(), limits = c(0, 20000)) +
labs(
title = "Diamond Price vs Depth Percentage",
x = "Depth (%)",
y = "Price (USD)"
) +
theme(
plot.title = element_text(face = "bold"),
panel.grid.minor = element_blank()
)
## `geom_smooth()` using formula = 'y ~ x'
Comment: Depth alone has a weak effect on price.
ggplot(diamonds, aes(x = x, y = price)) +
geom_point(alpha = 0.3, color = "darkcyan") +
geom_smooth(method = "lm", se = FALSE, color = "brown", linewidth = 1) +
scale_x_continuous(limits = c(3, 10), breaks = seq(3, 10, 1)) +
scale_y_continuous(labels = dollar_format(), limits = c(0, 20000)) +
labs(
title = "Diamond Price vs Length (in mm)",
x = "Length (mm)",
y = "Price (USD)"
) +
theme(
plot.title = element_text(face = "bold"),
panel.grid.minor = element_blank()
)
## `geom_smooth()` using formula = 'y ~ x'
Comment: Longer diamonds tend to have higher prices.