diamond_sizes

We have data about 53940 diamonds. Only 126 are larger than 2.5 carats. The distribution of the remainder is shown in Figure 1:

Figure 1: Diamond Carat Distribution

From the frequency polygon chart we can see sharp spikes in count at specific carat values, suggesting that most diamonds are cut to these sizes due to market/industry standards. Despite uneven distribution with peaks, we can still observe the decline in the number of diamonds as the carats increase.

percent_larger <- mean(diamonds$carat > 2.5)
cat(round(percent_larger * 100, 2), "%\n")
0.23 %

Exploring diamond sizes by color, cut, and clarity

diamonds |>
  group_by(cut) |>
  summarize(median_carat = median(carat), .groups = "drop") |>
  knitr::kable()
cut median_carat
Fair 1.00
Good 0.82
Very Good 0.71
Premium 0.86
Ideal 0.54
diamonds |>
  group_by(color) |>
  summarize(median_carat = median(carat), .groups = "drop") |>
  knitr::kable()
color median_carat
D 0.53
E 0.53
F 0.70
G 0.70
H 0.90
I 1.00
J 1.11
diamonds |>
  group_by(clarity) |>
  summarize(median_carat = median(carat), .groups = "drop") |>
  knitr::kable()
clarity median_carat
I1 1.12
SI2 1.01
SI1 0.76
VS2 0.63
VS1 0.57
VVS2 0.44
VVS1 0.39
IF 0.35

The Largest 20 Diamonds

top20 <- diamonds |> arrange(desc(carat)) |> slice_head(n = 20)
knitr::kable(top20[, c("carat", "cut", "color", "clarity")])
carat cut color clarity
5.01 Fair J I1
4.50 Fair J I1
4.13 Fair H I1
4.01 Premium I I1
4.01 Premium J I1
4.00 Very Good I I1
3.67 Premium I I1
3.65 Fair H I1
3.51 Premium J VS2
3.50 Ideal H I1
3.40 Fair D I1
3.24 Premium H I1
3.22 Ideal I I1
3.11 Fair J I1
3.05 Premium E I1
3.04 Very Good I SI2
3.04 Premium I SI2
3.02 Fair I I1
3.01 Premium I I1
3.01 Premium F I1

Diamond
knitr::kable(head(diamonds, 5))
carat cut color clarity depth table price x y z
0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63
0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75